Data Temperature and bucket storage: optimizing data storage costs

8 minutes read
09 March 2023

In an increasingly connected world, data is rapidly gaining importance, and the ability to make data-driven decisions is a decisive factor in the competitiveness of companies. Thus, data management in an organization is an issue of utmost importance: the definition of the data model, the paradigm to be adopted, the type of database, and many other factors can have a significant impact on the ability to properly leverage data.

At Mia‑Platform we have been pushing the importance of Fast Data, that is, data that needs to be updated and available in real-time 24/7. We have developed Mia‑Platform Fast Data, a product that allows you to easily manage this kind of data and build your own Digital Integration Hub, and after enhancing it with low‑code features we have now introduced an important new feature, which is the support for bucket storage.

Before delving into the new feature, let us take a step back and introduce the topic of data temperature. This will allow us to better explain how bucket storage works and why we introduced it.

 

What is data temperature and why it is important

Data temperature refers to the frequency of data access: the more a data item is used, the higher its temperature. As we will see in detail below, according to this logic, data can be divided into three categories: Hot Data, Warm Data, and Cold Data. Some organizations reduce the categories to just Hot and Cold Data, considering cold all data that is not strictly hot.

Before continuing, it is important to make a disclaimer. In literature, Hot Data are those saved in RAM, while Cold Data are all other data stored in databases. In this article, however, we will analyze only this second category of data. In fact, although these data are stored on databases, they are not all the same: their use may be very different and divided into further sub‑temperatures with respect to the macro‑division between RAM and database. It is therefore important to keep in mind that the data temperature we discuss in this article must be contextualized within the data saved on databases.

Thus, data temperature is not an intrinsic property but is a quality assigned to data based on its use over time. Some data may keep their temperature unchanged over time, while other that is considered Hot Data one day may become Warm Data after a certain period, and later complete the transition to Cold Data. Depending on the type of data, you may also have the transition directly from Hot to Cold Data.

Why is it important to set the data temperature? Since temperature is directly dependent on the frequency of data access, by establishing the correct data temperature, different types of databases can be employed, optimizing storage capacity and resource utilization.

 

Hot Data

In the context of data stored on databases, Hot Data is the data that is accessed and used very frequently. Hot Data is often made available in real‑time, and therefore needs to be saved to very high‑performance databases. Hot Data in narrow terms is data saved in RAM, but, as mentioned above, the analysis in question refers to data on databases.

Examples of Hot Data are real‑time inventory of a warehouse, the current location of a vehicle, and active users on a particular service.

 

Warm Data

Warm Data is data less frequently than Hot Data but still needs to be available with relative speed. This type of data is generally stored on performing databases to ensure access in a short time.

 

Cold Data

Cold Data, on the other hand, is data that is rarely used. For this reason, it is usually saved on bucket storage or databases that are less powerful in terms of performance but offer large storage capacities. Systems backup files are a good example of Cold Data, as they need to be kept for auditing and security reasons but are rarely used and do not need to be made available quickly.

 

Data temperature and bucket storage

Having defined what data temperature is and the different types of data, it is easier to explain why we have introduced support for bucket storage within Mia‑Platform Fast Data. Our product takes care of collecting data from the various business systems in use (also called Systems of Records or SoR), organizes them into Single Views according to defined business logic, and makes them available in real‑time to external systems. This data is mostly Hot Data, which is queried, displayed, and updated many times: for this reason, it is stored on a high‑performance database such as MongoDB.

Once this data cools down, however, there is no longer the need to have it saved on high‑performance media, as space on these databases is generally limited and expensive. To collect this data (which can be either Warm or Cold Data) we have introduced support for bucket storage, a great innovation that helps optimize resources and reduce costs.

 

What is a bucket storage

A bucket storage is an object container optimized for handling large amounts of data. It is not a true database because the capacity to update a record is generally less than a database, but it offers much more storage space.

In addition, a bucket storage is an excellent solution for storing unstructured raw data, thus giving the ability to defer their organization at a later time where needed.

 

Mia‑Platform Fast Data and the support for bucket storage

How does Mia‑Platform Fast Data’s support for bucket storage work? In the ingestion phase, that is, when the data is collected to be sent to the MongoDB instances of Mia‑Platform Fast Data, the same data is sent to the storage bucket in raw form. The support introduced also allows the raw data to be organized and structured at a later time than it was written. This feature allows you to separate raw data from structured data, saving them in different folders or possibly even on a second bucket storage.

Depending on requirements, storage can occur in parallel (writing to MongoDB and to the storage bucket occurs at the same time through two separate actions), or sequentially. In the latter case, the write to the storage bucket occurs before the write to MongoDB, but with negligible latency that does not impact the performance of Mia‑Platform Fast Data. Choosing the parallel architecture ensures lower latency but could cause inconsistency problems between data on the bucket storage and those on MongoDB, while the sequential architecture ensures that there is no inconsistency between the two storage but increases, albeit negligibly, the latency.

Currently supported are Google Bucket Cloud Storage and all buckets that adopt the S3 protocol, e.g., Amazon S3 and Oracle Object Storage.

 

Advantages of bucket storage

From a Composable Architecture perspective, the bucket storage can be employed independently of Mia‑Platform Fast Data, as a storage space to be used whenever needed. The main benefits of bucket storage, in general, include:

  • Providing large storage space at a reduced cost;
  • Becoming an easily accessible repository for backup and compliance reasons;
  • Enabling easy exposure of data to business intelligence (BI) tools.

In addition, when used in conjunction with Mia‑Platform Fast Data, there are additional benefits:

  • Allowing only strictly necessary data to be left on MongoDB, leaving free space for other data;
  • Fostering decoupling from Systems of Records (SoR), i.e., the systems from which data is collected, as data is replicated to the storage bucket;
  • Increasing the speed of reingestion of data in Mia‑Platform Fast Data if this operation is necessary.

The only limitation of bucket storage concerns the frequency of data updates. This is a general feature of this type of tool, as bucket storage allows for a limited number of operations compared to a high-performance database such as MongoDB. However, the support introduced in Mia‑Platform Console provides a feature to mitigate this constraint. It is indeed possible to group updates for each file and thus apply the most recent update only once, while also keeping the history of individual changes.

 

Conclusion

Knowledge of data temperature is very important. In fact, not all data needs to be stored on high‑performance databases: each element of data can be stored on different types of media according to specific needs, optimizing resources and reducing costs.

For this reason, we have introduced support for bucket storage, which is a storage space optimized for collecting large amounts of data, so that Cold Data can be stored on a proper device for its use. In addition, bucket storage can be used to free up space on Mia‑Platform Fast Data to add more data and achieve total decoupling from source systems.

If you are not yet familiar with Mia‑Platform Fast Data read this article to learn more; if you already use it and want to add this new feature, you can refer to the documentation.

New call-to-action
Back to start ↑
TABLE OF CONTENT
What is data temperature and why it is important
Data temperature and bucket storage
What is a bucket storage
Mia‑Platform Fast Data and the support for bucket storage
Advantages of bucket storage
Conclusion