Data Fabric Versus All: ETL – The Integration Revolution

6 minutes read
28 November 2023

Data Fabric is quite a recent concept, and if you are not a data scientist you may need to become more familiar with it. For this reason, we decided to create a series of blog posts to compare Data Fabric to more well-known tools to facilitate understanding this new paradigm. This article is the second of the series, following the confrontation between Data Fabric and iPaaS. We recommend reading the first article before this one, as here some concepts will be taken for granted.

In this article, we continue confronting the Data Fabric paradigm with another data integration pattern, which is ETL (Extract, Transform, Load). We will cover what ETL and ELT (Extract, Load, Transform) are, their main capabilities, and the differences with Data Fabric.

What is ETL (Extract, Transform, Load)?

According to IBM, ETL, which stands for extract, transform and load, is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system. It is important to note that this definition highlights the fact that, as long as you follow the process, you can customize each step according to your specific needs.

What is ELT (Extract, Load, Transform)?

ELT is similar to ETL, but the steps are executed in a different order. Raw data is loaded in the final data store (whether it is a database, a data lake, a data warehouse, or else), which takes care of the transformation process.

Data transformation is the most complex and heavy step of the whole process: postponing it at the end has the advantage of making the previous two steps faster. For this reason, ELT is to be preferred for managing large amounts of unstructured data. Additionally, ELT is good for use cases where the transformation is not a heavy-load process, or where it does not require too many computational resources.

However, it is important to note that data is saved in the load step: if you transform data after saving it, it means that the transformed data is not persistent. For this reason, ELT is recommended for purposes that do not need transformed data to be persistent.

Data Fabric vs ETL: differences and synergies

For the definition of Data Fabric, please refer to the previous article.

Since we are comparing the two paradigms, it is important to note that Data Fabric can be considered as an evolution of ETL/ELT. Indeed, the latter are longer established, while the former is more recent. So, generally speaking, a Data Fabric can perform all the three steps that compose an ETL/ELT process, but it is not limited to them. Other Data Fabric capabilities, beyond the ETL ones, include:

  • Data virtualization;
  • Metadata management;
  • Data security;
  • Near real-time data distribution.

In the following sections, we will unpack the main capabilities of both ETL/ELT and Data Fabric, showing the differences between them. In doing so, we will also highlight the possible synergies that may emerge.

Disclaimer: from now on, for the sake of readability, we will speak only of ETL while also referring to ELT. In fact, ETL is more widely adopted and ELT can be considered a subset.

Change Data Capture (CDC)

The main difference between Data Fabric and ETL is that generally, the latter does not have any Change Data Capture (CDC) mechanism (except some of the latest ETLs that may have it). So, for most of them, a separate CDC solution needs to be implemented to notify the ETL system and extract the data when needed. Without a supporting CDC, ETL must extract data on a periodic basis. Also, some source systems can’t notify or keep track of changed data: in such cases, you need to extract all data. For large amounts of data, this step can require a lot of time and resources, even if there are few actual changes.

On the other hand, Data Fabric has a built-in CDC mechanism, so it can only focus on data that really needs to be extracted. Most importantly, Data Fabric has the ability to automate the extraction process from the data source. As we also mentioned in another blog post, automation is one of the main responsibilities of Data Fabric, mainly thanks to metadata.

In addition, extracted data are not stored in a dedicated staging area like in ETL systems, giving Data Fabric very high performance.

Real-time Data

Another big difference is that ETL solutions cannot support real-time data, mainly because they do not have a CDC mechanism. No matter the exact order of the steps, the whole process is long and time-consuming, especially for large amounts of data. So, it requires time to have updated data on the target data store. Considering that for an ETL without a CDC can pass weeks or months between one scan and the other, it becomes clear that these processes cannot work for modern cloud services that need to be always up to date.

On the other hand, making data available in real time is one of the key capabilities of Data Fabric. The process of detecting the changed data on the source system, updating the single view, and making it available is all performed in milliseconds. With a Data Fabric, you can build complex services leveraging real-time data.

Microservices Architecture

Since ETL is an older paradigm, generally, it is not built with a microservices architecture. This results in a more straightforward process, as you do not need to cope with the complexity brought by microservices, but has also disadvantages. For example, you can not vertically scale just the software section in charge of one of the steps, but you have to scale the whole system. This can be very expensive in terms of resources. But, most importantly, ETL does not generally provide APIs for communicating with other systems. So, exposing data can be long and difficult and requires much manual work.

Data Fabric, on the other hand, can be built following a microservices architecture. In this way, you can scale just the microservices that need to be scaled, resulting in a more optimized and sustainable resource consumption. Plus, you can use the most appropriate language and cloud-native technology for each microservice, further optimizing the whole system. In addition, since microservices communicate with each other through APIs, it is very easy to expose data to other services that may need it.

Conclusion

In this second article of the series “Data Fabric vs All”, we compared it to the paradigm of ETL/ELT. We covered the main differences between them, highlighting how Data Fabric can be considered as an evolution of ETL, covering the same functionalities and expanding it with more capabilities.

Mia-Platform Fast Data is our Data Fabric solution built with a microservices architecture. It strongly relies on a No-Code approach so that your team can focus more on solving problems rather than making things work. To explore all the features of Mia-Platform Fast Data, take a look at the documentation and book a free demo to see it in action.

Back to start ↑
TABLE OF CONTENT
What is ETL (Extract, Transform, Load)?
What is ELT (Extract, Load, Transform)?
Data Fabric vs ETL: differences and synergies
Conclusion