As an organisation grows, generally the software stack does so too, this is where the issue of Data integration arises. According to this report by Forrester, “Large organisations use an average of 367 different software tools, creating data silos and disrupting processes between teams”. No wonder, then, that “80% of respondents say reducing silos are a top priority for their organisation”.
To break down these silos, most organisations are investing in data integration solutions. According to Gartner, “Through 2024, 75% of midsize to large enterprises will leverage at least two different integration tool categories in order to strategically address most of their pervasive integration needs”.
Throughout the years, a lot of different data integration tools have been developed that address particular problems and focus on specific aspects. Some of them are quite obsolete nowadays; for example, we have already explained why the ESB is an outdated approach. Recently, a new architectural paradigm has emerged and it is known as Data Fabric. Since it is an innovation that not so many are familiar with, we decided to dedicate some blog posts to better understand what it is. We will do this by comparing Data Fabric to other data integration tools that are more widely known.
In this first article, we will compare Data Fabric to iPaaS (integration Platform-as-a-Service). We will provide a general overview of each of these paradigms, and then we will compare their main features and capabilities.
What is Data Fabric
As defined by IBM, a
Data Fabric is an architecture that facilitates the end‑to‑end integration of various data pipelines and cloud environments through intelligent and automated systems. According to Gartner,
A data fabric is an emerging data management design for attaining flexible, reusable and augmented data integration pipelines, services and semantics (see link for the full definition).
Data Fabric is becoming increasingly popular mostly because it is strongly focused on automation. Another key characteristic of Data Fabric is that it is driven by metadata. To delve more deeply into its characteristics, you can read this blog post illustrating the 5 key capabilities you need in a Data Fabric solution.
What is iPaaS
Gartner defines Integration Platform as a Service (iPaaS) as
a suite of cloud services enabling development, execution and governance of integration flows connecting any combination of on-premises and cloud-based processes, services, applications and data within individual or across multiple organisations. On the other hand, according to IBM, iPaaS
is a self-service cloud-based solution that standardizes how applications are integrated.
As explained by its name and definition, iPaaS is strongly focused on integrating applications, particularly their data. Some of the other key capabilities of iPaaS include:
- Adapters to simplify configuration connectivity;
- A low-code workflow environment to involve also less technical figures;
- Support for hybrid deployment models.
Yet, iPaaS has severe limitations that do not allow it to scale effectively. Both its performance and availability are strictly dependent on the performance and availability of the backend systems to which it connects. If those systems are slow or overloaded, all iPaaS processes are affected in terms of speed, responsiveness, availability, and reliability.
Caching, redundancy, failover mechanisms, and error-handling strategies are often implemented by iPaaS solutions. While these approaches can mitigate it, the problem cannot be totally eliminated without a lot of work on the backend systems themselves.
Understanding Integration: Three Main Use Cases
When speaking about integration, generally there are three main use cases you might want to address. In order to be considered an iPaaS, a platform must support at least one of the following use cases. This section discusses each use case in more detail.
This pattern aims at ensuring that data is synchronized and consistent across different sources. In other words, you want to make sure that data about certain business entities scattered across multiple databases and applications are in sync. Such business entities include, but are not limited to, customers, products, suppliers, employees, patients, citizens, assets, etc. For example, the address of a given customer should be the same across your CRM, ERP, and billing applications.
IPaaS that address data consistency offer two different capabilities, depending on how data synchronization is made:
- Applications: when the purpose is to keep applications synchronized, iPaaS has the ability to detect changes in the source application and trigger integration processes to validate, enrich, and transform data, and then route it to destination applications.
- Data: when the purpose is to centralize data, iPaaS has the ability to create data pipelines from a variety of disparate applications and data sources into a destination data endpoint, such as a data warehouse or data lake.
This pattern aims at creating a unique process in which each step is triggered by an event occurring on another application. In other words, you want independent applications to collaborate in order to streamline certain business process by automatically synchronizing the activity and exchanging data. For example, your supply chain management (SCM) application notifies your warehouse management system (WMS) about the arrival of a certain quantity of a product. In turn, the WMS updates your ERP through the exchange of financial data about the new products.
Multistep processes can be divided into two main categories:
- Internal: when the automated tasks and processes are all run within the organisation. All applications, services, and data sources are inside the organisation network;
- External: when the automated tasks and processes involve business partners’ applications, services, or data sources.
This pattern aims to develop a new application that has to access data in some preexisting applications, particularly legacy ones. For example, you want to develop a mobile app for your salespeople that displays and updates customer data in your ERP and CRM systems. Modern applications usually expose APIs to support this use case, but legacy applications typically do not, which makes addressing this use case more difficult than it may seem at first.
Depending on how you expose the data — usually via APIs — you can have two types of composite services:
- Internal: when the service exposes data that is discoverable and available for use within the organisation;
- External: when the service exposes data that is discoverable and available for people outside the organization, such as business partners and/or customers.
Data Fabric and iPaaS: Unpacking Capabilities and Crafting Synergy
Having clarified what a Data Fabric and an iPaaS are, in this section, we will show how these two tools can work together. By showing the possible synergies between them, the differences will also emerge.
First of all, it is important to note that they both offer some capabilities, including:
- Change Data Capture (CDC);
- Batch Processing;
- Microservices architecture;
- Simple Data Exposure through API;
- Historical data;
- Near real-time data update, but with different approaches. For iPaaS, this capability is related to replicated data coming from the backend systems. For Data Fabric, near real-time data update is performed on aggregated data.
Besides these shared characteristics, iPaaS and Data Fabric have different capabilities. While some of them are substantial differences, some features are complementary to each other. This means that you can have a Data Fabric and an iPaaS working in synergy, each of them focusing on its particular strengths. For instance, if you already have an iPaaS operating in your organization, you can use a Data Fabric to expand and improve its capabilities.
Let’s take a deeper look at how a Data Fabric can enhance an iPaaS system and at their differences.
The first aspect to consider is integration since it’s the main priority of both tools. The three different use cases we will speak about are illustrated in the previous section.
In order to be an iPaaS, the platform should support at least one of the three use cases mentioned above (Data consistency, Multistep process, Composite Services). So, most iPaaS do not provide all these capabilities and it is necessary to analyze what each provider actually offers. You can have an iPaaS that solves the problem of data consistency, another one focused only on multistep processes, and yet another for composite services. If you are experiencing all these three issues, you might need three different iPaaS to solve them all.
Data Fabric, on the other hand, is a valuable unique solution to support both data consistency and composite services; only multi-step processes are not generally backed by Data Fabric. However, a Data Fabric solution can be leveraged to improve the performance of an iPaaS, thus covering all the possible scenarios. The iPaaS can be connected to the Data Fabric, which takes care of integrating data from the underlying backend systems. As Data Fabric is cloud-based and highly performant, the iPaaS will no longer be dependent on the low-performance of the backend systems and will be more performant as well.
Since it is strongly focused on integration, usually iPaaS does not support data aggregation. Generally, this is a task out of the scope of iPaaS, and, if needed, it’s outsourced to another dedicated tool such as a data warehouse or a data lake.
In fact, data aggregation is one of the core capabilities of Data Fabric. In a Data Fabric architecture, there are several layers, each of which with a specific purpose. Data is firstly integrated into a dedicated layer, then a second layer takes care of aggregation. A unique data fabric solution that handles both data integration and data aggregation provides a more streamlined, efficient, and cost-effective approach to managing your data infrastructure. It simplifies data management processes, enhances data consistency, and improves overall system performance, making it a preferred choice for organisations looking to harness the full potential of their data.
Data transformation is a crucial process that involves converting data from one format to another to make it usable and valuable. Generally, iPaaS does not feature advanced data transformation solutions. Data mapping, the pre-transformation step, is performed manually, requiring huge efforts both in terms of time and professional figures involved.
Data Fabric, on the other hand, mainly thanks to its capability to leverage metadata, can automate this task. As a consequence, the whole transformation process is faster and more effective. Data transformation in a predetermined data model shared within the organisation also enables the creation of single views, i.e. unified collections of all the data about a business entity.
As mentioned in this previous article about the Five key capabilities you need in a Data Fabric solution, real-time data availability and access is one of the most important features of Data Fabric. Plus, Data Fabric can aggregate data to produce single views and make them available in real-time. This replication of all data can then be used by the iPaaS as the source for further processes. Also, thanks to the single views you can have a unified and comprehensive view of your data, no matter how scattered it is on the different original data sources.
Connecting an iPaaS to a Data Fabric is a practical solution for the performance issues associated with the former: if data is served in real-time by Data Fabric, there will no longer be any slowdown or bottleneck caused by the underlying low-performant systems.
Efficient data repository
As mentioned above, Data Fabric can be the source of data to which the iPaaS is connected. This is mainly because Data Fabric is designed to store and host data. Acting as a centralised repository, from here data can be managed and served efficiently and effectively. A great benefit of doing so is that Data Fabric can send update notifications without the need for a specific CDC.
Having all data in a single point of access benefits other data services, as well as the iPaaS. Data governance will be easier and more efficient, and teams can dedicate more time and resources to building and maintaining services instead of cherry-picking data issues and fixing pipeline errors caused by poor data.
On the other hand, it is also possible to work the other way around, i.e. connecting a Data Fabric to an iPaaS. This way, the iPaaS connects directly to the backend systems and then sends data to the Data Fabric which, in turn, aggregates it. Aggregated data can then be used for several purposes.
As the number of systems and software used within an organisation increases, data integration becomes more and more important to ensure a comprehensive business overview. Several different tools can be used to integrate data from multiple sources, and in this article, we have compared iPaaS, one of the most popular solutions, and Data Fabric, a new emerging paradigm.
iPaaS is a paradigm that has been established in the market for a longer time and therefore more common, but it has severe performance limitations that can affect its effectiveness. Instead, Data Fabric is a new paradigm that is gaining popularity due to its focus on automation and metadata, which make it very fast and more complete.
There is some overlap between iPaaS and Data Fabric. Several capabilities they offer are very similar but there are also substantial differences between them. However, it is important to note that Data Fabric and iPaaS can work synergically together: by connecting the two tools, it is possible to leverage the strengths of one to mitigate the limitations of the other.
Mia-Platform Fast Data is a Data Fabric solution that can be used to enhance an existing iPaaS, expanding its capabilities, or to perform data integration, aggregation, and transformation in a single tool. To further explore all the features of Mia-Platform Fast Data, take a look at the documentation and book a free demo to see it in action.