Getting started
Start to learn the main concepts of Mia-Platform and how to use to develop your services
Start to learn the main concepts of Mia-Platform and how to use to develop your services
Discover more
Discover all the benefits of becoming a Mia-Platform's partner entering the program.
Discover more
In this article we are going to compare from a technical point of view the two most used data management models on the market: the relational model, which is implemented in most corporate legacy systems, and the document model, that was recently rediscovered because it is able to overcome some inflexibility of the relational model.
So, let’s see the main features, the different approaches to the models, and a series of useful questions to decide which model to adopt.
The relational model, built around the concept of relationship, structures the data into models that relate to each other through constraints that are made explicit through foreign keys: for example, the Child Table has a reference to the Master table through its ID. This relationship is expressed in a language, usually a Data Definition Language (DDL).
One of the reasons for the popularity of this model is the fact that several tools for database management are available (also for free, such as MySQL Workbench or DBeaver for database design or schema migration).
For the sake of simplicity, in the examples that will follow, we will assume that the relational model is implemented on a SQL compliant database.
The document model focuses on data, rather than relationships. In this regard, the data is not seen as a tuple (or row) within a table, but as a document with properties, which are not theoretically defined. There is no DDL language that defines a schema and therefore some documents may or may not have these properties.
In this model, the relationships are not represented by foreign keys but by nested data. Since the document is unstructured, it may include another document or an array of documents, thus creating an N: 1 or 1: N relationship between the two entities. Therefore, nesting is preferred over an explicit relationship by one or more properties.
In the examples that will follow, we will assume that the document model is implemented with the MongoDB database, which has recently become very popular in the enterprise environment.
In a relational approach, the writing activity is performed by a component that writes data in one place and there is one and only one table that contains that data. If you add a record, a new row will be added to that table. On the contrary, if you want to read that data, the reading query will typically find that information by performing the JOIN
: an operation that allows you to query multiple tables, combine their data and return them in filtered form.
On the contrary, a document model pushes the developer to replicate data in different places so that the reading can be performed considering only one data model – therefore without a solution like the JOIN
.
Let’s make a simple example, like a library: how do you represent authors and books in the data model? The use case shows the search by book or by the author which is performed by the users of the library through the designated terminals.
In a relational approach, a possible solution could be implemented by using two tables – the Authors table and the Books table – both linked together by a property of the Books table which logically relates the two tables. In this model it is also possible to force all the books to have an author, who must be registered in the database by using the Books table foreign key, that does not allow a null
value.
By using a document-oriented approach, instead, it is possible to take advantage of the nesting to describe the relationship. There could three approaches:
By choosing the document model, it is possible to make some data consistency checks, but not others. For example: by using the third approach, it is possible to survey the books of a non-existent author. Indeed, in the Books collection, you could insert a document whose author is not registered in the Authors collection.
By using the relational model, the data is always accessible. As a matter of fact, it is possible to access all the information from both tables: starting from Authors and – through JOIN
– it is possible to find the data of the books and vice versa.
SELECT * from Authors JOIN Books on Authors.id = Books.author_id where Author.id=??;
SELECT * from Authors JOIN Books on Authors.id = Books.author_id where Book.id=??;
Both readings involve both entities, so the editing to the schema affects the reading queries: adding or removing a property the query can return more or fewer properties. As a matter of fact, if you add a column to the Authors table, both read queries will not be affected.
Using the document model and referring to the three hypotheses mentioned above bring to three possibilities:
In the document model, therefore, it is very important to choose the most suitable data model for the required reading activities.
The relational model allows you to write the data in one place. In this way, updating or removing a row can be done by a single command without the use of JOIN
: automatically, all new reading queries will inherit the change that was made to the data.
Based on the approach, the document model can provide for a greater complexity of writing implementation:
Therefore, the document model might not respect the atomicity and consistency requirements of the transactions: the writing must be performed in two sequential actions, thus violating the atomicity principle. There is also a time, albeit short, in which a collection contains some documents that have not yet been updated.
As always, there is no silver bullet: it depends on the type of project and on the critical issues of the application.
However, we can highlight some questions that can be useful in choosing which approach to use.
If you think of a book management application, such as a library, it is more likely that the readings will be executed more frequently than the writings, since the catalog consultation is the main feature. As a matter of fact, the requests for writing would be limited to the booking and/or editing of data, less frequent compared to consulting the catalog.
In these types of contexts, writing and reading operations have different weights.
In a relational database, which includes a large number of tables or records, the reading queries might be less performing, if they involve many tables. Databases with more than a thousand tables are difficult to consult since the number of crossings on these tables is large. Similarly, relational databases with a very high number of records (even billions of rows per table) and with a high number of columns can be affected by the same performance problem. In this case, a document model may be the most appropriate choice.
Otherwise, if your project involves more frequent writings, it may be more useful to adopt a relational model which, as we have seen, does not provide copies of data.
In a SQL relational database, being ACID, the problem does not exist: if the data is updated, each query executed after the update reads the new data.
In a document database, on the contrary, if the data is replicated in different collections, its updating must be propagated in all its copies: it is necessary to keep track of where each copy of the data is made.
In this regard, there are architectural solutions that decouple the writing on the main collection from the propagation on the copies through a queue system. For example, by using a queue system – such as Kafka – it is possible to update the various copies asynchronously, without having a central census point of all the copies: this facilitates the management of the copies’ writings.
In any case, a delay during the data updating must be endured.
During the development of a project, it may happen that you need to change the data schema to add, remove or edit properties. Based on the diversity of the data present and on the type of modification, this operation can be very complex.
In fact, the schema changes can impact all reading and writing queries. This means that for each change, a review of all queries – involving the modified tables – must be performed.
Suppose that you want to evolve the library database schema by adding further information on the author, such as literary prizes, won or participation in competitions.
In a relational model, you can add columns to the Authors table or make one or more tables related to the author.
In the first case, in order to take into account all the possible cases, it is necessary to add a series of columns, one for each information. This implementation allows, for each author, the census of only one data per property thus limiting the capabilities of the model.
In the second case, it is necessary to add a table for each case, with the related foreign keys. For example, you can create an Awards table, a Competitions table and two bridging tables between Authors and the other two tables. For any other information that you want to add, it would be necessary to create two tables. The limitation of this implementation is the maintainability of a large number of tables.
Using a document model instead, one possibility could be to add a field to the Authors collection document, inserting an array of documents: this way you can keep track of all the information requested.
In addition, by attributing a logical meaning to the null
value, you can distinguish who has not received awards from those for whom the information is unknown: if the field is an empty array, it means that the author has no awards; otherwise, if the field is null
, it means that the information cannot be found.
In complex architectures with complex business models, the existing applications are often based on relational databases, whose configurations are based on business requirements that have followed over time – new products, changes in regulations etc.
This evolution has necessarily transformed the underneath data models, to accommodate the new volume of information and organize it within multiple tables. The data models have been able to effectively respond to business requests by increasing their structure as the volume of information to be organized increases.
In an increasingly digitized world, companies are trying to reach users with as many channels as possible and to offer them updated and available information 24/7. The proliferation of data access points and the new data availability needs cannot be satisfied by those same systems, which are not predisposed to withstand such a load of requests. For example, the reading query that was previously performed once a day is now performed continuously, every time the user accesses the app or site. This leads to performance problems, since the different required properties are allocated on hundreds of different tables.
This problem is even more evident when the touchpoint requires data from different databases. Although some database engines allow you to execute a JOIN
between different databases, located on different hosts, this approach is not recommended for three reasons:
JOIN
can lead to inefficiencies;To solve this problem, Mia-Platform has developed an architectural solution called Fast Data which involves the creation of a document database that collects information asynchronously and organizes it to meet the needs of all touchpoints.
In this way,you can experience the benefits of both models: the applications write the data on the relational systems only once and the touchpoints interrogate the document database to access the appropriately organized data safely and quickly, without weighing down the systems underneath.
There is no silver bullet: both models have their strengths and weaknesses. We have seen that, in some cases, it is better to adopt a relational model, while in other contexts the document model best applies. Therefore, there is no better model than another, but only a model that, in certain circumstances, fits better according to the type of operation to be carried out more frequently.
If the number of readings far exceeds the number of other operations, the use of a document model should be taken into consideration, especially if the readings involve many tables or many data. On the contrary, in order to avoid the duplication of data at each update, a relational model is recommended when there are many writings on the database.
The article was written by Tommaso Allevi, R&D Leader of Mia-Platform.