By choosing the document model, it is possible to make some data consistency checks, but not others. For example: by using the third approach, it is possible to survey the books of a non-existent author. Indeed, in the Books collection, you could insert a document whose author is not registered in the Authors collection.
Reading access to data
By using the relational model, the data is always accessible. As a matter of fact, it is possible to access all the information from both tables: starting from Authors and – through
JOIN – it is possible to find the data of the books and vice versa.
SELECT * from Authors JOIN Books on Authors.id = Books.author_id where Author.id=??;
SELECT * from Authors JOIN Books on Authors.id = Books.author_id where Book.id=??;
Both readings involve both entities, so the editing to the schema affects the reading queries: adding or removing a property the query can return more or fewer properties. As a matter of fact, if you add a column to the Authors table, both read queries will not be affected.
Using the document model and referring to the three hypotheses mentioned above bring to three possibilities:
- In the first case, the access is simple if you have to locate an author, but to locate a book you have to select the author and all his books will necessarily return;
- In the second case, the access is simple if you have to locate a single book, but in order to locate all the books of an author, you have to look for the author in the Books collection as a grafted property. In this way, however, you would have a repetition of the author information for each book;
- In the third model, however, by choosing the starting collection, you can easily obtain the necessary data without duplication of information.
In the document model, therefore, it is very important to choose the most suitable data model for the required reading activities.
Writing access to data
The relational model allows you to write the data in one place. In this way, updating or removing a row can be done by a single command without the use of
JOIN: automatically, all new reading queries will inherit the change that was made to the data.
Based on the approach, the document model can provide for a greater complexity of writing implementation:
- In the first approach, updating the Author or a Book can be executed by a single request to the database;
- In the second case, updating a book can be performed with a single query; even the updating of an author can be performed with a single query, but this must consider different documents;
- Finally, in the third case, which involves both collections, updating the book or author must be performed with two different queries (one per collection) to ensure data consistency.
Therefore, the document model might not respect the atomicity and consistency requirements of the transactions: the writing must be performed in two sequential actions, thus violating the atomicity principle. There is also a time, albeit short, in which a collection contains some documents that have not yet been updated.
Which should you choose?
As always, there is no silver bullet: it depends on the type of project and on the critical issues of the application.
However, we can highlight some questions that can be useful in choosing which approach to use.
Do you foresee more readings or more writings?
If you think of a book management application, such as a library, it is more likely that the readings will be executed more frequently than the writings, since the catalog consultation is the main feature. As a matter of fact, the requests for writing would be limited to the booking and/or editing of data, less frequent compared to consulting the catalog.
In these types of contexts, writing and reading operations have different weights.
In a relational database, which includes a large number of tables or records, the reading queries might be less performing, if they involve many tables. Databases with more than a thousand tables are difficult to consult since the number of crossings on these tables is large. Similarly, relational databases with a very high number of records (even billions of rows per table) and with a high number of columns can be affected by the same performance problem. In this case, a document model may be the most appropriate choice.
Otherwise, if your project involves more frequent writings, it may be more useful to adopt a relational model which, as we have seen, does not provide copies of data.
Is the synchronization important?
In a SQL relational database, being ACID, the problem does not exist: if the data is updated, each query executed after the update reads the new data.
In a document database, on the contrary, if the data is replicated in different collections, its updating must be propagated in all its copies: it is necessary to keep track of where each copy of the data is made.
In this regard, there are architectural solutions that decouple the writing on the main collection from the propagation on the copies through a queue system. For example, by using a queue system – such as Kafka – it is possible to update the various copies asynchronously, without having a central census point of all the copies: this facilitates the management of the copies’ writings.
In any case, a delay during the data updating must be endured.
Do you foresee many evolutions in the database schema?
During the development of a project, it may happen that you need to change the data schema to add, remove or edit properties. Based on the diversity of the data present and on the type of modification, this operation can be very complex.
In fact, the schema changes can impact all reading and writing queries. This means that for each change, a review of all queries – involving the modified tables – must be performed.
Suppose that you want to evolve the library database schema by adding further information on the author, such as literary prizes, won or participation in competitions.
In a relational model, you can add columns to the Authors table or make one or more tables related to the author.
In the first case, in order to take into account all the possible cases, it is necessary to add a series of columns, one for each information. This implementation allows, for each author, the census of only one data per property thus limiting the capabilities of the model.
In the second case, it is necessary to add a table for each case, with the related foreign keys. For example, you can create an Awards table, a Competitions table and two bridging tables between Authors and the other two tables. For any other information that you want to add, it would be necessary to create two tables. The limitation of this implementation is the maintainability of a large number of tables.
Using a document model instead, one possibility could be to add a field to the Authors collection document, inserting an array of documents: this way you can keep track of all the information requested.
In addition, by attributing a logical meaning to the
null value, you can distinguish who has not received awards from those for whom the information is unknown: if the field is an empty array, it means that the author has no awards; otherwise, if the field is
null, it means that the information cannot be found.