Designing Data-Intensive Applications Notes/Overview Chapter 2 Part 1

Anthony Johnson
5 min readDec 16, 2020

So first off I'm writing this with dictation on my Mac so I don't have to write manually, and I can just word vomit after sections. There might be some weird grammatical errors I will do my best to check. This is a very dense overview of both data models and query languages, So I will be breaking this into multiple parts so this article will be mainly focusing on data models and the next article will focus on query languages.

The limits of my language are the limits of my world.

— Ludwig Wittgenstein

Data models are extremely important because they change the way that we think about our data when we are developing applications. A lot of things are abstracted away from us. Something very important for developers to look at is not how they are writing their code, but how their code is going to be interpreted the next level down.

In the Book “Designing Data-Intensive Applications the author breaks it down to four distinct layers of abstraction in regards to an application. The first being, how developers look at the real world. That view is turned into an object model which in turn you have to determine how you want to store within a database and also in what format you want to deliver that information whether it be JSON, XML, tables, or in graph models.

The main difference between relational and document databases is that one has a set structure in how It should be implemented which will be a relational database built off of tables as opposed to a schemaless Database which has a lot more flexibility but does not have the same ability when it comes to direct access to otherwise nested data.

NoSQL stands for not only structured query language so basically what that means is there NoSQL database is not bound by the normal confines in which relational databases are bound but there is no set standard to a NoSQL database.

Couple of things that NoSQL databases are great at or when you would use a NoSQL database and why NoSQL databases came to be.

  • the need for greater stability and ease.
  • the preference for open-source software over commercial databases.
  • specializations are built-in to a quarry language for specific tasks.
  • frustrations are inherent in the restrictiveness of relational schemas.

Object-relational mismatch. OK so the object-relational mismatch is what you see when you are trying to figure out how you’re going to translate an object into fitting a relational database schema so normally you have frameworks like Entity that is used with.NET called object-relational mapping frameworks these frameworks are used to make it easier to translate storage from an object into a table based database.

Normalization versus denormalization so this is the idea that you shouldn’t have to replicate human thought or you shouldn’t have duplicates of the same repetitive thoughts for example if we had 30 people that were from the same city but maybe they call those like the areas that they’re from by Different names or maybe had a different way of spelling that certain cities name when you have normalization you take out the human interaction or the human thought process that can create discrepancies across your platforms so normalization just put a set standard that can greatly expedite change and ensure that your system is on the same normalization also greatly eases the process of searching and categorizing.

Notes normalization requires many to one relationship which is not ideal for a document structure. This is because of the absence of joins.

In a document model, your values are just strings they’re not things opposed to it a relational database you’re able to assign ideas that are going to point to other things that have real meaning because they are being used and understood across an entire platform.

Note the issue or controversy between relational and no SQL/document databases is nothing new these are issues that were being discussed back in the 70s when you had Dykstra and people like him. Back in the 70s you had to the relational model and the network model the relational model turn into what we now know as SQL in the network model was something similar to our document model and you could also think of it as a giant linked list worth of storage that was kind of difficult to navigate through as opposed to the immediate access you get with a relational database. one example of a network model is the CODASYL model… look it up. Also, something cool to note about the network path is that the only way to access records was to follow a path from the root of the record called access pass and that is almost a direct quote from the book.

The section on comparison to document databases.

Document models have document references and relational models have foreign keys. If this doesn’t make sense it will what do you have experience with both database types.

If the application has a lot of many to many relationships then using a document bottle can cause a lot of unnecessary complexity and slow down performance.

OK so there are two types of schemas one is a scheme on read the other is a scheme on write and the basic difference or the best way to think about it is that what is dynamic and the other is static. Meaning that if you’re trying to make changes in a schema on read those changes are going to have to be made before the change occurring as opposed to the schema on write, Your database will flex as your requirements flex so for example if I had to change a full name to a first name and last name I could just write that in my code and it’s gonna automatically adapt as I’m updating records and you can put code in there that’s going to reflect the changes that you want to have made.

Storage locality, Storage locality has to do with how to spread out the information that relates to a single object is within your database for example if you have a document database then all of your information or a majority of your one to many relationships are going to be stored within that one object opposed to in a relational database Where all of your information is found by reference so you’re gonna be making multiple different queries to get all of that information into one spot that’s where your joints coming to place which I do have a huge advantage in some situations but not all.

This advantage is much more prevalent when dealing with a large document that requires many parts at once. Also to note there can be issues with dealing with Document models that are the size of your document is changed This could affect the performance or have different unique implications.

Not sure if I already noted this within this article but non-relational or not relational databases do have means of joining data for example imago DB supply drivers automatically resolve some of the database reference issues and you also have databases like rethink DBA which was mentioned in the book.

There is no definitive champion of data models if you’re thinking about relational versus document at the end of the day a hybrid of the two will likely be the best route for the future

To Be Continued……

--

--