After quite a while this article has finally come out! I started to write it about 6 months ago but because of many factors in my life I procrastinated. This text is mainly devoted to those who want to see major pros and cons of this model. You won’t find here the nuts and bolts however. For this, move to the reference section where listed really cool URLs for the start. Further the pattern will be named as CQRS/ES for simplicity.
2 main goals of the pattern
Some developers consider that this pattern is solely used for scalability reasons forgetting that one greatly reduces data model complexity.
Events and snapshots – no removal but always append
it is well known that append-only architectures distribute more easily than updating architectures because there are far fewer locks to deal with. Append-only operations require no locks on a database-side which increases both performance and scalability.
CQRS + Event Model
One of the largest issues when using Event Sourcing is that you cannot ask the system a query such as “Give me all users whose first names are ‘Greg’”. This is due to not having a representation of current state.
With Event Sourcing the event model is also the persistence model on the Write side. This drastically lowers costs of development as no conversion between the models is needed.
Synchronizing write and read sides
The choice of integration model though is very important as translation and synchronization between models can be become a very expensive undertaking. The model that is best suited is the introduction of events, events are a well known integration pattern and offer the best mechanism for model synchronization.
CQRS suggests using 2 different databases: one optimized for reading (denormalized) and another for writing. It is a very tempting way to designing systems. The only big problem with this solution is that you need somehow to sync up your 2 data storages.
You must definitely have a mechanism to recreate a read-side from the write-side. You might need to do this if the read side data store got out of synchronization for some reason, or because you needed to modify the structure of the read-side data store to support a new query. Remember that the CQRS pattern does not mandate that you use different stores on the read side and write side. You could decide to use a single relational store with a schema in third normal form and a set of denormalized views over that schema. However, replaying events is a very convenient mechanism for resynchronizing the read-side data store with the write-side data store.
Storage technology replaceability
CQRS/ES makes it easy to change your technologies. For example, you could start with a file-based event store for prototyping and development, and later switch to a Windows Azure table-based store for production.
Relational vs NoSQL solutions and the cost of maintenance – think twice
Let’s mention major issues in case of using a relational database as a technology for this model.
Pitfall 1: Limit in number of columns per table
Read-side is always kept in a denormalized form for performance reasons and usually requires a huge number of columns. If you have e.g. Oracle 11g database and hit a maximum number of 1000 columns per table you won’t be able to add more columns.
Pitfall 2: Adding new columns
In many cases in order to add new data to the domain model you must release a bunch of applications/services along with changing a schema of a database on the read-side, especially when a column is not empty but already has some data. Here, I prefer a NoSQL databases such as MongoDB with relaxed schema that doesn’t require to change one (data are kept in JSON documents) thus you won’t have to change the schema of a database at all! CQRS/ES patterns with NoSQL db might have a simple Event Storage for the write-side with a blob-column in which data are represented as history and read-side as a schema-less NoSQL db.
Eventual consistency – think twice
While not required, it is common to use messaging between the write and read sides. This means that the system will be in an inconsistent state from time to time. This pattern is not a fit for e.g. financial transactions where a strong consistency is required between 2 sides. However, in majority of cases strong consistency can be used only on the write-side whereas the read-side is used for querying and reporting.
Simple vs costly
Events are simple objects that describe what has happened in the system. By simply saving events, you are avoiding the complications associated with saving complex domain objects to a relational store; namely, the object-relational impedance mismatch.
The event based model may be slightly more costly due to the need of definition of events but this cost is relatively low and it also offers a smaller Impedance Mismatch to bridge which helps to make up for the cost. The event based model also offers all of the benefits discussed in “Events” that also help to reduce the overall initial cost of the creation of the events. That said the CQRS and Event Sourcing model is actually less expensive in most cases. That said the CQRS and Event Sourcing model is actually less expensive in most cases.
As long as you have a stream of events, you can project it to any form, even a conventional SQL database. For instance, my favorite approach is to project event streams into JSON documents stored in a cloud storage.
Also a great benefit of the separate read layer is that it will not suffer from an impedance mismatch – It is connected directly to the data model, this can make queries much easier to optimize. On the write side you no longer need to worry about how your locks impact queries, and on the read side your database can be read-only.
Because events are immutable, you can use an append-only operation when you save them. Events are also simple, standalone objects. Both these factors can lead to better performance and scalability for the system than approaches that use complex relational storage models. On top of this events reduce network volumes unlike full snapshots in case of a pure CQRS model.
A normalized database schema can fail to provide adequate response times because of the excessive table JOIN operations. Despite advances in relational database technology, a JOIN operation is still very expensive compared to a single-table read.
Command: In most systems, especially web systems, the Command side generally processes a very small number of transactions as a percentage of the whole. Scalability therefore is not always important.
Query: In most systems, especially web systems, the Query side generally processes a very large number of transactions as a percentage of the whole (often times 2 or more orders of magnitude). Scalability is most often needed for the query side.
Distributed development teams
The architecture can be viewed as three distinct decoupled areas. The first is the client; it consumes DTOs and produces Commands. The second is the domain; it consumes commands and produces events. The third is the Read Model; it consumes events and produces DTOs. The decoupled nature of these three areas can be extremely valuable in terms of team characteristics. Instead of working in vertical slices the team can be working on three concurrent vertical slices, the client, the domain, and the read model. This allows for a much better scaling of the number of developers working on a project as since they are isolated from each other they cause less conflict when making changes. It would be reasonable to nearly triple a team size without introducing a larger amount of conflict due to not needing to introduce more communication for the developers.
There is no one right way to implement CQRS. It is not recommended to use this model for critical projects of course if you have no experience with one. Note that It is not possible to create an optimal solution for searching, reporting, and processing transactions utilizing a single model.