The flip side of rule engines on example of Drools and some valuable tips

Hey there! This post is devoted to business rule engines and solely written based on my research and experience with Open-Source Drools engine that has been used in Production for more than 1 year in mission-critical systems. Undoubtedly, this engine is great for many features. But, I want to explain you what you should think of before using it or when not to use it all. Because it is very important step, once you adapted it in your project, it will be quite costly to remove one later.  I am going to be quick on the theory here, pointing you at issues and important things. If you don’t understand something, feel free to ask me but first please refer to the official documentation. I’ll try to underline the major issue I’ve come across, surely there are many more.

A rule engine is not a magical box that does something new . It is intended to be a tool that provides a higher level of abstraction so you can focus less on reinventing the wheel.

On one of our projects we had a critical system with thousands of small “if then else if…” requirements from business that were initially developed as a sequence of checks on java. Gradually, we realized how this clutter turned into a very difficult mess to maintain. Hence we made a decision to apply some open source rule engine framework. After some high-level research Drools framework became our choice, primarily because it is free and open source. No one had experience on the team with commercial rule engines, however.

Ok, let’s get to it!

rule-engine-inkscape

The order of rules execution

Rule-engines are a good fit if you don’t know the order in which rules are executed, e.g. the order doesn’t matter. Frequently, in order to understand this, you have to decompose your logic in such a way that you have separated simple flat rules not depending on each other.  In a nutshell, Rule engine is like a black box that have inputs and outputs. This is a necessary but not sufficient condition. Besides, when rules change objects data, that are used in other rules, it greatly increases complexity. Hence you always should be vigilant on the order of execution of different rules.

Otherwise, if the order is determinate, frequently it would be better-off using BPM-engines or something else.

Clear domain model with stable public API

Rule-engines are not a fit if your system doesn’t have a clear domain model. Most non-mature systems don’t have one. It’s evolution, sometimes requiring to understand with time. Even clear requirements not always so clear in practice. Why am I saying this? Well,  because a good rule of thumb it that:

“Rules should be working with only the Domain Model of your system with clear public API around it. If they don’t, your rules ultimately will turn into unmanageable mess. Period!”

It seems to be natural, but many developers don’t realize it. Imagine, what happens if you rules start using internal code or other API that is not a part of domain logic.  With such a disgusting approach, the rules will be aware of more then they need to, like internal implementations/behaviors  and other dependencies = tightly coupled code. It’s just like in OOP or SOA principles but in terms of rules. Tightly coupled code/rules is very difficult to change. Needless to say, if this code is not encapsulated. Eventually, you will have to modify your code triggering broken rules very often, when business requirements change.

Indepotence or exactly once

All rules must run multiple times without errors yielding the same result. It’s like writing a bash-script and also considered a good rule of thumb. Otherwise you may encounter lots of misbehaving weird things.

Forget Debugging tools, %100 coverage with tests is needed

Rule-engines are difficult to refactor, you would have to analyze them and/or keep the whole graph in mind if there are dependencies. In terms of Drools it is impossible to trace rules line-by-line. Drools-rules are declarative and don’t have a mechanism to debug them. Therefore, you should ideally cover 100% of your rules with JUnit-tests. I can give you a hint to use Spock-framework on top of Groovy as a Behavioural-Driven Development stuff, where each rule is simply covered with a test. Tests usually come directly from business specs. Each rule should contain only 1 simple thing as minimal as possible.

Cost of change

No one has any idea if there are conflicting rules when a new one is added or existing one is changed.  A complete regression test must be done when rules are changed. The Memory is sometimes a big issue too. And you can do very little to reduce the consumption – you use 3-rd party framework.

Some rules need to be triggered twice or more

Somebody of you can claim that Drools has a feature of rule-invalidation of Facts, but this again turns into unobvious, tangled mess getting in the way to quickly understand why rules yielded such a result. It’s up to you, but you never know whether you will have to flush/invalidate rules in the long run.

Rete Algorithm vs your Domain API and traversals

The majority of modern rule engines including Drools work upon so called Rete Algorithm. Make sure that your codebase, which your rules call, can be built by a Rete algorithm in the working memory. If it’s not, tree traversal of your codebase within rules on such a model will be a big performance issue!  Bear in mind, if you have internal traversal logic inside your public domain API (e.g. some of your methods do traversal for you) happening on LHS (Left hand side operation) clause , it is very bad for Drools’ performance as latter has its own flexible querying model directly related to Rete-tree. Drools itself should be able to build the tree for you based on your API and efficiently traverse it.

Performance and external systems

What if you are required to save a set of operations to a database and each operation must be checked via rule engine. And the next operation depends on the result of the previous one. This imposes performance problems. All operation could be validated and afterwards saved in 1 go. Think it through in advance. I know there are Rule Groups in Drools, but my point – it’s very complicated.

Rules Centralization

The same business rules may apply across different services, leading to redundancy and governance challenges. It imposes a burden upon us to keep the content of those rules in synch over time.  Thus, it is usually recommended to move them to a new dedicated rule service. The downside is that such a service becomes an additional architectural dependency for other services. The responsibility to build and maintain centralized rule service can introduce various headaches. When we need to change currently active rules or introduce new ones we need to ensure that these changes to the existing rule architecture do not have negative or unforeseen impacts, the worst of which can be a cascading effect that causes exceptions that compound across multiple rules (and multiple services), which should be thoroughly tested.

And please! Make sure that your services know nothing about rules but only talk to the service via a clear contract.

Conclusion

This write-up only scratched the surface of issues coming from my experience. To be honest with you, I would say “No” to applying rule engine frameworks in my future projects to a large number of systems. Initially it seems to be pretty easy, but eventually you realize, that third-party rule engines are more of a pain that benefit, especially when it comes to interdependencies and a domain model. Project maintenance becomes very costly due to this.  It’s not as simple as coding a bunch of if statements. The rules engine gives you the tools to handle rule relationships, but you still have to be able to imagine all of that in your mind.

The most important thing, it is quite difficult to use a rule engine effectively and correctly. A team should have solid knowledge of how to use it and understanding with implementation of business model should be mature. Rule engines work well if there is a really flat independent logic among rules/facts. Each framework restricts you to use their specific model. It’s not worth it!

Advertisements