After quite a while I’ve decided to continue writing my blog. There are always lots of ideas to share with you but as usual a lack of time.
Service Discovery is an architectural pattern and a key component of most distributed systems and service oriented architectures.
In simple terms it is a central gateway for all client applications that want to use other services. A client application doesn’t have to know where a particular service is located (usually IP:port), moreover a service can be moved/redeployed to arbitrary boxes for any reasons. There’s no need to change connection details, update configs or whatever – Service Discovery will take care of that. You as a client-app just need to ask it to get access to other services. The pattern yields a good benefit especially when there are hundreds of client applications and/or dozens of services.
Most of the work in this article was made relying on trials and errors as well some research. I urge you to read also a great survey from Jason Wilder’s blog on Open Source Service Discovery frameworks.
In this particular section I won’t mention CAP-properties as they can vary depending on implementation. But CAP is important. If you want to know more about it I recommend moving to this article devoted to the CAP-theorem.
High level components
Service Discovery is the process of finding a suitable Service and its location for a given task that asked a service consumer. Let’s break down Service Discovery into 3 main components:
Any consuming application that want to use some service, that is a client application or a consumer, user, etc.
In the most basic scenario there is a Service Provider that finds this service and talks to Service Requestor.
Service Registry stores information about all services. It can be either static or dynamic. This can be IP:port, access rights and so forth.
This model comes directly from SOA but can be adapted to your needs. Conversation among 3 above entities might differ depending on implementation. As a quick example of such segregation refer to Web Service Discovery.
1. Static service registry
This one is a very basic and easy option. A big drawback – it is static. Every time a service moves to another box, static repo should be updated manually. We won’t go further here as it is quite obvious and have little benefit for flexibility.
This pattern implies that as soon as a service is up, it registers itself in the Service Registry – sometimes such a notification is called Announcement. This simple approach greatly simplifies maintenance of service registry. It’s worth mentioning also that in distributed world we always care about No Single Point Of Failure principle where we have at least 2 redundant nodes for a service. Service Registry is usually implemented as a separate independent component to be self-sufficient. Just for simplicity the registry can be a simple replicated memCached, ehCache or another storage, not necessarily cache.
A few words about ZooKeeper as a solution for Service Discovery.
A counterexample is ZooKeeper that does’t follow Shared Nothing architectural pattern. Service Registry is coupled with the Service Provider. In ZooKeeper’s world coordination-nodes in ensemble are NOT independent of each other because they share Service Registry and there’s a write-contention due to the order of messages. The last word leads to ZooKeeper’s poor write-scalability due to its synchronous replication among nodes. But in terms of requirements most applications involved into configuration management including Service Discovery don’t require frequent writes but rather reads not blocking overall scalability.
Let’s pick up where we left off – announcement.
The process of announcement is depicted below:
Using IGMP-protocol, a Service provider subscribes to a particular group, all services send a multicast message to this group. The downside of this approach is that IGMP is optional for IPv4 and not every hardware supports it.
Once a service is registered in Service Registry a Service Provider needs to know if it’s alive and healthy (that is, 100% of service is available).
Option 1 – on reconnect: As soon as a Service Requestor understands that there’s no connection to the service it worked with, it reverts back to the Service Provider to get a new one. Each time a Service Requestor accesses Service Provider, one does health-checking (let’s say it is almost the same as heartbeats plus probably some other stuff) to make sure that the service is alive, maintaining the Service Registry and returning response to the client containing information about a healthy service.
Option 2 – heartbeating: Service Provider receives heartbeats within equal time-slices from registered services. Once heartbeat is lost, the Service Provider removes corresponding record from the Service Registry. This is a good option to get a more real-time information though it is more difficult to implement. This way works ZooKeeper and other distributed systems.
In both options if some service goes down clients get back to Service Provider. What strategy to use is up to the developer but option 1 is more complex but a better one in most scenarios.
Load balancing can be easily implemented inside Service Provider as one becomes a central gateway of access for all Service Requestors to other services. For the sake of simplicity a very simple variation is round-robin. I won’t go into details as it is a different topic and you can find on the internet.
Again as with load balancing you might want to apply tokens with expiration or cryptographic keys.
Cooperation of all components
On a very high level the communication of components can happend as on the following picture:
We have defined what a Service Discovery is, broke it down into 3 components and described a few basic approaches.