C#

00011 : Service discovery, load balancing and routing

00011 : Service discovery, load balancing and routing

ServiceStack, a journey into the madness of microservices
  1. Context: the what and the why?
  2. Distributed debugging and logging
  3. Service discovery, load balancing and routing
  4. Service health, metrics and performance
  5. Configuration
  6. Documentation
  7. Versioning
  8. Security and access control
  9. Idempotency
  10. Fault-tolerance, Cascading failures
  11. Eventual consistency
  12. Caching
  13. Rate-limiting
  14. Deployment, provisioning and scaling
  15. Backups and Disaster Recovery
  16. Services Design
  17. Epilogue

In the previous post, I covered the challenges associated with debugging and logging RPC calls across distributed systems. Now let's turn our attention to how those RPC calls work in your services.

This boils down to the fact that services using RPC calls rely on services that are in another process.

As a system grows, and services are added or removed, keeping track of what services are available and where they are becomes an issue.

You could hard-code in each service the locations of the services it depends on, but that tends to break down once you have two services and need to add the third!

Once you need to run multiple instances of the same service, or use containers and elastic scaling, suddenly DNS propagates too slowly and you don't know where everything is.

Re-deploying your service every time another service on which it depends is updated or moved means you must decouple these type of dependencies between services.

It quickly becomes apparent that you need a more dynamic solution.

You need service discovery.

O Services, Services, wherefore art thou Services?

There are a number of tried-and-tested methods for discovery to be found in DHCP, Bonjour, uPnP, SSDL and DNS-SD. For web-based services, UDDI and WS-Discovery have come and - for the most part - gone.

Newer solutions like Zookeeper, Etcd and Consul have emerged to offer service discovery.

Gateways like NGINX also provide routing options which can be used for decoupling service-to-service calls.

Enterprise Service Bus systems like NServiceBus and MassTransit also can be used in a pub/sub messaging pattern to decouple service-to-service calls.

I've mentioned just a few but there are many more. You have a lot of options here, so how do you choose?

Let's first briefly cover some different patterns before I cover what we have chosen to use and why.

Centralised Registry vs. Self-Discovery

There are two common patterns that you find in solutions for Service Discovery.

The first is the service registry, a centralised database that stores the location of a service.

The second is self or auto-discovery where there is no central database and is often found in zero-configuration networking. Instead, clients use a variety of approaches to broadcast packets across a network to request a remote service and wait for the required service to respond with its location.

The service registry is another single point of failure (SPF) in your infrastructure but can provide more operational control. When used with server-side discovery, which is often found in gateways, it can completely decouple any discovery logic from the services.

Zero-configuration networking can be generous on security within networks to permit devices to 'just work' but can be more challenging to secure as systems span networks. It is often more suitable for smaller networks (uPnP, Bonjour etc.).

Communication

There are four common types of service-to-service communication.

Communication

  1. Point-to-point : services talk directly to each other.

  2. Gateway : acts as the middleman, handling the routing of requests and responses between services.

  3. Gateway Request : the responding service replies directly to the calling service rather than return through the gateway.

  4. Message Queue: services publish messages to a queue, the responding service subscribes to the messages published and in turn publishes its response to the queue for the original service to subscribe to.

Point-to-point involves the shortest route so is often the quickest but requires each end-point to take a dependency on your discovery mechanism.

The gateway can decouple many concerns from your services, handling not just routing, but caching, front-end to back-end bridging with HTTPS termination, transport conversions like HTTP to TCP/IP, formats, aggregation and load-balancing, to name just a few.

The message-queue pub/sub model is slower and is more suited for longer running processes.

Registration

For service registries only, the registration can be handled by each client directly or by the server.

As with server-side discovery, server-side registration completely decouples registration from your services.

Further reading

I've only really scratched the surface on the above keeping the explanations as brief as possible, as I want to get on to some specifics, but you can find a much better, more detailed overview of Service Discovery in Chris Richardson's excellent post as part of his series on Microservices.

Chris also has many video talks and articles available online and speaks very eloquently on all matters relating to distributed design which I have greatly enjoyed during my own research. I highly recommend checking them out.

It's make your mind up time.

So this is the first critical point where we had a variety of choices to make in our design.

Do we want smart versus dumb pipes? How about decentralised control with auto-discovery? How does our communication behave? Who controls registration? Is one single approach for all scenarios even practical?

For us they are opinionated and deliberate choices.
Our approach that follows is not inherently better or worse, but each choice has consequences for many of the subsequent design decisions. In many cases, they can actually remove choice.

We will come back to reference these choices in the rest of this series.

It is also worth pointing out that I couldn't try out everything available, so our choice is not a reflection on other solutions out there, it is just the one I felt best fit ServiceStack and suited our needs.

And the winner is

Consul, let's cover the basics of Consul before we tackle how it fits in with ServiceStack.

Consul is a single binary executable that can run on Windows, Linux or iOS. It can run either as a Service, an Agent or for sending Commands to other Consul instances.

We use it as a service registry with client-side self-registration, client-side discovery and this enables point-to-point service RPCs.

Consul Datacenter

Consul, like all service registry patterns is a potential SPF, but is designed for High Availability in mind.

In production, you run an odd number of Server nodes which form a DataCenter (DC), typically three or five. You can scale Consul to connect multiple datacenters.

The odd number is because it implements a consensus protocol based on RAFT which holds leadership elections, and they need a deciding vote to elect a leader.

For the best possible resiliency, server nodes can be spread across physical hardware, network locations and operating systems. Running three instances allows a single node to fail while running five can tolerate two node failures.

Consul is actually a hybrid model of server and client-side, something also found in Netflix's Eureka. This approach avoids one typical drawback of client-side discovery and self-registration systems i.e. network availability and latency.

It avoids this by using local agents on a loopback address.

Consul DataCenter and Agent

Each service has access to an agent co-located on the same physical hardware. Consul uses a gossip protocol Serf for managing membership, failure detection and message broadcasting and RAFT logs to keep each agent's list of services synchronised.

This means lookups and registrations are local and fast with no network hops.

ServiceStack [enters stage left]

This is my discovery solution, there are many just like it, but this one is mine.

So now we've made our first design choices, let me introduce our next plugin.

ServiceStack.Discovery.Consul

There is a detailed readme on the project which, as in previous posts, I won't cover here, but the minimum code to configure discovery in your ServiceStack AppHost is as follows:

public override void Configure(Container container)  
{
    SetConfig(new HostConfig
    {
        // the external url:port that other services will use to access this one
        WebHostUrl = "http://api.acme.com:1234",
    });

    // Register the plugin, that's it!
    Plugins.Add(new ConsulFeature());
}

Your ServiceStack instances can now communicate with each other requiring nothing more than a copy of the DTO POCO. This is where ServiceStack and it's DTO message-driven style really shines.

You interact with local and remote services solely through simple DTO POCO message contracts.

For most service discovery solutions, you have to know first which service you want to call. Not so for our plugin.

The difference in calling a local or remote service is indistinguishable in your code.

public class MyService : Service  
{
    public void Any(RequestDTO dto)
    {
        // The gateway will automatically use the DTO type to find the correct service
        var internalResponse = Gateway.Send(new InternalDTO { ... });
        var externalResponse = Gateway.Send(new ExternalDTO { ... });
    }
}

This makes it easy to develop all your services in a single instance. You can then split them out as you need to scale, but your calling code remains exactly the same.

There are no references and no uris.

Just look at the code and let that all sink in for a second.... it's more ServiceStack magic and it's so simple, it has caused a few WTFs!

Behind the curtain, the wizard is revealed

So how does it work?

Discovery

When the AppHost starts up, it registers itself with Consul. In doing so it passes a list of all the DTOs it is able to process.

Combined with ServiceStack's ability to export its DTO's and its native pre-defined-routes this makes it easy to move service methods between projects.

To call a remote method, the callee service only needs to have a copy of the DTO (the contract) with the correct name and structure as the remote service.

The gateway will recognise any DTO it cannot process itself and instead look up the correct service from Consul.

This allows our plugin, with Consul's help, to provide automatic and completely transparent DTO routing.

This also avoids the overheads of message-bus and gateway-style discovery by allowing point-to-point communication between services.

The verbiage on verbs

It is worth expanding slightly to cover how HTTP Verbs work in ServiceStack.

By default on the ServiceClient.Send() and Gateway.Send() or Gateway.SendAsync(), the verb will default to use POST.

There are two methods by which you can control this behaviour.

The first is to use the verb specific methods available on the ServiceClient:

var externalDto = new ExternalDTO();  
var client = new JsonServiceClient("http://myservice");

// HTTP GET
client.Get(externalDto);

// HTTP PUT Async call
client.PutAsync(externalDto);

// HTTP DELETE call
client.Delete(externalDTO);

The second method, which is only available to the Gateway, and the one we therefore have to use, is the IVerb interface markers on the DTOs.

public class ExternalDTO : IGet, IReturn<ExternalDTOResponse>  
{
  ...
}

// Gateway.Send() + IGet is an alias for Gateway.Get()
Gateway.Send(new ExternalDTO());  

The approach also helps decouple the HTTP verb specifics of any external calls from your call site and instead makes the DTO responsible for defining how it is sent.

But wait, there's more...

In addition, Consul provides another piece of the infrastructure jigsaw which our plugin handles for you - service health which we will cover in our next topic.

The gateway will also select the correct format for retrieving the DTO. If your remote service only communicates in XML, it will transparently call it using XML but return you a POCO.

It will also automatically cache responses from a GET request according to the remote service's cache settings. In some cases, it will not even issue an RPC, instead returning you the DTO response straight from the cache.

Our future roadmap also includes configurable time-out, retry and cache fall-back policies.

Let's get down to brass-tacks, how much for the API..?

We think the simplicity and low-ceremony approach above is really compelling, but it doesn't come for free. There are opinionated choices we've made to allow it to work this way.

So this is where we cover the consequences of those decisions and the first one is a whopper.

We've thrown RESTful routing under a bus

Oh my!

Hiding from RESTafarians

Now we have reasons for this which I cover next in routing. It may be possible to make this work with Consul, but I don't yet see a way to make it robust nor elegant.

DTOs MUST be globally unique.

This one is actually part of the ServiceStack guidelines anyway so we don't feel bad about this at all.

The third is another whopper which I have a whole topic devoted to later on so for now, I won't clarify further but instead lob this like a grenade into the fire-pit.

You cannot EVER make a breaking-change to a DTO

Run Away!!! <Runs away>

Routing

Instead of REST and all the great custom and fallback routing options in ServiceStack, we have chosen to use only ServiceStack's pre-defined-routes.

Together with our second consequence of globally unique DTOs, this allows the RPC routing to just work with Consul.

So let me try and explain why we've not only ignored RESTful routing, but will actively seek to prevent it being used directly in our Services.

There are a few reasons behind this but first it might help to clarify that we plan to use services internally at first, but later on expose them externally using a Gateway to be built on top of Consul.

Internally, with ServiceStack's ServiceClient and the DTOs, you already have fully end-to-end typed API calls so never really need to see a URI, let alone care what they are, this isn't so bad for them.

We expect that most of the internal calls will use this typed approach.

You can use custom routes, and the service-to-service calls will even use them. This is not really the problem area though.

Any non-ServiceStack client that wants to consume the services would have to go via Consul to find the right service, and Consul doesn't know a thing about your custom routes.

This affects the few internal apps or services that do not use the ServiceStack client and probably the MOST important group, the external clients.

Friends don't let friends break contracts

Hey Bob,

thank you for being a loyal customer, you mean the world to us.

Because we love you so much Bob, were superduper excited to announce our brand new [feature] and tell you how it will change your life.

You'll literally forget your own name, that's how amazing it is!

Here is our super-secret incrementing beta code, just for our most special customers, like you Bob.

Code: 37,027,491

Thanks again Bob, you're so amazing!!!

p.s. [Feature] requires you re-write all existing integration before launch at 3pm EST tomorrow :)


$#c*$%g WHAT?!

In accessing any external resource, the last thing you want as a consumer, is for that contract to change.

...ever.

It's painful, it involves additional work you can't plan for, work you don't have time to do.

In HTTP, these are contracts:

// Fragile, things which could change are both 'ordered' and 'embedded'
http://api.acme.com/account/123/orders/12352/shipped/2016/01

// Fragile, change requires running multiple endpoints and causes 'churn' for clients
http://api.v2.acme.com/anything

// predefined route *never* changes, DTO is the contract and *will not change* 
http://api.acme.com/sync/reply/accountorders  

In code, these are contracts

// Fragile, change to signature or return type, breaks clients (see WCF, WebAPI)
public string GetAccountOrders(int id, bool includeCompleted) { ... }

// message contract, any change to DTO, does not *have* to break clients
public AccountOrdersResponse Get(AccountOrders request) { ... }  

Contract stability is of paramount importance, but addendum's to contracts are OK.

So clumsily put, if we ensure our DTOs are backward-compatible, we have far more stability in our contracts. Contracts that can tolerate change. Contracts that instil confidence and the trust of consumers.

Another reason for avoiding custom routing in ServiceStack is the complexity of making it work correctly.

In what order do I add this service's routes to the routing table?

Will a fall-back or over-generous catchall route suddenly grab all other services requests?

Will the new dev/team remember to respect the guidelines?

As I mentioned previously, adding an external gateway is part of our future plans and we expect it to handle things like load-balancing, traffic shaping and SSL termination, all in one place, rather than in each service.

If in that future, we must have RESTful routing, it will be as a decoupled, globally managed concern in that gateway, carefully managing the mapping of routes to services. Even this though, by its nature, is static and prone to 'churn' in such a dynamic environment. (see schema changes in ORMs)

We are currently looking at a few options for Gateways so I'll simply mention one that stands out so far, Fabio

It looks to have great integration with Consul and avoids the need for more complex Consul-template solutions. Another one for the roadmap.

load-balancing

Finally, for this (not so micro)-post we come to load-balancing or
the ability to distribute requests between multiple instances of a service.

Definitely our weakest area of the three right now, we have some plans and ideas but they are still in their infancy.

Consul provides service-to service calls with a not-really load-balancing version of load-balancing.

It keeps track of round trip times (RTT) for its agents using network co-ordinates.

If you have multiple instances of a service available to process a DTO, Our plugin will sort these by the agent RTT, giving you the most responsive.

This isn't really load-balancing, more QoS, but it is useful nonetheless and worth mentioning.

Another thing Consul gives us is in how it maintains separate service catalogs per datacenter. Using this ability, we could locate datacenters and their services in different geographic regions to even out global traffic loads.

For true load-balancing though, we have to look for other solutions and they lie outside of each service.

A gateway is the most obvious candidate for this and Fabio allows you to split traffic between services based on rules, useful for things like canary deployments as well as more traditional load-balancing.

In the world of microservices however, we actually have all the ingredients we need to make something ourselves if we need to.

Having a service registry in Consul with RTT, Health and performance metrics information from logging for every service end-point opens up interesting possibilities for using that data. Combined with a good automated deployment pipeline, there are possibilities for elastic scaling. I'll explore this in more detail in the deployment topic.

Wrap it up, chuck!

There is a constant tension between how much 'smarts' you put into each service and how much is centrally managed. We are trying to find a good balance.

The service discovery and registry is a fundamental part of our overall design though. I think it allows us to decouple a lot of the other parts we will need on our journey to microservices.

Parts that can be independent, composable, infrastructure-centric microservices of their own because of this design.


So at last we come to the end of part III.

There was a lot to cover here and there are parts I feel I haven't explained as well as I could, and parts I have skimmed over or left out entirely.

Definitely a couple of things to divide opinions.

If I've missed anything, or you have your own great ideas or projects, let me know in the comments.

Also, we'd love others in the community to get involved with our plugins on Github so don't be shy.

:)

so without further ado...NEXT!

Let's do microservices!

next up: Service health, metrics and performance [coming soon]

Parsing Enum Strings

Well I thought I would kick off the blog with this handy generic method to parse enum strings.

public static T ParseEnum<T>(string enumText)  
{ 
    return (T) Enum.Parse(typeof (T), enumText); 
}

So lets compare the usage generic vs. non generic

public enum Vehicles  
{
    Bus,
    Car,
    Bike
}

public void TestEnumParse  
{
    //Create a enum value
    Vehicles myVehicleEnum = Vehicles.Car;

    //Get the enum string
    string vehicleString = myVehicleEnum.ToString();

    //Convert the string to an enum using non generic method
    Vehicles nonGenericParse = (Vehicles) Enum.Parse(typeof (Vehicles), vehicleString);

    //Convert the string to an enum using the generic method
    Vehicles genericParse = Helpers.ParseEnum<Vehicles>(vehicleString);
}

You could create ParseEnum as a string extension method however I’m not sure its a good idea to have a parse enum method on every string created. I use the generic method from a static helper class and I think its a much cleaner way of converting the enum string.