Seq

00010 : Logging and Debugging

00010 : Logging and Debugging

Foreword:

This post on microservices started innocently as a single post. They always do, don't they?

Before I knew it, it was 10k words and showing no sign of stopping and I was advised to split it up into a series, "nah, I said, it'll be fine".

I was then advised again to split it at least into two, the first few thousand in one and the rest in the other. I relented this time.

Perhaps it is just the nature of software, or perhaps it is 'scope-creep' (we've all been there right!) but, I've only gone and written a monolithic microservices post!

I see the irony here, so I've decided to break it apart into 'micro-posts'; thanks @adamralph :)

Each 'micro-topic' will become a 'micro-post' of it's own in our journey; so my apologies to those who wanted the full meal, your dinner will now be served as Tapas!


ServiceStack, a journey into the madness of microservices
  1. Context: the what and the why?
  2. Distributed debugging and logging
  3. Service discovery, load balancing and routing
  4. Service health, metrics and performance
  5. Configuration
  6. Documentation
  7. Versioning
  8. Security and access control
  9. Idempotency
  10. Fault-tolerance, Cascading failures
  11. Eventual consistency
  12. Caching
  13. Rate-limiting
  14. Deployment, provisioning and scaling
  15. Backups and Disaster Recovery
  16. Services Design
  17. Epilogue

Fr♣m ng pro▓ llem th e

In a monolith world with all the great modern tooling available to you, it is easy to load a project up in your IDE, set some breakpoints, and hit 'Debug'.

You can step through your code line-by-line and inspect its state.

Or you can write integration tests to assess the state of a system and to verify that the components of the system are interacting correctly with each other.

The single-threaded process has rock-solid reliability and low-latency for inter-process communications.

If you call a method on a class, you aren't concerned if it will reach that method-call or if will it take an unacceptable amount of time to get there.

Even at the boundaries of a process, things are often binary. If the database that runs the application or a file resource is unavailable, the application will crash or display an error.

You can check application logs, the system eventlog, the coredump or WinDbg type listeners to help you find, reproduce and fix the problem.

It's all in one place.

Next comes the multi-threaded process, to which nearly all of the above also applies.

How many of you have experienced race-conditions in one of your multi-threaded applications?

I know I have.

Even with the great tooling available, they are often subtle and hard to pin down. Introduce state mutation into the equation and you now need to worry about concurrency. There are locks and semaphores to deal with.

Is that class I used even 'thread-safe'?

Time and the order of execution between threads is no longer reliable, and the side-effects of this increase the level of complexity, of both your code and of your ability to reason about run-time state.

But they still have reliable, fast inter-process communication.

Now let's introduce asynchronous processing. Single-threaded and multi-threaded applications can be synchronous or asynchronous.

Now, within even a single thread, the order of execution is no longer reliable.

There is an excellent overview on the differences, but the take-away is that introducing multi-threading or asynchronous programming to your application can significantly increase the difficulty of reasoning about your system state as well as finding, reproducing and fixing bugs.

Yet they still have reliable, fast inter-process communication.

Now enter the distributed system, you probably see where this is going, don't you?

You are now in the world of remote procedure calls (RPC) across processes or networks. Each remote call is an order of magnitude slower and therefore, so is your application performance.

In our case, with ServiceStack, the RPCs use a request/response message-passing style.

Just as before, RPCs can be done either synchronously or asynchronously, but due to the performance of remote calls, using asynchronous processing is not really a choice, it's practically a must-have..

Now we have to ask ourselves - will calls be inexplicably duplicated? Will they even be invoked at all?

We have just lost cabin pressure. Communication is no longer reliable.

Down the rabbit hole we go, this stuff is hard and it gets weird, really fast.

Debugging this kind of system, which is executed piecemeal across processes and networks, just cannot be done with an IDE and access to a machine or applications logs.

Trying to reconstruct it in a test environment is an exercise in futility.

In the distributed system, debugging must be done in production.

This realisation forces you to approach the design of the system differently from the very beginning; if, you want to avoid creating the distributed equivalent of the Titanic that is!

So first up, logs have to be centralised to be able to reason about your system state, find errors and be able to trace the flow of events across each node.

Logging

Now, let's get specific shall we?

For ServiceStack, I've created a plugin on Github that builds upon the Request Logger to log to Seq.

ServiceStack.Seq.RequestLogsFeature

ServiceStack is fortunate to have some great people in its community and the plugin was quickly improved by a fellow member, thank you Richard.

Seq is an installable self-hosted service, with an HTTP API that is designed for log aggregation.

There is a readme available on the project, which I won't duplicate here, that covers setting up the plugin and using it in more detail, but I'll cover the basic code required to use it in your ServiceStack AppHost.

public override void Configure(Container container)  
{
   // Define your seq location, add the plugin
  var settings = new SeqRequestLogsSettings("http://localhost:5341");
  Plugins.Add(new SeqRequestLogsFeature(settings));
} 

That's it!

See everything!

The plugin now captures every incoming request to ServiceStack in Seq and is capable of logging every detail, not just the path but the headers, the request and response DTOs, execution timing, service exceptions and errors.

In addition, the logging detail can be modified at runtime, so when you need to debug in production, you can ramp up the level of detail logged. I'll come back to this again a few times in later posts.

So the first thing to note is that you are not storing plain text, you are storing structured data and it makes all the difference in the world.

With Seq that data is now easy to search, filter and aggregate using Seq's powerful query language and UI.

Logging in action

Having used Seq for a while now in other applications, I know how quickly it can help you identify and fix issues in your production systems. It's very easy to use and it comes with a free single-user licence. I highly recommend you try it out for yourself.

Tradeoffs:

Unless your logging receiver has high-availability (which Seq at this time does not have), we have just created our first piece of critical infrastructure as a single point of failure (SPF).

When any SPF goes down, bad things can happen, so throughout the series, I'll point these out.

Seq does however have a forwarder. This works against a local loopback address to buffer requests for forwarding onto your log server.

This helps with network unreliability by eliminating remote calls and the performance penalties that are associated with them.

An alternative is to use a UDP broadcast style of logging, like statsd. It may serve your circumstances better.

Our logging uses an http async 'fire-and-forget' style, so the network performance cost is reduced, but, if your service, network or Seq fails and you do not persist ServiceStack logging locally to disk or use a forwarder, you lose potentially valuable log data.

We could improve this plugin in future (PR's welcome!) to include a more resilient, guaranteed delivery to survive network outages; but for now we are not too concerned about this.

Our rationale will become more apparent as our design is revealed.

Debugging

The second part, distributed debugging, is a much trickier problem.

In distributed systems where there are many moving parts involved, the ability to reconstruct a timeline of events and state-mutations across your services is essential to being able to effectively debug and reason about system's state and overall health.

Enter the correlation identifier, our next plugin on the road to microservices

ServiceStack.Request.Correlation

Again this captures every incoming ServiceStack request and adds an Id to the request header. The service gateway's internal and external calls will pass this identifier on so that you can identify service to service calls from their point of origin in your logs.

Correlation

It is very early days for this plugin though. We need to refine it to be able to reconstruct a full map of service-to-service calls at each node, but for now, that is on our future road-map.

Being able to map out the calls is important for a couple of reasons which are worth mentioning at this stage though.

re'Curse' of the infinite loop

As your services become distributed, it is easy to create recursive calls, recursive calls, recursive calls, recu... :(

Recursive Call

Having a good timeout policy on all remote calls can help with this, but being able to add self-referencing checks in the correlation plugin can help with this by cancelling such requests.

Am I already in the call-map of the thing calling me?

Yes, byeee

the > never > ending > chain > of > calls > . > . >

Making remote calls easy and transparent to your services is really powerful, but it is also just as easy to abuse that power.

Long call chain

With each network hop, you increase the likely-hood of timeouts and the responsiveness of your API and their consumers suffers.

Having set limits on the length of call-chains can force distributed teams to be judicious in their use of dependencies (see left-pad!) and help foster collaboration between them instead to scale horizontally and keep the stack thin.

Further reading

There is some great research material if you are interested in reading more on this topic.

Google has a whitepaper on Dapper, a large-scale distributed systems tracing infrastructure, and there are a few implementations out in the wild to be found.

I am also looking at Vector Clocks of which you can find a C# implementation by the brilliant @kellabyte, but the fixed length limitations of this algorithm have led me towards Interval tree clocks as another possibility.

Another task for the future, is to capture requests at a lower level, like a proxy, so that tracing can work beyond ServiceStack calls to include data stores, files and external infrastructure resources.

Finally, there are also very interesting possibilities emerging from Joyent for run-time inspection and tracing using dTrace and containers worth keeping an eye on too.


That concludes part II, if I've missed anything, or you have your own great ideas or projects, let me know in the comments.

Also, we'd love others in the community to get involved with our plugins in Github so don't be shy.

:)

OK, enough of that for now, NEXT!

Let's do microservices!

next up: Service discovery, load balancing and routing

Logging ServiceStack requests with Seq

Logging ServiceStack requests with Seq

Following on from my last post where I used a serilog enricher for servicestack exceptions I wanted to log all requests in a ServiceStack AppHost to Seq.

Seq

If you don't know what seq is, it is described on their site as follows:

Seq is the easiest way for .NET developers to capture, search and integrate structured log events

Basically this means your apps can send rich log data to seq and have a really great experience around searching and filtering.

ServiceStack

ServiceStack provides this functionality in the RequestLogsFeature plugin, which by default creates an in-memory history of the last 1000 requests or optionally can be persisted using Redis.

If I can instead log these requests to Seq where all my apps log to, I can better track the flow of requests from applications to my API's.

With the RequestLogsFeature as my starting point I've created a plugin which logs requests to an instance of Seq. It does this by posting over HTTP/S to Seq.

The source is available on GitHub. You can also find it on nuget as ServiceStack.Seq.RequestLogsFeature.

Usage

In your AppHost add the plugin and specify your seq url and optional apiKey.

public override void Configure(Container container)  
{
  // Config examples
  Plugins.Add(new SeqRequestLogsFeature("http://localhost:5341", "seq-api-key"));
}

That's it. Simples!

You can optionally configure the same options the native request logger has, such as logging response bodies, session tracking etc and it shares all the same defaults.

Message template

Seq uses message templates to generate 'uniqueness' for types of requests.
For the RequestLogger I used the http://server/request/path ignoring querystrings to define my requestTypes on seq.

This hopefully provides a good way of filtering the logs by the server and service method called.

Logging levels

  • All 400-499 http status codes are logged as Warnings
  • All 500 > http status codes are logged as Errors
  • All other requests are logged as Info

Security

Unlike the native request logger, which uses AppHost roles, you control logging permission using api keys and read permissions to those logs via Seq users, Both local and active directory users are supported.

The other native logger defaults have been kept but if you are logging any potentially sensitive information, you should also consider configuring ssl on your Seq instance to ensure the seq post requests are encrypted.

Log Retention

Again unlike the native request logger, seq has it's own retention policies which give you fine control over which logs to keep and for how long.

Versions

I've created two different versions. For ServiceStack version 3.x, use the v3.x of the nuget package. For ServiceStack v4.x, use ... you guessed it, the v4.x version of the nuget package.

Bugs + Requests

Log any bugs and feature requests to github or submit a pull request if you're able.

Wrapping up

In one line of code in your AppHost you can now log all you api's requests to Seq.
Seq