Here comes microservices. Here we get hundreds of running nodes. And we get tons of logs. Then, we get lost in the logs, because most of the logs are not effective, or not so effective.

Lesson learned from my projects

In the design phase we don’t have session-id or track-id (or trace-id) in our mind. Although such things are not necessary for user functions but they do matter in the case of monitoring, debugging or user behavior analysis. According to privacy rules, the services cannot collect much personal identity information about the customer. The only information we can collect is the user’s internal id, which is opaque. And this internal id is also expensive to get because that involves remote calls.

When the service onboards to AWS ECS and all logs are punched to Splunk servers, it’s quite hard for engineers to track a given work flow. This is even harder if the user has multiple tasks running in parallel.

To fix this issue, the cost is higher than what we supposed. Because the desktop clients must be modified to accept the session-id or track-id, and pass the id for each sequent REST calls. This will also make the service kind of stateful, while not bad in this case because it doesn’t impact the service so much. The service don’t need to specially treat the id and just return the id as what it is in the HTTP request.

For REST services, the most straightforward way is to make the trace-id as an HTTP header. Once generated the id is kept unchanged for all the calls of the workflow. Regarding generation of the id, it can be generated by desktop client when the client initialize the work flow, or the id can be generated by the service on the first call (entry call) of a whole work flow.