Logging With Purpose - How to Capture What Matters

Nikoloz Gabunia
Software Engineer

Imagine the moment when you need to understand what is going on in production and all you see are [DEBUG] and [INFO] logs with polluted payloads — you start to ask yourself if you even needed those logs in the first place. Logging is one of the simplest observability tools to implement. Adding new log statements is easy, and more logs can create a false sense of safety. What makes it genuinely challenging is knowing what to log — and what to leave out.
At Techery, working across multiple services and projects, this became very real. The more our systems grew, the more log noise we accumulated — until we had to sit down and rethink what we actually needed from our logs.
Having logs is good, until it isn't
Logs are essential when we try to understand application behavior and figure out the problem. When used properly, they give you understanding on what happened, help with resolving problems and indicate if there is any concerning activity within the application. However, adding logs without proper consideration will add problems instead of helping to resolve them.
Excessive logging means pollution
Logging everything sounds good. In practice, it usually is not. Finding the required information may feel like finding a needle in a haystack. Instead of looking at the data, you will have to spend your mental resources to come up with a fine-tuned query that will filter out all unnecessary data. And time spent on finding the log you need is time that could have been used to address the reason you had to inspect logs in the first place.
For example, it is a common practice to propagate errors from the origin to the higher layers of execution. TargetEntity may throw an error, which will be propagated to MiddlewareEntity, which will be propagated to ExecutionEntity. Adding a log statement to an error is a good practice, but adding it to each place will only pollute your logs. In this case, there will be three logs for the same error. It will be hard to understand where the error originated and where it was propagated just by looking at the logs. You will have to spend additional time to check log traces or inspect your application and execution order manually. If the error was logged only once from the entity it actually originated there will be no need for additional clarifications and the issue could have been inspected straight away.
Logging without purpose increases your costs
Having logs means storing them somewhere. There are a lot of options to choose from. You can either choose one of the many cloud providers present on the market or you might decide to handle the storage yourself. Now imagine log statements that include all the information that slightly relates to them - request payloads, traces, object structures, etc. Such logs on average would require 2-3KB each. If you choose to go with a cloud provider ingestion would cost approximately $0.60 per 1GB (depending on region and provider). With 50M log statements daily that would be approximately 3.5TB or $2,100 per month. Properly selecting what information to include in log statements would reduce the size of such logs to 1-1.5KB easily. This means that when done properly your logging costs can be reduced to $1,100 monthly, an almost 50% decrease. Eliminate redundant and duplicate log statements and your costs can be decreased by another 10%-20%.
The story would be different with self-hosted infrastructure. There is no more provider who charges you for each GB of logs ingested. Now costs depend on purchasing and maintaining HDDs, making sure space never runs out, maintaining additional software and infrastructure to inspect stored logs. This is mostly reflected in human resource costs, time spent and additional staff required to maintain everything. Even though it is not possible to estimate exact costs in self-hosted cases, maintaining simpler log cases is easier than overcomplicated and bloated data ingestion.
Dealing with excessive logs makes it harder to reconstruct the flow
The more logs you have to trace, the harder it gets to tell which ones are actually important. Often problems are not caused by one particular part of your application but by related processes. When you have to deal with excessive logs - you spend more time on understanding what is important within the flow.
Think of an issue occurring during communication of separate services. Whereas the problem originates from one source, there might be multiple places where the issue is being logged. If all the reporting is done in a non-distinct way the origin of the issue will not be clear. Coupled with additional logs that can be generated during the failing process and the time to determining origin of issue increases from the moments needed to inspect trace of the exact log to spending hours on understanding everything in order to understand which places duplicate errors and which originate them.
Logging should be valuable
There can be different reasons why you would like to add new logs. It can be getting information on an unexpected error, it might be tracking something important, it may be providing some operational context, or there can be any other reason that is specific to that particular log. What all of these cases have in common is that they provide particular value. Understanding what you aim to achieve is the key to having valuable logs.
How to decide what to log
There are some general questions that can help to understand what the purpose of the log is and what information should be bound to it:
- Will this log be useful in the future?
- What context should be included to make it understandable?
- Will this log help someone decide what to do?
If you can answer those questions clearly, the log might be worth adding. For example, when implementing functionality that generally should be executed successfully but might result in unsuccessful execution, having logs on event of failure will help in the future to understand the reason. If there can be several known reasons of failure, indicating those will help narrow down possible root causes. Including input of the failed operation within the log will help to reproduce the failure and investigate it. With all of that we can decide that logging such information has a purpose and might be useful in case of failure.
On the contrary, indicating all of this in case of a successful execution will not be helpful at all. We already expect it not to fail and having such information will only pollute our logs instead of providing useful information that we can act upon.
Sometimes a particular log statement may seem to be useful but still lack important information. For example, we might have a retry logic implemented for some HTTP request and there can be a log informing us that the request was retried. It can be something like this:
1[INFO] {endpoint} request failed. Retrying...
Even though it is clear that something went wrong with the request, there is nothing that we can use to determine the cause and do something about it. In such cases asking the same questions in reverse might help to understand what the log lacks to have a better purpose. In this case we do not have context. So what information can we include to provide it? First of all there can be several reasons why HTTP request might be failing, adding that information will help to understand the root cause of failure. Knowing the failure reason will help to understand whether the issue is on application side or on the endpoint side and then appropriate action can be taken. This means that our log would look something like this:
1[INFO] {endpoint} request failed (Internal Server Error). Retrying...2{3 "retryReason": "Internal Server Error",4 "requestData": {5 ...requestRelatedData6 }7}
From the log above it is clear what caused retry logic to be executed and what data was part of a failed request. Based on the information you can decide if there is a need to adjust something with the way the request is being sent, determine if the issue is on the target service side or even remove retry logic for this particular failure scenario.
What information is generally useful
The information you would like to include in the log might be different from case to case, but there are things that generally apply to most of the logs:
- Severity or event type - is it an error, is it a warning, is it an indication of some rare edge-case execution that we need to be aware of? Identifying the event type will help to focus on what is important.
- Execution context - what has contributed to the scenario where we had to log something particular. What was the input, global state, results of related procedures? Having more context on the cause will help to determine possible solutions.
- Correlation or scope identifiers - an identifier that can indicate execution within the same scope. It can be a request scope, a procedure scope or anything that might be specific to the particular functionality sequence. Such information will help to understand the relationship between different logs and will provide a bigger picture when needed.
The difference between a useless log and a useful one often comes down to one thing: context.
For example, consider the case when order placement fails due to an issue on the payment provider side. A message like Something went wrong. Order placement failed does not provide any useful information. Let's first identify the type of the statement. As it indicates some kind of failure we can say it is an error. So our log statement will become [ERROR] Something went wrong. Order placement failed. Even though it indicates its type, there is still no relevant information so let's make it more structured and add relevant information:
1[ERROR] Order placement failed (PaymentProviderError).2{3 "service": "order-service",4 "requestId": "<unique-id-identified-single-request-chain>",5 "cause": "PaymentProviderError",6 "details": {7 "orderId": "<unique-order-id>",8 "provider": "ProviderName",9 "statusCode": "500",10 "transactionId": "<unique-transaction-id>"11 }12}
Now the log is much clearer and more structured. It makes clear that there was an error during order placement caused by a payment provider. There is additional information on context that will provide the necessary information to investigate the issue. From this log it is clear which order to check, which provider caused the issue, which transaction has failed. All this information can be used to decide on a course of action and is enough to find more information.
Conclusion
Over time, I learned that the value of logging does not come from volume but from information we can act upon. When logs are done right, you can move thorough log entries much faster and with less confusion. I've seen the results firsthand after we agreed on the structure of logs to be used in all our services and reduced the amount of unnecessary logs. Now, instead of trying different keywords just to find remotely related log entries we know exactly what to search for and get the most relevant information from the logs.