Introduction
In the fast moving and high-pressure environments of exchanges and trading firms, where complexity and customer expectations continually rise, system supportability remains a crucial factor in making sure systems are able to deal with new market scenarios and challenges. In this context, the seemingly simple and prosaic process of logging extends beyond mere technical necessity; it is a finetuned process that (almost literally) communicates the story of our systems’ operation to a diverse audience. From support staff and developers to operations teams and even business users, logs offer a narrative that, done right, aids in problem resolution, system optimisation, and preventive maintenance.
Part of the Coding Process
While developing any new system or system component, the practice of writing and checking logs throughout the development lifecycle is essential to long-term supportability and sustainability. At Sinara, developers are trained to view logging as an integral part of the coding process. From the onset of a project, we identify key points within our applications that require logging, ensuring every critical action and decision point is accompanied by comprehensive and context-rich log statements. This proactive approach continues throughout the development cycle, with regular reviews of logs in consultation with the Sinara support team to help verify standards and adjust as necessary.
Clarity and Context
We write logs that are understandable across various levels of technical expertise, avoiding jargon and focusing on clarity. Our logs capture the start and end of discrete processes, offer periodic updates to signal system health, and use appropriate logging depth levels to differentiate between routine operations and potential issues. This approach ensures that critical information is never ‘lost in translation’, enabling effective decision-making and support.
Good examples of logging practices at Sinara include:
- Contextual information: Logs that provide enough context to understand what was happening within the system when the event occurred. For instance, including user IDs, transaction IDs, and other relevant details that can help trace the issue without needing to replicate the problem.
- Clear level indication: Using log levels appropriately (DEBUG, INFO, WARN, ERROR) to quickly filter and prioritise issues. For example, using ERROR for actual system errors that need immediate attention and INFO for operational entries that indicate normal behaviour.
- Actionable messages: Logs that clearly state what went wrong and potentially how to fix it. An example would be a log entry for a failed transaction that includes not only the error but also which part of the system was affected and suggestions for next steps.
- Consistency in format: Logs that maintain a consistent format across the system, making it easier for Sinara or client monitoring systems to parse logs. This typically includes standardised timestamps, consistent ordering of elements (timestamp, level, message), but also a uniform approach to describing events.
- Performance metrics: Including performance metrics in logs, such as processing times or memory usage, to help identify potential bottlenecks or performance degradation over time.
- Security and privacy: Ensuring that logs do not contain sensitive information, such as passwords or personally identifiable information, to comply with privacy laws and security best practices.
Tooling
With log files containing a rich source of information, Sinara regularly deploys various log analysis tools (both external and internally built) to help our support analysts with their investigations. One of the key tasks during any support incident is to extract a narrative around a reported problem, or even to identify what the issue might be in the first place. The root cause may lie in the interaction between multiple distinct processes, each with its own log file—being able to rapidly extract and collate this information in a timely manner helps resolve issues faster.
Summary
Good logging is an important part of good engineering, especially of large-scale complex software systems. In a marketplace where downtime equates to lost revenue and customer trust, detailed, context-rich logging is another key aspect in ensuring Sinara systems remain maintainable and transparent in their operation. This helps build trust, facilitates easier troubleshooting, and helps clients gain better understanding of how their critical systems operate.