Logging Design and Best Practices

Logging Design and Best Practices

Introduction

The design and implementation of centralized logging is typically an afterthought, and generally it is only when problems arise that a centralized logging solution is deployed as an analysis aid. Such a logging function is essential, however, when designing a Containers-as-a-Service (CaaS) platform using MKE, MSR, and MCR.

What You Will Learn

This reference architecture provides an overview of how Docker logging works, explains the two main categories of Docker logs, and then discusses Docker logging best practices.

Understanding Docker Logging

Before diving into design considerations, it’s important to start with the basics of Docker logging.

Docker supports different logging drivers used to store and/or stream container stdout and stderr logs of the main container process (pid 1). By default, Docker uses the json-file logging driver, but it can be configured to use many other drivers by setting the value of log-driver in /etc/docker/daemon.json followed by restarting the Docker daemon to reload its configuration.

The logging driver settings apply to ALL containers launched after reconfiguring the daemon (restarting existing containers after reconfiguring the logging driver does not result in containers using the updated config). To override the default container logging driver run the container with --log-driver and --log-opt options. Swarm-mode services, on the other hand, can be updated to use a different logging driver on the go by using:

$ docker service update --log-driver <DRIVER_NAME> --log-opt <LIST OF OPTIONS> <SERVICE NAME>

What about Mirantis Container Runtime logs? These logs are typically handled by the default system manager logger. Most of the modern distros (CentOS 7, RHEL 7, Ubuntu 16, etc.) use systemd, which uses journald for logging and journalctl for accessing the logs. To access the MCR logs use journalctl -u docker.service.

Docker Logs Categories and Sources

Now that the basics of Docker logging have been covered, this section explains their categories and sources.

Docker logs typically fall into one of two categories: Infrastructure Management or Application logs. Most logs naturally fall into these categories based on the roles of who needs access to the logs.

  • Operators are mostly concerned with the stability of the platform as well as the availability of the services.

  • Developers are more concerned with their application code and how their service is performing.

In order to have a self-service platform, both operators and developers should have access to the logs they need in order to perform their role. DevOps practices suggest that there is an overall, shared responsibility when it comes to service availability and performance. However, everyone shouldn’t need access to every log on the platform. For instance, developers should only need access to the logs for their services and the integration points. Operators are more concerned with Docker daemon logs, MKE and MSR availability, as well as service availability. There is a bit of overlap since developers and operators both should be aware of service availability. Having access to the logs that each role needs allows for simpler troubleshooting when an issues occurs and a decreased Mean Time To Resolve (MTTR).

Infrastructure Management Logs

The infrastructure management logs include the logs of the Mirantis Container Runtime, containers running MKE or MSR, and any containerized infrastructure services that are deployed (think containerized monitoring agents).

Mirantis Container Runtime Logs

As previously mentioned, Mirantis Container Runtime logs are captured by the OS’s system manager by default. These logs can be sent to a centralized logging server.

MKE and MSR System Logs

MKE and MSR are deployed as Docker containers. All their logs are captured in the container;s STDOUT/STDERR. The default logging driver for Mirantis Container Runtime captures these logs.

MKE can be configured to use remote syslog logging. This can be done post-installation from the MKE UI for all of its containers.

Note

It is recommended that the Mirantis Container Runtime default logging driver be configured before installing MKE and MSR so that their logs are captured by the chosen logging driver. This is due to the inability to change a container’s logging driver once it had been created. The only exception to this is ucp-agent, which is a component of MKE that gets deployed as a Swarm service.

Infrastructure Services

Infrastructure operation teams deploy containerized infrastructure services used for various infrastructure operations such as monitoring, auditing, reporting, config deployment, etc. These services also produce important logs that need to be captured. Typically, their logs are limited to the STDOUT/STDERR of their containers, so they are also captured by the Mirantis Container Runtime default logging driver. If not, they need to be handled separately.

Application Logs

Application-produced logs can be a combination of custom application logs and the STDOUT/STDERR logs of the main process of the application. As described earlier, the STDOUT/STDERR logs of all containers are captured by the Mirantis Container Runtime default logging driver. So, no need to do any custom configuration to capture them. If the application has custom logging ( e.g. writes logs to /var/log/myapp.log within the container), it’s important to take that into consideration.

Docker Logging Design Considerations

Understanding the types of Docker logs is important. It is also important to define which entities are best suited to consume and own them.

Categorizing the Docker Logs

Mainly, there are two categories: infrastructure logs and application logs.

Defining the Organizational Ownership

Based on the organization’s structure and policies, decide if these categories have a direct mapping to existing teams. If they do not, then it is important to define the right organization or team responsible for these log categories:

Category

Team

System and Management Logs

Infrastructure Operations

Application Logs

Application Operations

If the organization is part of a larger organization, these categories may be too broad. Sub-divide them into more specific ownership teams:

Category

Team

Mirantis Container Runtime Logs

Infrastructure Operations

Infrastructure Services

Infrastructure Operations

MKE and MSR Logs

MKE/MSR Operations

Application A Logs

Application A Operations

Application B Logs

Application B Operations

Some organizations don’t distinguish between infrastructure and application operations, so they might combine the two categories and have a single operations team own them.

Category

Team

System and Management Logs

Infrastructure Operations

Application Logs

Infrastructure Operations

Pick the right model to clearly define the appropriate ownership for each type of log, resulting in decreased mean time to resolve (MTTR). Once organizational ownership has been determined for the type of logs, it is time to start investigating the right logging solution for deployment.

Picking a Logging Infrastructure

Docker can easily integrate with existing logging tools and solutions. Most of the major logging utilities in the logging ecosystem have developed Docker logging or provided proper documentation to integrate with Docker.

Pick the logging solution that:

  1. Allows for the implementation of the organizational ownership model defined in the previous section. For example, some organizations may choose to send all logs to a single logging infrastructure and then provide the right level of access to the functional teams.

  2. The organization is most familiar with. Docker can integrate with most of the popular logging providers. Please refer to your logging provider’s documentation for additional information.

  3. Has Docker integration: pre-configured dashboards, stable Docker plugin, proper documentation, etc.

Application Log Drivers

Docker offers several logging drivers for use in managing application logs. Check the Docker documentation for the complete list, as well as for detailed information on their use.

Many logging vendors provide agents that can be used to collect and ship the logs. As necessary, refer to vendor documentation on how to configure those agents for the MKE, MSR, and MCR platform.

As a general rule, if you already have logging infrastructure in place, then you should use the logging driver for that existing infrastructure. Below is a list of the logging drivers built-in to MCR.

Driver

Advantages

Disadvantages

none

Ultra-secure, since nothing gets logged

Much harder to troubleshoot issues with no logs

local

Optimized for performance and disk use. Limits on log size by default.

Can’t be used for centralized logging due to the file format (it’s compressed)

json-file

The default, supports tags

Logs reside locally and not aggregated, logs can fill up local disk if no restrictions in place. See docs for more details. Additional disk I/O. Additional utilities needed if you want to ship these logs.

syslog

Most machines come with syslog, supports TLS for encrypted log shipping, supports tags. Centralized view of logs.

Needs to be set up as highly available (HA) or else there can be issues on container start if it’s not available. Additional network I/O, subject to network outages.

journald

Log aggregator can be down without impact by spooling locally, this also collects Docker daemon logs

Since journal logs are in binary format, extra steps need to be taken to ship them off to the log collector. Additional disk I/O.

gelf

Provides indexable fields by defaults (container id, host, container name, etc.), tag support. Centralized view of logs. Flexible.

Additional network I/O. Subject to network outages. More components to maintain.

fluentd

Provides container_name and container_id fields by default, fluentd supports multiple outputs. Centralized view of logs. Flexible.

No TLS support, additional network I/O, subject to network outages. More components to maintain.

awslogs

Easy integration when using Amazon Web Services, less infrastructure to maintain, tag support. Centralized view of logs.

Not the most ideal for hybrid cloud configurations or on-premise installations. Additional network I/O, subject to network outages.

splunk

Easy integration with Splunk, TLS support, highly configurable, tag support, additional metrics. Works on Windows.

Splunk needs to be highly available or possible issues on container start - set splunk-verify-connection = false to prevent. Additional network I/O, subject to network outages.

etwlogs

Common framework for logging on Windows, default indexable values

Only works on Windows, those logs have to be shipped from Windows machines to a log aggregator with a different utility

gcplogs

Simple integration with Google Compute, less infrastructure to maintain, tag support. Centralized view of logs.

Not the most ideal for hybrid cloud configurations or on-premise installations. Additional network I/O, subject to network outages.

logentries

Less to manage, SaaS based log aggregation and analytics. Supports TLS.

Requires logentries subscription.

Collecting Logs

There’re a few different ways to perform cluster-level logging with MKE, MSR, and MCR:

  • At the node level using a logging driver

  • Using a logging agent deployed either as a global service with Swarm or as a DaemonSet with Kubernetes

  • Have applications themselves send logs to your logging infrastructure

Node Level Logging

To implement node level logging, simply create an entry in /etc/docker/daemon.json specifying your log driver on Linux machines. The default Docker daemon configuration file location is %programdata%\docker\config\daemon.json on Windows machines.

Logging at the node level can also be accomplished by using the default json-file or journald log driver and then using a logging agent to ship these logs.

Note

With no specific logging driver set in the daemon.json, by default the json-file log driver is used. A no auto-rotate setting is provided, and to ensure that the disk does not fill up with logs, Mirantis recommends changing to an auto-rotate configuration prior to installing the MKE, MSR, and MCR platform.

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Use “dual logging” to enable the docker logs command for any logging driver (refer to the Docker documentation for more detail).

Windows Logging

The ETW logging driver is supported for Windows. ETW stands for Event Tracing in Windows, and is the common framework for tracing applications in Windows. Each ETW event contains a message with both the log and its context information. A client can then create an ETW listener to listen to these events.

Alternatively, if Splunk is available in your organization Splunk can be used to collect Windows container logs. In order for this to function properly the HTTP Collector needs to be configured on the Splunk server side. Below is an example daemon.json for sending container logs to Splunk on Windows:

{
  "data-root": "d:\\docker",
  "labels": ["os=windows"],
  "log-driver": "splunk",
  "log-opts": {
    "splunk-token": "AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE",
    "splunk-url": "https://splunk.example.com",
    "splunk-format": "json",
    "splunk-index": "main",
    "splunk-insecureskipverify": "true",
    "splunk-verify-connection": "false",
    "tag":"{{.ImageName}} | {{.Name}} | {{.ID}}"
   }
}

Node Level Swarm Logging Example

To implement system-wide logging, creating an entry in /etc/docker/daemon.json. For example, use the following to enable the gelf output plugin:

{
  "log-driver": "gelf",
  "log-opts": {
    "gelf-address": "udp://1.2.3.4:12201",
    "tag": "{{.ImageName}}/{{.Name}}/{{.ID}}"
  }
}

And then restart the Docker daemon. All of the logging drivers can be configured in a similar way, by using the /etc/docker/daemon.json file. In the previous example using the gelf log driver, the tag field sets additional data that can be searched and indexed when logs are collected. Please refer to the documentation for each of the logging drivers to see what additional fields can be set from the log driver.

Setting logs using the /etc/docker/daemon.json file will set the default logging behavior on a per-node basis. This can be overwritten on a per-service or a per-container level. Overwriting the default logging behavior can be useful for troubleshooting so that the logs can be viewed in real-time.

If a service is created on a system where the daemon.json file is configured to use the gelf log driver, then all container logs running on that host will go to where the gelf-address config is set.

If a different logging driver is preferred, for instance to view a log stream from the stdout of the container, then it’s possible to override the default logging behavior ad-hoc.

$ docker service create \
    -–log-driver json-file --log-opt max-size=10m \
    nginx:alpine

This can then be coupled with Docker service logs to more readily identify issues with the service.

Docker Swarm Service Logs

docker service logs provides a multiplexed stream of logs when a service has multiple replica tasks. By entering in docker service logs <service_id>, the logs show the originating task name in the first column and then real-time logs of each replica in the right column. For example:

$ docker service create -d --name ping --replicas=3 alpine:latest ping 8.8.8.8
5x3enwyyr1re3hg1u2nogs40z

$ docker service logs ping
ping.2.n0bg40kksu8e@m00    | 64 bytes from 8.8.8.8: seq=43 ttl=43 time=24.791 ms
ping.3.pofxdol20p51@w01    | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=34.161 ms
ping.1.o07dvxfx2ou2@w00    | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=30.111 ms
ping.2.n0bg40kksu8e@m00    | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=25.276 ms
ping.3.pofxdol20p51@w01    | 64 bytes from 8.8.8.8: seq=45 ttl=43 time=24.239 ms
ping.1.o07dvxfx2ou2@w00    | 64 bytes from 8.8.8.8: seq=45 ttl=43 time=26.403 ms

This command is useful when trying to view the log output of a service that contains multiple replicas. Viewing the logs in real time, streamed across multiple replicas allows for instant understanding and troubleshooting of service issues across the entire cluster.

Deploying a Logging Agent

Many logging providers have their own logging agents. Please refer to their respective documentation for detailed instructions on using their respective tooling.

Generally speaking, those agents will either be deployed as a global Swarm service or as a Kubernetes DaemonSet.

Brownfield Application Logs

Sometimes, especially when dealing with brownfield (existing) applications not all logs will be written to stdout. In this case it can be useful to deploy a sidecar container to ensure that logs that are written to disk are also collected. Please refer to the Kubernetes documentation for an example of using fluentd with a sidecar container to collect these additional logs.

Logging Infrastructure

It’s recommended that logging infrastructure be placed in a separate environment from where you deploy your applications. Troubleshooting cluster and application issues will become much more complicated when your logging infrastructure is unavailable. Creating a utility cluster to collect metrics and logs is an MKE, MSR, and MCR platform best practice.

Conclusion

The MKE, MSR, and MCR platform provides many logging options, and as such it’s good to have a logging strategy in place prior to its adoption (for most systems, leaving the log data on the host is not adequate). Having the ability to index, search, and use a self-service platform provides operators and developers with a smoother experience.