Traditionally, designing and implementing centralized logging is an after-thought. It is not until problems arise that priorities shift to a centralized logging solution to query, view, and analyze the logs so the root-cause of the problem can be found. However, in the container era, when designing a Containers-as-a-Service (CaaS) platform with Docker Enterprise, it is critical to prioritize centralized logging. As the number of micro-services deployed in containers increases, the amount of data produced by them in the form of logs (or events) exponentially increases.
This reference architecture provides an overview of how Docker logging works, explains the two main categories of Docker logs, and then discusses Docker logging best practices.
Before diving into design considerations, it’s important to start with the basics of Docker logging.
Docker supports different logging drivers used to store and/or stream
container stdout
and stderr
logs of the main container
process (pid 1
). By default, Docker uses the json-file
logging
driver, but it can be configured to use many other
drivers
by setting the value of log-driver
in /etc/docker/daemon.json
followed by restarting the Docker daemon to reload its configuration.
The logging driver settings apply to ALL containers launched after
reconfiguring the daemon (restarting existing containers after
reconfiguring the logging driver does not result in containers using the
updated config). To override the default container logging driver run
the container with --log-driver
and --log-opt
options.
Swarm-mode services, on the other hand, can be updated to use a
different logging driver on the go by using:
$ docker service update --log-driver <DRIVER_NAME> --log-opt <LIST OF OPTIONS> <SERVICE NAME>
What about Mirantis Container Runtime logs? These logs are typically handled by
the default system manager logger. Most of the modern distros (CentOS 7, RHEL
7, Ubuntu 16, etc.) use systemd
, which uses journald
for logging and
journalctl
for accessing the logs. To access the MCR logs use
journalctl -u docker.service
.
Now that the basics of Docker logging have been covered, this section explains their categories and sources.
Docker logs typically fall into one of two categories: Infrastructure Management or Application logs. Most logs naturally fall into these categories based on the roles of who needs access to the logs.
In order to have a self-service platform, both operators and developers should have access to the logs they need in order to perform their role. DevOps practices suggest that there is an overall, shared responsibility when it comes to service availability and performance. However, everyone shouldn’t need access to every log on the platform. For instance, developers should only need access to the logs for their services and the integration points. Operators are more concerned with Docker daemon logs, MKE and MSR availability, as well as service availability. There is a bit of overlap since developers and operators both should be aware of service availability. Having access to the logs that each role needs allows for simpler troubleshooting when an issues occurs and a decreased Mean Time To Resolve (MTTR).
The infrastructure management logs include the logs of the Mirantis Container Runtime, containers running MKE or MSR, and any containerized infrastructure services that are deployed (think containerized monitoring agents).
As previously mentioned, Mirantis Container Runtime logs are captured by the OS’s system manager by default. These logs can be sent to a centralized logging server.
MKE and MSR are deployed as Docker containers. All their logs are
captured in the container;s STDOUT
/STDERR
. The default logging
driver for Mirantis Container Runtime captures these logs.
MKE can be configured to use remote syslog logging. This can be done post-installation from the MKE UI for all of its containers.
Note
It is recommended that the Mirantis Container Runtime default logging
driver be configured before installing MKE and MSR so that their logs
are captured by the chosen logging driver. This is due to the inability to
change a container’s logging driver once it had been created. The only
exception to this is ucp-agent
, which is a component of MKE that gets
deployed as a Swarm service.
Infrastructure operation teams deploy containerized infrastructure services
used for various infrastructure operations such as monitoring, auditing,
reporting, config deployment, etc. These services also produce important logs
that need to be captured. Typically, their logs are limited to the
STDOUT
/STDERR
of their containers, so they are also captured by the
Mirantis Container Runtime default logging driver. If not, they need to be
handled separately.
Application-produced logs can be a combination of custom application logs and
the STDOUT
/STDERR
logs of the main process of the application. As
described earlier, the STDOUT
/STDERR
logs of all containers are
captured by the Mirantis Container Runtime default logging driver. So, no need
to do any custom configuration to capture them. If the application has custom
logging ( e.g. writes logs to /var/log/myapp.log
within the container),
it’s important to take that into consideration.
Understanding the types of Docker logs is important. It is also important to define which entities are best suited to consume and own them.
Mainly, there are two categories: infrastructure logs and application logs.
Based on the organization’s structure and policies, decide if these categories have a direct mapping to existing teams. If they do not, then it is important to define the right organization or team responsible for these log categories:
Category | Team |
---|---|
System and Management Logs | Infrastructure Operations |
Application Logs | Application Operations |
If the organization is part of a larger organization, these categories may be too broad. Sub-divide them into more specific ownership teams:
Category | Team |
---|---|
Mirantis Container Runtime Logs | Infrastructure Operations |
Infrastructure Services | Infrastructure Operations |
MKE and MSR Logs | MKE/MSR Operations |
Application A Logs | Application A Operations |
Application B Logs | Application B Operations |
Some organizations don’t distinguish between infrastructure and application operations, so they might combine the two categories and have a single operations team own them.
Category | Team |
---|---|
System and Management Logs | Infrastructure Operations |
Application Logs | Infrastructure Operations |
Pick the right model to clearly define the appropriate ownership for each type of log, resulting in decreased mean time to resolve (MTTR). Once organizational ownership has been determined for the type of logs, it is time to start investigating the right logging solution for deployment.
Docker can easily integrate with existing logging tools and solutions. Most of the major logging utilities in the logging ecosystem have developed Docker logging or provided proper documentation to integrate with Docker.
Pick the logging solution that:
Docker has several available logging drivers that can be used for the management of application logs. Check the Docker docs for the complete list as well as detailed information on how to use them. Many logging vendors have agents that can be used to collect and ship the logs, please refer to their official documentation on how to configure those agents with Docker Enterprise.
As a general rule, if you already have logging infrastructure in place, then you should use the logging driver for that existing infrastructure. Below is a list of the logging drivers built-in to the Docker engine.
Driver | Advantages | Disadvantages |
---|---|---|
none | Ultra-secure, since nothing gets logged | Much harder to troubleshoot issues with no logs |
local | Optimized for performance and disk use. Limits on log size by default. | Can’t be used for centralized logging due to the file format (it’s compressed) |
json-file | The default, supports tags | Logs reside locally and not aggregated, logs can fill up local disk if no restrictions in place. See docs for more details. Additional disk I/O. Additional utilities needed if you want to ship these logs. |
syslog | Most machines come with syslog, supports TLS for encrypted log shipping, supports tags. Centralized view of logs. | Needs to be set up as highly available (HA) or else there can be issues on container start if it’s not available. Additional network I/O, subject to network outages. |
journald | Log aggregator can be down without impact by spooling locally, this also collects Docker daemon logs | Since journal logs are in binary format, extra steps need to be taken to ship them off to the log collector. Additional disk I/O. |
gelf | Provides indexable fields by defaults (container id, host, container name, etc.), tag support. Centralized view of logs. Flexible. | Additional network I/O. Subject to network outages. More components to maintain. |
fluentd | Provides container_name and container_id fields by default, fluentd supports multiple outputs. Centralized view of logs. Flexible. | No TLS support, additional network I/O, subject to network outages. More components to maintain. |
awslogs | Easy integration when using Amazon Web Services, less infrastructure to maintain, tag support. Centralized view of logs. | Not the most ideal for hybrid cloud configurations or on-premise installations. Additional network I/O, subject to network outages. |
splunk | Easy integration with Splunk, TLS support, highly configurable, tag support, additional metrics. Works on Windows. | Splunk needs to be highly available or possible issues on container start - set splunk-verify-connection = false to prevent. Additional network I/O, subject to network outages. |
etwlogs | Common framework for logging on Windows, default indexable values | Only works on Windows, those logs have to be shipped from Windows machines to a log aggregator with a different utility |
gcplogs | Simple integration with Google Compute, less infrastructure to maintain, tag support. Centralized view of logs. | Not the most ideal for hybrid cloud configurations or on-premise installations. Additional network I/O, subject to network outages. |
logentries | Less to manage, SaaS based log aggregation and analytics. Supports TLS. | Requires logentries subscription. |
There’s a few different ways to perform cluster-level logging with Docker Enterprise.
To implement node level logging, simply create an entry in
/etc/docker/daemon.json
specifying your log driver on Linux
machines. The default Docker daemon configuration file location is
%programdata%\docker\config\daemon.json
on Windows machines.
Logging at the node level can also be accomplished by using the default
json-file
or journald
log driver and then using a logging agent
to ship these logs.
Note
With no specific logging driver set in the
daemon.json
, be default the json-file
log driver is used. By
default this comes with no auto-rotate setting. To ensure your
disk doesn’t fill up with logs it is recommended to at least change
to an auto-rotate configuration before installing Docker Enterprise,
as seen here:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Users of Docker Enterprise can make use of “dual logging”, which enables
you to use the docker logs
command for any logging driver. Please
refer to the Docker
documentation
for more details on this Docker Enterprise feature.
The ETW logging driver is supported for Windows. ETW stands for Event Tracing in Windows, and is the common framework for tracing applications in Windows. Each ETW event contains a message with both the log and its context information. A client can then create an ETW listener to listen to these events.
Alternatively, if Splunk is available in your organization Splunk can be
used to collect Windows container logs. In order for this to function
properly the HTTP
Collector
needs to be configured on the Splunk server side. Below is an example
daemon.json
for sending container logs to Splunk on Windows:
{
"data-root": "d:\\docker",
"labels": ["os=windows"],
"log-driver": "splunk",
"log-opts": {
"splunk-token": "AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE",
"splunk-url": "https://splunk.example.com",
"splunk-format": "json",
"splunk-index": "main",
"splunk-insecureskipverify": "true",
"splunk-verify-connection": "false",
"tag":"{{.ImageName}} | {{.Name}} | {{.ID}}"
}
}
To implement system-wide logging, creating an entry in
/etc/docker/daemon.json
. For example, use the following to enable
the gelf
output plugin:
{
"log-driver": "gelf",
"log-opts": {
"gelf-address": "udp://1.2.3.4:12201",
"tag": "{{.ImageName}}/{{.Name}}/{{.ID}}"
}
}
And then restart the Docker daemon. All of the logging drivers can be
configured in a similar way, by using the /etc/docker/daemon.json
file. In the previous example using the gelf
log driver, the tag
field sets additional data that can be searched and indexed when logs
are collected. Please refer to the documentation for each of the logging
drivers to see what additional fields can be set from the log driver.
Setting logs using the /etc/docker/daemon.json
file will set the
default logging behavior on a per-node basis. This can be overwritten on
a per-service or a per-container level. Overwriting the default logging
behavior can be useful for troubleshooting so that the logs can be
viewed in real-time.
If a service is created on a system where the daemon.json
file is
configured to use the gelf
log driver, then all container logs
running on that host will go to where the gelf-address
config is
set.
If a different logging driver is preferred, for instance to view a log
stream from the stdout
of the container, then it’s possible to
override the default logging behavior ad-hoc.
$ docker service create \
-–log-driver json-file --log-opt max-size=10m \
nginx:alpine
This can then be coupled with Docker service logs to more readily identify issues with the service.
docker service logs
provides a multiplexed stream of logs when a
service has multiple replica tasks. By entering in
docker service logs <service_id>
, the logs show the originating task
name in the first column and then real-time logs of each replica in the
right column. For example:
$ docker service create -d --name ping --replicas=3 alpine:latest ping 8.8.8.8
5x3enwyyr1re3hg1u2nogs40z
$ docker service logs ping
ping.2.n0bg40kksu8e@m00 | 64 bytes from 8.8.8.8: seq=43 ttl=43 time=24.791 ms
ping.3.pofxdol20p51@w01 | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=34.161 ms
ping.1.o07dvxfx2ou2@w00 | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=30.111 ms
ping.2.n0bg40kksu8e@m00 | 64 bytes from 8.8.8.8: seq=44 ttl=43 time=25.276 ms
ping.3.pofxdol20p51@w01 | 64 bytes from 8.8.8.8: seq=45 ttl=43 time=24.239 ms
ping.1.o07dvxfx2ou2@w00 | 64 bytes from 8.8.8.8: seq=45 ttl=43 time=26.403 ms
This command is useful when trying to view the log output of a service that contains multiple replicas. Viewing the logs in real time, streamed across multiple replicas allows for instant understanding and troubleshooting of service issues across the entire cluster.
Many logging providers have their own logging agents. Please refer to their respective documentation for detailed instructions on using their respective tooling.
Generally speaking, those agents will either be deployed as a global Swarm service or as a Kubernetes DaemonSet.
Sometimes, especially when dealing with brownfield (existing)
applications not all logs will be written to stdout
. In this case it
can be useful to deploy a sidecar container to ensure that logs that are
written to disk are also collected. Please refer to the Kubernetes
documentation
for an example of using fluentd with a sidecar container to collect
these additional logs.
It’s recommended that logging infrastructure be placed in an environment separate from where you deploy your applications. Troubleshooting cluster and application issues will become much more complicated when your logging infrastructure is unavailable. Creating a utility cluster to collect metrics and logs is a best practice with Docker Enterprise.
Docker provides many options when it comes to logging, and it’s helpful to have a logging strategy when adopting the platform. For most systems, leaving the log data on the host isn’t adequate. Being able to index, search, and have a self-service platform allows for a smoother experience for both operators and developers.