Shipping ECS logs to Mezmo (LogDNA) with AWS Firelens and Fluent Bit
October 7, 2023
CloudWatch Logs is expensive
There are a trillion different ways you can go about shipping logs from your ECS containers to Mezmo. One of the ways often suggested is to hook up your containers to CloudWatch Logs using awslogs log driver and have a Lambda subscribe to those log groups and pass on the logs to Mezmo (formerly known as LogDNA).
This might be the most straightforward way to go about shipping your logs to Mezmo, but it gets expensive pretty quickly. You pay for Lambda invocations, Clouwatch Logs and probably for egress. We ended up blowing about $1000 on this setup last month.
If you start looking around for solutions, you will quickly figure out that if you are using ECS Fargate, you don't have access to the Docker daemon running on the host. Setting up a sidecar running Logsprout or something is not an option because it won't be able to access the Docker daemon.
In this article, I'll show you how to ship logs to Mezmo without CloudWatch Logs, using AWS Firelens and Fluent Bit. This will save you a lot of money that you could spend on more useful things, such as a new Bugatti or a Bored Ape NFT.
The steps to shipping logs to Mezmo without CloudWatch Logs
In this section, I'll give you a high-level overview of the steps involved in shipping logs to Mezmo without CloudWatch Logs.
If you don't care about the details, you can skip to the next section where I'll show you how to put everything together in a more compressed form.
Routing logs with AWS Firelens
AWS Firelens is a log router for ECS and EKS. What Firelens does is route your logs from one container to another. It does not cost anything to use Firelens, but you do pay for the resources used by the Firelens container that you set up.
You can use Fluent Bit, Fluentd and some other agents. In this article, we will be using Fluent Bit.
The basic schtick is something along these lines:
- Set up your app container in your ECS Task Definition and set its log driver to
awsfirelens. - Set up a second container running Fluent Bit in the same ECS Task Definition.
Then what happens is the stdout and stderr output from your app container are read by the Fluent Bit container using AWS Firelens. Then Fluent Bit does some processing on the logs and ships them to Mezmo.
Fluent Bit
Fluent Bit is a log processing and forwarding agent. It takes in log events from some source, then runs those events through a pipeline (that you define) and outputs them to some destination.
Fluent Bit configs look something like this (I'll give you the config for shipping logs to Mezmo later in the article):
[SERVICE]
flush 1
log_level info
[INPUT]
...
[FILTER]
...
[OUTPUT]
...The config is divided into sections, each section having a name in square brackets. The sections are SERVICE, INPUT, FILTER and OUTPUT.
The SERVICEsection is for configuring Fluent Bit itself - for example, the log level of Fluent Bit agent's logs and the flush interval.
The INPUT section is for configuring where the logs should be read from. In our actual config, we will be reading from a socket that receives our logs.
The FILTER block(s) is optional. Each FILTER is a step in the pipeline that transforms the log in some way. You can rename fields, add fields, remove fields, parse the log, etc.
The OUTPUT section is for configuring the output plugin. For example, the logdna plugin is for shipping logs to Mezmo.
Setting up the ECS Task Definition with Firelens
To set up Firelens and have it hook up a Fluent Bit sidecar to your container, you need to define the log configuration of your task and set up the Fluent Bit sidecar in your ECS Task Definition.
I manage my infrastructure using Terraform, so I'll show you how to do it using Terraform. If you are not using Terraform, you can probably figure out how to do it using the AWS CLI or the AWS Console - it is just a fairly standard ECS Task Definition.
resource "aws_ecs_task_definition" "some-task" {
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
container_definitions = <<TASK_DEFINITION
[
{
"image": "amazon/firelens-fluent-bit-logdna:latest", # more on this later
"name": "log_router",
"essential": true,
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "false",
"config-file-type": "file",
"config-file-value": "/fluent-bit/etc/fluent-bit-custom.conf" # more on this later
}
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "firelens-container",
"awslogs-region": "us-east-1",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "firelens"
}
},
"environment": [
{
"name": "LOGDNA_API_KEY",
"value": ${YOUR_LOGDNA_API_KEY} # use SSM or something to fetch it
},
{
"name": "LOGDNA_TAGS",
"value": "${var.environment},fargate,my_app,some-tag"
}
]
},
{
"name": "my_app",
"image": "some-app-image",
"essential": true,
"dependsOn": [
{
"containerName": "log_router",
"condition": "START"
}
],
"logConfiguration": {
"logDriver": "awsfirelens"
},
...omitted for brevity...
}
]
TASK_DEFINITION
}This is the rough shape of the Task Definition. The important bits are the firelensConfiguration in the log_router container and logConfiguration in the my_app container.
About not setting options in the logDriver
logConfiguration in the my_app container has options.For example, you might find some example that looks like this:
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"name": "http",
"host": "mezmo",
"uri": "/v1/XYZ",
"header": "Authorization: SOME_KEY",
"tls": "on",
"port": "443",
"format": "json_lines"
}
}If you do this, the options defined there will be directly read by the Firelens sidecar. However, it gets in the way when you want to set up a custom Fluent Bit configuration. In this article, we are intentionally not setting any options in the logConfiguration of the my_app container.Instead, we will build a custom Fluent Bit image with its own Fluent Bit configuration.
Building a custom Fluent Bit image
AWS has a default Fluent Bit image (amazon/aws-for-fluent-bit) that you can use. However, we need to do some custom configuration, so we are going to build our own custom Fluent Bit image.
Note that there may exist a different way to do this (passing a custom configuration file to the sidecar, without having to build a custom image). But building a custom Fluent Bit image is probably the most transparent and straightforward way to do it, and easy to debug because you know exactly what is going on.
The Dockerfile for our custom Fluent Bit image to ship to Mezmo looks something like this:
FROM amazon/aws-for-fluent-bit:latest
# Install curl and jq for metadata API query
RUN yum install -y curl jq
# Copy custom configuration file (we have not written it yet)
COPY fluent-bit-custom.conf /fluent-bit/etc/
# Copy entrypoint script
COPY entrypoint.sh /entrypoint.sh
# Make entrypoint script executable
RUN chmod +x /entrypoint.sh
# Set entrypoint
ENTRYPOINT ["/entrypoint.sh"]The entrypoint script is optional. It is a simple bash script that fetches the ECS metadata to grab the container ID so we can use it in our logs and know which logs come from which container.
Here it is:
#!/bin/sh
# Query ECS metadata API to get container ID
CONTAINER_ID=$(curl -s ${ECS_CONTAINER_METADATA_URI}/task | jq -r '.Containers[] | select(.Name == "log_router") | .DockerId')
# Trim the container ID to the first 6 characters
SHORT_CONTAINER_ID=$(echo $CONTAINER_ID | cut -c 1-6)
# Export the trimmed container ID as an environment variable
export FARGATE_CONTAINER_ID=$SHORT_CONTAINER_ID
# Run Fluent Bit
exec /fluent-bit/bin/fluent-bit -c /fluent-bit/etc/fluent-bit-custom.confThat is all there is to it. You can now build the image and push it to your image repository.
Custom Fluent Bit configuration
By default, your stdout and stderr will be stringified and nested into a log string.
For example, say you log some JSON to stdout from your app, you will end up with this useless nonsense in Mezmo:
{
"container_id": "xyz",
..., # other metadata
"log": "{\n \"level\": \"info\",\n \"service\": \"some-service\",\n \"message\": \"Some error bla\"\n}"
}Since your log is nested under log as a string, you cannot filter for the level or servicefields in Mezmo. Your log stream won't even use the message as the log message, it will just print out the whole stringified JSON as the log message.
You could probably try to do some custom parsing in Mezmo using their Pipeline feature. But I found it to be difficult to work with. I also tried the old school custom parser on Mezmo, but it did not seem to be able to parse the nested JSON.
The easiest way to do the transformation and avoid scattering the mess all over different areas of your infra is to do it on the Fluent Bit side.
The custom configuration file for Fluent Bit looks something like this:
[SERVICE]
Parsers_File /fluent-bit/parsers/parsers.conf # you need this otherwise it won't know how to parse your logs
Flush 1
Daemon Off
Log_Level off
[INPUT]
Name forward
unix_path /var/run/fluent.sock
Port 24224
[FILTER]
# this is a custom filter that lifts the nested object to the top level
Name nest
Match *
Operation lift
Nested_under log
[FILTER]
Name parser
Match *
Key_name log
Parser docker
Reserve_data true
[OUTPUT]
Name logdna
Match *
api_key ${LOGDNA_API_KEY}
hostname /${ENV}/whatever
app ${FARGATE_CONTAINER_ID} # for some reason, I could not use the FARGATE_CONTAINER_ID as hostname, but it works as app
tags ${LOGDNA_TAGS}The SERVICE section is for configuring Fluent Bit itself. Note that the Parsers_Fileoption is important - it tells Fluent Bit where to find the parsers. You need this otherwise it won't know how to parse your logs in your FILTER steps.
The INPUT section is for configuring the input to read events from. In this case, we are listening to a Unix socket that receives the logs from the awsfirelens log driver.
With our FILTER steps, we apply some transformations to the log events. We are lifting the nested log object to the top level such that the level, service and message fields are at the top level. We are then parsing the log using the docker parser.
The OUTPUT section is where we tell Fluent Bit where to ship the logs. We are using the logdna output. Its options are documented here.
You need to set the api_key and tags options. You can get these values from environment variables (if you remember, we set these environment variables in our ECS Task Definition).
Putting it all together
Now that we have all the pieces, we can put them together.
Build your custom Fluent Bit image and push it to your image repository
Write the entrypoint script:
#!/bin/sh
# Query ECS metadata API to get container ID
CONTAINER_ID=$(curl -s ${ECS_CONTAINER_METADATA_URI}/task | jq -r '.Containers[] | select(.Name == "log_router") | .DockerId')
# Trim the container ID to the first 6 characters
SHORT_CONTAINER_ID=$(echo $CONTAINER_ID | cut -c 1-6)
# Export the trimmed container ID as an environment variable
export FARGATE_CONTAINER_ID=$SHORT_CONTAINER_ID
# Run Fluent Bit
exec /fluent-bit/bin/fluent-bit -c /fluent-bit/etc/fluent-bit-custom.confPrepare your configuration:
[SERVICE]
Parsers_File /fluent-bit/parsers/parsers.conf
Flush 1
Daemon Off
Log_Level off
[INPUT]
Name forward
unix_path /var/run/fluent.sock
Port 24224
[FILTER]
Name nest
Match *
Operation lift
Nested_under log
[FILTER]
Name parser
Match *
Key_name log
Parser docker
Reserve_data true
[OUTPUT]
Name logdna
Match *
api_key ${LOGDNA_API_KEY}
hostname /${ENV}/whatever
app ${FARGATE_CONTAINER_ID}
tags ${LOGDNA_TAGS}Write the Dockerfile:
FROM amazon/aws-for-fluent-bit:latest
# Install curl and jq for metadata API query
RUN yum install -y curl jq
# Copy custom configuration file
COPY fluent-bit-custom.conf /fluent-bit/etc/
# Copy entrypoint script
COPY entrypoint.sh /entrypoint.sh
# Make entrypoint script executable
RUN chmod +x /entrypoint.sh
# Set entrypoint
ENTRYPOINT ["/entrypoint.sh"]Build the image and push it to your image repository:
docker build -t your-repo/firelens-fluent-bit-mezmo:latest .
docker push your-repo/firelens-fluent-bit-mezmo:latestSet up your ECS Task Definition
Write your ECS Task Definition:
resource "aws_ecs_task_definition" "some-task" {
requires_compatibilities = ["FARGATE"]
network_mode = "awsvpc"
container_definitions = <<TASK_DEFINITION
[
{
"image": "your-repo/firelens-fluent-bit-mezmo:latest",
"name": "log_router",
"essential": true,
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "false",
"config-file-type": "file",
"config-file-value": "/fluent-bit/etc/fluent-bit-custom.conf"
}
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "firelens-container",
"awslogs-region": "us-east-1",
"awslogs-create-group": "true",
"awslogs-stream-prefix": "firelens"
}
},
"environment": [
{
"name": "LOGDNA_API_KEY",
"value": ${YOUR_LOGDNA_API_KEY} # use SSM or something to fetch it
},
{
"name": "LOGDNA_TAGS",
"value": "${var.environment},fargate,my_app,some-tag"
}
]
},
{
"name": "my_app",
"image": "some-app-image",
"essential": true,
"dependsOn": [
{
"containerName": "log_router",
"condition": "START"
}
],
"logConfiguration": {
"logDriver": "awsfirelens"
},
...omitted for brevity...
}
]
TASK_DEFINITION
}There you have it. You can now deploy your task and see the logs flowing into Mezmo and save a lot of money.