Shipping ECS logs to Mezmo (LogDNA) with AWS Firelens and Fluent Bit

October 7, 2023

views
Infrastructure

CloudWatch Logs is expensive

There are a trillion different ways you can go about shipping logs from your ECS containers to Mezmo. One of the ways often suggested is to hook up your containers to CloudWatch Logs using log driver and have a Lambda subscribe to those log groups and pass on the logs to Mezmo (formerly known as LogDNA).

This might be the most straightforward way to go about shipping your logs to Mezmo, but it gets expensive pretty quickly. You pay for Lambda invocations, Clouwatch Logs and probably for egress. We ended up blowing about $1000 on this setup last month.

If you start looking around for solutions, you will quickly figure out that if you are using ECS Fargate, you don't have access to the Docker daemon running on the host. Setting up a sidecar running Logsprout or something is not an option because it won't be able to access the Docker daemon.

In this article, I'll show you how to ship logs to Mezmo without CloudWatch Logs, using AWS Firelens and Fluent Bit. This will save you a lot of money that you could spend on more useful things, such as a new Bugatti or a Bored Ape NFT.

The steps to shipping logs to Mezmo without CloudWatch Logs

In this section, I'll give you a high-level overview of the steps involved in shipping logs to Mezmo without CloudWatch Logs.

If you don't care about the details, you can skip to the next section where I'll show you how to put everything together in a more compressed form.

Routing logs with AWS Firelens

AWS Firelens is a log router for ECS and EKS. What Firelens does is route your logs from one container to another. It does not cost anything to use Firelens, but you do pay for the resources used by the Firelens container that you set up.

You can use Fluent Bit, Fluentd and some other agents. In this article, we will be using Fluent Bit.

The basic schtick is something along these lines:

  1. Set up your app container in your ECS Task Definition and set its log driver to .
  2. Set up a second container running Fluent Bit in the same ECS Task Definition.

Then what happens is the stdout and stderr output from your app container are read by the Fluent Bit container using AWS Firelens. Then Fluent Bit does some processing on the logs and ships them to Mezmo.

Fluent Bit

Fluent Bit is a log processing and forwarding agent. It takes in log events from some source, then runs those events through a pipeline (that you define) and outputs them to some destination.

Fluent Bit configs look something like this (I'll give you the config for shipping logs to Mezmo later in the article):

The config is divided into sections, each section having a name in square brackets. The sections are , , and .

The section is for configuring Fluent Bit itself - for example, the log level of Fluent Bit agent's logs and the flush interval.

The section is for configuring where the logs should be read from. In our actual config, we will be reading from a socket that receives our logs.

The block(s) is optional. Each is a step in the pipeline that transforms the log in some way. You can rename fields, add fields, remove fields, parse the log, etc.

The section is for configuring the output plugin. For example, the plugin is for shipping logs to Mezmo.

Setting up the ECS Task Definition with Firelens

To set up Firelens and have it hook up a Fluent Bit sidecar to your container, you need to define the log configuration of your task and set up the Fluent Bit sidecar in your ECS Task Definition.

I manage my infrastructure using Terraform, so I'll show you how to do it using Terraform. If you are not using Terraform, you can probably figure out how to do it using the AWS CLI or the AWS Console - it is just a fairly standard ECS Task Definition.

This is the rough shape of the Task Definition. The important bits are the in the container and in the container.

About not setting options in the logDriver

If you read documentation on Firelens, you will probably find example where the in the container has options.

For example, you might find some example that looks like this:

If you do this, the options defined there will be directly read by the Firelens sidecar. However, it gets in the way when you want to set up a custom Fluent Bit configuration. In this article, we are intentionally not setting any options in the of the container.

Instead, we will build a custom Fluent Bit image with its own Fluent Bit configuration.

Building a custom Fluent Bit image

AWS has a default Fluent Bit image (amazon/aws-for-fluent-bit) that you can use. However, we need to do some custom configuration, so we are going to build our own custom Fluent Bit image.

Note that there may exist a different way to do this (passing a custom configuration file to the sidecar, without having to build a custom image). But building a custom Fluent Bit image is probably the most transparent and straightforward way to do it, and easy to debug because you know exactly what is going on.

The Dockerfile for our custom Fluent Bit image to ship to Mezmo looks something like this:

The entrypoint script is optional. It is a simple bash script that fetches the ECS metadata to grab the container ID so we can use it in our logs and know which logs come from which container.

Here it is:

That is all there is to it. You can now build the image and push it to your image repository.

Custom Fluent Bit configuration

By default, your and will be stringified and nested into a string.

For example, say you log some JSON to stdout from your app, you will end up with this useless nonsense in Mezmo:

Since your log is nested under as a , you cannot filter for the or fields in Mezmo. Your log stream won't even use the as the log message, it will just print out the whole stringified JSON as the log message.

You could probably try to do some custom parsing in Mezmo using their feature. But I found it to be difficult to work with. I also tried the old school custom parser on Mezmo, but it did not seem to be able to parse the nested JSON.

The easiest way to do the transformation and avoid scattering the mess all over different areas of your infra is to do it on the Fluent Bit side.

The custom configuration file for Fluent Bit looks something like this:

The section is for configuring Fluent Bit itself. Note that the option is important - it tells Fluent Bit where to find the parsers. You need this otherwise it won't know how to parse your logs in your steps.

The section is for configuring the input to read events from. In this case, we are listening to a Unix socket that receives the logs from the log driver.

With our steps, we apply some transformations to the log events. We are lifting the nested object to the top level such that the , and fields are at the top level. We are then parsing the log using the parser.

The section is where we tell Fluent Bit where to ship the logs. We are using the output. Its options are documented here.

You need to set the and options. You can get these values from environment variables (if you remember, we set these environment variables in our ECS Task Definition).

Putting it all together

Now that we have all the pieces, we can put them together.

Build your custom Fluent Bit image and push it to your image repository

Write the entrypoint script:

Prepare your configuration:

Write the Dockerfile:

Build the image and push it to your image repository:

Set up your ECS Task Definition

Write your ECS Task Definition:

There you have it. You can now deploy your task and see the logs flowing into Mezmo and save a lot of money.