fluent-plugin-gcs

Test Gem Version

A Fluentd output plugin that buffers events and uploads them to Google Cloud Storage.

Features

  • Multiple formats — store objects as gzip, plain text, or JSON.
  • Fast compression — optionally shell out to the external gzip binary, with automatic fallback to the pure-Ruby compressor.
  • Flexible object keys — build paths from time slices, tags, hostnames, random tokens, and UUIDs.
  • Server-side controls — set ACLs, storage class, customer-supplied encryption keys, and custom object metadata.
  • Flexible auth — explicit credentials or Application Default Credentials on GCE / GKE / Cloud Run.

Table of contents

Requirements

fluent-plugin-gcs fluentd ruby
>= 0.5.0 >= 1.0 >= 3.3

Installation

gem install fluent-plugin-gcs

Using td-agent / fluent-package:

fluent-gem install fluent-plugin-gcs

Quick start

The minimal configuration needs only a bucket. On GCE, GKE, or Cloud Run the credentials are picked up automatically from the environment.

<match your.tag>
  @type gcs

  bucket YOUR_GCS_BUCKET_NAME
  path logs/

  <buffer time>
    @type file
    path /var/log/fluent/gcs
    timekey 1h
    timekey_wait 10m
    timekey_use_utc true
  </buffer>
</match>

This writes gzip-compressed objects such as logs/2024010112_0.gz, one per hourly time slice.

Configuration

Authentication

Provide credentials explicitly, or rely on Application Default Credentials when running on Google Cloud.

Option Type Default Description
project string nil GCS project identifier
keyfile string nil Path to a service account credentials JSON file
credentials_json hash nil Service account credentials inline as JSON. Takes precedence over keyfile
client_retries integer nil Number of retries on server error
client_timeout integer nil Request timeout in seconds

project is resolved in the following order: the project option, then the STORAGE_PROJECT / GOOGLE_CLOUD_PROJECT / GCLOUD_PROJECT environment variables, then GCE metadata.

keyfile is resolved in the following order: the keyfile option, the GOOGLE_CLOUD_KEYFILE / GCLOUD_KEYFILE (path) or GOOGLE_CLOUD_KEYFILE_JSON / GCLOUD_KEYFILE_JSON (inline) environment variables, the Cloud SDK's well-known path, then GCE metadata.

Object placement

Option Type Default Description
bucket string Required. GCS bucket name
path string "" Path prefix for objects
object_key_format string %{path}%{time_slice}_%{index}.%{file_extension} Template for object keys. See Object key format
hex_random_length integer 4 Length of the %{hex_random} placeholder (max 32)
overwrite bool false Overwrite the existing object instead of incrementing %{index}
blind_write bool false Skip the existence check before writing (see below)

Avoiding key collisions. When object_key_format contains %{index} (the default), the plugin checks GCS for an existing object and increments %{index} until it finds an unused key, so existing objects are never overwritten. This existence check requires the storage.objects.get permission.

blind_write skips that existence check, so the storage.objects.get permission is no longer needed. The trade-off is that %{index} stops working (it always stays 0), so you must keep keys unique another way, with %{hex_random} (unique per chunk) or %{uuid_flush} (unique per flush).

[!WARNING] If a key collides with an existing object (which can happen with blind_write true, or with overwrite true), uploading it overwrites the existing object, and GCS requires the storage.objects.delete permission to do so. Without that permission the flush fails repeatedly and the buffer chunk is eventually lost. With blind_write true, include %{hex_random} or %{uuid_flush} in object_key_format to avoid collisions.

Format and compression

Option Type Default Description
store_as enum gzip Object format. See the table below
command_parameter string (per format) Override the default arguments for the compression command (gzip_command / lzo / lzma2 / zstd)
transcoding bool false Enable decompressive transcoding (gzip only)
store_as Compression Requires Default args Extension content_type
gzip Ruby's built-in Zlib::GzipWriter (none) gz application/gzip
gzip_command External gzip. Faster for large chunks, falls back to Zlib::GzipWriter on failure gzip command (none) gz application/gzip
lzo External lzop lzop command -qf1 lzo application/x-lzop
lzma2 External xz xz command -qf0 xz application/x-xz
zstd External zstd zstd command (none) zst application/x-zst
json None (upload as JSON) (none) json application/json
text None (upload as text) (none) txt text/plain

The command-based formats (gzip_command, lzo, lzma2, zstd) stream the chunk through the command's stdin (no intermediate temp file). Each has a sensible default argument set; override it with command_parameter. Multiple arguments are separated by spaces; the value is parsed with shellsplit, so it is not evaluated by a shell:

store_as gzip_command
command_parameter -1             # single argument
store_as zstd
command_parameter -19 --long     # multiple arguments, split on spaces

Quote a value that itself contains a space, the same way you would in a shell (command_parameter -o "with space").

gzip_command falls back to Zlib::GzipWriter if the gzip command fails. lzo / lzma2 / zstd have no fallback, so the command must be installed (checked at startup), and they are not compatible with transcoding, which is gzip-specific.

[!NOTE] gzip_command_parameter is a deprecated alias of command_parameter, kept for backward compatibility with v0.4.x configs. New configs should use command_parameter.

The per-line format is configured with a <format> section (default out_file):

<format>
  @type json
</format>

See the Formatter documentation for available types (out_file, json, ltsv, single_value, ...).

GCS object settings

Option Type Default Description
auto_create_bucket bool true Create the bucket if it does not exist
acl enum nil Predefined ACL for uploaded objects (see below)
storage_class enum nil Storage class for uploaded objects (see below)
encryption_key string nil Customer-supplied AES-256 key for server-side encryption

acl accepts one of auth_read, owner_full, owner_read, private, project_private, public_read. Defaults to the bucket's default object ACL. See the access control documentation.

storage_class accepts one of dra, nearline, coldline, multi_regional, regional, standard. See the storage classes documentation.

encryption_key enables customer-supplied encryption; the encryption_key_sha256 is computed automatically.

Object key format

object_key_format supports the following placeholders:

Placeholder Description
%{path} The value of the path option
%{time_slice} Time slice text derived from the <buffer> timekey
%{index} Sequential number (from 0) within the same time slice
%{file_extension} Inferred from store_as (gz / lzo / xz / zst / json / txt)
%{uuid_flush} A UUID generated on every buffer flush
%{hex_random} A random hex string per chunk, length set by hex_random_length
%{hostname} The hostname of the running server

The default is %{path}%{time_slice}_%{index}.%{file_extension}.

Object metadata

Attach arbitrary x-goog-meta-* headers to uploaded objects with one or more <object_metadata> sections:

<object_metadata>
  key KEY_1
  value VALUE_1
</object_metadata>

<object_metadata>
  key KEY_2
  value VALUE_2
</object_metadata>

Examples

Partition by tag and date

<match app.**>
  @type gcs

  project YOUR_PROJECT
  bucket YOUR_GCS_BUCKET_NAME
  object_key_format %{path}%{time_slice}/%{hostname}_%{index}.%{file_extension}
  path logs/${tag}/

  <buffer tag,time>
    @type file
    path /var/log/fluent/gcs
    timekey 1d
    timekey_wait 10m
    timekey_use_utc true
  </buffer>

  <format>
    @type json
  </format>
</match>

For the tag app.web on host web1, this writes objects such as logs/app.web/20240101/web1_0.gz.

Fine-grained 1-minute partitions

When timekey is under an hour, %{time_slice} automatically resolves to minute granularity (%Y%m%d%H%M).

<match app.**>
  @type gcs

  bucket YOUR_GCS_BUCKET_NAME
  path logs/

  <buffer time>
    @type file
    path /var/log/fluent/gcs
    timekey 1m          # 1 minute partition
    timekey_wait 10s    # short wait for late events
    timekey_use_utc true
  </buffer>
</match>

This writes objects such as logs/202401011230_0.gz, one (or more) per minute.

Fast compression with the external gzip

<match app.**>
  @type gcs

  bucket YOUR_GCS_BUCKET_NAME
  path logs/
  store_as gzip_command
  command_parameter -1

  <buffer time>
    @type file
    path /var/log/fluent/gcs
    timekey 1h
    timekey_wait 10m
  </buffer>
</match>

Using the default object_key_format, this writes objects such as logs/2024010112_0.gz, one per hourly slice.

Cost-optimized cold storage

<match archive.**>
  @type gcs

  bucket YOUR_GCS_BUCKET_NAME
  path archive/
  storage_class coldline
  acl project_private

  <buffer time>
    @type file
    path /var/log/fluent/gcs-archive
    timekey 1d
    timekey_wait 1h
  </buffer>
</match>

Using the default object_key_format, this writes objects such as archive/20240101_0.gz, one per day, stored in the Coldline class.

Write without the get permission (blind_write)

blind_write true skips the existence check, so the storage.objects.get permission is not required. Because %{index} does not work in this mode, include %{hex_random} or %{uuid_flush} to keep keys unique.

<match app.**>
  @type gcs

  bucket YOUR_GCS_BUCKET_NAME
  path logs/
  object_key_format %{path}%{time_slice}_%{hex_random}.%{file_extension}
  blind_write true

  <buffer time>
    @type file
    path /var/log/fluent/gcs
    timekey 1h
    timekey_wait 10m
    timekey_use_utc true
  </buffer>
</match>

This writes objects such as logs/2024010112_a1b2.gz, with a per-chunk random suffix instead of an incrementing index.

Development

bundle install
bundle exec rake test                       # run the test suite
bundle exec bundler-audit check --update    # audit dependencies
gem build fluent-plugin-gcs.gemspec         # build the gem

Author

Daichi HIRATA

License

Apache License 2.0. See LICENSE.txt.