fluent-plugin-uri-parser
Fluentd filter plugins that decompose URIs and query strings into structured fields, so you can search, group, and aggregate on them in your downstream stack.
"https://example.com:8080/search?q=fluentd&lang=ja"
↓
{ scheme: "https", host: "example.com", port: 8080,
path: "/search", query: "q=fluentd&lang=ja", fragment: nil }
Why?
URL strings sitting in a single log field are a black box — you can't filter by host, group by path, or count by query parameter without parsing them first. These filters turn a raw URL into a record your storage can index.
| Plugin | Turns this | Into this |
|---|---|---|
uri_parser |
https://example.com:8080/p?q=1#x |
scheme, host, port, path, query, fragment |
query_string_parser |
foo=bar&hoge=fuga |
{ "foo" => "bar", "hoge" => "fuga" } |
Requirements
| fluent-plugin-uri-parser | fluentd | ruby |
|---|---|---|
| >= 0.4.0 | >= v1.0.0 | >= 3.2 |
Installation
gem install fluent-plugin-uri-parser
Or in your Gemfile:
gem "fluent-plugin-uri-parser"
uri_parser
Decomposes a URI field into its components.
Minimal example
<filter access.log>
@type uri_parser
key_name url
out_key_scheme scheme
out_key_host host
out_key_port port
out_key_path path
out_key_query query
out_key_fragment fragment
</filter>
// input
{ "url": "https://example.com:8080/search?q=fluentd#top" }
// output
{
"url": "https://example.com:8080/search?q=fluentd#top",
"scheme": "https",
"host": "example.com",
"port": 8080,
"path": "/search",
"query": "q=fluentd",
"fragment": "top"
}
The
portvalue usesAddressable::URI#inferred_port, so well-known schemes (http→ 80,https→ 443, ...) get a port even when the URL omits one.
Group output under a single key — hash_value_field
<filter access.log>
@type uri_parser
key_name url
hash_value_field parsed
out_key_host host
out_key_path path
</filter>
// input
{ "url": "https://example.com/search" }
// output
{
"url": "https://example.com/search",
"parsed": { "host": "example.com", "path": "/search" }
}
Namespace output keys — inject_key_prefix
<filter access.log>
@type uri_parser
key_name url
inject_key_prefix url.
out_key_host host
out_key_path path
</filter>
// input
{ "url": "https://example.com/search" }
// output
{
"url": "https://example.com/search",
"url.host": "example.com",
"url.path": "/search"
}
Drop empty components — ignore_nil
When a component is missing (no query, no fragment, etc.) the default is to emit it as null. Set ignore_nil true to omit those keys entirely.
// input
{ "url": "https://example.com/path" }
// ignore_nil false (default)
{ "scheme": "https", "host": "example.com", "port": 443,
"path": "/path", "query": null, "fragment": null }
// ignore_nil true
{ "scheme": "https", "host": "example.com", "port": 443,
"path": "/path" }
query_string_parser
Decomposes a query string field into individual parameters.
Pairs with an empty key (e.g. the leading
&in&foo=1) are silently dropped — they're noise from user-supplied URLs and never represent a real parameter.
Minimal example
<filter access.log>
@type query_string_parser
key_name query
</filter>
// input
{ "query": "foo=bar&hoge=fuga" }
// output
{ "query": "foo=bar&hoge=fuga", "foo": "bar", "hoge": "fuga" }
Group output under a single key — hash_value_field
<filter access.log>
@type query_string_parser
key_name query
hash_value_field params
</filter>
// input
{ "query": "foo=bar&hoge=fuga" }
// output
{
"query": "foo=bar&hoge=fuga",
"params": { "foo": "bar", "hoge": "fuga" }
}
Handle repeated parameters
A request like ?tag=ruby&tag=fluentd has two tag values. By default the last one wins (scalar). You have two ways to keep both:
Option A: always arrays — multi_value_params true
Every parameter becomes an array, even when it appeared once.
// input
{ "query": "tag=ruby&tag=fluentd&lang=ja" }
// output
{ "tag": ["ruby", "fluentd"], "lang": ["ja"] }
Option B: array only for listed names — multi_value_param_names
You know tag may repeat but lang won't. Keep lang as a scalar and only wrap tag in an array.
<filter access.log>
@type query_string_parser
key_name query
multi_value_param_names tag
</filter>
// input
{ "query": "tag=ruby&tag=fluentd&lang=ja" }
// output
{ "tag": ["ruby", "fluentd"], "lang": "ja" }
When both are set,
multi_value_params truewins.
Options
Shared between both filters unless noted.
| Option | Type | Default | What it does |
|---|---|---|---|
key_name |
string | — (required) | Record key holding the URL or query string to parse. |
hash_value_field |
string | nil |
If set, all extracted fields are nested under this key. |
inject_key_prefix |
string | nil |
Prefix prepended to every extracted key. |
ignore_key_not_exist |
bool | false |
When key_name is missing, drop the record instead of passing it through. |
emit_invalid_record_to_error |
bool | true |
When key_name is missing, emit the record to Fluentd's error stream. |
suppress_parse_error_log |
bool | false |
Silence the warning log when the value fails to parse. |
ignore_nil |
bool | false |
(uri_parser only) Omit output keys whose parsed value is nil. |
out_key_scheme / out_key_host / out_key_port / out_key_path / out_key_query / out_key_fragment |
string | nil |
(uri_parser only) Output key name for each URI component. Components without an out_key_* are not emitted. |
multi_value_params |
bool | false |
(query_string_parser only) Emit every parameter as an array. |
multi_value_param_names |
array | nil |
(query_string_parser only) Emit only the listed parameters as arrays. Ignored when multi_value_params is true. |
Development
bundle install
bundle exec rake test
To install this gem onto your local machine: bundle exec rake install.
To release: bump the version in the gemspec, then bundle exec rake release (tags, pushes, and uploads to rubygems.org).
Contributing
Bug reports and pull requests are welcome at https://github.com/daichirata/fluent-plugin-uri-parser.
License
Apache-2.0