Yerba
YAML Editing and Refactoring with Better Accuracy
What is Yerba?
Yerba is a lossless YAML editing tool. It lets you programmatically read, modify, and enforce formatting in YAML files while preserving their original structure, including comments, blank lines, quote styles, and key ordering.
Most YAML libraries parse a file into a data structure and serialize it back, discarding all formatting in the process. Yerba operates on the concrete syntax tree (CST), so your edits are surgical: only the targeted values change, and everything else stays exactly as it was.
Yerba is available as:
- A standalone CLI binary with zero runtime dependencies
- A Rust crate for embedding in Rust applications
- A Ruby gem for programmatic YAML editing from Ruby
Yerba was born out of the need to manage, validate, and enforce consistent formatting for hundreds of YAML data files in the RubyEvents.org project.
Installation
CLI (standalone)
The yerba CLI is a standalone Rust binary with no Ruby dependency. Install it via Cargo:
cargo install yerba
Rust Crate
Use yerba as a library in your Rust project:
[dependencies]
yerba = "0.5"
let mut document = yerba::parse_file("config.yml")?;
document.set("database.host", "0.0.0.0")?;
document.save()?; // saves to original path
document.save_to("output.yml")?; // saves to new path
Ruby Gem
The Ruby gem bundles both the CLI binary and a native extension for programmatic access from Ruby:
gem install yerba
Or add it to your Gemfile:
gem "yerba"
The gem ships with precompiled binaries for macOS and Linux.
If no precompiled binary is available for your platform, it will compile from source automatically, which requires a Rust toolchain.
CLI Usage
The yerba CLI follows a consistent pattern:
yerba <command> <file> <selector> [value] [options]
Selectors use dot-notation for nested keys, brackets for array access, and support glob patterns for operating on multiple files at once.
Selectors
Selectors let you address any node in a YAML document:
| Pattern | Meaning | Example |
|---|---|---|
key |
A single key | "database.host" |
key.nested |
Nested key path | "database.settings.pool" |
[] |
All items in array | "[].title" |
[N] |
Item at index | "[0].title" |
[].key[].nested |
Nested array access | "[].speakers[].name" |
Conditions
Conditions filter which items a command operates on:
| Syntax | Meaning | Example |
|---|---|---|
.key == value |
Equality | ".kind == keynote" |
.key != value |
Inequality | ".status != draft" |
.key contains val |
Substring or member | ".title contains Ruby" |
.key not_contains val |
Negated contains | ".title not_contains test" |
get
Retrieve values from YAML files. Supports single values, array traversal, glob patterns across multiple files, conditions for filtering, and field selection.
yerba get config.yml "database.host"
yerba get videos.yml "[].title"
yerba get videos.yml "[0].title"
Use --select to pick specific fields from each item, and --condition to filter which items are returned:
yerba get videos.yml "[]" --select ".title,.speakers"
yerba get videos.yml "[]" --condition ".kind == keynote"
yerba get videos.yml "[]" --select ".title" --condition ".kind == keynote"
Glob patterns let you query across many files at once:
yerba get "data/**/videos.yml" "[].speakers[].name"
yerba get "data/**/videos.yml" "[]" --condition ".kind == keynote" --select ".id,.title"
Use --raw to output plain values (one per line) instead of JSON:
yerba get videos.yml "[]" --condition ".speakers contains Matz" --raw
set
Update an existing value at a path. The original quote style is preserved automatically, if a value was double-quoted before, it stays double-quoted after the edit.
yerba set config.yml "database.host" "0.0.0.0"
yerba set videos.yml "[0].title" "New Title"
Use --if-exists to only set the value when the path already exists, or --if-missing to only set it when the path does not exist:
yerba set config.yml "database.host" "0.0.0.0" --if-exists
yerba set "data/**/event.yml" "website" "" --if-exists
Use --condition to only apply the change when a sibling field matches:
yerba set config.yml "database.host" "0.0.0.0" --condition ".port == 5432"
Use --all to update all nodes matching a wildcard selector:
yerba set videos.yml "[].description" "" --all
insert
Insert a new key into a map or a new item into a sequence. By default, new items are appended at the end.
yerba insert config.yml "database.ssl" true
yerba insert config.yml "tags" "yaml"
Control placement with --before, --after, or --at:
yerba insert config.yml "database.ssl" true --after "host"
yerba insert config.yml "database.ssl" true --before "port"
yerba insert config.yml "tags" "yaml" --at 0
yerba insert config.yml "tags" "yaml" --after "ruby"
For sequences of maps, use conditions to position relative to other items:
yerba insert speakers.yml "" "name: Bob" --after ".name == Alice"
yerba insert videos.yml "[0].speakers" "Diana" --before ".name == Charlie"
Use --from to read the value from another file (or stdin with -):
yerba insert videos.yml "" --from "new_talk.yml" --after ".id == first-talk"
delete
Remove a key and its value from a map:
yerba delete config.yml "database.pool"
yerba delete videos.yml "[0].description"
Use --dry-run to preview the result without writing to the file:
yerba delete config.yml "database.pool" --dry-run
remove
Remove a specific item from a sequence by its value:
yerba remove config.yml "tags" "rust"
yerba remove videos.yml "[0].speakers" "Alice"
rename
Rename a key in a map while preserving its value and position:
yerba rename config.yml "database.host" "database.hostname"
yerba rename config.yml "database.host" "hostname"
move
Move a sequence item to a new position. You can reference items by value, index, or condition:
yerba move config.yml "tags" "rust" --before "ruby"
yerba move config.yml "tags" "rust" --after "yaml"
yerba move config.yml "tags" 2 --to 0
yerba move videos.yml "" ".id == talk-2" --after ".id == talk-1"
move-key
Move a key to a new position within a map:
yerba move-key config.yml "database.name" --to 0
yerba move-key config.yml "database.pool" --before "database.host"
yerba move-key config.yml "database.pool" --after "database.name"
sort
Sort items in a sequence. For simple scalar sequences, no options are needed. For sequences of maps, use --by to specify the sort field. Use --order desc for descending. Repeat --by and --order for tie-breakers:
yerba sort config.yml "tags"
yerba sort videos.yml --by ".title"
yerba sort videos.yml --by ".date" --order desc --by ".title"
yerba sort videos.yml "[].speakers" --by ".name"
Use --order with a comma-separated list to specify an explicit custom order. All items must be listed:
yerba sort videos.yml "[]" --by ".id" --order "talk-c,talk-a,talk-b"
yerba sort speakers.yml "[]" --by ".name" --order "Charlie,Alice,Bob"
yerba sort config.yml "tags" --by "." --order "yaml,ruby,rust"
This is useful for reordering items in a specific sequence (e.g., conference schedule order, priority lists) or when an LLM agent needs to rearrange items programmatically.
sort-keys
Reorder the keys in a map to match a predefined order. If any key in the document is not present in the order list, the command aborts with an error, this ensures you account for every field:
yerba sort-keys config.yml "database" "host,port,name,pool"
yerba sort-keys "data/**/event.yml" "" "id,title,kind,location"
yerba sort-keys "data/**/videos.yml" "[]" "id,title,speakers"
quote-style
Enforce a consistent quote style across keys and/or values:
yerba quote-style config.yml --values double
yerba quote-style config.yml --keys plain
yerba quote-style config.yml --keys plain --values double
Scope the operation to a specific selector:
yerba quote-style config.yml "[].speakers" --values plain
yerba quote-style "data/**/*.yml" --keys plain --values double
Use block scalar styles to enforce multiline formatting on specific fields:
yerba quote-style videos.yml "[].description" --values literal
Key styles (--keys):
| Style | Symbol | Example |
|---|---|---|
plain |
— | host: value |
single |
' |
'host': value |
double |
" |
"host": value |
Value styles (--values):
| Style | Symbol | Example | Behavior | |
|---|---|---|---|---|
plain |
— | host: localhost |
Unquoted | |
single |
' |
host: 'localhost' |
Single-quoted | |
double |
" |
host: "localhost" |
Double-quoted, supports \n escapes |
|
literal |
`\ | -` | Preserves newlines | Strip trailing newline |
literal-clip |
`\ | ` | Preserves newlines | Keep one trailing newline |
literal-keep |
`\ | +` | Preserves newlines | Keep all trailing newlines |
folded |
>- |
Folds newlines to spaces | Strip trailing newline | |
folded-clip |
> |
Folds newlines to spaces | Keep one trailing newline | |
folded-keep |
>+ |
Folds newlines to spaces | Keep all trailing newlines |
Block scalars are only converted when scoped to a specific selector. An unscoped --values double will not touch existing block scalars.
blank-lines
Enforce a consistent number of blank lines between sequence entries:
yerba blank-lines videos.yml 1
yerba blank-lines videos.yml "[]" 1
yerba blank-lines config.yml "tags" 0
directives
Add or remove the document start marker (---):
yerba directives config.yml --ensure
yerba directives config.yml --remove
yerba directives "data/**/*.yml" --ensure
unique
Find or remove duplicate items in a sequence. Use --by to specify which field determines uniqueness:
yerba unique videos.yml --by ".id"
yerba unique speakers.yml --by ".name"
yerba unique config.yml "tags" --by "."
By default, duplicates are reported but not removed. Use --remove to remove them (keeps the first occurrence):
yerba unique videos.yml --by ".id" --remove
yerba unique speakers.yml --by ".name" --remove --dry-run
location
Show the location (line, column, byte offset) of a selector in a YAML file:
yerba location config.yml "database.host"
yerba location videos.yml "[0].title"
yerba location videos.yml "[0]"
Output:
{
"selector": "[0].title",
"file": "videos.yml",
"start_line": 2,
"start_column": 9,
"end_line": 2,
"end_column": 19,
"start_offset": 22,
"end_offset": 32
}
schema
Validate YAML files against a JSON schema:
yerba schema data/speakers.yml --schema lib/schemas/speaker_schema.json
yerba schema "data/**/videos.yml" --schema lib/schemas/video_schema.json
Use --path to scope validation to a specific selector (e.g. validate each item in an array):
yerba schema data/speakers.yml --schema speaker_schema.json --selector "[]"
yerba schema data/sponsors.yml --schema tier_schema.json --selector "tiers[]"
selectors
Show all valid selectors for a YAML file. Useful for discovering the structure of a file and knowing which selectors you can use with other commands:
yerba selectors config.yml
Output:
database
database.host
database.port
[]
For sequences of objects:
yerba selectors videos.yml
Output:
[]
[].id
[].title
[].speakers
[].speakers[]
[].speakers[].name
[].speakers[].slug
[].video_id
[].video_provider
Pass a selector to scope the output to a specific subtree:
yerba selectors config.yml "database"
yerba selectors videos.yml "[]"
yerba selectors videos.yml "[].speakers"
Works with glob patterns to show the union of selectors across multiple files:
yerba selectors "data/**/videos.yml"
yerba selectors "data/**/videos.yml" "[]"
Yerbafile
A Yerbafile is a YAML configuration file that defines formatting and editing rules as pipelines of operations that are applied to your files across your project.
Use yerba init to create one, then yerba apply to apply all rules, or yerba check to verify compliance (exits with code 1 if files would change):
yerba init
yerba apply
yerba apply path/to/file.yml
yerba check
yerba check path/to/file.yml
Each rule specifies a file glob and a list of steps to run in order:
rules:
- files: "config/**/*.yml"
pipeline:
- quote_style:
key_style: plain
value_style: double
- sort_keys:
path: ""
order:
- id
- title
- description
- blank_lines:
count: 1
- files: "data/speakers.yml"
pipeline:
- quote_style:
key_style: plain
value_style: double
- sort_keys:
path: ""
order:
- name
- slug
- github
- twitter
- website
- sort:
path: ""
by: name
Available pipeline steps:
quote_styleEnforce quote style on keys and/or values, optionally scoped by pathsort_keysReorder keys to match a predefined listsortSort sequence items by field(s)blank_linesEnforce blank lines between sequence entriessetSet a value (supports conditions)insertInsert a new key or sequence itemdeleteRemove a key (supports conditions)renameRename a keyremoveRemove an item from a sequencedirectivesAdd or remove the document start marker (---)uniqueFind or remove duplicate items in a sequenceschemaValidate against a JSON schema (with optionalpathfor scoping)getRead a value and store it as a variable for subsequent steps
This makes it easy to enforce project-wide YAML conventions in CI:
yerba check
Ruby API
Yerba includes a native C extension (backed by the same Rust core) that provides a full Ruby API for YAML editing.
Parsing
Create a document from a file path or from a string:
require "yerba"
document = Yerba.parse_file("config.yml")
document = Yerba.parse(<<~YAML)
database:
host: localhost
port: 5432
YAML
Reading Values
Use bracket notation ([]) to navigate the document. Returns typed node objects (Scalar, Map, or Sequence) that are live references — mutations flow back to the document.
All access methods ([], fetch, dig, value_at) accept full selector strings like "database.host", "[0].title", or "[].speakers[].name". In the examples below we prefer the more idiomatic chained bracket style, but the two forms are equivalent:
document["database"]["host"].value # => "localhost"
document["database.host"].value # => "localhost" (same thing)
The returned object type depends on what's at the path:
document["database"] # => Yerba::Map
document["database"]["host"] # => Yerba::Scalar
document["tags"] # => Yerba::Sequence
Scalars expose their value and quote style:
scalar = document["database"]["host"]
scalar.value # => "localhost"
scalar.quote_style # => :double
Use fetch for strict access, it raises Yerba::SelectorNotFoundError with "did you mean?" suggestions if the selector doesn't exist:
document.fetch("database.host") # => Yerba::Scalar
document.fetch("databse.host") # => raises SelectorNotFoundError: ... Did you mean: database.host?
Use dig to traverse multiple levels, returning nil for missing paths:
document.dig("database", "host") # => Yerba::Scalar
document.dig("items", 0, "name") # => Yerba::Scalar
document.dig("database", "missing") # => nil
Use value_at to get the plain Ruby value (String, Integer, Hash, Array, etc.) instead of a node object:
document.value_at("database.host") # => "localhost"
document.value_at("database.port") # => 5432
document.value_at("database") # => {"host" => "localhost", "port" => 5432}
document.value_at("[].title") # => ["First Talk", "Second Talk"]
Summary of access methods:
| Method | Not found | Returns |
|---|---|---|
[] |
nil |
Scalar / Map / Sequence node |
fetch |
raises SelectorNotFoundError |
Scalar / Map / Sequence node |
dig |
nil |
Scalar / Map / Sequence node |
value_at |
nil |
plain Ruby value |
Mutations
Modify values in place. The original formatting is preserved:
document["database"]["host"].value = "0.0.0.0"
document.set("database.port", 3306)
Set all matching nodes at once with all: true:
document.set("[].description", "", all: true)
Insert new keys with positional control:
document["database"].insert("ssl", true, after: "host")
Work with sequences using familiar Ruby patterns:
= document["tags"]
<< "yaml"
<< { name: "Rust", version: "1.80" }
.remove("obsolete")
Sorting
Sort sequences in place. Works on both the document and sequence level:
document.sort(by: :name)
document.sort(by: :name, order: :desc)
document.sort(by: :name, order: ["Charlie", "Bob", "Alice"])
document.sort("tags")
document.sort("tags", order: :desc)
document.sort("tags", order: ["rust", "ruby", "go"])
The by: option accepts symbols, strings, or dot-prefixed strings (:name, "name", ".name").
Querying
Find and filter items in sequences with find_by, where, and pluck:
document.find_by(name: "Alice")
document.where(role: "admin")
document.pluck(:name)
document.find_by(speakers: { name: "Alice" })
document.where(tags: ["ruby"])
document.find_by("database.host": "localhost")
These methods work on Document (delegates to root), Sequence, and Collection (searches across files):
collection = Yerba.files("data/**/*.yml")
collection.find_by(name: "Alice")
collection.where(kind: "talk")
collection.pluck(:name)
Schema Validation
Validate documents against JSON schemas from Ruby:
schema = {
type: "object",
properties: { name: { type: "string" }, slug: { type: "string" } },
required: ["name", "slug"]
}
document.valid?(schema) # => true/false
document.valid?(schema, selector: "[]") # validate each array item
errors = document.validate(schema, selector: "[]")
errors.each do |error|
puts "#{error["message"]} at #{error["path"]} (line #{error["line"]})"
end
Also accepts a JSON string:
document.valid?('{"type":"object","required":["name"]}')
Quote Style Control
Read and set the quote style on individual scalars:
scalar = document["database"]["host"]
scalar.quote_style # => :double
scalar.quote_style = :single
Location
Get the precise location (line, column, byte offset) of any selector in a document:
loc = document[0]["title"].location
loc.start_line # => 2
loc.start_column # => 9
loc.end_line # => 2
loc.end_column # => 19
loc.start_offset # => 22
loc.end_offset # => 32
You can also get a location by selector string:
document.location("[0].title")
The above is the same as:
document[0]["title"].location
document["[0].title"].location
Omit the selector to get the whole document's location:
document.location # => #<Yerba::Location start_line=1, ...>
Returns nil for non-existent selectors. Use locations for wildcard selectors that match multiple nodes:
locs = document.locations("[].title")
locs.each { |loc| puts "line #{loc.start_line}" }
# line 2
# line 4
document.locations("[]")
document.locations("[].speakers[]")
Wildcard Access
When [] receives a wildcard selector (containing []), it returns an array of nodes instead of a single node:
document["[].title"] # => [Yerba::Scalar, Yerba::Scalar, ...]
document["[].speakers[]"] # => [Yerba::Scalar, Yerba::Scalar, ...]
document["items[].name"] # => [Yerba::Scalar, Yerba::Scalar, ...]
document["[].title"].each { |scalar| puts scalar.value }
document["[].title"].each { |scalar| scalar.value = "Updated" }
Collections
Operate on multiple files matching a glob pattern:
collection = Yerba.files("data/**/videos.yml")
collection.each do |document|
puts document[0]["title"].value
end
collection.find_by(name: "Alice")
collection.where(kind: "talk")
collection.pluck(:name)
collection.apply! do |document|
document.set("status", "published")
end
Use Collection.get to retrieve nodes across all matching files in parallel. Returns Scalar, Map, or Sequence objects with file_path, line, and selector:
speakers = Yerba::Collection.get("data/**/videos.yml", "[].speakers[]")
speakers.each do |scalar|
puts "#{scalar.value} in #{scalar.file_path}:#{scalar.line}"
end
maps = Yerba::Collection.get("data/**/videos.yml", "[]")
maps.first.class
# => Yerba::Map
sequences = Yerba::Collection.get("data/**/videos.yml", "[].speakers")
sequences.first.class
# => Yerba::Sequence
Nodes returned by Collection.get lazily load their Document on first mutation, so reads are fast and writes work transparently:
scalars = Yerba::Collection.get("data/**/videos.yml", "[].title")
scalars.first.value = "New Title"
scalars.first.document.save!
Saving
Write changes back to the original file:
document.save!
Or render the document as a string without writing to disk:
document.to_s
Development
See CONTRIBUTING.md for details on how to set up the repo locally, run tests, and contribute.
License
The gem is available as open source under the terms of the MIT License.