SmarterJSON

Gem Version codecov <!-- Downloads --> RubyGems Ruby Toolbox

A lenient, fast JSON parser for Ruby. It parses strict JSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Other parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.

SmarterJSON: one parser, no modes — want strict? Please use the stdlib json gem.

Why SmarterJSON?

Most JSON parsers reject anything that isn't perfectly strict JSON. SmarterJSON is built on the opposite principle: you shouldn't have to care what flavor of JSON you were handed and you shouldn't lose the whole document because of formatting errors. Give it strict JSON, JSON5, an HJSON-style config file, newline-delimited JSON, or a copy-pasted blob with comments and trailing commas — it just parses it. When it is lenient, smarter_json isn't dropping data that exists — it's just not raising an eyebrow at a suspicious gap (like an extra comma). A strict parser would refuse the whole document and recover nothing; smarter_json returns everything except the formatting error.

For an ingestion tool, "reject the whole document because of one stray comma" is the worst outcome: you throw away the 99% that's fine to avoid maybe-mishandling a gap that carries no data anyway.

Three things set it apart:

  1. One parser, no modes, no flags. There is no dialect: option and no "strict mode" — SmarterJSON.process(input) accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.

  2. It parses multi-document input automatically — a distinguishing feature. SmarterJSON.process handles NDJSON / JSONL / concatenated JSON with no block and no special method: one document returns its value, several documents return an Array, empty input returns nil. Only SmarterJSON parses multi-document input via plain process — Oj and the stdlib json library raise without a block. For input larger than memory, pass a block to stream one document at a time.

  3. It's fast. A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib json C parser — the fastest general-purpose Ruby JSON parser.

What it accepts, beyond strict JSON

  • //, /* … */, and # comments (a #/// only starts a comment when preceded by whitespace, so url: http://x.com parses as a string, not a truncated value)
  • Trailing commas; unquoted keys ({host: localhost}); single-quoted, triple-quoted ('''…'''), and quoteless string values
  • Implicit root object — a config file that starts with key: value, no outer {}
  • NaN, Infinity, hex (0xFF), leading + / ., underscores in numbers (1_000_000)
  • UTF-8 BOM, smart/curly quotes, Python literals (True / False / None), JavaScript undefined
  • Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via encoding:)
  • Duplicate keys (last value wins by default; configurable)

It raises only on genuinely unparseable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.

Installation

# Gemfile
gem "smarter_json"
gem install smarter_json

The C extension is built on install and used automatically. On platforms where it can't build, the pure-Ruby parser runs instead and produces identical results.

Documentation

Usage

require "smarter_json"

SmarterJSON.process('{"a": 1, "b": [2, 3]}')          # => {"a"=>1, "b"=>[2, 3]}
SmarterJSON.process("host: localhost\nport: 5432")     # => {"host"=>"localhost", "port"=>5432}  (no braces needed)
SmarterJSON.process_file("config.json5")               # read a file, then parse

# Multiple documents (NDJSON / JSONL / concatenated) — no block, no special method:
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3}))   # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
SmarterJSON.process('{"id":1}')                         # => {"id"=>1}   (one document → the value itself)
SmarterJSON.process("")                                 # => nil          (zero documents)

# For input larger than memory, stream one document at a time with a block
# (process and process_file both forward the block):
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }

Options

option default meaning
symbolize_keys false return object keys as Symbols instead of Strings
duplicate_key :last_wins :last_wins / :first_wins / :raise for repeated keys in one object
bigdecimal_load :auto :auto keeps high-precision decimals as BigDecimal; :float forces Float; :bigdecimal forces BigDecimal
acceleration true true uses the C extension when compiled and loadable; false forces pure Ruby (identical results)
encoding "UTF-8" labels the input's encoding (no transcoding pass; see below)
warnings false when true, return [result, warnings]warnings lists the lenient fixes applied (:empty_slot, :empty_value, :duplicate_key)

Performance

Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpus (canada, citm_catalog, twitter, github_events, …). The apples-to-apples comparisons are SmarterJSON/C vs Oj/strict vs stdlib json, all producing Float (run rake report in json_benchmarks/ for the full table — numbers vary run to run).

  • vs Oj/strict (the JSON.parse-equivalent mode, both producing Float): SmarterJSON/C is faster on nearly every file — typically 1.1–1.6× (e.g. big_decimals ~1.6×, deeply-nested ~1.4×, citm / twitter / usgs ~1.3×, github / citylots / weather ~1.1–1.2×). The one exception is string_array, where Oj/strict's SIMD string scan is ~1.7× faster — that's the current frontier.
  • vs stdlib json (C): competitive with the fastest Ruby JSON parser — it ties json on big_decimals and string_array, and trails by ~1.1–1.7× on the rest. (canada.json is the outlier, far behind — that's the BigDecimal default, see below.)
  • Numbers: floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.

Two notes on fair comparison:

  • NDJSON: on multi-document files, only SmarterJSON parses the input via plain process — Oj and json raise without a block, so their cells are N/A. That N/A reflects real default behavior, not a measurement gap. Plain process collects every document into an Array at ~270 MB/s; the streaming block form runs faster (~440 MB/s) because it doesn't hold all documents in memory at once — use it for input larger than RAM.
  • High-precision decimals (e.g. canada.json): SmarterJSON's default :auto mode preserves high-precision numbers as BigDecimal (matching Oj's default), which is intrinsically slower than Float. Against Float-producing parsers it looks slower on such files; pass bigdecimal_load: :float to compare like-for-like (it then runs much faster). Against the equivalent BigDecimal-producing Oj mode, SmarterJSON is faster.

Encoding

encoding: (default "UTF-8") labels what the input is — it does not trigger a transcoding pass. The parser works on the bytes in their native encoding and emits string values with the same encoding tag, the same way smarter_csv handles encodings. Bytes that are invalid for the claimed encoding raise SmarterJSON::EncodingError (a kind of SmarterJSON::ParseError).

Nesting & untrusted input

Both the C extension and the pure-Ruby parser are iterative, not recursive — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input cannot overflow the call stack or segfault: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib json caps at 100). The deeply_nested.json benchmark (212 MB of nesting) parses without issue.

The trade-off: there is currently no fixed nesting or input-size limit, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you parse untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream of the parser.

Development

After checking out the repo, run bin/setup to install dependencies, then rake compile to build the C extension and rake spec to run the tests. The test suite runs every example against both the C and pure-Ruby paths, so the two stay behavior-identical.

License

Available as open source under the terms of the MIT License.