SmarterJSON

Gem Version codecov <!-- Downloads --> RubyGems Ruby Toolbox

A lenient, fast JSON parser for Ruby. It parses strict JSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Other parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.

SmarterJSON: one parser, no modes — want strict? Please use the stdlib json gem.

Why SmarterJSON?

Most JSON parsers reject anything that isn't perfectly strict JSON. SmarterJSON is built on the opposite principle: you shouldn't have to care what flavor of JSON you were handed and you shouldn't lose the whole document because of formatting errors. Give it strict JSON, JSON5, an HJSON-style config file, newline-delimited JSON, or a copy-pasted blob with comments and trailing commas — it just parses it. When it is lenient, smarter_json isn't dropping data that exists — it's just not raising an eyebrow at a suspicious gap (like an extra comma). A strict parser would refuse the whole document and recover nothing; smarter_json returns everything except the formatting error.

For an ingestion tool, "reject the whole document because of one stray comma" is the worst outcome: you throw away the 99% that's fine to avoid maybe-mishandling a gap that carries no data anyway.

Three things set it apart:

  1. One parser, no modes, no flags. There is no dialect: option and no "strict mode" — SmarterJSON.process(input) accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.

  2. It parses multi-document input automatically — a distinguishing feature. SmarterJSON.process handles NDJSON / JSONL / concatenated JSON with no block and no special method: one document returns its value, several documents return an Array, empty input returns nil. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. Only SmarterJSON parses multi-document input via plain process — Oj and the stdlib json library raise without a block. Pass a block to iterate the recovered documents one at a time.

  3. It's fast. A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib json C parser — the fastest general-purpose Ruby JSON parser.

What it accepts, beyond strict JSON

  • //, /* … */, and # comments (a #/// only starts a comment when preceded by whitespace, so url: http://x.com parses as a string, not a truncated value)
  • Markdown-wrapped / chatty blobs around the payload: strips ```json / ``` fences, ignores obvious prose before/after the payload, unwraps <json>...</json> and BEGIN_JSON ... END_JSON, and preserves multiple recovered payloads as an Array
  • Trailing commas; unquoted keys ({host: localhost}); single-quoted, triple-quoted ('''…'''), and quoteless string values
  • Implicit root object — a config file that starts with key: value, no outer {}
  • NaN, Infinity, hex (0xFF), leading + / ., underscores in numbers (1_000_000)
  • UTF-8 BOM, smart/curly quotes, Python literals (True / False / None), JavaScript undefined
  • Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via encoding:)
  • Duplicate keys (last value wins by default; configurable)

It raises only on genuinely unparseable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.

Installation

# Gemfile
gem "smarter_json"
gem install smarter_json

The C extension is built on install and used automatically. On platforms where it can't build, the pure-Ruby parser runs instead and produces identical results.

Documentation

Usage

require "smarter_json"

SmarterJSON.process('{"a": 1, "b": [2, 3]}')          # => {"a"=>1, "b"=>[2, 3]}
SmarterJSON.process("host: localhost\nport: 5432")     # => {"host"=>"localhost", "port"=>5432}  (no braces needed)
SmarterJSON.process_file("config.json5")               # read a file, then parse

# Multiple documents (NDJSON / JSONL / concatenated) — no block, no special method:
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3}))   # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
SmarterJSON.process('{"id":1}')                         # => {"id"=>1}   (one document → the value itself)
SmarterJSON.process("")                                 # => nil          (zero documents)

# Iterate one recovered document at a time with a block
# (process and process_file both forward the block):
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }

# Wrapper noise is stripped automatically:
SmarterJSON.process("Here is the JSON:\n\n```json\n{\"a\":1}\n```\n")  # => {"a"=>1}
SmarterJSON.process("<json>{\"a\":1}</json>")                          # => {"a"=>1}
SmarterJSON.process("first:\n{\"a\":1}\nsecond:\n{\"b\":2}")      # => [{"a"=>1}, {"b"=>2}]

Options

option default meaning
symbolize_keys false return object keys as Symbols instead of Strings
duplicate_key :last_wins :last_wins / :first_wins / :raise for repeated keys in one object
bigdecimal_load :auto :auto keeps high-precision decimals as BigDecimal; :float forces Float; :bigdecimal forces BigDecimal
acceleration true true uses the C extension when compiled and loadable; false forces pure Ruby (identical results)
encoding "UTF-8" labels the input's encoding (no transcoding pass; see below)
on_warning nil a callable invoked once per lenient fix applied (:empty_slot, :empty_value, :duplicate_key), passed a SmarterJSON::Warning; the return value is never changed. See below.

Warnings (on_warning)

When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as null, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what process returns. Pass a callable as on_warning:; it is invoked once per fix with a SmarterJSON::Warning (type, message, line, col). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.

# Collect them all:
warns = []
data  = SmarterJSON.process(input, on_warning: ->(w) { warns << w })

# Or route them — log, count, raise:
SmarterJSON.process(input, on_warning: ->(w) { Rails.logger.warn(w) })

Performance

Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpus (canada, citm_catalog, twitter, github_events, …). The apples-to-apples comparisons are SmarterJSON/C vs Oj/strict vs stdlib json, all producing Float (run rake report in json_benchmarks/ for the full table — numbers vary run to run).

  • vs Oj/strict (the JSON.parse-equivalent mode, both producing Float): SmarterJSON/C is faster on nearly every file — typically 1.1–1.6× (e.g. big_decimals ~1.6×, deeply-nested ~1.4×, citm / twitter / usgs ~1.3×, github / citylots / weather ~1.1–1.2×). The one exception is string_array, where Oj/strict's SIMD string scan is ~1.7× faster — that's the current frontier.
  • vs stdlib json (C): competitive with the fastest Ruby JSON parser — it ties json on big_decimals and string_array, and trails by ~1.1–1.7× on the rest. (canada.json is the outlier, far behind — that's the BigDecimal default, see below.)
  • Numbers: floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.

Two notes on fair comparison:

  • NDJSON: on multi-document files, only SmarterJSON parses the input via plain process — Oj and json raise without a block, so their cells are N/A. That N/A reflects real default behavior, not a measurement gap. Plain process collects every document into an Array at ~270 MB/s; the block form yields each recovered document instead of returning the collected Array.
  • High-precision decimals (e.g. canada.json): SmarterJSON's default :auto mode preserves high-precision numbers as BigDecimal (matching Oj's default), which is intrinsically slower than Float. Against Float-producing parsers it looks slower on such files; pass bigdecimal_load: :float to compare like-for-like (it then runs much faster). Against the equivalent BigDecimal-producing Oj mode, SmarterJSON is faster.

Encoding

encoding: (default "UTF-8") labels what the input is — it does not trigger a transcoding pass. The parser works on the bytes in their native encoding and emits string values with the same encoding tag, the same way smarter_csv handles encodings. Bytes that are invalid for the claimed encoding raise SmarterJSON::EncodingError (a kind of SmarterJSON::ParseError).

Nesting & untrusted input

Both the C extension and the pure-Ruby parser are iterative, not recursive — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input cannot overflow the call stack or segfault: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib json caps at 100). The deeply_nested.json benchmark (212 MB of nesting) parses without issue.

The trade-off: there is currently no fixed nesting or input-size limit, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you parse untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream of the parser.

Development

After checking out the repo, run bin/setup to install dependencies, then rake compile to build the C extension and rake spec to run the tests. The test suite runs every example against both the C and pure-Ruby paths, so the two stay behavior-identical.

License

Available as open source under the terms of the MIT License.