SmarterJSON
A lenient, fast JSON parser for Ruby. It parses strict JSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Other parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
SmarterJSON: one parser, no modes — want strict? Please use the stdlib
jsongem.
Why SmarterJSON?
Most JSON parsers reject anything that isn't perfectly strict JSON. SmarterJSON is built on the opposite principle: you shouldn't have to care what flavor of JSON you were handed and you shouldn't lose the whole document because of formatting errors. Give it strict JSON, JSON5, an HJSON-style config file, newline-delimited JSON, or a copy-pasted blob with comments and trailing commas — it just parses it. When it is lenient, smarter_json isn't dropping data that exists — it's just not raising an eyebrow at a suspicious gap (like an extra comma). A strict parser would refuse the whole document and recover nothing; smarter_json returns everything except the formatting error.
For an ingestion tool, "reject the whole document because of one stray comma" is the worst outcome: you throw away the 99% that's fine to avoid maybe-mishandling a gap that carries no data anyway.
Three things set it apart:
One parser, no modes, no flags. There is no
dialect:option and no "strict mode" —SmarterJSON.process(input)accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.It parses multi-document input automatically — a distinguishing feature.
SmarterJSON.processhandles NDJSON / JSONL / concatenated JSON with no block and no special method: one document returns its value, several documents return anArray, empty input returnsnil. Only SmarterJSON parses multi-document input via plainprocess— Oj and the stdlibjsonlibrary raise without a block. For input larger than memory, pass a block to stream one document at a time.It's fast. A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib
jsonC parser — the fastest general-purpose Ruby JSON parser.
What it accepts, beyond strict JSON
//,/* … */, and#comments (a#///only starts a comment when preceded by whitespace, sourl: http://x.comparses as a string, not a truncated value)- Trailing commas; unquoted keys (
{host: localhost}); single-quoted, triple-quoted ('''…'''), and quoteless string values - Implicit root object — a config file that starts with
key: value, no outer{} NaN,Infinity, hex (0xFF), leading+/., underscores in numbers (1_000_000)- UTF-8 BOM, smart/curly quotes, Python literals (
True/False/None), JavaScriptundefined - Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via
encoding:) - Duplicate keys (last value wins by default; configurable)
It raises only on genuinely unparseable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.
Installation
# Gemfile
gem "smarter_json"
gem install smarter_json
The C extension is built on install and used automatically. On platforms where it can't build, the pure-Ruby parser runs instead and produces identical results.
Documentation
Usage
require "smarter_json"
SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
SmarterJSON.process_file("config.json5") # read a file, then parse
# Multiple documents (NDJSON / JSONL / concatenated) — no block, no special method:
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
SmarterJSON.process("") # => nil (zero documents)
# For input larger than memory, stream one document at a time with a block
# (process and process_file both forward the block):
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
Options
| option | default | meaning |
|---|---|---|
symbolize_keys |
false |
return object keys as Symbols instead of Strings |
duplicate_key |
:last_wins |
:last_wins / :first_wins / :raise for repeated keys in one object |
bigdecimal_load |
:auto |
:auto keeps high-precision decimals as BigDecimal; :float forces Float; :bigdecimal forces BigDecimal |
acceleration |
true |
true uses the C extension when compiled and loadable; false forces pure Ruby (identical results) |
encoding |
"UTF-8" |
labels the input's encoding (no transcoding pass; see below) |
warnings |
false |
when true, return [result, warnings] — warnings lists the lenient fixes applied (:empty_slot, :empty_value, :duplicate_key) |
Performance
Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpus (canada, citm_catalog, twitter, github_events, …). The apples-to-apples comparisons are SmarterJSON/C vs Oj/strict vs stdlib json, all producing Float (run rake report in json_benchmarks/ for the full table — numbers vary run to run).
- vs Oj/strict (the
JSON.parse-equivalent mode, both producingFloat): SmarterJSON/C is faster on nearly every file — typically 1.1–1.6× (e.g. big_decimals ~1.6×, deeply-nested ~1.4×, citm / twitter / usgs ~1.3×, github / citylots / weather ~1.1–1.2×). The one exception is string_array, where Oj/strict's SIMD string scan is ~1.7× faster — that's the current frontier. - vs stdlib
json(C): competitive with the fastest Ruby JSON parser — it tiesjsonon big_decimals and string_array, and trails by ~1.1–1.7× on the rest. (canada.jsonis the outlier, far behind — that's theBigDecimaldefault, see below.) - Numbers: floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
Two notes on fair comparison:
- NDJSON: on multi-document files, only SmarterJSON parses the input via plain
process— Oj andjsonraise without a block, so their cells areN/A. ThatN/Areflects real default behavior, not a measurement gap. Plainprocesscollects every document into an Array at ~270 MB/s; the streaming block form runs faster (~440 MB/s) because it doesn't hold all documents in memory at once — use it for input larger than RAM. - High-precision decimals (e.g.
canada.json): SmarterJSON's default:automode preserves high-precision numbers asBigDecimal(matching Oj's default), which is intrinsically slower thanFloat. AgainstFloat-producing parsers it looks slower on such files; passbigdecimal_load: :floatto compare like-for-like (it then runs much faster). Against the equivalentBigDecimal-producing Oj mode, SmarterJSON is faster.
Encoding
encoding: (default "UTF-8") labels what the input is — it does not trigger a transcoding pass. The parser works on the bytes in their native encoding and emits string values with the same encoding tag, the same way smarter_csv handles encodings. Bytes that are invalid for the claimed encoding raise SmarterJSON::EncodingError (a kind of SmarterJSON::ParseError).
Nesting & untrusted input
Both the C extension and the pure-Ruby parser are iterative, not recursive — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input cannot overflow the call stack or segfault: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib json caps at 100). The deeply_nested.json benchmark (212 MB of nesting) parses without issue.
The trade-off: there is currently no fixed nesting or input-size limit, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you parse untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream of the parser.
Development
After checking out the repo, run bin/setup to install dependencies, then rake compile to build the C extension and rake spec to run the tests. The test suite runs every example against both the C and pure-Ruby paths, so the two stay behavior-identical.
License
Available as open source under the terms of the MIT License.