SmarterJSON
A lenient, fast JSON parser for Ruby. It parses strict JSON, JSON5, HJSON-style config, and the messy JSON-ish input humans actually write — and in benchmarks it matches or beats Oj on nearly every file. SmarterJSON is opinionated: we want your JSON processing to be successful. Other parsers are strict - they stop at the first deviation - SmarterJSON keeps going - it optimizes for getting your data out, not for policing the JSON spec.
SmarterJSON: one parser, no modes — want strict? Please use the stdlib
jsongem.
Why SmarterJSON?
Most JSON parsers reject anything that isn't perfectly strict JSON. SmarterJSON is built on the opposite principle: you shouldn't have to care what flavor of JSON you were handed and you shouldn't lose the whole document because of formatting errors. Give it strict JSON, JSON5, an HJSON-style config file, newline-delimited JSON, or a copy-pasted blob with comments and trailing commas — it just parses it. When it is lenient, smarter_json isn't dropping data that exists — it's just not raising an eyebrow at a suspicious gap (like an extra comma). A strict parser would refuse the whole document and recover nothing; smarter_json returns everything except the formatting error.
For an ingestion tool, "reject the whole document because of one stray comma" is the worst outcome: you throw away the 99% that's fine to avoid maybe-mishandling a gap that carries no data anyway.
Three things set it apart:
One parser, no modes, no flags. There is no
dialect:option and no "strict mode" —SmarterJSON.process(input)accepts the whole superset, and strict JSON is simply the narrowest case. You don't configure the parser to match your input; it adapts to whatever you give it.It parses multi-document input automatically — a distinguishing feature.
SmarterJSON.processhandles NDJSON / JSONL / concatenated JSON with no block and no special method: one document returns its value, several documents return anArray, empty input returnsnil. The same rule applies when wrapper noise is stripped and several payloads are recovered from one blob. Only SmarterJSON parses multi-document input via plainprocess— Oj and the stdlibjsonlibrary raise without a block. For input larger than memory, pass a block to stream one document at a time.It's fast. A C extension (with a pure-Ruby fallback that runs everywhere) puts it ahead of Oj on nearly every file we benchmark, and competitive with the stdlib
jsonC parser — the fastest general-purpose Ruby JSON parser.
What it accepts, beyond strict JSON
//,/* … */, and#comments (a#///only starts a comment when preceded by whitespace, sourl: http://x.comparses as a string, not a truncated value)- Markdown-wrapped / chatty blobs around the payload: strips
```json/```fences, ignores obvious prose before/after the payload, unwraps<json>...</json>andBEGIN_JSON ... END_JSON, and preserves multiple recovered payloads as an Array - Trailing commas; unquoted keys (
{host: localhost}); single-quoted, triple-quoted ('''…'''), and quoteless string values - Implicit root object — a config file that starts with
key: value, no outer{} NaN,Infinity, hex (0xFF), leading+/., underscores in numbers (1_000_000)- UTF-8 BOM, smart/curly quotes, Python literals (
True/False/None), JavaScriptundefined - Mixed CR / LF / CRLF line endings, and any Ruby-supported input encoding (via
encoding:) - Duplicate keys (last value wins by default; configurable)
It raises only on genuinely unparseable input (unterminated string, mismatched bracket), with line and column in the message — never on valid-but-lenient input.
Installation
# Gemfile
gem "smarter_json"
gem install smarter_json
The C extension is built on install and used automatically. On platforms where it can't build, the pure-Ruby parser runs instead and produces identical results.
API stability and thread safety
The public API is now considered stable: SmarterJSON.process, SmarterJSON.process_file, SmarterJSON.generate, and the documented options in this README/docs are the supported surface.
Concurrent calls are safe. The parser/generator keep per-call state local, and the C extension only caches Ruby IDs / constants at load time; it does not share mutable parse state across calls.
Documentation
Usage
require "smarter_json"
SmarterJSON.process('{"a": 1, "b": [2, 3]}') # => {"a"=>1, "b"=>[2, 3]}
SmarterJSON.process("host: localhost\nport: 5432") # => {"host"=>"localhost", "port"=>5432} (no braces needed)
SmarterJSON.process_file("config.json5") # read a file, then parse
# Multiple documents (NDJSON / JSONL / concatenated) — no block, no special method:
SmarterJSON.process(%({"id":1}\n{"id":2}\n{"id":3})) # => [{"id"=>1}, {"id"=>2}, {"id"=>3}]
SmarterJSON.process('{"id":1}') # => {"id"=>1} (one document → the value itself)
SmarterJSON.process("") # => nil (zero documents)
# For input larger than memory, stream one document at a time with a block
# (process and process_file both forward the block):
SmarterJSON.process_file("events.ndjson") { |event| EventJob.perform_async(event) }
# Wrapper noise is stripped automatically:
SmarterJSON.process(<<~TEXT)
Here is the JSON:
```json
{
"a": 1
}
TEXT
=> "a"=>1
SmarterJSON.process(<<~TEXT) Here is the result:
{ "a": 1 }
Hope this helps. TEXT
=> "a"=>1
SmarterJSON.process("
=> "a"=>1
SmarterJSON.process(<<~TEXT) first attempt: "a":1
corrected payload: "b":2 TEXT
=> ["a"=>1, "b"=>2]
### Options
| option | default | meaning |
|-------------------|--------------|-------------------------------------------------------------------------|
| `symbolize_keys` | `false` | return object keys as Symbols instead of Strings |
| `duplicate_key` | `:last_wins` | `:last_wins` / `:first_wins` / `:raise` for repeated keys in one object |
| `bigdecimal_load` | `:auto` | `:auto` keeps high-precision decimals as `BigDecimal`; `:float` forces `Float`; `:bigdecimal` forces `BigDecimal` |
| `acceleration` | `true` | `true` uses the C extension when compiled and loadable; `false` forces pure Ruby (identical results) |
| `encoding` | `"UTF-8"` | labels the input's encoding (no transcoding pass; see below) |
| `on_warning` | `nil` | a callable invoked once per lenient fix applied (`:empty_slot`, `:empty_value`, `:duplicate_key`), passed a `SmarterJSON::Warning`; the return value is never changed. See below. |
### Warnings (`on_warning`)
When the parser quietly fixes something lenient — collapses an empty comma slot, reads a key with no value as `null`, drops a duplicate key, strips code fences, ignores wrapper prose, unwraps wrapper tags — it can tell you, without changing what `process` returns. Pass a callable as `on_warning:`; it is invoked once per fix with a `SmarterJSON::Warning` (`type`, `message`, `line`, `col`). It fires on every path, including the streaming block form. With no handler (the default) nothing is recorded and there is zero overhead.
```ruby
# Collect them all:
warns = []
data = SmarterJSON.process(input, on_warning: ->(w) { warns << w })
# Or route them — log, count, raise:
SmarterJSON.process(input, on_warning: ->(w) { Rails.logger.warn(w) })
Performance
Benchmarks: p10 of 40 runs, Apple M1 Max, Ruby 3.4.7, on the standard JSON corpus (canada, citm_catalog, twitter, github_events, …). The apples-to-apples comparisons are SmarterJSON/C vs Oj/strict vs stdlib json, all producing Float (run rake report in json_benchmarks/ for the full table — numbers vary run to run).
- vs Oj/strict (the
JSON.parse-equivalent mode, both producingFloat): SmarterJSON/C is faster on nearly every file — typically 1.1–1.6× (e.g. big_decimals ~1.6×, deeply-nested ~1.4×, citm / twitter / usgs ~1.3×, github / citylots / weather ~1.1–1.2×). The one exception is string_array, where Oj/strict's SIMD string scan is ~1.7× faster — that's the current frontier. - vs stdlib
json(C): competitive with the fastest Ruby JSON parser — it tiesjsonon big_decimals and string_array, and trails by ~1.1–1.7× on the rest. (canada.jsonis the outlier, far behind — that's theBigDecimaldefault, see below.) - Numbers: floats are parsed with Ryū (correctly rounded, single-pass), so number-heavy data is fast and bit-exact.
Two notes on fair comparison:
- NDJSON: on multi-document files, only SmarterJSON parses the input via plain
process— Oj andjsonraise without a block, so their cells areN/A. ThatN/Areflects real default behavior, not a measurement gap. Plainprocesscollects every document into an Array at ~270 MB/s; the streaming block form runs faster (~440 MB/s) because it doesn't hold all documents in memory at once. - High-precision decimals (e.g.
canada.json): SmarterJSON's default:automode preserves high-precision numbers asBigDecimal(matching Oj's default), which is intrinsically slower thanFloat. AgainstFloat-producing parsers it looks slower on such files; passbigdecimal_load: :floatto compare like-for-like (it then runs much faster). Against the equivalentBigDecimal-producing Oj mode, SmarterJSON is faster.
Encoding
encoding: (default "UTF-8") labels what the input is — it does not trigger a transcoding pass. The parser works on the bytes in their native encoding and emits string values with the same encoding tag, the same way smarter_csv handles encodings. Bytes that are invalid for the claimed encoding raise SmarterJSON::EncodingError (a kind of SmarterJSON::ParseError).
Nesting & untrusted input
Both the C extension and the pure-Ruby parser are iterative, not recursive — they track nesting on an explicit, heap-allocated stack rather than the call stack. So deeply nested input cannot overflow the call stack or segfault: nesting is bounded only by available memory, the same posture as Oj (which also ships no nesting limit; the stdlib json caps at 100). The deeply_nested.json benchmark (212 MB of nesting) parses without issue.
The trade-off: there is currently no fixed nesting or input-size limit, so extremely large or adversarially-nested untrusted input is bounded by memory (it can exhaust RAM), not by a crash. If you parse untrusted input and want a hard cap, that's a planned opt-in guard — for now, size-limit upstream of the parser.
Development
After checking out the repo, run bin/setup to install dependencies, then rake compile to build the C extension and rake spec to run the tests. The test suite runs every example against both the C and pure-Ruby paths, so the two stay behavior-identical.
License
Available as open source under the terms of the MIT License.