Module: SmarterJSON

Defined in:
lib/smarter_json/backports.rb,
lib/smarter_json.rb,
lib/smarter_json/errors.rb,
lib/smarter_json/parser.rb,
lib/smarter_json/options.rb,
lib/smarter_json/version.rb,
lib/smarter_json/warning.rb,
lib/smarter_json/generator.rb,
ext/smarter_json/smarter_json.c

Overview

Refinement backport of Array#filter_map for Ruby < 2.7 (the gem supports >= 2.6.0).

filter_map shipped in Ruby 2.7. Rather than monkey-patching core Enumerable globally, this is a refinement scoped to the single file that needs it: parser.rb does ‘using SmarterJSON::Backports` (guarded to Ruby < 2.7). On 2.7+ the refinement is never activated, so the native © filter_map is used and this is a complete no-op.

DELETE this file, its require in lib/smarter_json.rb, and the ‘using` line in parser.rb once the minimum supported Ruby is >= 2.7.

Defined Under Namespace

Modules: Backports, Bytes, Framer, Options, Recovery Classes: EncodingError, Error, GenerateError, Generator, ParseError, Parser, Warning

Constant Summary collapse

HAS_ACCELERATION =
respond_to?(:parse_c)
VERSION =
"1.1.0"

Class Method Summary collapse

Class Method Details

.foreach(source, options = {}, &block) ⇒ Object

SmarterJSON.foreach(source, options = {}) — the streaming, composable sibling of process_file, mirroring the stdlib convention (CSV.foreach / File.foreach): a plain Enumerator (NOT Enumerator::Lazy), so .map / .select behave the normal way and return an Array.

‘source` is a file path (opened and streamed from disk, like process_file) OR an IO — a socket, a StringIO, an open File — streamed directly from its current position. A String is always a path, never content. An IO source is single-pass: it can only be read once, so iterating the returned Enumerator a second time over the same IO yields nothing.

Without a block: returns an Enumerator over each top-level document, reading one document at a time via readpartial — it never slurps the whole file the way process_file(path) does. So foreach(path).first(3) reads only ~3 documents off disk, and foreach(src).each { … } / .next stream in bounded memory. .map / .select read the source one document at a time but still build an Array of their result; for a chain that stays bounded end to end (a large filtered set off a fat file) opt into .lazy at the call site: foreach(src).lazy.select { … }.each { … }.

With a block: streams each document and returns the document count — identical to process_file(path) { |doc| … } (or process(io) { |doc| … } for an IO).

Options are validated eagerly (before the Enumerator is returned), so a bad option key or value fails fast rather than on first iteration.



84
85
86
87
88
89
90
91
92
93
# File 'lib/smarter_json/parser.rb', line 84

def foreach(source, options = {}, &block)
  options = Options.process_options(options)
  return enum_for(:foreach, source, options) unless block

  if source.respond_to?(:read) # an IO (socket, StringIO, open File) — stream it directly
    stream_io(source, options, &block)
  else                         # a path — open the file and stream from disk
    process_file(source, options, &block)
  end
end

.generate(obj, options = {}) ⇒ Object

SmarterJSON.generate(obj, options = {}) — write a Ruby value as JSON.

options:

:json   (default) — standard JSON. Hash -> object, Array -> array,
                    scalar -> scalar. Always valid, interoperable JSON.
:ndjson           — newline-delimited JSON. An Array writes one element per
                    line; any other value writes as a single line. The
                    inverse of process reading NDJSON back into an Array.

options: spaces per nesting level for pretty-printing (Integer, default

0 = compact). Empty objects/arrays stay inline. Not allowed with :ndjson (a
record must be a single line) — combining them raises ArgumentError.

Symbol keys/values are emitted as strings; BigDecimal as a JSON number. Unsupported types (Time, custom objects) and non-finite Floats raise SmarterJSON::GenerateError. Returns a String.



24
25
26
# File 'lib/smarter_json/generator.rb', line 24

def generate(obj, options = {})
  Generator.new(options).generate(obj)
end

.normalize_default_encoding(input, options) ⇒ Object

Smart default for the nil :encoding option. A String tagged ASCII-8BIT (BINARY) is how Net::HTTP and many HTTP libraries hand back a response body even when the bytes are UTF-8. JSON’s interchange encoding is UTF-8, so we relabel such input to UTF-8 when its bytes are valid UTF-8 — otherwise string values would come back tagged ASCII-8BIT and compare unequal to UTF-8 literals (a silent footgun). When the bytes are NOT valid UTF-8 we raise EncodingError rather than guess a legacy encoding — pass an explicit :encoding for that. An explicit (non-nil) :encoding, or any non-BINARY tag, is left untouched (the per-path force_encoding / validation handles it). Only relabels — never transcodes.

Raises:



156
157
158
159
160
161
162
163
164
# File 'lib/smarter_json/parser.rb', line 156

def normalize_default_encoding(input, options)
  return input unless options[:encoding].nil?
  return input unless input.encoding == Encoding::ASCII_8BIT

  utf8 = input.dup.force_encoding(Encoding::UTF_8)
  return utf8 if utf8.valid_encoding?

  raise EncodingError, "input is tagged ASCII-8BIT and is not valid UTF-8 — pass encoding: to declare its encoding"
end

.parse_c(input, opts) ⇒ Object



1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
# File 'ext/smarter_json/smarter_json.c', line 1548

static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
  fj_state st;
  VALUE enc_opt, dk;

  Check_Type(input, T_STRING);

  enc_opt = rb_hash_aref(opts, fj_sym_encoding);
  if (!NIL_P(enc_opt)) {
    input = rb_funcall(rb_str_dup(input), fj_force_encoding_id, 1, enc_opt);
  }
  if (!RTEST(rb_funcall(input, fj_valid_encoding_p_id, 0))) {
    VALUE name = rb_funcall(rb_funcall(input, fj_encoding_id, 0), fj_name_id, 0);
    VALUE msg = rb_sprintf("invalid byte sequence for %" PRIsVALUE, name);
    rb_exc_raise(rb_funcall(cEncodingError, fj_new_id, 3, msg, Qnil, Qnil));
  }

  st.buf = RSTRING_PTR(input);
  st.len = RSTRING_LEN(input);
  st.pos = 0;
  st.enc = rb_enc_get(input);
  st.depth = 0;
#ifdef HAVE_RB_ENC_INTERNED_STR
  fj_kc_slot kcache[FJ_KCACHE_SIZE];
  memset(kcache, 0, sizeof(kcache));
  st.kcache = kcache;
#else
  st.kcache = NULL;
#endif

  st.symbolize_keys = RTEST(rb_hash_aref(opts, fj_sym_symbolize_keys));
  dk = rb_hash_aref(opts, fj_sym_duplicate_key);
  st.dup_first_wins = (dk == fj_sym_first_wins);

  {
    VALUE bd = rb_hash_aref(opts, fj_sym_decimal_precision);
    if (bd == fj_sym_float) st.decimal_precision = 0;
    else if (bd == fj_sym_bigdecimal) st.decimal_precision = 2;
    else st.decimal_precision = 1; /* :auto (default), including nil */
  }

  st.on_warning = rb_hash_aref(opts, fj_sym_on_warning); /* Qnil when absent */

  if (st.len >= 3 && (unsigned char)st.buf[0] == 0xEF &&
      (unsigned char)st.buf[1] == 0xBB && (unsigned char)st.buf[2] == 0xBF) {
    st.pos = 3;
  }

  /* With a block: yield each top-level document until EOF and return the document
   * count (NDJSON / JSONL / concatenated). Same loop as the Ruby each_value path. */
  if (rb_block_given_p()) {
    long count = 0;
    for (;;) {
      VALUE v;
      fj_skip_document_separators(&st);
      if (fj_eof(&st)) break;
      v = fj_parse_iter(&st, fj_implicit_root_ahead(&st));
      fj_enforce_scalar_boundary(&st, v);
      rb_yield(v);
      count++;
    }
    return LONG2NUM(count);
  }

  /* No block: always return an Array of every top-level document (0 -> [], 1 ->
   * [doc], 2+ -> [d1, d2, …]) — the always-array contract. Documents are separated by
   * newline / comma / concatenation (self-delimiting values); a space alone never
   * separates, and a bare scalar must be followed by a real separator, so `1 2 3`
   * raises while `1\n2\n3` and `1, 2, 3` are three documents. */
  {
    VALUE arr = rb_ary_new();
    for (;;) {
      VALUE v;
      fj_skip_document_separators(&st);
      if (fj_eof(&st)) break;
      v = fj_parse_iter(&st, fj_implicit_root_ahead(&st));
      fj_enforce_scalar_boundary(&st, v);
      rb_ary_push(arr, v);
    }
    return arr;
  }
}

.process(input, options = {}, &block) ⇒ Object

SmarterJSON.process(input, options = {}) — the main entry point.

‘input` is either a String of JSON content or an IO to read from. (A String is always content, never a filename — use process_file for paths.) The values in `options` override Parser::DEFAULT_OPTIONS.

Without a block: always returns an Array of the documents found — [] for none,

doc

for one, [d1, d2, …] for several (NDJSON / JSONL / concatenated). A

top-level value must be a recognized JSON value (number / literal / quoted string / object / array) or an implicit-root object, else it raises. For the single-document case use SmarterJSON.process_one (returns the bare value). :acceleration (default true) selects the C extension when compiled and loaded (SmarterJSON::HAS_ACCELERATION); otherwise the pure-Ruby parser.

With a block: yields each top-level document as it is parsed, and returns the document count. For an IO this streams document-by-document in bounded memory —it reads the stream as newline-delimited documents (NDJSON / JSONL), one per line.



31
32
33
34
35
36
37
38
39
40
# File 'lib/smarter_json/parser.rb', line 31

def process(input, options = {}, &block)
  options = Options.process_options(options)
  if input.is_a?(String)
    Recovery.process_string(input, options, &block)
  elsif input.respond_to?(:read)
    block ? stream_io(input, options, &block) : process(input.read, options)
  else
    raise ArgumentError, "SmarterJSON.process expects a String or an IO, got #{input.class}"
  end
end

.process_file(path, options = {}, &block) ⇒ Object

SmarterJSON.process_file(path, options = {}) — open a file and process it.

The :encoding option labels the file’s encoding (default “UTF-8”); it does NOT trigger a transcoding pass — the parser works on the bytes in their native encoding and emits string values with the same encoding tag. With a block, streams document-by-document straight from disk in bounded memory (never loading the whole file); the documents are read as newline-delimited (NDJSON / JSONL), one per line.



50
51
52
53
54
55
56
57
58
# File 'lib/smarter_json/parser.rb', line 50

def process_file(path, options = {}, &block)
  options = Options.process_options(options)
  encoding = options[:encoding] || "UTF-8"
  if block
    File.open(path, "r:#{encoding}") { |io| stream_io(io, options, &block) }
  else
    process(File.read(path, encoding: encoding), options)
  end
end

.process_one(input, options = {}) ⇒ Object

SmarterJSON.process_one(input, options = {}) — the single-document accessor.

Returns the first document’s value (or nil when the input holds no documents). When the input holds MORE than one document it returns the first and warns once — it never raises, since an extra document is valid data; the warning goes to on_warning if set, else Rails.logger.warn when Rails is loaded, else Kernel#warn. For an IO this is bounded memory: it parses just the first document and stops as soon as a second is seen, instead of materialising the whole stream the way process(io).first would. (process(input).first and process(input) silently drop documents 2+ — a footgun; use process_one instead.)



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/smarter_json/parser.rb', line 105

def process_one(input, options = {})
  options = Options.process_options(options)

  # IO: bounded memory — parse just the first document and stop once a second is
  # seen (peek-to-warn). A String is already in memory, so use the plain no-block
  # path: it returns the full (wrapper-recovered, de-duplicated) Array in one pass,
  # which also avoids the reactive-recovery double-yield the block path would hit.
  unless input.respond_to?(:read)
    docs = process(input, options)
    warn_extra_documents(options) if docs.length > 1
    return docs.first
  end

  first = nil
  count = 0
  catch(:smarter_json_first_document) do
    process(input, options) do |doc|
      count += 1
      first = doc if count == 1
      throw(:smarter_json_first_document) if count > 1
    end
  end
  warn_extra_documents(options) if count > 1
  first
end