Module: SmarterJSON
- Defined in:
- lib/smarter_json/backports.rb,
lib/smarter_json.rb,
lib/smarter_json/errors.rb,
lib/smarter_json/parser.rb,
lib/smarter_json/options.rb,
lib/smarter_json/version.rb,
lib/smarter_json/warning.rb,
lib/smarter_json/generator.rb,
ext/smarter_json/smarter_json.c
Overview
Refinement backport of Array#filter_map for Ruby < 2.7 (the gem supports >= 2.6.0).
filter_map shipped in Ruby 2.7. Rather than monkey-patching core Enumerable
globally, this is a refinement scoped to the single file that needs it: parser.rb
does using SmarterJSON::Backports (guarded to Ruby < 2.7). On 2.7+ the
refinement is never activated, so the native (C) filter_map is used and this is a
complete no-op.
DELETE this file, its require in lib/smarter_json.rb, and the using line in
parser.rb once the minimum supported Ruby is >= 2.7.
Defined Under Namespace
Modules: Backports, Bytes, Framer, Options, Recovery Classes: EncodingError, Error, GenerateError, Generator, ParseError, Parser, Warning
Constant Summary collapse
- HAS_ACCELERATION =
respond_to?(:parse_c)
- UNSCANNABLE_ASCII_COMPATIBLE =
Legacy CJK double-byte encodings whose trail bytes can fall in the ASCII range, so a 0x5C trail byte looks like a string escape, a 0x7B like a brace, etc. — i.e. they are ascii_compatible? yet still NOT safe to byte-scan for JSON structure. (EUC-* and single-byte encodings keep their non-ASCII bytes above 0x7F, so they ARE safe.)
%w[ Shift_JIS Windows-31J MacJapanese SHIFT_JISX0213 SJIS-DoCoMo SJIS-KDDI SJIS-SoftBank Big5 Big5-HKSCS Big5-UAO CP950 GBK GB18030 GB12345 ].each_with_object({}) do |name, h| h[Encoding.find(name)] = true rescue ArgumentError # encoding not built into this Ruby — skip it end.freeze
- VERSION =
"1.2.3"
Class Method Summary collapse
-
.foreach(source, options = {}, &block) ⇒ Object
SmarterJSON.foreach(source, options = {}) — the streaming, composable sibling of process_file, mirroring the stdlib convention (CSV.foreach / File.foreach): a plain Enumerator (NOT Enumerator::Lazy), so .map / .select behave the normal way and return an Array.
-
.generate(obj, options = {}) ⇒ Object
SmarterJSON.generate(obj, options = {}) — write a Ruby value as JSON.
- .parse_c(input, opts) ⇒ Object
-
.process(input, options = {}, &block) ⇒ Object
SmarterJSON.process(input, options = {}) — the main entry point.
-
.process_file(path, options = {}, &block) ⇒ Object
SmarterJSON.process_file(path, options = {}) — open a file and process it.
-
.process_one(input, options = {}) ⇒ Object
SmarterJSON.process_one(input, options = {}) — the single-document accessor.
Class Method Details
.foreach(source, options = {}, &block) ⇒ Object
SmarterJSON.foreach(source, options = {}) — the streaming, composable sibling of process_file, mirroring the stdlib convention (CSV.foreach / File.foreach): a plain Enumerator (NOT Enumerator::Lazy), so .map / .select behave the normal way and return an Array.
source is a file path (opened and streamed from disk, like process_file) OR an
IO — a socket, a StringIO, an open File — streamed directly from its current
position. A String is always a path, never content. An IO source is single-pass:
it can only be read once, so iterating the returned Enumerator a second time over
the same IO yields nothing.
Without a block: returns an Enumerator over each top-level document, reading one document at a time via readpartial — it never slurps the whole file the way process_file(path) does. So foreach(path).first(3) reads only ~3 documents off disk, and foreach(src).each { … } / .next stream in bounded memory. .map / .select read the source one document at a time but still build an Array of their result; for a chain that stays bounded end to end (a large filtered set off a fat file) opt into .lazy at the call site: foreach(src).lazy.select { … }.each { … }.
With a block: streams each document and returns the document count — identical to process_file(path) { |doc| … } (or process(io) { |doc| … } for an IO).
Options are validated eagerly (before the Enumerator is returned), so a bad option key or value fails fast rather than on first iteration.
104 105 106 107 108 109 110 111 112 113 |
# File 'lib/smarter_json/parser.rb', line 104 def foreach(source, = {}, &block) = Options.() return enum_for(:foreach, source, ) unless block if source.respond_to?(:read) # an IO (socket, StringIO, open File) — stream it directly stream_io(source, , &block) else # a path — open the file and stream from disk process_file(source, , &block) end end |
.generate(obj, options = {}) ⇒ Object
SmarterJSON.generate(obj, options = {}) — write a Ruby value as JSON.
:json (default) — standard JSON. Hash -> object, Array -> array,
scalar -> scalar. Always valid, interoperable JSON.
:ndjson — newline-delimited JSON. An Array writes one element per
line; any other value writes as a single line. The
inverse of process reading NDJSON back into an Array.
options: spaces per nesting level for pretty-printing (Integer, default 0 = compact). Empty objects/arrays stay inline. Not allowed with :ndjson (a record must be a single line) — combining them raises ArgumentError.
Symbol keys/values are emitted as strings; BigDecimal as a JSON number. Unsupported types (Time, custom objects) and non-finite Floats raise SmarterJSON::GenerateError. Returns a String.
24 25 26 |
# File 'lib/smarter_json/generator.rb', line 24 def generate(obj, = {}) Generator.new().generate(obj) end |
.parse_c(input, opts) ⇒ Object
1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 |
# File 'ext/smarter_json/smarter_json.c', line 1664
static VALUE fj_parse_c(VALUE self, VALUE input, VALUE opts) {
fj_state st;
VALUE enc_opt, dk;
Check_Type(input, T_STRING);
enc_opt = rb_hash_aref(opts, fj_sym_encoding);
if (!NIL_P(enc_opt)) {
input = rb_funcall(rb_str_dup(input), fj_force_encoding_id, 1, enc_opt);
}
if (!RTEST(rb_funcall(input, fj_valid_encoding_p_id, 0))) {
VALUE name = rb_funcall(rb_funcall(input, fj_encoding_id, 0), fj_name_id, 0);
VALUE msg = rb_sprintf("invalid byte sequence for %" PRIsVALUE, name);
rb_exc_raise(rb_funcall(cEncodingError, fj_new_id, 3, msg, Qnil, Qnil));
}
st.buf = RSTRING_PTR(input);
st.len = RSTRING_LEN(input);
st.pos = 0;
st.enc = rb_enc_get(input);
st.depth = 0;
#ifdef HAVE_RB_ENC_INTERNED_STR
fj_kc_slot kcache[FJ_KCACHE_SIZE];
memset(kcache, 0, sizeof(kcache));
st.kcache = kcache;
#else
st.kcache = NULL;
#endif
st.symbolize_keys = RTEST(rb_hash_aref(opts, fj_sym_symbolize_keys));
dk = rb_hash_aref(opts, fj_sym_duplicate_key);
st.dup_first_wins = (dk == fj_sym_first_wins);
{
VALUE bd = rb_hash_aref(opts, fj_sym_decimal_precision);
if (bd == fj_sym_float) st.decimal_precision = 0;
else if (bd == fj_sym_bigdecimal) st.decimal_precision = 2;
else st.decimal_precision = 1; /* :auto (default), including nil */
}
st.on_warning = rb_hash_aref(opts, fj_sym_on_warning); /* Qnil when absent */
if (st.len >= 3 && (unsigned char)st.buf[0] == 0xEF &&
(unsigned char)st.buf[1] == 0xBB && (unsigned char)st.buf[2] == 0xBF) {
st.pos = 3;
}
/* With a block: yield each top-level document until EOF and return the document
* count (NDJSON / JSONL / concatenated). Same loop as the Ruby each_value path. */
if (rb_block_given_p()) {
long count = 0;
for (;;) {
VALUE v;
fj_skip_document_separators(&st);
if (fj_eof(&st)) break;
v = fj_parse_iter(&st, fj_implicit_root_ahead(&st));
fj_enforce_scalar_boundary(&st, v);
rb_yield(v);
count++;
}
return LONG2NUM(count);
}
/* No block: always return an Array of every top-level document (0 -> [], 1 ->
* [doc], 2+ -> [d1, d2, …]) — the always-array contract. Documents are separated by
* newline / comma / concatenation (self-delimiting values); a space alone never
* separates, and a bare scalar must be followed by a real separator, so `1 2 3`
* raises while `1\n2\n3` and `1, 2, 3` are three documents. */
{
VALUE arr = rb_ary_new();
for (;;) {
VALUE v;
fj_skip_document_separators(&st);
if (fj_eof(&st)) break;
v = fj_parse_iter(&st, fj_implicit_root_ahead(&st));
fj_enforce_scalar_boundary(&st, v);
rb_ary_push(arr, v);
}
return arr;
}
}
|
.process(input, options = {}, &block) ⇒ Object
SmarterJSON.process(input, options = {}) — the main entry point.
input is either a String of JSON content or an IO to read from. (A String
is always content, never a filename — use process_file for paths.) The values
in options override Parser::DEFAULT_OPTIONS.
Without a block: always returns an Array of the documents found — [] for none, [doc] for one, [d1, d2, …] for several (NDJSON / JSONL / concatenated). A top-level value must be a recognized JSON value (number / literal / quoted string / object / array) or an implicit-root object, else it raises. For the single-document case use SmarterJSON.process_one (returns the bare value). :acceleration (default true) selects the C extension when compiled and loaded (SmarterJSON::HAS_ACCELERATION); otherwise the pure-Ruby parser.
With a block: yields each top-level document as it is parsed, and returns the document count. For an IO this streams document-by-document in bounded memory — it reads the stream as newline-delimited documents (NDJSON / JSONL), one per line.
31 32 33 34 35 36 37 38 39 40 |
# File 'lib/smarter_json/parser.rb', line 31 def process(input, = {}, &block) = Options.() if input.is_a?(String) Recovery.process_string(input, , &block) elsif input.respond_to?(:read) block ? stream_io(input, , &block) : process(input.read, ) else raise ArgumentError, "SmarterJSON.process expects a String or an IO, got #{input.class}" end end |
.process_file(path, options = {}, &block) ⇒ Object
SmarterJSON.process_file(path, options = {}) — open a file and process it.
The :encoding option labels the file's encoding (default "UTF-8").
The user can send any encoding to SmarterJSON - we make zero assumptions about encoding. We also do not "normalize" the input to a different encoding on our own (this is not Python).
We parse the bytes in whatever encoding they arrive in and emit string values with that same encoding tag.
The caller is free to transcode the input themselves (e.g. open the file with a "r:ext:int" mode); however the bytes arrive, we parse them and preserve their encoding. With a block, streams document-by-document straight from disk in bounded memory (neverloading the whole file); the documents are read as newline-delimited (NDJSON / JSONL), one per line.
57 58 59 60 61 62 63 64 65 66 |
# File 'lib/smarter_json/parser.rb', line 57 def process_file(path, = {}, &block) = Options.() encoding = [:encoding] || "UTF-8" mode = file_read_mode(encoding) if block File.open(path, mode) { |io| stream_io(io, , &block) } else process(File.read(path, mode: mode), ) end end |
.process_one(input, options = {}) ⇒ Object
SmarterJSON.process_one(input, options = {}) — the single-document accessor.
Returns the first document's value (or nil when the input holds no documents). When the input holds MORE than one document it returns the first and warns once — it never raises, since an extra document is valid data; the warning goes to on_warning if set, else Rails.logger.warn when Rails is loaded, else Kernel#warn. For an IO this is bounded memory: it parses just the first document and stops as soon as a second is seen, instead of materialising the whole stream the way process(io).first would. (process(input).first and process(input)[0] silently drop documents 2+ — a footgun; use process_one instead.)
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
# File 'lib/smarter_json/parser.rb', line 125 def process_one(input, = {}) = Options.() # IO: bounded memory — parse just the first document and stop once a second is # seen (peek-to-warn). A String is already in memory, so use the plain no-block # path: it returns the full (wrapper-recovered, de-duplicated) Array in one pass, # which also avoids the reactive-recovery double-yield the block path would hit. unless input.respond_to?(:read) docs = process(input, ) warn_extra_documents() if docs.length > 1 return docs.first end first = nil count = 0 catch(:smarter_json_first_document) do process(input, ) do |doc| count += 1 first = doc if count == 1 throw(:smarter_json_first_document) if count > 1 end end warn_extra_documents() if count > 1 first end |