Class: SmarterJSON::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/smarter_json/parser.rb

Overview

Hand-rolled FSM single-pass parser. Layer 1: strict JSON (RFC 8259). Layer 2: JSON5 additions — line/block comments, trailing comma,

unquoted ECMAScript identifier keys, single-quoted strings,
hex numbers, leading/trailing decimal points, Infinity/NaN,
explicit + sign, \-line-continuation inside strings.

Layer 3: HJSON-inspired additions — #/comment-marker rule, triple-quoted

strings, quoteless single-line strings, implicit root object,
newline-as-separator, broader unquoted keys, recognized-literals-win.

Layer 4: smarter_json additions — UTF-8 BOM skip, smart/curly quotes,

Python literals (True/False/None) and undefined, underscores in
numeric literals, and encoding validation (SmarterJSON::EncodingError).

Constant Summary collapse

LBRACE =
0x7B
RBRACE =
0x7D
LBRACKET =
0x5B
RBRACKET =
0x5D
COLON =
0x3A
COMMA =
0x2C
DQUOTE =
0x22
SQUOTE =
0x27
BACKSLASH =
0x5C
SLASH =
0x2F
STAR =
0x2A
HASH =
0x23
MINUS =
0x2D
PLUS =
0x2B
DOT =
0x2E
ZERO =
0x30
NINE =
0x39
LOWER_E =
0x65
UPPER_E =
0x45
LOWER_T =
0x74
LOWER_F =
0x66
LOWER_N =
0x6E
LOWER_U =
0x75
LOWER_X =
0x78
UPPER_X =
0x58
UPPER_I =
0x49
UPPER_N =
0x4E
UPPER_T =
0x54
UPPER_F =
0x46
UNDERSCORE =
0x5F
DOLLAR =
0x24
SPACE =
0x20
TAB =
0x09
LF =
0x0A
CR =
0x0D
NOT_NUMERIC =
Object.new
HEX_RE =
/\A[-+]?0[xX][0-9a-fA-F_]+\z/.freeze
DEC_RE =
/\A[-+]?(?:0|[1-9][0-9_]*)?(?:\.[0-9_]*)?(?:[eE][-+]?[0-9_]+)?\z/.freeze
NEEDS_DECIMAL_FIXUP =

A decimal BigDecimal() would reject as-is: a leading dot (“.5”) or a dot not followed by a digit (“5.”, “5.e3”). Matches iff normalize_for_bigdecimal would change the string — so when it doesn’t match, we skip normalization.

/\A[+-]?\.|\.(?:[eE]|\z)/.freeze
BLANK_HEAD =
/\A[[:space:]]+/.freeze
BLANK_TAIL =
/[[:space:]]+\z/.freeze
DEFAULT_OPTIONS =

All caller-facing settings live in one options hash (smarter_csv style).

{
  acceleration: true, # use the C extension when available
  encoding: nil, # label the input's encoding (no transcoding)
  symbolize_keys: false, # Symbol keys instead of String
  duplicate_key: :last_wins, # :last_wins | :first_wins | :raise
  bigdecimal_load: :auto, # :auto | :float | :bigdecimal (Oj-compatible)
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(input, options = {}) ⇒ Parser

Returns a new instance of Parser.

Raises:

  • (ArgumentError)


144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# File 'lib/smarter_json/parser.rb', line 144

def initialize(input, options = {})
  raise ArgumentError, "input must be a String" unless input.is_a?(String)

  opts = DEFAULT_OPTIONS.merge(options)
  @symbolize_keys  = opts[:symbolize_keys]
  @duplicate_key   = opts[:duplicate_key]
  @bigdecimal_load = opts[:bigdecimal_load]

  encoding = opts[:encoding]
  @input = encoding ? input.dup.force_encoding(encoding) : input
  raise EncodingError, "invalid byte sequence for #{@input.encoding.name}" unless @input.valid_encoding?

  @bytesize = @input.bytesize
  # Skip a UTF-8 BOM (EF BB BF) at the start of input.
  @pos = @input.getbyte(0) == 0xEF && @input.getbyte(1) == 0xBB && @input.getbyte(2) == 0xBF ? 3 : 0
  @line = 1
  @col = 1
end

Instance Method Details

#each_valueObject

Yield each top-level value until EOF (JSONL / NDJSON / concatenated / whitespace-separated). Used by the block form of SmarterJSON.process.



187
188
189
190
191
192
193
194
195
# File 'lib/smarter_json/parser.rb', line 187

def each_value
  loop do
    skip_whitespace_and_comments
    break if eof?

    yield parse_document
  end
  nil
end

#parseObject

No block: auto-detect the document count for free (the same “is there trailing content?” check that used to raise). 0 documents -> nil; 1 document -> the value itself (single-document path, no Array allocated); 2+ documents (NDJSON / JSONL / concatenated / whitespace-separated) -> an Array of every value. Commas do NOT separate documents (only whitespace / newline / concatenation do), so a bracketless comma list still raises in parse_document.



169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
# File 'lib/smarter_json/parser.rb', line 169

def parse
  skip_whitespace_and_comments
  return nil if eof?

  value = parse_document
  skip_whitespace_and_comments
  return value if eof?

  results = [value]
  until eof?
    results << parse_document
    skip_whitespace_and_comments
  end
  results
end