Class: SmarterJSON::Parser
- Inherits:
-
Object
- Object
- SmarterJSON::Parser
- Defined in:
- lib/smarter_json/parser.rb
Overview
Hand-rolled FSM single-pass parser. Layer 1: strict JSON (RFC 8259). Layer 2: JSON5 additions — line/block comments, trailing comma,
unquoted ECMAScript identifier keys, single-quoted strings,
hex numbers, leading/trailing decimal points, Infinity/NaN,
explicit + sign, \-line-continuation inside strings.
Layer 3: HJSON-inspired additions — #/comment-marker rule, triple-quoted
strings, quoteless single-line strings, implicit root object,
newline-as-separator, broader unquoted keys, recognized-literals-win.
Layer 4: smarter_json additions — UTF-8 BOM skip, smart/curly quotes,
Python literals (True/False/None) and undefined, underscores in
numeric literals, and encoding validation (SmarterJSON::EncodingError).
Constant Summary collapse
- LBRACE =
0x7B- RBRACE =
0x7D- LBRACKET =
0x5B- RBRACKET =
0x5D- COLON =
0x3A- COMMA =
0x2C- DQUOTE =
0x22- SQUOTE =
0x27- BACKSLASH =
0x5C- SLASH =
0x2F- STAR =
0x2A- HASH =
0x23- MINUS =
0x2D- PLUS =
0x2B- DOT =
0x2E- ZERO =
0x30- NINE =
0x39- LOWER_E =
0x65- UPPER_E =
0x45- LOWER_T =
0x74- LOWER_F =
0x66- LOWER_N =
0x6E- LOWER_U =
0x75- LOWER_X =
0x78- UPPER_X =
0x58- UPPER_I =
0x49- UPPER_N =
0x4E- UPPER_T =
0x54- UPPER_F =
0x46- UNDERSCORE =
0x5F- DOLLAR =
0x24- SPACE =
0x20- TAB =
0x09- LF =
0x0A- CR =
0x0D- NOT_NUMERIC =
Object.new
- HEX_RE =
/\A[-+]?0[xX][0-9a-fA-F_]+\z/.freeze
- DEC_RE =
/\A[-+]?(?:0|[1-9][0-9_]*)?(?:\.[0-9_]*)?(?:[eE][-+]?[0-9_]+)?\z/.freeze
- NEEDS_DECIMAL_FIXUP =
A decimal BigDecimal() would reject as-is: a leading dot (“.5”) or a dot not followed by a digit (“5.”, “5.e3”). Matches iff normalize_for_bigdecimal would change the string — so when it doesn’t match, we skip normalization.
/\A[+-]?\.|\.(?:[eE]|\z)/.freeze
- BLANK_HEAD =
/\A[[:space:]]+/.freeze
- BLANK_TAIL =
/[[:space:]]+\z/.freeze
- DEFAULT_OPTIONS =
All caller-facing settings live in one options hash (smarter_csv style).
{ acceleration: true, # use the C extension when available encoding: nil, # label the input's encoding (no transcoding) symbolize_keys: false, # Symbol keys instead of String duplicate_key: :last_wins, # :last_wins | :first_wins | :raise bigdecimal_load: :auto, # :auto | :float | :bigdecimal (Oj-compatible) }.freeze
Instance Method Summary collapse
-
#each_value ⇒ Object
Yield each top-level value until EOF (JSONL / NDJSON / concatenated / whitespace-separated).
-
#initialize(input, options = {}) ⇒ Parser
constructor
A new instance of Parser.
-
#parse ⇒ Object
No block: auto-detect the document count for free (the same “is there trailing content?” check that used to raise).
Constructor Details
#initialize(input, options = {}) ⇒ Parser
Returns a new instance of Parser.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
# File 'lib/smarter_json/parser.rb', line 144 def initialize(input, = {}) raise ArgumentError, "input must be a String" unless input.is_a?(String) opts = DEFAULT_OPTIONS.merge() @symbolize_keys = opts[:symbolize_keys] @duplicate_key = opts[:duplicate_key] @bigdecimal_load = opts[:bigdecimal_load] encoding = opts[:encoding] @input = encoding ? input.dup.force_encoding(encoding) : input raise EncodingError, "invalid byte sequence for #{@input.encoding.name}" unless @input.valid_encoding? @bytesize = @input.bytesize # Skip a UTF-8 BOM (EF BB BF) at the start of input. @pos = @input.getbyte(0) == 0xEF && @input.getbyte(1) == 0xBB && @input.getbyte(2) == 0xBF ? 3 : 0 @line = 1 @col = 1 end |
Instance Method Details
#each_value ⇒ Object
Yield each top-level value until EOF (JSONL / NDJSON / concatenated / whitespace-separated). Used by the block form of SmarterJSON.process.
187 188 189 190 191 192 193 194 195 |
# File 'lib/smarter_json/parser.rb', line 187 def each_value loop do skip_whitespace_and_comments break if eof? yield parse_document end nil end |
#parse ⇒ Object
No block: auto-detect the document count for free (the same “is there trailing content?” check that used to raise). 0 documents -> nil; 1 document -> the value itself (single-document path, no Array allocated); 2+ documents (NDJSON / JSONL / concatenated / whitespace-separated) -> an Array of every value. Commas do NOT separate documents (only whitespace / newline / concatenation do), so a bracketless comma list still raises in parse_document.
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
# File 'lib/smarter_json/parser.rb', line 169 def parse skip_whitespace_and_comments return nil if eof? value = parse_document skip_whitespace_and_comments return value if eof? results = [value] until eof? results << parse_document skip_whitespace_and_comments end results end |