Class: RedQuilt::Inline::LinkScanner
- Inherits:
-
Object
- Object
- RedQuilt::Inline::LinkScanner
- Defined in:
- lib/red_quilt/inline/link_scanner.rb
Overview
Pure byte-level scanner for link / image tails: inline link bodies ‘(dest “title”)`, bracketed reference labels `[label]`, and link destination URI normalization. Operates only on the document source string – no arena, token stream, or parser state – so it can be exercised in isolation. Inline::Builder owns one instance and feeds it absolute byte offsets.
Constant Summary collapse
- NIL_PAIR =
[nil, nil].freeze
- URL_SAFE_BYTE =
Bytes left verbatim by normalize_uri: ASCII alphanumerics plus the URL sub-delims / reserved chars that the spec keeps unencoded. Everything else is percent-encoded.
begin a = Array.new(256, false) (0x30..0x39).each { |b| a[b] = true } # 0-9 (0x41..0x5A).each { |b| a[b] = true } # A-Z (0x61..0x7A).each { |b| a[b] = true } # a-z "-._~:/?#@!$&'()*+,;=".each_byte { |b| a[b] = true } a.freeze end
Instance Method Summary collapse
-
#initialize(source) ⇒ LinkScanner
constructor
A new instance of LinkScanner.
-
#inline_link(start_byte) ⇒ Object
Parses an inline link body ‘(dest “title”)` starting at the byte right after the link’s closing ‘]`.
-
#normalize_uri(raw) ⇒ Object
Percent-encodes bytes not in the URL-safe set, decodes HTML entities first, and preserves (uppercasing) existing ‘%XX`.
-
#reference_label(start_byte) ⇒ Object
Reads a bracketed reference label ‘[label]` starting at start_byte (which must point at the `[`). Returns [label, after_byte] or NIL_PAIR when the label is malformed or over-long..
Constructor Details
#initialize(source) ⇒ LinkScanner
Returns a new instance of LinkScanner.
25 26 27 |
# File 'lib/red_quilt/inline/link_scanner.rb', line 25 def initialize(source) @source = source end |
Instance Method Details
#inline_link(start_byte) ⇒ Object
Parses an inline link body ‘(dest “title”)` starting at the byte right after the link’s closing ‘]`. Returns a hash with `:end_byte`, `:destination`, `:title` on success, or nil if the bytes don’t form a valid inline link tail.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/red_quilt/inline/link_scanner.rb', line 33 def inline_link(start_byte) return nil unless byte_at(start_byte) == 0x28 pos = start_byte + 1 pos = skip_link_whitespace(pos) return nil if pos.nil? raw_dest = nil next_byte = byte_at(pos) if next_byte && next_byte != 0x29 && !link_tail_whitespace_byte?(next_byte) && next_byte != 0x0A dest_result = parse_link_destination(pos) return nil unless dest_result raw_dest, pos = dest_result end ws_end = skip_link_whitespace(pos) return nil if ws_end.nil? raw_title = nil if ws_end > pos opener_byte = byte_at(ws_end) if opener_byte && (opener_byte == 0x22 || opener_byte == 0x27 || opener_byte == 0x28) title_result = parse_link_title(ws_end) return nil unless title_result raw_title, pos = title_result pos = skip_link_whitespace(pos) return nil if pos.nil? else pos = ws_end end else pos = ws_end end return nil unless byte_at(pos) == 0x29 destination = raw_dest ? normalize_uri(raw_dest) : "" title = raw_title ? decode_link_entities(raw_title) : nil { end_byte: pos + 1, destination: destination, title: title } end |
#normalize_uri(raw) ⇒ Object
Percent-encodes bytes not in the URL-safe set, decodes HTML entities first, and preserves (uppercasing) existing ‘%XX`.
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/red_quilt/inline/link_scanner.rb', line 104 def normalize_uri(raw) decoded = decode_link_entities(raw) bytes = decoded.b result = +"" i = 0 size = bytes.bytesize while i < size b = bytes.getbyte(i) if b == 0x25 && i + 2 < size && hex_byte?(bytes.getbyte(i + 1)) && hex_byte?(bytes.getbyte(i + 2)) result << "%" result << bytes.getbyte(i + 1).chr.upcase result << bytes.getbyte(i + 2).chr.upcase i += 3 elsif URL_SAFE_BYTE[b] # All URL-safe bytes are ASCII, so appending the integer # codepoint matches b.chr without allocating a 1-char string. result << b i += 1 else result << format("%%%02X", b) i += 1 end end result end |
#reference_label(start_byte) ⇒ Object
Reads a bracketed reference label ‘[label]` starting at start_byte (which must point at the `[`). Returns [label, after_byte] or NIL_PAIR when the label is malformed or over-long.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/red_quilt/inline/link_scanner.rb', line 79 def reference_label(start_byte) return NIL_PAIR unless @source.getbyte(start_byte) == 0x5B i = start_byte + 1 while i < @source.bytesize b = @source.getbyte(i) if b == 0x5D label = @source.byteslice(start_byte + 1, i - start_byte - 1).to_s return NIL_PAIR if ReferenceDefinition.label_too_long?(label) return [label, i + 1] elsif b == 0x5B # An unescaped `[` inside a reference label voids the form. return NIL_PAIR elsif b == 0x5C && i + 1 < @source.bytesize i += 2 next end i += 1 end NIL_PAIR end |