Class: RedQuilt::Inline::LinkScanner

Inherits:
Object
  • Object
show all
Defined in:
lib/red_quilt/inline/link_scanner.rb

Overview

Pure byte-level scanner for link / image tails: inline link bodies ‘(dest “title”)`, bracketed reference labels `[label]`, and link destination URI normalization. Operates only on the document source string – no arena, token stream, or parser state – so it can be exercised in isolation. Inline::Builder owns one instance and feeds it absolute byte offsets.

Constant Summary collapse

NIL_PAIR =
[nil, nil].freeze
URL_SAFE_BYTE =

Bytes left verbatim by normalize_uri: ASCII alphanumerics plus the URL sub-delims / reserved chars that the spec keeps unencoded. Everything else is percent-encoded.

begin
  a = Array.new(256, false)
  (0x30..0x39).each { |b| a[b] = true } # 0-9
  (0x41..0x5A).each { |b| a[b] = true } # A-Z
  (0x61..0x7A).each { |b| a[b] = true } # a-z
  "-._~:/?#@!$&'()*+,;=".each_byte { |b| a[b] = true }
  a.freeze
end

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ LinkScanner

Returns a new instance of LinkScanner.



25
26
27
# File 'lib/red_quilt/inline/link_scanner.rb', line 25

def initialize(source)
  @source = source
end

Instance Method Details

Parses an inline link body ‘(dest “title”)` starting at the byte right after the link’s closing ‘]`. Returns a hash with `:end_byte`, `:destination`, `:title` on success, or nil if the bytes don’t form a valid inline link tail.



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/red_quilt/inline/link_scanner.rb', line 33

def inline_link(start_byte)
  return nil unless byte_at(start_byte) == 0x28

  pos = start_byte + 1
  pos = skip_link_whitespace(pos)
  return nil if pos.nil?

  raw_dest = nil
  next_byte = byte_at(pos)
  if next_byte && next_byte != 0x29 && !link_tail_whitespace_byte?(next_byte) && next_byte != 0x0A
    dest_result = parse_link_destination(pos)
    return nil unless dest_result

    raw_dest, pos = dest_result
  end

  ws_end = skip_link_whitespace(pos)
  return nil if ws_end.nil?

  raw_title = nil
  if ws_end > pos
    opener_byte = byte_at(ws_end)
    if opener_byte && (opener_byte == 0x22 || opener_byte == 0x27 || opener_byte == 0x28)
      title_result = parse_link_title(ws_end)
      return nil unless title_result

      raw_title, pos = title_result
      pos = skip_link_whitespace(pos)
      return nil if pos.nil?
    else
      pos = ws_end
    end
  else
    pos = ws_end
  end

  return nil unless byte_at(pos) == 0x29

  destination = raw_dest ? normalize_uri(raw_dest) : ""
  title = raw_title ? decode_link_entities(raw_title) : nil
  { end_byte: pos + 1, destination: destination, title: title }
end

#normalize_uri(raw) ⇒ Object

Percent-encodes bytes not in the URL-safe set, decodes HTML entities first, and preserves (uppercasing) existing ‘%XX`.



104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/red_quilt/inline/link_scanner.rb', line 104

def normalize_uri(raw)
  decoded = decode_link_entities(raw)
  bytes = decoded.b
  result = +""
  i = 0
  size = bytes.bytesize
  while i < size
    b = bytes.getbyte(i)
    if b == 0x25 && i + 2 < size &&
       hex_byte?(bytes.getbyte(i + 1)) && hex_byte?(bytes.getbyte(i + 2))
      result << "%"
      result << bytes.getbyte(i + 1).chr.upcase
      result << bytes.getbyte(i + 2).chr.upcase
      i += 3
    elsif URL_SAFE_BYTE[b]
      # All URL-safe bytes are ASCII, so appending the integer
      # codepoint matches b.chr without allocating a 1-char string.
      result << b
      i += 1
    else
      result << format("%%%02X", b)
      i += 1
    end
  end
  result
end

#reference_label(start_byte) ⇒ Object

Reads a bracketed reference label ‘[label]` starting at start_byte (which must point at the `[`). Returns [label, after_byte] or NIL_PAIR when the label is malformed or over-long.



79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/red_quilt/inline/link_scanner.rb', line 79

def reference_label(start_byte)
  return NIL_PAIR unless @source.getbyte(start_byte) == 0x5B

  i = start_byte + 1
  while i < @source.bytesize
    b = @source.getbyte(i)
    if b == 0x5D
      label = @source.byteslice(start_byte + 1, i - start_byte - 1).to_s
      return NIL_PAIR if ReferenceDefinition.label_too_long?(label)

      return [label, i + 1]
    elsif b == 0x5B
      # An unescaped `[` inside a reference label voids the form.
      return NIL_PAIR
    elsif b == 0x5C && i + 1 < @source.bytesize
      i += 2
      next
    end
    i += 1
  end
  NIL_PAIR
end