Class: URIPattern::Compiler

Inherits:
Object
  • Object
show all
Includes:
Canonicalization
Defined in:
lib/uri_pattern/compiler.rb

Constant Summary collapse

SEGMENT_REGEXPS =
{
  pathname: "[^/]+?",
  hostname: "[^.]+?"
}.freeze
DEFAULT_SEGMENT =
"[^#?{}]+?"
DELIMITER_CHARS =
{
  pathname: "/",
  hostname: "."
}.freeze
LITERAL_TOKEN_TYPES =

Token types that carry literal text and are buffered (not turned into a capture). Shared by the top-level and in-group compile loops.

%i[char escaped_char invalid_char].freeze
WILDCARD_PREFIX =
"_w"

Instance Method Summary collapse

Methods included from Canonicalization

#canonicalize_hostname, #canonicalize_ipv6, #canonicalize_port, #encode_run

Constructor Details

#initialize(tokens, component:, ignore_case: false, opaque_path: false, ipv6: false) ⇒ Compiler

Returns a new instance of Compiler.



48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/uri_pattern/compiler.rb', line 48

def initialize(tokens, component:, ignore_case: false, opaque_path: false, ipv6: false)
  @tokens = tokens
  @component = component
  @ignore_case = ignore_case
  @opaque_path = opaque_path
  @ipv6 = ipv6
  @wildcard_index = 0
  @names_order = []
  @wildcard_name_map = {}
  @literal_buf = +""
  @seen_names = {}
end

Instance Method Details

#compileObject



61
62
63
64
65
66
67
68
69
70
# File 'lib/uri_pattern/compiler.rb', line 61

def compile
  regexp_str = translate_v_class_sets(build_regexp_string)
  flags = @ignore_case ? Regexp::IGNORECASE : 0
  begin
    regexp = Regexp.new("\\A#{regexp_str}\\z", flags)
  rescue RegexpError => e
    raise URIPattern::Error, "Invalid pattern: #{e.message}"
  end
  { regexp: regexp, names: @names_order, wildcard_name_map: @wildcard_name_map }
end

#flush_literals(result, before_part: false) ⇒ Object

Accumulate consecutive literal characters; flush_literals canonicalizes the whole run through the component’s encode callback (which may raise) and appends the Regexp-escaped result. This mirrors the spec applying an encoding callback to each fixed-text part of a pattern.



26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/uri_pattern/compiler.rb', line 26

def flush_literals(result, before_part: false)
  return if @literal_buf.empty?
  run = @literal_buf
  @literal_buf = +""
  delim = delimiter_char
  # When this run is immediately followed by a part (name/group/wildcard), a
  # trailing delimiter ("/" for pathname, "." for hostname) is that part's
  # prefix, not part of this fixed run. Canonicalize the run WITHOUT it — so e.g.
  # pathname dot-segments collapse correctly (`/a/../` → run `/a/..` → `/`) — and
  # re-append the delimiter verbatim for pull_delimiter_prefix / the next literal
  # to consume. This keeps the Compiler consistent with PatternString and the
  # spec, which treat the prefix as a separate token.
  if before_part && !delim.empty? && run != delim && run.end_with?(delim)
    result << Regexp.escape(encode_run(run[0...-delim.length]))
    result << Regexp.escape(delim)
  else
    result << Regexp.escape(encode_run(run))
  end
end