Module: URIPattern::Canonicalization
- Included in:
- Compiler, PatternString
- Defined in:
- lib/uri_pattern/canonicalization.rb
Overview
Per-component canonicalization of fixed (literal) text, shared by the regexp Compiler and the pattern-string generator so both apply identical encoding. Including classes must define @component, @opaque_path and @ipv6.
Instance Method Summary collapse
-
#canonicalize_hostname(run) ⇒ Object
WHATWG “canonicalize a hostname”: strip tab/newline/CR, end the host at the first path delimiter (“/”, “\”, “#”, “?”), then run the host parser.
-
#canonicalize_ipv6(run) ⇒ Object
WHATWG “canonicalize an IPv6 hostname”: only “[”, “]”, “:” and ASCII hex digits are permitted; hex letters are lowercased.
-
#canonicalize_port(run) ⇒ Object
WHATWG basic URL parser “port state”: read leading ASCII digits, stop at the first non-digit, fail if the number exceeds 65535, and serialize without leading zeros.
-
#encode_run(run) ⇒ Object
Canonicalize one fixed-text run for the current component.
Instance Method Details
#canonicalize_hostname(run) ⇒ Object
WHATWG “canonicalize a hostname”: strip tab/newline/CR, end the host at the first path delimiter (“/”, “\”, “#”, “?”), then run the host parser. A host that fails to parse (forbidden code points, bad IDN, etc.) raises.
50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/uri_pattern/canonicalization.rb', line 50 def canonicalize_hostname(run) return run if run.empty? value = run.gsub(/[\t\n\r]/, "") return "" if value.empty? if (idx = value.index(/[\/\\#?]/)) value = value[0, idx] end return "" if value.empty? URI::WhatwgParser::HostParser.new.parse(value) rescue => e raise URIPattern::Error, "Invalid hostname #{run.inspect}: #{e.}" end |
#canonicalize_ipv6(run) ⇒ Object
WHATWG “canonicalize an IPv6 hostname”: only “[”, “]”, “:” and ASCII hex digits are permitted; hex letters are lowercased.
65 66 67 68 69 70 71 72 73 74 |
# File 'lib/uri_pattern/canonicalization.rb', line 65 def canonicalize_ipv6(run) run.each_char.map do |c| case c when "[", "]", ":" then c when /[0-9a-fA-F]/ then c.downcase else raise URIPattern::Error, "Invalid IPv6 hostname character #{c.inspect} in #{run.inspect}" end end.join end |
#canonicalize_port(run) ⇒ Object
WHATWG basic URL parser “port state”: read leading ASCII digits, stop at the first non-digit, fail if the number exceeds 65535, and serialize without leading zeros. (Default-port stripping is protocol-dependent and not applied to the pattern string.)
38 39 40 41 42 43 44 45 |
# File 'lib/uri_pattern/canonicalization.rb', line 38 def canonicalize_port(run) return run if run.empty? digits = run[/\A[0-9]*/] raise URIPattern::Error, "Invalid port #{run.inspect}" if digits.empty? number = digits.to_i raise URIPattern::Error, "Invalid port #{run.inspect}" if number > 65_535 number.to_s end |
#encode_run(run) ⇒ Object
Canonicalize one fixed-text run for the current component. The percent-encode components (pathname/query/fragment/username/password) are delegated to the spec’s “dummy URL” canonicalizers in URLParser so the URL parser applies the exact spec encode set and dot-segment handling. Hostname/port keep their dedicated parsers; protocol (and anything else) passes through unchanged.
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/uri_pattern/canonicalization.rb', line 13 def encode_run(run) case @component when :hostname @ipv6 ? canonicalize_ipv6(run) : canonicalize_hostname(run) when :port canonicalize_port(run) when :pathname URIPattern::URLParser.canonicalize_pathname_run(run, opaque_path: @opaque_path) when :query URIPattern::URLParser.canonicalize_search_run(run) when :fragment URIPattern::URLParser.canonicalize_hash_run(run) when :username URIPattern::URLParser.canonicalize_username_run(run) when :password URIPattern::URLParser.canonicalize_password_run(run) else run end end |