Module: Dommy::Internal::Punycode
- Defined in:
- lib/dommy/internal/punycode.rb
Overview
RFC 3492 Punycode encoder / decoder. Used by ‘Internal::IDNA` to turn IDN labels (e.g. `日本`) into ASCII (`wgv71a`) before they reach Ruby’s ‘URI` parser, which rejects non-ASCII hosts.
‘encode` / `decode` produce / consume the bare Punycode form (no `xn–` prefix). The IDNA layer is responsible for adding / stripping the prefix.
Defined Under Namespace
Classes: Error
Constant Summary collapse
- BASE =
36- TMIN =
1- TMAX =
26- SKEW =
38- DAMP =
700- INITIAL_BIAS =
72- INITIAL_N =
0x80- DELIMITER =
"-"
Class Method Summary collapse
-
.adapt(delta, numpoints, firsttime) ⇒ Object
RFC 3492 §6.1.
- .char_to_digit(ch) ⇒ Object
-
.decode(input) ⇒ Object
Decode a bare Punycode string back to Unicode.
-
.digit_to_char(d) ⇒ Object
Punycode digits: 0..25 → ‘a’..‘z’; 26..35 → ‘0’..‘9’.
-
.encode(input) ⇒ Object
Encode a Unicode label into bare Punycode (no ‘xn–` prefix).
- .threshold(k, bias) ⇒ Object
Class Method Details
.adapt(delta, numpoints, firsttime) ⇒ Object
RFC 3492 §6.1
155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/dommy/internal/punycode.rb', line 155 def self.adapt(delta, numpoints, firsttime) delta = firsttime ? delta / DAMP : delta / 2 delta += delta / numpoints k = 0 while delta > ((BASE - TMIN) * TMAX) / 2 delta /= (BASE - TMIN) k += BASE end k + (((BASE - TMIN + 1) * delta) / (delta + SKEW)) end |
.char_to_digit(ch) ⇒ Object
187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/dommy/internal/punycode.rb', line 187 def self.char_to_digit(ch) cp = ch.ord case cp when ("a".ord)..("z".ord) cp - "a".ord when ("A".ord)..("Z".ord) cp - "A".ord when ("0".ord)..("9".ord) cp - "0".ord + 26 else raise Error, "invalid punycode digit: #{ch.inspect}" end end |
.decode(input) ⇒ Object
Decode a bare Punycode string back to Unicode. Inverse of ‘encode`. Raises `Error` on malformed input.
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/dommy/internal/punycode.rb', line 91 def self.decode(input) str = input.to_s output = [] # The last delimiter splits basic code points from the # extended portion. If there is no delimiter, the whole input # is the extended portion. idx = str.rindex(DELIMITER) if idx str[0...idx].each_char do |ch| cp = ch.ord raise Error, "non-basic code point in basic section" if cp >= INITIAL_N output << cp end extended = str[(idx + 1)..] else extended = str end n = INITIAL_N i = 0 bias = INITIAL_BIAS pos = 0 ext_chars = extended.each_char.to_a while pos < ext_chars.length oldi = i w = 1 k = BASE loop do raise Error, "truncated punycode" if pos >= ext_chars.length digit = char_to_digit(ext_chars[pos]) pos += 1 raise Error, "punycode overflow" if digit > (((2 ** 31) - 1 - i) / w) i += digit * w t = threshold(k, bias) break if digit < t raise Error, "punycode overflow" if w > (((2 ** 31) - 1) / (BASE - t)) w *= (BASE - t) k += BASE end bias = adapt(i - oldi, output.length + 1, oldi.zero?) n += i / (output.length + 1) raise Error, "punycode overflow" if n > ((2 ** 31) - 1) i %= (output.length + 1) output.insert(i, n) i += 1 end output.pack("U*") end |
.digit_to_char(d) ⇒ Object
Punycode digits: 0..25 → ‘a’..‘z’; 26..35 → ‘0’..‘9’.
179 180 181 182 183 184 185 |
# File 'lib/dommy/internal/punycode.rb', line 179 def self.digit_to_char(d) if d < 26 ("a".ord + d).chr else ("0".ord + d - 26).chr end end |
.encode(input) ⇒ Object
Encode a Unicode label into bare Punycode (no ‘xn–` prefix). Returns the input unchanged if it contains only ASCII —callers can detect “pure ASCII pass-through” via the absence of any extended-code-point handling.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/dommy/internal/punycode.rb', line 31 def self.encode(input) codepoints = input.to_s.unpack("U*") output = +"" # Step 1: copy basic (ASCII < 0x80) code points to output. basic = codepoints.select { |c| c < INITIAL_N } output << basic.pack("U*") h = b = basic.length # RFC 3492 §6.3: append a delimiter whenever there are basic # code points, even if no extended encoding follows. The # decoder relies on the delimiter to know where the basic # section ends. output << DELIMITER if b.positive? n = INITIAL_N delta = 0 bias = INITIAL_BIAS while h < codepoints.length # Find the minimum code point >= n in the input. m = codepoints.select { |c| c >= n }.min raise Error, "punycode overflow" if (m - n) > (((2 ** 31) - 1 - delta) / (h + 1)) delta += (m - n) * (h + 1) n = m codepoints.each do |c| if c < n delta += 1 raise Error, "punycode overflow" if delta > ((2 ** 31) - 1) elsif c == n q = delta k = BASE loop do t = threshold(k, bias) break if q < t digit = t + ((q - t) % (BASE - t)) output << digit_to_char(digit) q = (q - t) / (BASE - t) k += BASE end output << digit_to_char(q) bias = adapt(delta, h + 1, h == b) delta = 0 h += 1 end end delta += 1 n += 1 end output end |