Class: Unisec::Normalization
- Inherits:
-
Object
- Object
- Unisec::Normalization
- Defined in:
- lib/unisec/normalization.rb
Overview
Normalization Forms
Constant Summary collapse
- HTML_ESCAPE_BYPASS =
HTML escapable characters mapped with their Unicode counterparts that will cast to themself after applying normalization forms using compatibility mode.
{ '<' => ['﹤', '<'], '>' => ['﹥', '>'], '"' => ['"'], "'" => ['''], '&' => ['﹠', '&'] }.freeze
Instance Attribute Summary collapse
-
#nfc ⇒ String
readonly
Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition.
-
#nfd ⇒ String
readonly
Normalization Form D (NFD) - Canonical Decomposition.
-
#nfkc ⇒ String
readonly
Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition.
-
#nfkd ⇒ String
readonly
Normalization Form KD (NFKD) - Compatibility Decomposition.
-
#original ⇒ String
readonly
Original input.
Class Method Summary collapse
-
.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ String
Display a CLI-friendly output reverse normalization results.
-
.nfc(str) ⇒ String
Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition.
-
.nfd(str) ⇒ String
Normalization Form D (NFD) - Canonical Decomposition.
-
.nfkc(str) ⇒ String
Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition.
-
.nfkd(str) ⇒ String
Normalization Form KD (NFKD) - Compatibility Decomposition.
-
.replace_bypass(str) ⇒ String
Replace HTML escapable characters with their Unicode counterparts that will cast to themself after applying normalization forms using compatibility mode.
-
.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ Hash
Find the list of symbols that will transform into a given symbol after normalization.
Instance Method Summary collapse
-
#display ⇒ String
Display a CLI-friendly output summurizing all normalization forms.
-
#display_replace ⇒ String
Display a CLI-friendly output of the XSS payload to bypass HTML escape and what it does once normalized in NFKC & NFKD.
-
#initialize(str) ⇒ nil
constructor
Generate all normilzation forms for a given input.
-
#replace_bypass ⇒ Object
Instance version of Normalization.replace_bypass.
Constructor Details
#initialize(str) ⇒ nil
Generate all normilzation forms for a given input
43 44 45 46 47 48 49 |
# File 'lib/unisec/normalization.rb', line 43 def initialize(str) @original = str @nfc = Normalization.nfc(str) @nfkc = Normalization.nfkc(str) @nfd = Normalization.nfd(str) @nfkd = Normalization.nfkd(str) end |
Instance Attribute Details
#nfc ⇒ String (readonly)
Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition
26 27 28 |
# File 'lib/unisec/normalization.rb', line 26 def nfc @nfc end |
#nfd ⇒ String (readonly)
Normalization Form D (NFD) - Canonical Decomposition
34 35 36 |
# File 'lib/unisec/normalization.rb', line 34 def nfd @nfd end |
#nfkc ⇒ String (readonly)
Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition
30 31 32 |
# File 'lib/unisec/normalization.rb', line 30 def nfkc @nfkc end |
#nfkd ⇒ String (readonly)
Normalization Form KD (NFKD) - Compatibility Decomposition
38 39 40 |
# File 'lib/unisec/normalization.rb', line 38 def nfkd @nfkd end |
#original ⇒ String (readonly)
Original input
22 23 24 |
# File 'lib/unisec/normalization.rb', line 22 def original @original end |
Class Method Details
.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ String
Display a CLI-friendly output reverse normalization results
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
# File 'lib/unisec/normalization.rb', line 195 def self.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd]) # rubocop:disable Metrics/AbcSize colorize_form = ->(form_title) { Paint[form_title, :underline, :bold] } colorize_char = ->(char) { " #{char} (#{Paint[Unisec::Utils::String.chars2codepoints(char), :red]})\n" } out = "#{colorize_form.call('Original')}:\n#{colorize_char.call(target)}" res = Unisec::Normalization.reverse_normalize(target, forms: forms) # => {nfc: [], nfd: [], nfkc: ["﹤", "<"], nfkd: ["﹤", "<"]} res.each_key do |k| next if res[k].empty? out += "#{colorize_form.call(k.to_s.upcase)}\n" res[k].each do |v| out += colorize_char.call(v) end end out end |
.nfc(str) ⇒ String
Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition
54 55 56 |
# File 'lib/unisec/normalization.rb', line 54 def self.nfc(str) str.unicode_normalize(:nfc) end |
.nfd(str) ⇒ String
Normalization Form D (NFD) - Canonical Decomposition
68 69 70 |
# File 'lib/unisec/normalization.rb', line 68 def self.nfd(str) str.unicode_normalize(:nfd) end |
.nfkc(str) ⇒ String
Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition
61 62 63 |
# File 'lib/unisec/normalization.rb', line 61 def self.nfkc(str) str.unicode_normalize(:nfkc) end |
.nfkd(str) ⇒ String
Normalization Form KD (NFKD) - Compatibility Decomposition
75 76 77 |
# File 'lib/unisec/normalization.rb', line 75 def self.nfkd(str) str.unicode_normalize(:nfkd) end |
.replace_bypass(str) ⇒ String
Replace HTML escapable characters with their Unicode counterparts that will cast to themself after applying normalization forms using compatibility mode. Usefull for XSS, to bypass HTML escape. If several values are possible, one is picked randomly.
85 86 87 88 89 90 91 |
# File 'lib/unisec/normalization.rb', line 85 def self.replace_bypass(str) str = str.dup HTML_ESCAPE_BYPASS.each do |k, v| str.gsub!(k, v.sample) end str end |
.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ Hash
Find the list of symbols that will transform into a given symbol after normalization
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/unisec/normalization.rb', line 108 def self.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd]) forms = Utils::Arguments.to_array_of_sym(forms) result = {} forms.each do |form| result[form] = [] end (0x000000..0x10FFFF).each do |codepoint| char = codepoint.chr(Encoding::UTF_8) forms.each do |form| result[form] << char if (char.unicode_normalize(form) == target) && (char != target) end rescue RangeError # skip UTF-16 surrogates and potential other invalid code points next end result end |
Instance Method Details
#display ⇒ String
Display a CLI-friendly output summurizing all normalization forms
142 143 144 145 146 147 148 149 150 151 152 |
# File 'lib/unisec/normalization.rb', line 142 def display colorize = lambda { |form_title, form_attr| "#{Paint[form_title.to_s, :underline, :bold]}: #{form_attr}\n #{Paint[Unisec::Utils::String.chars2codepoints(form_attr), :red]}\n" } colorize.call('Original', @original) + colorize.call('NFC', @nfc) + colorize.call('NFKC', @nfkc) + colorize.call('NFD', @nfd) + colorize.call('NFKD', @nfkd) end |
#display_replace ⇒ String
Display a CLI-friendly output of the XSS payload to bypass HTML escape and what it does once normalized in NFKC & NFKD.
168 169 170 171 172 173 174 175 176 177 178 |
# File 'lib/unisec/normalization.rb', line 168 def display_replace colorize = lambda { |form_title, form_attr| "#{Paint[form_title.to_s, :underline, :bold]}: #{form_attr}\n #{Paint[Unisec::Utils::String.chars2codepoints(form_attr), :red]}\n" } payload = replace_bypass colorize.call('Original', @original) + colorize.call('Bypass payload', payload) + colorize.call('NFKC', Normalization.nfkc(payload)) + colorize.call('NFKD', Normalization.nfkd(payload)) end |
#replace_bypass ⇒ Object
Instance version of replace_bypass.
94 95 96 |
# File 'lib/unisec/normalization.rb', line 94 def replace_bypass Normalization.replace_bypass(@original) end |