Class: Unisec::Normalization

Inherits:

Object

Object
Unisec::Normalization

show all

Defined in:: lib/unisec/normalization.rb

Overview

Normalization Forms

Constant Summary collapse

HTML_ESCAPE_BYPASS = HTML escapable characters mapped with their Unicode counterparts that will cast to themself after applying normalization forms using compatibility mode.

{
  '<' => ['﹤', '＜'],
  '>' => ['﹥', '＞'],
  '"' => ['＂'],
  "'" => ['＇'],
  '&' => ['﹠', '＆']
}.freeze

Instance Attribute Summary collapse

#nfc ⇒ String readonly
Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition.
#nfd ⇒ String readonly
Normalization Form D (NFD) - Canonical Decomposition.
#nfkc ⇒ String readonly
Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition.
#nfkd ⇒ String readonly
Normalization Form KD (NFKD) - Compatibility Decomposition.
#original ⇒ String readonly
Original input.

Class Method Summary collapse

.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ String
Display a CLI-friendly output reverse normalization results.
.nfc(str) ⇒ String
Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition.
.nfd(str) ⇒ String
Normalization Form D (NFD) - Canonical Decomposition.
.nfkc(str) ⇒ String
Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition.
.nfkd(str) ⇒ String
Normalization Form KD (NFKD) - Compatibility Decomposition.
.replace_bypass(str) ⇒ String
Replace HTML escapable characters with their Unicode counterparts that will cast to themself after applying normalization forms using compatibility mode.
.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ Hash
Find the list of symbols that will transform into a given symbol after normalization.

Instance Method Summary collapse

#display ⇒ String
Display a CLI-friendly output summurizing all normalization forms.
#display_replace ⇒ String
Display a CLI-friendly output of the XSS payload to bypass HTML escape and what it does once normalized in NFKC & NFKD.
#initialize(str) ⇒ nil constructor
Generate all normilzation forms for a given input.
#replace_bypass ⇒ Object
Instance version of Normalization.replace_bypass.

Constructor Details

#initialize(str) ⇒ `nil`

Generate all normilzation forms for a given input

Parameters:

str (String) —
the target string

# File 'lib/unisec/normalization.rb', line 43

def initialize(str)
  @original = str
  @nfc = Normalization.nfc(str)
  @nfkc = Normalization.nfkc(str)
  @nfd = Normalization.nfd(str)
  @nfkd = Normalization.nfkd(str)
end

Instance Attribute Details

#nfc ⇒ `String` (readonly)

Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition

Returns:

(String) —
input normalized with NFC



26
27
28

# File 'lib/unisec/normalization.rb', line 26

def nfc
  @nfc
end

#nfd ⇒ `String` (readonly)

Normalization Form D (NFD) - Canonical Decomposition

Returns:

(String) —
input normalized with NFD



34
35
36

# File 'lib/unisec/normalization.rb', line 34

def nfd
  @nfd
end

#nfkc ⇒ `String` (readonly)

Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition

Returns:

(String) —
input normalized with NFKC



30
31
32

# File 'lib/unisec/normalization.rb', line 30

def nfkc
  @nfkc
end

#nfkd ⇒ `String` (readonly)

Normalization Form KD (NFKD) - Compatibility Decomposition

Returns:

(String) —
input normalized with NFKD



38
39
40

# File 'lib/unisec/normalization.rb', line 38

def nfkd
  @nfkd
end

#original ⇒ `String` (readonly)

Original input

Returns:

(String) —
untouched input



22
23
24

# File 'lib/unisec/normalization.rb', line 22

def original
  @original
end

Class Method Details

.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ `String`

Display a CLI-friendly output reverse normalization results

Examples:

puts Unisec::Normalization.display_reverse_normalize('<')
# =>
# Original:
#   < (U+003C)
# NFKC
#   ﹤ (U+FE64)
#   ＜ (U+FF1C)
# NFKD
#   ﹤ (U+FE64)
#   ＜ (U+FF1C)

Parameters:

target (String) —
see reverse_normalize
forms (String|Symbol|Array<Symbol>) (defaults to: %i[nfc nfd nfkc nfkd])) —
see reverse_normalize

Returns:

(String) —
CLI-ready output

# File 'lib/unisec/normalization.rb', line 195

def self.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd]) # rubocop:disable Metrics/AbcSize
  colorize_form = ->(form_title) { Paint[form_title, :underline, :bold] }
  colorize_char = ->(char) { "  #{char} (#{Paint[Unisec::Utils::String.chars2codepoints(char), :red]})\n" }
  out = "#{colorize_form.call('Original')}:\n#{colorize_char.call(target)}"
  res = Unisec::Normalization.reverse_normalize(target, forms: forms) # => {nfc: [], nfd: [], nfkc: ["﹤", "＜"], nfkd: ["﹤", "＜"]}
  res.each_key do |k|
    next if res[k].empty?

    out += "#{colorize_form.call(k.to_s.upcase)}\n"
    res[k].each do |v|
      out += colorize_char.call(v)
    end
  end
  out
end

.nfc(str) ⇒ `String`

Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition

Parameters:

str (String) —
the target string

Returns:

(String) —
input normalized with NFC



54
55
56

# File 'lib/unisec/normalization.rb', line 54

def self.nfc(str)
  str.unicode_normalize(:nfc)
end

.nfd(str) ⇒ `String`

Normalization Form D (NFD) - Canonical Decomposition

Parameters:

str (String) —
the target string

Returns:

(String) —
input normalized with NFD



68
69
70

# File 'lib/unisec/normalization.rb', line 68

def self.nfd(str)
  str.unicode_normalize(:nfd)
end

.nfkc(str) ⇒ `String`

Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition

Parameters:

str (String) —
the target string

Returns:

(String) —
input normalized with NFKC



61
62
63

# File 'lib/unisec/normalization.rb', line 61

def self.nfkc(str)
  str.unicode_normalize(:nfkc)
end

.nfkd(str) ⇒ `String`

Normalization Form KD (NFKD) - Compatibility Decomposition

Parameters:

str (String) —
the target string

Returns:

(String) —
input normalized with NFKD



75
76
77

# File 'lib/unisec/normalization.rb', line 75

def self.nfkd(str)
  str.unicode_normalize(:nfkd)
end

.replace_bypass(str) ⇒ `String`

Replace HTML escapable characters with their Unicode counterparts that will cast to themself after applying normalization forms using compatibility mode. Usefull for XSS, to bypass HTML escape. If several values are possible, one is picked randomly.

Parameters:

str (String) —
the target string

Returns:

(String) —
escaped input

# File 'lib/unisec/normalization.rb', line 85

def self.replace_bypass(str)
  str = str.dup
  HTML_ESCAPE_BYPASS.each do |k, v|
    str.gsub!(k, v.sample)
  end
  str
end

.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ `Hash`

Find the list of symbols that will transform into a given symbol after normalization

Examples:

Unisec::Normalization.reverse_normalize('<') # => {nfc: [], nfd: [], nfkc: ["﹤", "＜"], nfkd: ["﹤", "＜"]}
Unisec::Normalization.reverse_normalize('.', forms: [:nfkc, :nfkd]) # => {nfkc: ["․", "﹒", "．"], nfkd: ["․", "﹒", "．"]}
Unisec::Normalization.reverse_normalize('ffi', forms: :nfkc) # => {nfkc: ["ﬃ"]}
Unisec::Normalization.reverse_normalize('≯', forms: 'nfd') # => {nfd: ["≯"]}
Unisec::Normalization.reverse_normalize('ô', forms: 'nfc,nfd') # => {nfc: [], nfd: []}

Parameters:

target (String)
forms (String|Symbol|Array<Symbol>) (defaults to: %i[nfc nfd nfkc nfkd]))

Returns:

(Hash) —
(results won't include input)

# File 'lib/unisec/normalization.rb', line 108

def self.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])
  forms = Utils::Arguments.to_array_of_sym(forms)
  result = {}
  forms.each do |form|
    result[form] = []
  end

  (0x000000..0x10FFFF).each do |codepoint|
    char = codepoint.chr(Encoding::UTF_8)
    forms.each do |form|
      result[form] << char if (char.unicode_normalize(form) == target) && (char != target)
    end
  rescue RangeError # skip UTF-16 surrogates and potential other invalid code points
    next
  end

  result
end

Instance Method Details

#display ⇒ `String`

Display a CLI-friendly output summurizing all normalization forms

Examples:

puts Unisec::Normalization.new("\u{1E9B 0323}").display
# =>
# Original: ẛ̣
#   U+1E9B U+0323
# NFC: ẛ̣
#   U+1E9B U+0323
# NFKC: ṩ
#   U+1E69
# NFD: ẛ̣
#   U+017F U+0323 U+0307
# NFKD: ṩ
#   U+0073 U+0323 U+0307

Returns:

(String) —
CLI-ready output

# File 'lib/unisec/normalization.rb', line 142

def display
  colorize = lambda { |form_title, form_attr|
    "#{Paint[form_title.to_s, :underline,
             :bold]}: #{form_attr}\n  #{Paint[Unisec::Utils::String.chars2codepoints(form_attr), :red]}\n"
  }
  colorize.call('Original', @original) +
    colorize.call('NFC', @nfc) +
    colorize.call('NFKC', @nfkc) +
    colorize.call('NFD', @nfd) +
    colorize.call('NFKD', @nfkd)
end

#display_replace ⇒ `String`

Display a CLI-friendly output of the XSS payload to bypass HTML escape and what it does once normalized in NFKC & NFKD.

Examples:

$ puts Unisec::Normalization.new('<script>').display_replace
# =>
# Original: <script>
#   U+003C U+0073 U+0063 U+0072 U+0069 U+0070 U+0074 U+003E
# Bypass payload: ＜script＞
#   U+FF1C U+0073 U+0063 U+0072 U+0069 U+0070 U+0074 U+FF1E
# NFKC: <script>
#   U+003C U+0073 U+0063 U+0072 U+0069 U+0070 U+0074 U+003E
# NFKD: <script>
#   U+003C U+0073 U+0063 U+0072 U+0069 U+0070 U+0074 U+003E

Returns:

(String) —
CLI-ready output

# File 'lib/unisec/normalization.rb', line 168

def display_replace
  colorize = lambda { |form_title, form_attr|
    "#{Paint[form_title.to_s, :underline,
             :bold]}: #{form_attr}\n  #{Paint[Unisec::Utils::String.chars2codepoints(form_attr), :red]}\n"
  }
  payload = replace_bypass
  colorize.call('Original', @original) +
    colorize.call('Bypass payload', payload) +
    colorize.call('NFKC', Normalization.nfkc(payload)) +
    colorize.call('NFKD', Normalization.nfkd(payload))
end

#replace_bypass ⇒ `Object`

Instance version of replace_bypass.



94
95
96

# File 'lib/unisec/normalization.rb', line 94

def replace_bypass
  Normalization.replace_bypass(@original)
end

Class: Unisec::Normalization

Overview

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str) ⇒ nil

Instance Attribute Details

#nfc ⇒ String (readonly)

#nfd ⇒ String (readonly)

#nfkc ⇒ String (readonly)

#nfkd ⇒ String (readonly)

#original ⇒ String (readonly)

Class Method Details

.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ String

Examples:

.nfc(str) ⇒ String

.nfd(str) ⇒ String

.nfkc(str) ⇒ String

.nfkd(str) ⇒ String

.replace_bypass(str) ⇒ String

.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ Hash

Examples:

Instance Method Details

#display ⇒ String

Examples:

#display_replace ⇒ String

Examples:

#replace_bypass ⇒ Object

#initialize(str) ⇒ `nil`

#nfc ⇒ `String` (readonly)

#nfd ⇒ `String` (readonly)

#nfkc ⇒ `String` (readonly)

#nfkd ⇒ `String` (readonly)

#original ⇒ `String` (readonly)

.display_reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ `String`

.nfc(str) ⇒ `String`

.nfd(str) ⇒ `String`

.nfkc(str) ⇒ `String`

.nfkd(str) ⇒ `String`

.replace_bypass(str) ⇒ `String`

.reverse_normalize(target, forms: %i[nfc nfd nfkc nfkd])) ⇒ `Hash`

#display ⇒ `String`

#display_replace ⇒ `String`

#replace_bypass ⇒ `Object`