Module: ActiveSupport::Multibyte::Unicode
Constant Summary collapse
- UNICODE_VERSION =
The Unicode version that is supported by the implementation
RbConfig::CONFIG["UNICODE_VERSION"]
Instance Method Summary collapse
-
#compose(codepoints) ⇒ Object
Compose decomposed characters to the composed form.
-
#decompose(type, codepoints) ⇒ Object
Decompose composed characters to the decomposed form.
- #default_normalization_form ⇒ Object
- #default_normalization_form=(_) ⇒ Object
-
#tidy_bytes(string, force = false) ⇒ Object
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
Instance Method Details
#compose(codepoints) ⇒ Object
Compose decomposed characters to the composed form.
33 34 35 |
# File 'lib/active_support/multibyte/unicode.rb', line 33 def compose(codepoints) codepoints.pack("U*").unicode_normalize(:nfc).codepoints end |
#decompose(type, codepoints) ⇒ Object
Decompose composed characters to the decomposed form.
24 25 26 27 28 29 30 |
# File 'lib/active_support/multibyte/unicode.rb', line 24 def decompose(type, codepoints) if type == :compatibility codepoints.pack("U*").unicode_normalize(:nfkd).codepoints else codepoints.pack("U*").unicode_normalize(:nfd).codepoints end end |
#default_normalization_form ⇒ Object
11 12 13 14 15 |
# File 'lib/active_support/multibyte/unicode.rb', line 11 def default_normalization_form ActiveSupport::Deprecation.warn( "ActiveSupport::Multibyte::Unicode.default_normalization_form is deprecated and will be removed in Rails 7.0." ) end |
#default_normalization_form=(_) ⇒ Object
17 18 19 20 21 |
# File 'lib/active_support/multibyte/unicode.rb', line 17 def default_normalization_form=(_) ActiveSupport::Deprecation.warn( "ActiveSupport::Multibyte::Unicode.default_normalization_form= is deprecated and will be removed in Rails 7.0." ) end |
#tidy_bytes(string, force = false) ⇒ Object
Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
Passing true
will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.
44 45 46 47 48 |
# File 'lib/active_support/multibyte/unicode.rb', line 44 def tidy_bytes(string, force = false) return string if string.empty? || string.ascii_only? return recode_windows1252_chars(string) if force string.scrub { |bad| recode_windows1252_chars(bad) } end |