Class: Yosina::Transliterators::IvsSvsBase::ReverseTransliterator

Inherits:
Object
  • Object
show all
Defined in:
lib/yosina/transliterators/ivs_svs_base.rb

Overview

Reverse transliterator to remove IVS/SVS selectors and get base characters

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(variants_to_base, charset, drop_selectors_altogether) ⇒ ReverseTransliterator

Initialize the reverse transliterator with options

Parameters:

  • variants_to_base (Hash)

    Mapping of IVS/SVS characters to their base forms

  • charset (String)

    The charset to use for base mappings (“unijis_90” or “unijis_2004”)

  • drop_selectors_altogether (Boolean)

    Whether to drop all selectors



63
64
65
66
67
# File 'lib/yosina/transliterators/ivs_svs_base.rb', line 63

def initialize(variants_to_base, charset, drop_selectors_altogether)
  @variants_to_base = variants_to_base
  @charset = charset
  @drop_selectors_altogether = drop_selectors_altogether
end

Instance Attribute Details

#charsetObject (readonly)

Returns the value of attribute charset.



56
57
58
# File 'lib/yosina/transliterators/ivs_svs_base.rb', line 56

def charset
  @charset
end

#drop_selectors_altogetherObject (readonly)

Returns the value of attribute drop_selectors_altogether.



56
57
58
# File 'lib/yosina/transliterators/ivs_svs_base.rb', line 56

def drop_selectors_altogether
  @drop_selectors_altogether
end

#variants_to_baseObject (readonly)

Returns the value of attribute variants_to_base.



56
57
58
# File 'lib/yosina/transliterators/ivs_svs_base.rb', line 56

def variants_to_base
  @variants_to_base
end

Instance Method Details

#call(input_chars) ⇒ Enumerable<Char>

Remove IVS/SVS selectors to get base characters

Parameters:

  • input_chars (Enumerable<Char>)

    The characters to transliterate

Returns:

  • (Enumerable<Char>)

    The transliterated characters



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/yosina/transliterators/ivs_svs_base.rb', line 73

def call(input_chars)
  offset = 0

  Chars.enum do |y|
    input_chars.each do |char|
      replacement = nil

      # Try to remove IVS/SVS selectors
      record = @variants_to_base[char.c]
      if record
        if @charset == 'unijis_2004' && record.base2004
          replacement = record.base2004
        elsif @charset == 'unijis_90' && record.base90
          replacement = record.base90
        end
      end

      # If no replacement found and drop_selectors_altogether is true,
      # try to remove variation selectors manually
      if !replacement && @drop_selectors_altogether && char.c.length > 1
        second_char = char.c[1]
        second_char_ord = second_char.ord
        # Check for variation selectors: U+FE00-U+FE0F or U+E0100-U+E01EF
        if (second_char_ord >= 0xFE00 && second_char_ord <= 0xFE0F) ||
           (second_char_ord >= 0xE0100 && second_char_ord <= 0xE01EF)
          replacement = char.c[0]
        end
      end

      if replacement
        y << Char.new(c: replacement, offset: offset, source: char)
        offset += replacement.length
      else
        y << char.with_offset(offset)
        offset += char.c.length
      end
    end
  end
end