Module: Yosina::Chars
- Defined in:
- lib/yosina/chars.rb
Overview
Character array building and string conversion utilities
Class Method Summary collapse
-
.as_s(chars) ⇒ String
Convert an array of characters back to a string.
-
.build_char_array(input_str) ⇒ Chars
Build a character array from a string, handling IVS/SVS sequences.
-
.enum(&block) ⇒ Enumerator
Create an enumerator that yields characters from the input.
Instance Method Summary collapse
Class Method Details
.as_s(chars) ⇒ String
Convert an array of characters back to a string
This function filters out sentinel characters (empty strings) that are used internally by the transliteration system.
69 70 71 |
# File 'lib/yosina/chars.rb', line 69 def self.as_s(chars) chars.reject { |char| char.c.empty? }.map(&:c).join end |
.build_char_array(input_str) ⇒ Chars
Build a character array from a string, handling IVS/SVS sequences
This function properly handles Ideographic Variation Sequences (IVS) and Standardized Variation Sequences (SVS) by combining base characters with their variation selectors into single Char objects.
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/yosina/chars.rb', line 15 def self.build_char_array(input_str) result = [] offset = 0 prev_char = nil prev_codepoint = nil input_str.each_char do |char| codepoint = char.ord if prev_char && prev_codepoint # Check if current character is a variation selector # Variation selectors are in ranges: U+FE00-U+FE0F, U+E0100-U+E01EF if (0xFE00..0xFE0F).cover?(codepoint) || (0xE0100..0xE01EF).cover?(codepoint) # Combine previous character with variation selector combined_char = prev_char + char result << Char.new(c: combined_char, offset: offset) offset += combined_char.length prev_char = prev_codepoint = nil next end # Previous character was not followed by a variation selector result << Char.new(c: prev_char, offset: offset) offset += prev_char.length end # Store current character for next iteration prev_char = char prev_codepoint = codepoint end # Handle the last character if any if prev_char result << Char.new(c: prev_char, offset: offset) offset += prev_char.length end # Add sentinel empty character result << Char.new(c: '', offset: offset) class << result include Chars end result end |
.enum(&block) ⇒ Enumerator
Create an enumerator that yields characters from the input
77 78 79 80 81 82 83 |
# File 'lib/yosina/chars.rb', line 77 def self.enum(&block) e = Enumerator.new { |y| block.call(y) } class << e include Chars end e end |
Instance Method Details
#chunk(&block) ⇒ Object
132 133 134 135 136 137 138 139 140 |
# File 'lib/yosina/chars.rb', line 132 def chunk(&block) e = super(&block) e.map do |g, slice| class << slice include Chars end [g, slice] end end |
#group_by(&block) ⇒ Object
142 143 144 145 146 147 148 149 150 |
# File 'lib/yosina/chars.rb', line 142 def group_by(&block) e = super(&block) e.transform_values do |slice| class << slice include Chars end slice end end |