Module: Yosina::Chars

Defined in:
lib/yosina/chars.rb

Overview

Character array building and string conversion utilities

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.as_s(chars) ⇒ String

Convert an array of characters back to a string

This function filters out sentinel characters (empty strings) that are used internally by the transliteration system.

Parameters:

  • chars (Enumerable<Char>)

    An array of Char objects

Returns:

  • (String)

    A string composed of the non-empty characters



69
70
71
# File 'lib/yosina/chars.rb', line 69

def self.as_s(chars)
  chars.reject { |char| char.c.empty? }.map(&:c).join
end

.build_char_array(input_str) ⇒ Chars

Build a character array from a string, handling IVS/SVS sequences

This function properly handles Ideographic Variation Sequences (IVS) and Standardized Variation Sequences (SVS) by combining base characters with their variation selectors into single Char objects.

Parameters:

  • input_str (String)

    The input string to convert to character array

Returns:

  • (Chars)

    A list of Char objects representing the input string, with a sentinel empty character at the end



15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/yosina/chars.rb', line 15

def self.build_char_array(input_str)
  result = []
  offset = 0
  prev_char = nil
  prev_codepoint = nil

  input_str.each_char do |char|
    codepoint = char.ord

    if prev_char && prev_codepoint
      # Check if current character is a variation selector
      # Variation selectors are in ranges: U+FE00-U+FE0F, U+E0100-U+E01EF
      if (0xFE00..0xFE0F).cover?(codepoint) || (0xE0100..0xE01EF).cover?(codepoint)
        # Combine previous character with variation selector
        combined_char = prev_char + char
        result << Char.new(c: combined_char, offset: offset)
        offset += combined_char.length
        prev_char = prev_codepoint = nil
        next
      end

      # Previous character was not followed by a variation selector
      result << Char.new(c: prev_char, offset: offset)
      offset += prev_char.length
    end

    # Store current character for next iteration
    prev_char = char
    prev_codepoint = codepoint
  end

  # Handle the last character if any
  if prev_char
    result << Char.new(c: prev_char, offset: offset)
    offset += prev_char.length
  end

  # Add sentinel empty character
  result << Char.new(c: '', offset: offset)

  class << result
    include Chars
  end

  result
end

.enum(&block) ⇒ Enumerator

Create an enumerator that yields characters from the input

Parameters:

  • &block (Proc)

    A block that yields characters to the enumerator

Returns:

  • (Enumerator)

    An enumerator that yields Char objects



77
78
79
80
81
82
83
# File 'lib/yosina/chars.rb', line 77

def self.enum(&block)
  e = Enumerator.new { |y| block.call(y) }
  class << e
    include Chars
  end
  e
end

Instance Method Details

#chunk(&block) ⇒ Object



132
133
134
135
136
137
138
139
140
# File 'lib/yosina/chars.rb', line 132

def chunk(&block)
  e = super(&block)
  e.map do |g, slice|
    class << slice
      include Chars
    end
    [g, slice]
  end
end

#group_by(&block) ⇒ Object



142
143
144
145
146
147
148
149
150
# File 'lib/yosina/chars.rb', line 142

def group_by(&block)
  e = super(&block)
  e.transform_values do |slice|
    class << slice
      include Chars
    end
    slice
  end
end

#to_sObject



85
86
87
# File 'lib/yosina/chars.rb', line 85

def to_s
  Chars.as_s(self)
end