Class: Ucode::Models::UnihanEntry

Inherits:
Lutaml::Model::Serializable
  • Object
show all
Defined in:
lib/ucode/models/unihan_entry.rb

Overview

Unihan dictionary data for CJK codepoints, grouped into the 8 categories defined by the Unihan standard. Each category corresponds to one Unihan file:

Unihan_DictionaryIndices.txt      dictionary_indices
Unihan_DictionaryLikeData.txt     dictionary_like_data
Unihan_IRGSources.txt             irg_sources
Unihan_NumericValues.txt          numeric_values
Unihan_RadicalStrokeCounts.txt    radical_stroke_counts
Unihan_Readings.txt               readings
Unihan_Variants.txt               variants
Unihan_OtherMappings.txt          other_mappings

Each category attribute is a collection of UnihanField records. Category is set at parse time from the source filename (via FILE_TO_CATEGORY) — Unicode does not reorganize files across versions, so this is stable without per-field hardcoding.

Constant Summary collapse

CATEGORIES =

Symbol → attribute name. Mirrors the 8 Unihan files.

{
  dictionary_indices: :dictionary_indices,
  dictionary_like_data: :dictionary_like_data,
  irg_sources: :irg_sources,
  numeric_values: :numeric_values,
  radical_stroke_counts: :radical_stroke_counts,
  readings: :readings,
  variants: :variants,
  other_mappings: :other_mappings,
}.freeze
FILE_TO_CATEGORY =

Filename → category symbol. Used by the parser to bucket records without callers needing to know the mapping.

{
  "Unihan_DictionaryIndices.txt" => :dictionary_indices,
  "Unihan_DictionaryLikeData.txt" => :dictionary_like_data,
  "Unihan_IRGSources.txt" => :irg_sources,
  "Unihan_NumericValues.txt" => :numeric_values,
  "Unihan_RadicalStrokeCounts.txt" => :radical_stroke_counts,
  "Unihan_Readings.txt" => :readings,
  "Unihan_Variants.txt" => :variants,
  "Unihan_OtherMappings.txt" => :other_mappings,
}.freeze

Instance Method Summary collapse

Instance Method Details

#add(category, name, values) ⇒ Object

Pushes a field into the right category bucket. Used by the Coordinator when streaming records from the parser.

Parameters:

  • category (Symbol)

    one of CATEGORIES keys

  • name (String)

    e.g. "kMandarin"

  • values (Array<String>)

    space-split values from Unihan



67
68
69
70
# File 'lib/ucode/models/unihan_entry.rb', line 67

def add(category, name, values)
  attr_name = CATEGORIES.fetch(category) { return }
  public_send(attr_name) << UnihanField.new(name: name, values: values)
end

#all_fieldsHash{String => Array<String>}

All fields across every category, flattened to => values. Iteration helper for consumers that want a flat view (search indexing, downstream filtering).

Returns:

  • (Hash{String => Array<String>})


82
83
84
85
86
# File 'lib/ucode/models/unihan_entry.rb', line 82

def all_fields
  CATEGORIES.keys.each_with_object({}) do |sym, h|
    public_send(sym).each { |f| h[f.name] = f.values }
  end
end

#any?Boolean

True if any category has data.

Returns:

  • (Boolean)


73
74
75
# File 'lib/ucode/models/unihan_entry.rb', line 73

def any?
  CATEGORIES.keys.any? { |sym| !public_send(sym).empty? }
end