Class: Kotoshu::Readers::DicReader

Inherits:
Object
  • Object
show all
Defined in:
lib/kotoshu/readers/dic_reader.rb

Overview

DIC file reader for Hunspell dictionary files.

This class reads .dic files and creates a list of Word entries.

Examples:

Reading a dic file

reader = DicReader.new('en_US.dic', flag_format: 'short')
words = reader.read

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path, encoding: 'UTF-8', flag_format: 'short', flag_synonyms: {}) ⇒ DicReader

Create a new DIC reader.

Parameters:

  • path (String)

    Path to the .dic file

  • encoding (String) (defaults to: 'UTF-8')

    File encoding (default: ‘UTF-8’)

  • flag_format (String) (defaults to: 'short')

    Flag format (‘short’, ‘long’, ‘num’, ‘UTF-8’)

  • flag_synonyms (Hash) (defaults to: {})

    Flag synonyms map



78
79
80
81
82
83
# File 'lib/kotoshu/readers/dic_reader.rb', line 78

def initialize(path, encoding: 'UTF-8', flag_format: 'short', flag_synonyms: {})
  @path = path
  @encoding = encoding
  @flag_format = flag_format
  @flag_synonyms = flag_synonyms
end

Instance Attribute Details

#encodingObject (readonly)

Returns the value of attribute encoding.



70
71
72
# File 'lib/kotoshu/readers/dic_reader.rb', line 70

def encoding
  @encoding
end

#flag_formatObject (readonly)

Returns the value of attribute flag_format.



70
71
72
# File 'lib/kotoshu/readers/dic_reader.rb', line 70

def flag_format
  @flag_format
end

#flag_synonymsObject (readonly)

Returns the value of attribute flag_synonyms.



70
71
72
# File 'lib/kotoshu/readers/dic_reader.rb', line 70

def flag_synonyms
  @flag_synonyms
end

#pathObject (readonly)

Returns the value of attribute path.



70
71
72
# File 'lib/kotoshu/readers/dic_reader.rb', line 70

def path
  @path
end

Instance Method Details

#readArray<Word>

Read the dic file and return a list of Word entries.

Returns:

  • (Array<Word>)

    List of word entries



88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/kotoshu/readers/dic_reader.rb', line 88

def read
  reader = FileReader.new(@path, @encoding)

  words = []
  first_line = true
  expected_count = 0

  reader.each do |_line_no, line|
    if first_line
      # First line is word count
      expected_count = line.to_i
      first_line = false
      next
    end

    # Skip empty lines
    next if line.empty?

    # Parse word
    word = Word.from_line(line, flag_format: @flag_format, flag_synonyms: @flag_synonyms)
    words << word
  end

  # Verify word count
  # Note: We don't raise an error if count doesn't match, as some dictionaries have different formats

  words
end