Class: Kotoshu::Readers::AffReader

Inherits:
Object
  • Object
show all
Defined in:
lib/kotoshu/readers/aff_reader.rb

Overview

AFF file reader for Hunspell affix files.

This class reads .aff files and creates an Aff data structure.

Examples:

Reading an aff file

reader = AffReader.new('en_US.aff')
aff = reader.read

Constant Summary collapse

BOOLEAN_DIRECTIVES =

Directives that are single boolean flags

%w[
  COMPLEXPREFIXES FULLSTRIP NOSPLITSUGS CHECKSHARPS
  CHECKCOMPOUNDCASE CHECKCOMPOUNDDUP CHECKCOMPOUNDREP CHECKCOMPOUNDTRIPLE
  SIMPLIFIEDTRIPLE ONLYMAXDIFF COMPOUNDMORESUFFIXES
].freeze
STRING_DIRECTIVES =

Directives that are single string values

%w[SET FLAG KEY TRY WORDCHARS LANG].freeze
INTEGER_DIRECTIVES =

Directives that are single integer values

%w[MAXDIFF MAXNGRAMSUGS MAXCPDSUGS COMPOUNDMIN COMPOUNDWORDMAX].freeze
FLAG_DIRECTIVES =

Directives that are single flag values

%w[
  NOSUGGEST KEEPCASE CIRCUMFIX NEEDAFFIX FORBIDDENWORD WARN
  COMPOUNDFLAG COMPOUNDBEGIN COMPOUNDMIDDLE COMPOUNDEND
  ONLYINCOMPOUND COMPOUNDPERMITFLAG COMPOUNDFORBIDFLAG FORCEUCASE
  SUBSTANDARD SYLLABLENUM COMPOUNDROOT
].freeze
SYNONYMS =

Outdated directive names and their synonyms

{
  'PSEUDOROOT' => 'NEEDAFFIX',
  'COMPOUNDLAST' => 'COMPOUNDEND'
}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path, encoding: 'UTF-8') ⇒ AffReader

Create a new AFF reader.

Parameters:

  • path (String)

    Path to the .aff file

  • encoding (String) (defaults to: 'UTF-8')

    File encoding (default: ‘UTF-8’); overridden by the file’s SET directive when present



50
51
52
53
54
55
# File 'lib/kotoshu/readers/aff_reader.rb', line 50

def initialize(path, encoding: 'UTF-8')
  @path = path
  @encoding = detect_encoding(path) || encoding
  @flag_format = 'short'
  @flag_synonyms = {}
end

Instance Attribute Details

#encodingObject (readonly)

Returns the value of attribute encoding.



43
44
45
# File 'lib/kotoshu/readers/aff_reader.rb', line 43

def encoding
  @encoding
end

#flag_formatObject (readonly)

Returns the value of attribute flag_format.



43
44
45
# File 'lib/kotoshu/readers/aff_reader.rb', line 43

def flag_format
  @flag_format
end

#pathObject (readonly)

Returns the value of attribute path.



43
44
45
# File 'lib/kotoshu/readers/aff_reader.rb', line 43

def path
  @path
end

Instance Method Details

#read(source = nil) ⇒ Hash

Read the aff file and return the aff data structure.

Parameters:

  • source (FileReader, nil) (defaults to: nil)

    Optional file reader to use instead of creating a new one

Returns:

  • (Hash)

    The aff data structure



61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# File 'lib/kotoshu/readers/aff_reader.rb', line 61

def read(source = nil)
  reader = source || FileReader.new(@path, @encoding)

  data = {
    'SFX' => {},
    'PFX' => {},
    'FLAG' => 'short'
  }

  reader.each do |_line_no, line|
    dir_value = read_directive(reader, line)
    next unless dir_value

    directive, value = dir_value

    # Update flag format when FLAG directive is encountered (BEFORE using it)
    if directive == 'FLAG'
      @flag_format = value
    end

    # Re-parse FLAG directive value now that @flag_format is updated
    if directive == 'FLAG' && value.is_a?(String)
      # No re-parsing needed for FLAG, just update the format
    end

    # SFX/PFX have multiple entries
    if %w[SFX PFX].include?(directive)
      data[directive][value.first.flag] = value
    else
      data[directive] = value
    end

    # Update flag synonyms when AF directive is encountered (AFTER storing it)
    if directive == 'AF'
      @flag_synonyms = value
    end

    # Note: We don't reset_encoding during iteration because it closes
    # the file and breaks the iteration. The FileReader is initialized
    # with UTF-8 encoding which handles most cases.
  end

  data
end