Class: ICU4X::Segmenter

Inherits:
Object
  • Object
show all
Defined in:
lib/icu4x/yard_docs.rb,
lib/icu4x.rb

Overview

Segments text into graphemes, words, sentences, or lines.

Segmenter provides Unicode-compliant text segmentation according to UAX #29 (Text Segmentation) and UAX #14 (Line Breaking).

Examples:

Word segmentation

segmenter = ICU4X::Segmenter.new(granularity: :word)
segments = segmenter.segment("Hello, world!")
segments.map(&:segment)  #=> ["Hello", ",", " ", "world", "!"]

Grapheme segmentation

segmenter = ICU4X::Segmenter.new(granularity: :grapheme)
segments = segmenter.segment("👨‍👩‍👧")
segments.size  #=> 1 (family emoji is one grapheme)

Defined Under Namespace

Classes: Segment

Instance Method Summary collapse

Constructor Details

#initialize(granularity:, provider: nil) ⇒ Segmenter

Creates a new Segmenter instance.

Examples:

segmenter = ICU4X::Segmenter.new(granularity: :word)
segmenter = ICU4X::Segmenter.new(granularity: :sentence)

Parameters:

  • granularity (Symbol)

    segmentation granularity: ‘:grapheme`, `:word`, `:sentence`, or `:line`

  • provider (DataProvider, nil) (defaults to: nil)

    data provider (uses default if nil)

Raises:



934
# File 'lib/icu4x/yard_docs.rb', line 934

def initialize(granularity:, provider: nil); end

Instance Method Details

#resolved_optionsHash

Returns the resolved options for this instance.

Returns:

  • (Hash)

    options hash with keys:

    • ‘:granularity` [Symbol] the segmentation granularity



954
# File 'lib/icu4x/yard_docs.rb', line 954

def resolved_options; end

#segment(text) ⇒ Array<Segment>

Segments text into an array of segments.

Examples:

segments = segmenter.segment("Hello world")
segments.each do |seg|
  puts "#{seg.index}: #{seg.segment.inspect}"
end

Parameters:

  • text (String)

    the text to segment

Returns:

  • (Array<Segment>)

    array of segment objects



947
# File 'lib/icu4x/yard_docs.rb', line 947

def segment(text); end