Class: Phronomy::Splitter::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/phronomy/splitter/base.rb

Overview

Abstract base class for text splitters.

A splitter takes a single document hash (or plain text) and returns an array of smaller chunk documents:

[{ text: String, metadata: Hash }, ...]

Subclasses must implement #split.

Direct Known Subclasses

FixedSizeSplitter, RecursiveSplitter

Instance Method Summary collapse

Instance Method Details

#split(document) ⇒ Array<Hash>

Split +document+ into an array of chunk documents.

Parameters:

  • document (Hash, String)

    Either a document hash ({ text: String, metadata: Hash }) returned by a Loader, or a plain String.

Returns:

  • (Array<Hash>)

    array of { text: String, metadata: Hash }

Raises:

  • (NotImplementedError)

    when not overridden by a subclass



21
22
23
# File 'lib/phronomy/splitter/base.rb', line 21

def split(document)
  raise NotImplementedError, "#{self.class}#split is not implemented"
end

#split_all(documents) ⇒ Array<Hash>

Convenience method: split an array of documents.

Parameters:

  • documents (Array<Hash, String>)

Returns:

  • (Array<Hash>)


29
30
31
# File 'lib/phronomy/splitter/base.rb', line 29

def split_all(documents)
  documents.flat_map { |doc| split(doc) }
end