Class: Phronomy::Loader::CsvLoader

Inherits:
Base
  • Object
show all
Defined in:
lib/phronomy/loader/csv_loader.rb

Overview

Loads a CSV file, converting each row into a separate document.

By default the first row is treated as a header and column names are available in the document metadata. The full row is serialised to a human-readable "key: value" string for embedding.

Examples:

loader = Phronomy::Loader::CsvLoader.new
docs   = loader.load("products.csv")
# => [
#   { text: "name: Widget\nprice: 9.99", metadata: { source: "...", row: 1, name: "Widget", price: "9.99" } },
#   ...
# ]

Instance Method Summary collapse

Constructor Details

#initialize(headers: true, text_column: nil) ⇒ CsvLoader

Returns a new instance of CsvLoader.

Parameters:

  • headers (Boolean) (defaults to: true)

    treat the first row as headers (default: true)

  • text_column (String, nil) (defaults to: nil)

    if set, use only this column as the document text



23
24
25
26
# File 'lib/phronomy/loader/csv_loader.rb', line 23

def initialize(headers: true, text_column: nil)
  @headers = headers
  @text_column = text_column
end

Instance Method Details

#load(source) ⇒ Array<Hash>

Parameters:

  • source (String)

    path to a CSV file

Returns:

  • (Array<Hash>)

Raises:

  • (Errno::ENOENT)

    if the file does not exist



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/phronomy/loader/csv_loader.rb', line 31

def load(source)
  rows = CSV.read(source, headers: @headers, encoding: "UTF-8")

  if @headers
    rows.each_with_index.map do |row, idx|
      row_hash = row.to_h
      text = if @text_column
        row_hash[@text_column].to_s
      else
        row_hash.map { |k, v| "#{k}: #{v}" }.join("\n")
      end
       = row_hash.transform_keys(&:to_sym).merge(source: source, row: idx + 1)
      {text: text, metadata: }
    end
  else
    rows.each_with_index.map do |row, idx|
      text = row.join(", ")
      {text: text, metadata: {source: source, row: idx + 1}}
    end
  end
end