Class: SmartCsvImport::Strategies::Llm

Inherits:

SmartCsvImport::Strategy

Object
SmartCsvImport::Strategy
SmartCsvImport::Strategies::Llm

show all

Includes:: Logging

Defined in:: lib/smart_csv_import/strategies/llm.rb

Instance Method Summary collapse

#match(csv_headers:, form_class:, sample_rows: []) ⇒ Object

Why we do NOT use HyDE (Hypothetical Document Embeddings) here:.

Instance Method Details

#match(csv_headers:, form_class:, sample_rows: []) ⇒ `Object`

Why we do NOT use HyDE (Hypothetical Document Embeddings) here:

HyDE would ask the LLM to generate a description of each header in isolation, then compare those descriptions to field descriptions via embeddings. It was trialled and rejected for two reasons:

It throws away the best signal we have. The LLM here already sees both sides — all headers AND all field definitions — in one prompt. That cross-field context is what disambiguates genuinely ambiguous headers. “Cell” next to first_name/last_name/email is clearly a phone number. “Cell” described in isolation could be a phone, a prison cell, or a biological cell — the LLM can’t know which.
It adds indirection without benefit. Direct matching lets the LLM reason holistically. HyDE turns that into a blind embedding lookup that loses the reasoning context.

The right path for genuinely ambiguous headers is: enrich this prompt with business context (csv_source, csv_context on the form class) so the LLM has more signal — not strip signal away via HyDE. If even that isn’t enough, surface the header as UnmatchedResult for human review.

# File 'lib/smart_csv_import/strategies/llm.rb', line 32

def match(csv_headers:, form_class:, sample_rows: [])
  field_definitions = form_class.csv_fields
  return {} if field_definitions.empty?

  prompt = build_prompt(csv_headers, field_definitions, form_class)
  response = fetch_llm_response(prompt)
  parse_response(response, csv_headers)
rescue StandardError => e
  log_error("LLM strategy failed: #{e.message}")
  {}
end

Class: SmartCsvImport::Strategies::Llm

Instance Method Summary collapse

Instance Method Details

#match(csv_headers:, form_class:, sample_rows: []) ⇒ Object

#match(csv_headers:, form_class:, sample_rows: []) ⇒ `Object`