Module: Woods::Tasks

Defined in:
lib/woods/tasks.rb

Overview

Small helpers invoked from ‘lib/tasks/woods.rake`.

Keeps rake task bodies to a couple of lines each so the real work lives in plain Ruby that can be unit-tested without Rake’s global state.

Class Method Summary collapse

Class Method Details

.build_embed_indexerEmbedding::Indexer

Build an Embedding::Indexer wired to the provider and stores described by Woods.configuration. Uses Builder so ‘config.embedding_provider`, `config.embedding_options`, and `config.vector_store(_options)` are all honoured — prior to this the rake tasks hardcoded Ollama + InMemory and silently ignored configuration, which was invisible until the provider tried to reach an unreachable default host.

The TextPreparer and SemanticChunker are tuned to the selected provider so oversize units are split into chunks that fit the provider’s input budget (e.g. Ollama’s num_ctx, OpenAI’s 8k cap).

Returns:



28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/woods/tasks.rb', line 28

def build_embed_indexer
  config = Woods.configuration
  builder = Builder.new(config)
  provider = builder.build_embedding_provider

  # Wire the persistence-arc pieces (resolved_config, metadata_store,
  # dump_retention_count) so Indexer#persist_snapshot can write
  # woods.json, dump metadata, and honour the user's retention setting.
  # Without these kwargs, embed writes vectors.bin + latest pointer but
  # never writes woods.json — which breaks the standalone woods-mcp
  # Shape-2 boot path entirely.
  #
  # metadata_store and resolved_config are nil-safe — hosts that don't
  # configure metadata or that pre-date the persistence arc still work.
  Embedding::Indexer.new(
    provider: provider,
    text_preparer: builder.build_text_preparer(provider),
    vector_store: builder.build_vector_store,
    metadata_store: config. ? builder. : nil,
    resolved_config: build_resolved_config(config, provider: provider),
    chunker: builder.build_chunker(provider),
    dump_retention_count: config.dump_retention_count,
    output_dir: ENV.fetch('WOODS_OUTPUT', config.output_dir)
  )
end

.build_resolved_config(config, provider: nil) ⇒ Object

Build a ResolvedConfig snapshot from the live Woods::Configuration. Returns nil if the configuration doesn’t have enough to produce one (pre-persistence-arc hosts) so the Indexer falls back to the legacy dump-without-woods.json behaviour.

Passes the live provider so ResolvedConfig.from_configuration can probe provider.dimensions — without this, Ollama snapshots record dimension: 0 and every subsequent MCP boot fails a spurious dimension-mismatch check against the real stored vectors.



63
64
65
66
67
68
69
# File 'lib/woods/tasks.rb', line 63

def build_resolved_config(config, provider: nil)
  return nil unless config.embedding_provider

  ResolvedConfig.from_configuration(config, provider: provider)
rescue StandardError
  nil
end

Print an indexer stats hash in the format the rake tasks have historically used. ‘mode:` only affects the header line.

Parameters:

  • stats (Hash)
  • mode (Symbol)

    :full or :incremental



76
77
78
79
80
81
82
83
# File 'lib/woods/tasks.rb', line 76

def print_embed_stats(stats, mode:)
  header = mode == :incremental ? 'Incremental embedding complete!' : 'Embedding complete!'
  puts
  puts header
  puts "  Processed: #{stats[:processed]}"
  puts "  Skipped:   #{stats[:skipped]}"
  puts "  Errors:    #{stats[:errors]}"
end