Module: Woods::Tasks
- Defined in:
- lib/woods/tasks.rb
Overview
Small helpers invoked from ‘lib/tasks/woods.rake`.
Keeps rake task bodies to a couple of lines each so the real work lives in plain Ruby that can be unit-tested without Rake’s global state.
Class Method Summary collapse
-
.build_embed_indexer ⇒ Embedding::Indexer
Build an Embedding::Indexer wired to the provider and stores described by configuration.
-
.build_resolved_config(config, provider: nil) ⇒ Object
Build a ResolvedConfig snapshot from the live Woods::Configuration.
-
.print_embed_stats(stats, mode:) ⇒ Object
Print an indexer stats hash in the format the rake tasks have historically used.
Class Method Details
.build_embed_indexer ⇒ Embedding::Indexer
Build an Embedding::Indexer wired to the provider and stores described by Woods.configuration. Uses Builder so ‘config.embedding_provider`, `config.embedding_options`, and `config.vector_store(_options)` are all honoured — prior to this the rake tasks hardcoded Ollama + InMemory and silently ignored configuration, which was invisible until the provider tried to reach an unreachable default host.
The TextPreparer and SemanticChunker are tuned to the selected provider so oversize units are split into chunks that fit the provider’s input budget (e.g. Ollama’s num_ctx, OpenAI’s 8k cap).
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/woods/tasks.rb', line 28 def config = Woods.configuration builder = Builder.new(config) provider = builder. # Wire the persistence-arc pieces (resolved_config, metadata_store, # dump_retention_count) so Indexer#persist_snapshot can write # woods.json, dump metadata, and honour the user's retention setting. # Without these kwargs, embed writes vectors.bin + latest pointer but # never writes woods.json — which breaks the standalone woods-mcp # Shape-2 boot path entirely. # # metadata_store and resolved_config are nil-safe — hosts that don't # configure metadata or that pre-date the persistence arc still work. Embedding::Indexer.new( provider: provider, text_preparer: builder.build_text_preparer(provider), vector_store: builder.build_vector_store, metadata_store: config. ? builder. : nil, resolved_config: build_resolved_config(config, provider: provider), chunker: builder.build_chunker(provider), dump_retention_count: config.dump_retention_count, output_dir: ENV.fetch('WOODS_OUTPUT', config.output_dir) ) end |
.build_resolved_config(config, provider: nil) ⇒ Object
Build a ResolvedConfig snapshot from the live Woods::Configuration. Returns nil if the configuration doesn’t have enough to produce one (pre-persistence-arc hosts) so the Indexer falls back to the legacy dump-without-woods.json behaviour.
Passes the live provider so ResolvedConfig.from_configuration can probe provider.dimensions — without this, Ollama snapshots record dimension: 0 and every subsequent MCP boot fails a spurious dimension-mismatch check against the real stored vectors.
63 64 65 66 67 68 69 |
# File 'lib/woods/tasks.rb', line 63 def build_resolved_config(config, provider: nil) return nil unless config. ResolvedConfig.from_configuration(config, provider: provider) rescue StandardError nil end |
.print_embed_stats(stats, mode:) ⇒ Object
Print an indexer stats hash in the format the rake tasks have historically used. ‘mode:` only affects the header line.
76 77 78 79 80 81 82 83 |
# File 'lib/woods/tasks.rb', line 76 def (stats, mode:) header = mode == :incremental ? 'Incremental embedding complete!' : 'Embedding complete!' puts puts header puts " Processed: #{stats[:processed]}" puts " Skipped: #{stats[:skipped]}" puts " Errors: #{stats[:errors]}" end |