Module: Woods::Storage::Snapshotter::Vector

Defined in:
lib/woods/storage/snapshotter/vector.rb

Overview

Reads and writes the vectors.bin / vectors.idx on-disk format.

Binary layout of vectors.bin (all integers little-endian):

offset  length   field
  0     4 bytes  magic "WVF1"
  4     4 bytes  schema_version (u32 LE)
  8     4 bytes  dimension (u32 LE)
 12     8 bytes  vector_count (u64 LE)
 20     4 bytes  gem_version_length (u32 LE)
 24     N bytes  gem_version (UTF-8)
 24+N   4 bytes  model_name_length (u32 LE)
 28+N   M bytes  model_name (UTF-8)
 ...    —        packed float32 data (vector_count × dimension × 4 bytes)

vectors.idx (one record per vector):

4 bytes  id_length (u32 LE) + N bytes id (UTF-8) + 8 bytes offset (u64 LE)

Atomic writes use Tempfile + File.rename for crash safety.

Constant Summary collapse

MAGIC =

rubocop:disable Metrics/ModuleLength

'WVF1'
SCHEMA_VERSION_SUPPORTED =
1

Class Method Summary collapse

Class Method Details

.dump(store, artifact, dump_dir, resolved_config: nil) ⇒ void

This method returns an undefined value.

Writes vectors.bin and vectors.idx into dump_dir atomically.

Parameters:

  • store (#each_entry, #bulk_load)

    in-memory vector store adapter

  • artifact (Woods::IndexArtifact)

    artifact layout object

  • dump_dir (Pathname, String)

    target directory; must be under artifact.dumps_root

  • resolved_config (#model_name, nil) (defaults to: nil)

    model name written to header

Raises:



65
66
67
68
69
70
71
72
# File 'lib/woods/storage/snapshotter/vector.rb', line 65

def self.dump(store, artifact, dump_dir, resolved_config: nil)
  validate_store!(store)
  validate_dump_dir!(artifact, Pathname.new(dump_dir.to_s))
  model_name = resolved_config.respond_to?(:model_name) ? resolved_config.model_name.to_s : ''
  entries = store.each_entry.to_a
  write_bin_and_idx(Pathname.new(dump_dir.to_s), entries, Woods::VERSION, model_name)
  nil
end

.load_or_empty(artifact, resolved_config: nil) ⇒ Woods::Storage::VectorStore::InMemory

Returns a populated in-memory vector store loaded from the latest dump, or an empty store when no dump exists yet.

Parameters:

  • artifact (Woods::IndexArtifact)

    artifact layout object

  • resolved_config (#dimension, nil) (defaults to: nil)

    used for dimension validation

Returns:

Raises:



45
46
47
48
49
50
51
52
53
54
# File 'lib/woods/storage/snapshotter/vector.rb', line 45

def self.load_or_empty(artifact, resolved_config: nil)
  dump_dir = artifact.latest_dump_path
  return VectorStore::InMemory.new if dump_dir.nil?

  bin_path = dump_dir.join('vectors.bin')
  idx_path = dump_dir.join('vectors.idx')
  return VectorStore::InMemory.new unless bin_path.exist? && idx_path.exist?

  load_from(bin_path, idx_path, resolved_config)
end