Class: Woods::Embedding::Provider::Ollama

Inherits:
Object
  • Object
show all
Includes:
Interface
Defined in:
lib/woods/embedding/provider.rb

Overview

Ollama adapter for local embeddings via the Ollama HTTP API.

Uses the ‘/api/embed` endpoint to generate embeddings. Requires a running Ollama instance (default: localhost:11434) with the specified model pulled.

Examples:

provider = Woods::Embedding::Provider::Ollama.new
vector = provider.embed("class User < ApplicationRecord; end")
vectors = provider.embed_batch(["text1", "text2"])

Constant Summary collapse

DEFAULT_MODEL =
'nomic-embed-text'
DEFAULT_HOST =
'http://localhost:11434'
MODEL_CONTEXT_LENGTHS =

Ollama enforces the model’s native context length on ‘/api/embed` regardless of the `num_ctx` override — we’ve validated this against 0.15.x for nomic-embed-text (rejects >2048) and bge-m3 (accepts up to 8192, silently truncates above). Advertise the native ceiling so the chunker can size inputs correctly. Models outside this registry fall back to Ollama’s conservative 2048 default.

See ‘docs/EMBEDDING_MODELS.md` for the tradeoff matrix and instructions for adding a new model here.

{
  'nomic-embed-text' => 2048,
  'bge-m3' => 8192,
  'mxbai-embed-large' => 512,
  'snowflake-arctic-embed' => 512,
  'snowflake-arctic-embed2' => 8192,
  # all-minilm: 512 is the model's context length, NOT the 384
  # embedding dimension and NOT the 256 some sources confuse with
  # the dimension. With a 256-token budget the chunker formula
  # produces a negative max_chars and silently drops every chunk.
  'all-minilm' => 512
}.freeze
FALLBACK_NUM_CTX =

Fallback when the configured model isn’t in the registry.

2048
DEFAULT_READ_TIMEOUT =

Default read timeout for /api/embed. The previous 30s default was too short for batched embed calls on cold models — Ollama has to load the model on first call, and an N-item batch can easily exceed 30s on a CPU-only host. 120s leaves headroom without wedging the whole pipeline on a genuinely dead server.

120

Instance Method Summary collapse

Constructor Details

#initialize(model: DEFAULT_MODEL, host: DEFAULT_HOST, num_ctx: nil, read_timeout: DEFAULT_READ_TIMEOUT) ⇒ Ollama

Returns a new instance of Ollama.

Parameters:

  • model (String) (defaults to: DEFAULT_MODEL)

    Ollama model name (default: nomic-embed-text). Set to ‘“bge-m3”` or `“snowflake-arctic-embed2”` for an 8192-token context and skip most chunking for dense Rails units.

  • host (String) (defaults to: DEFAULT_HOST)

    Ollama server URL (default: localhost:11434)

  • num_ctx (Integer, nil) (defaults to: nil)

    Ollama context window in tokens. When ‘nil` (the default), the provider picks the model’s native context from ‘MODEL_CONTEXT_LENGTHS`, falling back to 2048 for unknown models. Set explicitly only if running a model with a known-larger native context that isn’t in the registry yet.

  • read_timeout (Integer) (defaults to: DEFAULT_READ_TIMEOUT)

    HTTP read timeout in seconds. Bump this for slow / cold-start hosts or very large batches.



123
124
125
126
127
128
129
130
# File 'lib/woods/embedding/provider.rb', line 123

def initialize(model: DEFAULT_MODEL, host: DEFAULT_HOST, num_ctx: nil,
               read_timeout: DEFAULT_READ_TIMEOUT)
  @model = model
  @host = host
  @num_ctx = num_ctx || MODEL_CONTEXT_LENGTHS.fetch(model, FALLBACK_NUM_CTX)
  @read_timeout = read_timeout
  @uri = URI("#{host}/api/embed")
end

Instance Method Details

#dimensionsInteger

Return the dimensionality of vectors produced by this model.

Determined dynamically by embedding a test string on first call.

Returns:

  • (Integer)

    number of dimensions



166
167
168
# File 'lib/woods/embedding/provider.rb', line 166

def dimensions
  @dimensions ||= embed('test').length
end

#embed(text) ⇒ Array<Float>

Embed a single text string.

Parameters:

  • text (String)

    the text to embed

Returns:

  • (Array<Float>)

    the embedding vector

Raises:

  • (Woods::Error)

    if the API returns an error

  • (ArgumentError)

    if the text is nil or empty (avoids provider 400)



138
139
140
141
142
143
# File 'lib/woods/embedding/provider.rb', line 138

def embed(text)
  raise ArgumentError, 'embed(text) requires a non-empty string' if text.nil? || text.to_s.strip.empty?

  response = post_request(build_body(text))
  response['embeddings'].first
end

#embed_batch(texts) ⇒ Array<Array<Float>>

Embed multiple texts in a single request.

Parameters:

  • texts (Array<String>)

    the texts to embed

Returns:

  • (Array<Array<Float>>)

    array of embedding vectors

Raises:

  • (Woods::Error)

    if the API returns an error

  • (ArgumentError)

    if the array is empty or any element is nil/empty



151
152
153
154
155
156
157
158
159
# File 'lib/woods/embedding/provider.rb', line 151

def embed_batch(texts)
  raise ArgumentError, 'embed_batch(texts) requires a non-empty array' if texts.nil? || texts.empty?
  if texts.any? { |t| t.nil? || t.to_s.strip.empty? }
    raise ArgumentError, 'embed_batch(texts) rejects nil/empty entries'
  end

  response = post_request(build_body(texts))
  response['embeddings']
end

#max_input_tokensInteger

Maximum input length Ollama will accept — tracks the configured context window. Always populated: the constructor resolves ‘num_ctx` to the model’s registry entry or FALLBACK_NUM_CTX, so this method never returns nil for an Ollama provider.

Returns:

  • (Integer)


183
184
185
# File 'lib/woods/embedding/provider.rb', line 183

def max_input_tokens
  @num_ctx
end

#model_nameString

Return the model name.

Returns:

  • (String)

    the Ollama model name



173
174
175
# File 'lib/woods/embedding/provider.rb', line 173

def model_name
  @model
end