Class: Woods::Embedding::Provider::OpenAI
- Inherits:
-
Object
- Object
- Woods::Embedding::Provider::OpenAI
- Includes:
- Interface
- Defined in:
- lib/woods/embedding/openai.rb
Overview
OpenAI adapter for cloud embeddings via the OpenAI HTTP API.
Uses the ‘/v1/embeddings` endpoint to generate embeddings. Requires a valid OpenAI API key.
Constant Summary collapse
- ENDPOINT =
URI('https://api.openai.com/v1/embeddings')
- DEFAULT_MODEL =
'text-embedding-3-small'- DIMENSIONS =
{ 'text-embedding-3-small' => 1536, 'text-embedding-3-large' => 3072 }.freeze
- MAX_INPUT_TOKENS =
OpenAI embedding models share an 8191-token input cap across text-embedding-3-small / -3-large / ada-002. The chunker uses this as a hard ceiling — the actual chunk size lands well below it once chars-per-token estimation and the prefix allowance are factored in (see Builder#build_chunker).
8191
Instance Method Summary collapse
-
#dimensions ⇒ Integer
Return the dimensionality of vectors produced by this model.
-
#embed(text) ⇒ Array<Float>
Embed a single text string.
-
#embed_batch(texts) ⇒ Array<Array<Float>>
Embed multiple texts in a single request.
-
#initialize(api_key:, model: DEFAULT_MODEL) ⇒ OpenAI
constructor
A new instance of OpenAI.
-
#max_input_tokens ⇒ Integer
Maximum input length OpenAI will accept for a single embedding text.
-
#model_name ⇒ String
Return the model name.
Constructor Details
#initialize(api_key:, model: DEFAULT_MODEL) ⇒ OpenAI
Returns a new instance of OpenAI.
36 37 38 39 |
# File 'lib/woods/embedding/openai.rb', line 36 def initialize(api_key:, model: DEFAULT_MODEL) @api_key = api_key @model = model end |
Instance Method Details
#dimensions ⇒ Integer
Return the dimensionality of vectors produced by this model.
Uses the known dimensions for standard models, falling back to a test embedding for unknown models.
80 81 82 |
# File 'lib/woods/embedding/openai.rb', line 80 def dimensions DIMENSIONS[@model] || ('test').length end |
#embed(text) ⇒ Array<Float>
Embed a single text string.
47 48 49 50 51 52 |
# File 'lib/woods/embedding/openai.rb', line 47 def (text) raise ArgumentError, 'embed(text) requires a non-empty string' if text.nil? || text.to_s.strip.empty? response = post_request({ model: @model, input: text }) response['data'].first['embedding'] end |
#embed_batch(texts) ⇒ Array<Array<Float>>
Embed multiple texts in a single request.
Sorts results by the index field to guarantee ordering matches input.
62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/woods/embedding/openai.rb', line 62 def (texts) # rubocop:disable Metrics/CyclomaticComplexity raise ArgumentError, 'embed_batch(texts) requires a non-empty array' if texts.nil? || texts.empty? if texts.any? { |t| t.nil? || t.to_s.strip.empty? } raise ArgumentError, 'embed_batch(texts) rejects nil/empty entries (OpenAI returns 400)' end response = post_request({ model: @model, input: texts }) response['data'] .sort_by { |item| item['index'] } .map { |item| item['embedding'] } end |
#max_input_tokens ⇒ Integer
Maximum input length OpenAI will accept for a single embedding text. All current text-embedding-* models cap at ~8k tokens.
95 96 97 |
# File 'lib/woods/embedding/openai.rb', line 95 def max_input_tokens MAX_INPUT_TOKENS end |
#model_name ⇒ String
Return the model name.
87 88 89 |
# File 'lib/woods/embedding/openai.rb', line 87 def model_name @model end |