Class: EmbeddingsHandler
- Inherits:
-
Tep::Handler
- Object
- Tep::Handler
- EmbeddingsHandler
- Defined in:
- lib/toy/serve/openai/embeddings_handler.rb
Overview
lib/toy/serve/openai/embeddings_handler.rb – OpenAI-shape /v1/embeddings handler.
MOVED from tep_demo/embeddings_handler.rb (P4 toy serve). Decodes the input token-id array with SpinelKit::Json.get_int_array (toy#44; the former local ApiJson shim is retired — SpinelKit’s decoder replaces it). Constructed with (STATE, MODEL_NAME) so the handler doesn’t need cross-file constant resolution.
Contract:
-
Body: [int, int, …], “model”: “…” – IDs only, tokenize client-side (matches the server’s overall “IDs only” policy).
-
Lookup: per-token embedding via tnn_embed_lookup_to_doubles (dequantize-aware; works on f32 + Q8 + Q4 tables).
-
Pooling: mean over the input tokens -> single d_model vector.
-
Response shape: OpenAI /v1/embeddings v1.
Spinel notes:
-
State is passed in at construction so the handler doesn’t need cross-file constant resolution (STATE is per-server).
-
The pooling loop uses pure while loops + Array<Float> buffers seeded with pop-to-empty (the [0.0]; arr.pop type-pin pattern).
Instance Method Summary collapse
- #handle(req, res) ⇒ Object
-
#initialize(state, model_name) ⇒ EmbeddingsHandler
constructor
A new instance of EmbeddingsHandler.
Constructor Details
#initialize(state, model_name) ⇒ EmbeddingsHandler
Returns a new instance of EmbeddingsHandler.
25 26 27 28 |
# File 'lib/toy/serve/openai/embeddings_handler.rb', line 25 def initialize(state, model_name) @state = state @model_name = model_name end |
Instance Method Details
#handle(req, res) ⇒ Object
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
# File 'lib/toy/serve/openai/embeddings_handler.rb', line 30 def handle(req, res) res.headers["Content-Type"] = "application/json" body = req.body ids = SpinelKit::Json.get_int_array(body, "input") if ids.length == 0 res.set_status(400) return "{\"error\":{\"message\":\"input must be a non-empty int array " + "(this server speaks IDs only; tokenize client-side)\"," + "\"type\":\"invalid_request_error\"}}\n" end d_model = @state.cfg.d_model sess = @state.kv.sess = @state.kv. # Mean-pool buffer: sum across tokens, then divide. sum_buf = [0.0]; sum_buf.pop j0 = 0 while j0 < d_model sum_buf.push(0.0) j0 = j0 + 1 end tok_buf = [0.0]; tok_buf.pop j1 = 0 while j1 < d_model tok_buf.push(0.0) j1 = j1 + 1 end i = 0 while i < ids.length rc = TinyNN.(sess, , ids[i], tok_buf, d_model) if rc != 0 # Per the "never mask, fail loud" rule, surface as an error # response rather than silently returning a partial vector. res.set_status(500) return "{\"error\":{\"message\":\"embed_lookup rc=" + rc.to_s + " at token index " + i.to_s + " (id=" + ids[i].to_s + ")\",\"type\":\"server_error\"}}\n" end k = 0 while k < d_model sum_buf[k] = sum_buf[k] + tok_buf[k] k = k + 1 end i = i + 1 end inv_n = 1.0 / ids.length.to_f out = "{\"object\":\"list\",\"data\":[{\"object\":\"embedding\",\"index\":0,\"embedding\":[" kk = 0 while kk < d_model if kk > 0; out = out + ","; end out = out + (sum_buf[kk] * inv_n).to_s kk = kk + 1 end out = out + "]}],\"model\":\"" + @model_name + "\",\"usage\":{\"prompt_tokens\":" + ids.length.to_s + ",\"total_tokens\":" + ids.length.to_s + "}}\n" out end |