Module: Pikuri::Tool::Search::Exa
- Defined in:
- lib/pikuri/tool/search/exa.rb
Overview
Performs an Exa search via the official /search endpoint and returns the hits as a list of Result rows. Split into a thin HTTP fetch (#search) and a pure parser (#parse) so tests can exercise the parser against fixture JSON without hitting the network. The cascade in Pikuri::Tool::Search::Engines.search owns the final Markdown rendering.
Requires an Exa API key. Get one at exa.ai — the service is paid, so the cascade in Pikuri::Tool::Search::Engines.providers only includes Exa when ENV_KEY is set in the environment; users who haven’t registered never spend money on it.
Calls request type: “auto” (Exa picks neural vs keyword per query) and contents: { highlights: true } so each result carries a short neural-ranked snippet — the closest analog to Brave’s description field, populating Result#body consistently across providers.
Privacy posture
Exa’s Privacy Policy states Query Data is used to improve our products and technology, including by training and fine-tuning models that power our Services, and the Terms of Service §1.2© grant Exa a perpetual and irrevocable, sub-licensable, worldwide license over User Input that can be disclosed to third parties as needed. Business customers under a Master Subscription Agreement / DPA get carve-outs; the default pay-as-you-go API key (which is what pikuri uses) does not.
Bottom line: Exa does not sell queries to data brokers, but it does mine them to train competing models, and the license it claims is effectively “do what we want with this, forever”. If a query would be embarrassing or sensitive in a training set, drop Exa out of the cascade by unsetting ENV_KEY — Pikuri::Tool::Search::Engines.providers is recomputed every call.
Constant Summary collapse
- ENDPOINT =
Returns Search endpoint (POST, JSON body).
'https://api.exa.ai/search'- DEFAULT_MAX_RESULTS =
Returns default number of results returned, matching DuckDuckGo::DEFAULT_MAX_RESULTS.
10- ENV_KEY =
Returns env var holding the API key; sent as
x-api-key. 'EXA_API_KEY'- LIMITER =
Returns Exa is paid and doesn’t aggressively throttle, so no minimum interval is enforced. The 5-minute cooldown still applies on Pikuri::Tool::Search::Engines::Unavailable so the user’s budget isn’t burned on doomed retries while a 429 / 5xx condition persists.
RateLimiter.new(min_interval: 0.0, cooldown: 300.0)
Class Method Summary collapse
-
.parse(json, max_results: DEFAULT_MAX_RESULTS) ⇒ Array<Result>
Parse an Exa Search JSON response into a list of Result rows, where
bodyis the first non-emptyhighlightssnippet (empty when Exa returned no highlight for that result — e.g. for navigational results). -
.search(query, max_results: DEFAULT_MAX_RESULTS, api_key: ENV.fetch(ENV_KEY, nil)) ⇒ Array<Result>
Fetch results for
queryand return them as an Array<Result>.
Class Method Details
.parse(json, max_results: DEFAULT_MAX_RESULTS) ⇒ Array<Result>
Parse an Exa Search JSON response into a list of Result rows, where body is the first non-empty highlights snippet (empty when Exa returned no highlight for that result — e.g. for navigational results).
When the response yields zero result entries, two cases are distinguished: a genuine “no results” payload (response carries a requestId and an empty results array — Exa ran the query but matched nothing) returns an empty array instead of raising, so Pikuri::Tool::Search::Engines.search can render its standard no-results stub. Anything else (unknown shape, structured error) raises with a diagnostic so the failure surfaces.
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/pikuri/tool/search/exa.rb', line 127 def self.parse(json, max_results: DEFAULT_MAX_RESULTS) data = JSON.parse(json) results = Array(data['results']).take(max_results).filter_map do |r| href = r['url'].to_s next nil if href.empty? Result.new( url: href, title: clean(r['title']) || href, body: first_highlight(r['highlights']) ) end if results.empty? return [] if genuine_no_results?(data) raise diagnose_empty(data, json) end results end |
.search(query, max_results: DEFAULT_MAX_RESULTS, api_key: ENV.fetch(ENV_KEY, nil)) ⇒ Array<Result>
Fetch results for query and return them as an Array<Result>. Calls are circuit-broken for 5 minutes on rate-limit / unavailable responses; see LIMITER. The caller (typically Pikuri::Tool::Search::Engines.search) is expected to have already normalized the query and to wrap this in a result cache.
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'lib/pikuri/tool/search/exa.rb', line 81 def self.search(query, max_results: DEFAULT_MAX_RESULTS, api_key: ENV.fetch(ENV_KEY, nil)) raise ArgumentError, "Exa Search API key not set (#{ENV_KEY})" if api_key.to_s.strip.empty? LIMITER.call do response = Faraday.post(ENDPOINT) do |req| req.headers['x-api-key'] = api_key req.headers['Content-Type'] = 'application/json' req.headers['Accept'] = 'application/json' req.body = JSON.dump( query: query, type: 'auto', numResults: max_results, contents: { highlights: true } ) end unless response.success? if response.status == 429 || response.status >= 500 raise Engines::Unavailable, "HTTP #{response.status}" end raise "Exa Search request failed: #{response.status} #{response.body}" end parse(response.body, max_results: max_results) end end |