Module: Pikuri::Tool::Search::Exa

Defined in:
lib/pikuri/tool/search/exa.rb

Overview

Performs an Exa search via the official /search endpoint and returns the hits as a list of Result rows. Split into a thin HTTP fetch (#search) and a pure parser (#parse) so tests can exercise the parser against fixture JSON without hitting the network. The cascade in Pikuri::Tool::Search::Engines.search owns the final Markdown rendering.

Requires an Exa API key. Get one at exa.ai — the service is paid, so the cascade in Pikuri::Tool::Search::Engines.providers only includes Exa when ENV_KEY is set in the environment; users who haven’t registered never spend money on it.

Calls request type: “auto” (Exa picks neural vs keyword per query) and contents: { highlights: true } so each result carries a short neural-ranked snippet — the closest analog to Brave’s description field, populating Result#body consistently across providers.

Privacy posture

Exa’s Privacy Policy states Query Data is used to improve our products and technology, including by training and fine-tuning models that power our Services, and the Terms of Service §1.2© grant Exa a perpetual and irrevocable, sub-licensable, worldwide license over User Input that can be disclosed to third parties as needed. Business customers under a Master Subscription Agreement / DPA get carve-outs; the default pay-as-you-go API key (which is what pikuri uses) does not.

Bottom line: Exa does not sell queries to data brokers, but it does mine them to train competing models, and the license it claims is effectively “do what we want with this, forever”. If a query would be embarrassing or sensitive in a training set, drop Exa out of the cascade by unsetting ENV_KEYPikuri::Tool::Search::Engines.providers is recomputed every call.

Constant Summary collapse

ENDPOINT =

Returns Search endpoint (POST, JSON body).

Returns:

  • (String)

    Search endpoint (POST, JSON body)

'https://api.exa.ai/search'
DEFAULT_MAX_RESULTS =

Returns default number of results returned, matching DuckDuckGo::DEFAULT_MAX_RESULTS.

Returns:

10
ENV_KEY =

Returns env var holding the API key; sent as x-api-key.

Returns:

  • (String)

    env var holding the API key; sent as x-api-key

'EXA_API_KEY'
LIMITER =

Returns Exa is paid and doesn’t aggressively throttle, so no minimum interval is enforced. The 5-minute cooldown still applies on Pikuri::Tool::Search::Engines::Unavailable so the user’s budget isn’t burned on doomed retries while a 429 / 5xx condition persists.

Returns:

  • (RateLimiter)

    Exa is paid and doesn’t aggressively throttle, so no minimum interval is enforced. The 5-minute cooldown still applies on Pikuri::Tool::Search::Engines::Unavailable so the user’s budget isn’t burned on doomed retries while a 429 / 5xx condition persists.

RateLimiter.new(min_interval: 0.0, cooldown: 300.0)

Class Method Summary collapse

Class Method Details

.parse(json, max_results: DEFAULT_MAX_RESULTS) ⇒ Array<Result>

Parse an Exa Search JSON response into a list of Result rows, where body is the first non-empty highlights snippet (empty when Exa returned no highlight for that result — e.g. for navigational results).

When the response yields zero result entries, two cases are distinguished: a genuine “no results” payload (response carries a requestId and an empty results array — Exa ran the query but matched nothing) returns an empty array instead of raising, so Pikuri::Tool::Search::Engines.search can render its standard no-results stub. Anything else (unknown shape, structured error) raises with a diagnostic so the failure surfaces.

Parameters:

  • json (String)

    response body from ENDPOINT

  • max_results (Integer) (defaults to: DEFAULT_MAX_RESULTS)

    maximum number of result entries

Returns:

  • (Array<Result>)

    hits, possibly empty on a recognized empty-results payload

Raises:

  • (RuntimeError)

    when the response yields no result entries and is not recognized as a genuine empty-results payload



127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/pikuri/tool/search/exa.rb', line 127

def self.parse(json, max_results: DEFAULT_MAX_RESULTS)
  data = JSON.parse(json)
  results = Array(data['results']).take(max_results).filter_map do |r|
    href = r['url'].to_s
    next nil if href.empty?

    Result.new(
      url: href,
      title: clean(r['title']) || href,
      body: first_highlight(r['highlights'])
    )
  end

  if results.empty?
    return [] if genuine_no_results?(data)

    raise diagnose_empty(data, json)
  end

  results
end

.search(query, max_results: DEFAULT_MAX_RESULTS, api_key: ENV.fetch(ENV_KEY, nil)) ⇒ Array<Result>

Fetch results for query and return them as an Array<Result>. Calls are circuit-broken for 5 minutes on rate-limit / unavailable responses; see LIMITER. The caller (typically Pikuri::Tool::Search::Engines.search) is expected to have already normalized the query and to wrap this in a result cache.

Parameters:

  • query (String)

    search query (already normalized)

  • max_results (Integer) (defaults to: DEFAULT_MAX_RESULTS)

    maximum number of result entries; passed through as Exa’s numResults

  • api_key (String) (defaults to: ENV.fetch(ENV_KEY, nil))

    Exa API key; defaults to the ENV_KEY environment variable

Returns:

  • (Array<Result>)

    hits, possibly empty when Exa ran the query and matched nothing

Raises:

  • (ArgumentError)

    if no API key is available

  • (Engines::Unavailable)

    when Exa returns HTTP 429 (rate limit / quota exhausted) or 5xx — “try again later” responses the cascade in Pikuri::Tool::Search::Engines.search can fall back from. Also raised immediately if LIMITER is in cooldown. Other non-2xx (e.g. 401/403 from a bad API key) bubble up as RuntimeError so config problems stay visible.

  • (RuntimeError)

    for non-rate-limit HTTP failures or when the response shape contains no results and isn’t a recognized empty-results payload.



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/pikuri/tool/search/exa.rb', line 81

def self.search(query, max_results: DEFAULT_MAX_RESULTS, api_key: ENV.fetch(ENV_KEY, nil))
  raise ArgumentError, "Exa Search API key not set (#{ENV_KEY})" if api_key.to_s.strip.empty?

  LIMITER.call do
    response = Faraday.post(ENDPOINT) do |req|
      req.headers['x-api-key'] = api_key
      req.headers['Content-Type'] = 'application/json'
      req.headers['Accept'] = 'application/json'
      req.body = JSON.dump(
        query: query,
        type: 'auto',
        numResults: max_results,
        contents: { highlights: true }
      )
    end
    unless response.success?
      if response.status == 429 || response.status >= 500
        raise Engines::Unavailable, "HTTP #{response.status}"
      end

      raise "Exa Search request failed: #{response.status} #{response.body}"
    end

    parse(response.body, max_results: max_results)
  end
end