Module: Pikuri::Tool::Search::Engines

Defined in:
lib/pikuri/tool/search/engines.rb

Overview

Search-orchestration entry point: the cascade across configured providers, the result cache, and the Unavailable protocol marker the cascade uses to fall back. The LLM-facing tool itself (WEB_SEARCH) lives in lib/tool/web_search.rb and calls into Engines.search below. Each Pikuri::Tool::Search provider module (DuckDuckGo, Brave, Exa) raises Unavailable when it wants the cascade to try the next one.

Defined Under Namespace

Classes: Unavailable

Constant Summary collapse

LOGGER =

Subsystem logger; set its level with PIKURI_LOG_ENGINES (e.g. PIKURI_LOG_ENGINES=debug) or the global PIKURI_LOG.

Returns:

  • (Logger)
Pikuri.logger_for('Engines')
CACHE =

On-disk cache used by search to memoize answered queries. Defined as a method so specs can swap it for an isolated cache or UrlCache::NULL without touching the shared instance.

Returns:

UrlCache.new(ttl: UrlCache::DEFAULT_TTL, dir: "#{UrlCache::ROOT_DIR}/web_search")

Class Method Summary collapse

Class Method Details

.cacheObject



48
49
50
# File 'lib/pikuri/tool/search/engines.rb', line 48

def self.cache
  CACHE
end

.providersArray<Module>

All providers that are currently configured. DuckDuckGo is always available (no API key needed); Brave and Pikuri::Tool::Search::Exa each join the list when their API token is present in the environment. Recomputed on every call so a process picks up a newly-set token without a restart.

Returns:

  • (Array<Module>)

    Tool::Search::* provider modules, each exposing .search(query, max_results:)Array<Result>



35
36
37
38
39
40
# File 'lib/pikuri/tool/search/engines.rb', line 35

def self.providers
  list = [DuckDuckGo]
  list << Brave unless ENV[Brave::ENV_KEY].to_s.strip.empty?
  list << Exa unless ENV[Exa::ENV_KEY].to_s.strip.empty?
  list
end

.search(query, max_results:) ⇒ String

Run query through the configured providers in random order, falling back to the next one each time a provider raises Unavailable. The shuffle spreads load so a single provider isn’t always hit first (and exhausted first); revisit if it stops being the right default.

The query is whitespace-trimmed and runs of whitespace collapsed to a single space before the cascade runs. The winning provider’s Array<Result> is rendered into smolagents-style Markdown here (+“## Search Results”+ header, then [title](url)nbody entries joined by blank lines; an empty array becomes “No results found.”), and the rendered Markdown is cached on disk via cache, keyed by the cleaned query. A cache hit short-circuits the cascade entirely (and benefits whichever provider would have answered next time too — once a query is cached, the cooldown state of the original answering provider no longer matters). max_results is not part of the cache key, so callers passing a non-default value may get a result rendered with the previously-cached size.

If every provider reports temporary unavailability, returns an “Error: …” string instead of raising — same convention as Calculator.calculate, so the agent loop can feed the failure back to the model as the next observation. Any non-Unavailable exception (network error, parser failure, malformed response, bad API key) bubbles up to the caller.

Parameters:

  • query (String)

    search query

  • max_results (Integer)

    maximum number of result entries

Returns:

  • (String)

    Markdown-formatted result list, or “Error: …” when all providers are exhausted

Raises:

  • (ArgumentError)

    if the query is empty after normalization



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/pikuri/tool/search/engines.rb', line 82

def self.search(query, max_results:)
  cleaned = query.to_s.strip.gsub(/\s+/, ' ')
  raise ArgumentError, 'query is empty' if cleaned.empty?

  current_providers = providers
  log_providers(current_providers)

  hit = true
  result = cache.fetch(cleaned) do
    hit = false
    failures = []
    results = nil
    chosen = nil
    current_providers.shuffle.each do |provider|
      results = provider.search(cleaned, max_results: max_results)
      chosen = provider
      break
    rescue Unavailable => e
      failures << "#{provider.name.split('::').last} (#{e.message})"
    end
    # Raise so {UrlCache#fetch} does NOT persist the all-unavailable
    # message — otherwise that string would block every future search
    # for this query until the TTL expires. The outer +rescue+ turns
    # the raise back into the calculator-style "Error: …" string.
    chosen or raise Unavailable, "all search providers temporarily unavailable: #{failures.join('; ')}"

    LOGGER.info do
      "engine=#{chosen.name.split('::').last} query=#{cleaned.inspect} results=#{results.size}"
    end
    render(results)
  end
  LOGGER.info { "cache=hit query=#{cleaned.inspect} bytes=#{result.bytesize}" } if hit
  result
rescue Unavailable => e
  "Error: #{e.message}"
end