Class: Archaeo::ArchiveSearch

Inherits:
Object
  • Object
show all
Defined in:
lib/archaeo/archive_search.rb

Overview

Full-text search across archived snapshots.

Fetches snapshots from CDX, downloads their content, and searches for the given query string. Returns matches with surrounding context for each hit.

Constant Summary collapse

CONTEXT_RADIUS =
80

Instance Method Summary collapse

Constructor Details

#initialize(cdx_api: CdxApi.new, fetcher: Fetcher.new) ⇒ ArchiveSearch

Returns a new instance of ArchiveSearch.



31
32
33
34
# File 'lib/archaeo/archive_search.rb', line 31

def initialize(cdx_api: CdxApi.new, fetcher: Fetcher.new)
  @cdx = cdx_api
  @fetcher = fetcher
end

Instance Method Details

#search(url, query:, from: nil, to: nil, max_results: nil, case_sensitive: false) ⇒ Object



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/archaeo/archive_search.rb', line 36

def search(url, query:, from: nil, to: nil,
           max_results: nil, case_sensitive: false)
  if query.nil? || query.empty?
    raise ArgumentError,
          "query must not be empty"
  end

  url = UrlNormalizer.normalize(url)
  opts = build_options(from, to)

  snapshots = @cdx.snapshots(url, **opts)
    .select { |s| s.success? && s.mimetype.to_s.include?("text") }
    .to_a

  find_matches(snapshots, query, case_sensitive, max_results)
end