Class: Archaeo::ArchiveSearch
- Inherits:
-
Object
- Object
- Archaeo::ArchiveSearch
- Defined in:
- lib/archaeo/archive_search.rb
Overview
Full-text search across archived snapshots.
Fetches snapshots from CDX, downloads their content, and searches for the given query string. Returns matches with surrounding context for each hit.
Constant Summary collapse
- CONTEXT_RADIUS =
80
Instance Method Summary collapse
-
#initialize(cdx_api: CdxApi.new, fetcher: Fetcher.new) ⇒ ArchiveSearch
constructor
A new instance of ArchiveSearch.
- #search(url, query:, from: nil, to: nil, max_results: nil, case_sensitive: false) ⇒ Object
Constructor Details
#initialize(cdx_api: CdxApi.new, fetcher: Fetcher.new) ⇒ ArchiveSearch
Returns a new instance of ArchiveSearch.
31 32 33 34 |
# File 'lib/archaeo/archive_search.rb', line 31 def initialize(cdx_api: CdxApi.new, fetcher: Fetcher.new) @cdx = cdx_api @fetcher = fetcher end |
Instance Method Details
#search(url, query:, from: nil, to: nil, max_results: nil, case_sensitive: false) ⇒ Object
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/archaeo/archive_search.rb', line 36 def search(url, query:, from: nil, to: nil, max_results: nil, case_sensitive: false) if query.nil? || query.empty? raise ArgumentError, "query must not be empty" end url = UrlNormalizer.normalize(url) opts = (from, to) snapshots = @cdx.snapshots(url, **opts) .select { |s| s.success? && s.mimetype.to_s.include?("text") } .to_a find_matches(snapshots, query, case_sensitive, max_results) end |