Class: Kataba::Fetcher

Inherits:
Object
  • Object
show all
Defined in:
lib/kataba/fetcher.rb

Overview

Fetches a schema body, recovering from LoC-shaped delivery quirks that a verbatim URI.open would surface as cache-poisoning errors:

- 5xx (Cloudflare bot-management 503/529, origin overload):
  retry once on the alternate scheme.
- same-origin HTTPS->HTTP 3xx: follow. open-uri refuses all
  scheme downgrades; we relax to same-origin because the
  consumer already trusted this host by putting it in their
  schemaLocation. Cross-origin downgrades stay refused — that's
  the actual DNS-redirect attack vector.
- /mods/xml.xsd: rewrite to /standards/mods/xml.xsd before the
  first request. The /mods/xml.xsd path is what every mods-3-N.xsd
  embeds in its xs:import, but LoC only serves the file from
  /standards/mods/xml.xsd today — the embedded path bounces
  HTTPS->HTTP and 503s. Applied here so transitive xs:import
  resolution benefits, not just top-level fetch_schema calls.

mirror_list remains the consumer’s backstop for URI-identity changes (path renames, host moves) that no delivery heuristic can rescue.

Defined Under Namespace

Classes: FetchError

Constant Summary collapse

MAX_REDIRECTS =
5
PATH_REWRITES =

Map from a path that’s embedded in published schemas but no longer serves the file, to the path that does. xml.xsd: every mods-3-N.xsd imports /mods/xml.xsd, but only /standards/mods/xml.xsd serves it.

{
  '/mods/xml.xsd' => '/standards/mods/xml.xsd',
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(uri) ⇒ Fetcher

Returns a new instance of Fetcher.



37
38
39
# File 'lib/kataba/fetcher.rb', line 37

def initialize(uri)
  @original_uri = uri
end

Instance Method Details

#fetchObject



41
42
43
# File 'lib/kataba/fetcher.rb', line 41

def fetch
  attempt(normalize(@original_uri), alt_scheme_retry: true)
end