Module: Relaton::W3c::SafeRealize

Included in:
DataFetcher, DataParser
Defined in:
lib/relaton/w3c/safe_realize.rb

Overview

Thin wrapper over lutaml-hal’s ‘realize`. Successful objects are cached by w3c_api (it caches realized objects keyed by URL), so this only remembers resources that failed terminally and returns nil for them — so one broken link doesn’t abort the crawl and isn’t re-fetched on every reference.

Transient failures are retried upstream: w3c_api retries HTTP 403 (the W3C rate-limit signal) and connection/timeout errors, and lutaml-hal retries 429 and 5xx. By the time an error surfaces here it is terminal.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.skippedObject



21
22
23
# File 'lib/relaton/w3c/safe_realize.rb', line 21

def self.skipped
  @skipped
end

Instance Method Details

#realize(obj) ⇒ Object



25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/relaton/w3c/safe_realize.rb', line 25

def realize(obj)
  href = resolve_href(obj)
  return nil if SafeRealize.skipped.key?(href)

  obj.realize
rescue Lutaml::Hal::ConnectionError, Lutaml::Hal::TimeoutError, Faraday::Error, Net::OpenTimeout => e
  # Network-level failure (already retried by w3c_api). The resource itself
  # is fine, so don't skip it permanently — a later reference can try again.
  Util.warn "Failed to realize object: #{href}, error: #{e.message}"
  nil
rescue Lutaml::Hal::NotFoundError
  Util.warn "Object not found: #{href}"
  SafeRealize.skipped[href] = true
  nil
rescue Lutaml::Hal::Error => e
  # Definitive upstream error (403 rate-limit, 5xx, 429) already retried by
  # w3c_api / lutaml-hal. Skip the broken/unavailable resource rather than
  # re-hitting it for every link that references it.
  Util.warn "Skipping #{href}, upstream error after retries: #{e.message}"
  SafeRealize.skipped[href] = true
  nil
end