Class: Relaton::Iso::DataFetcher

Inherits:
Core::DataFetcher
  • Object
show all
Defined in:
lib/relaton/iso/data_fetcher.rb

Overview

Fetch ISO documents from the ISO Open Data programme bulk JSONL (see www.iso.org/open-data.html) and write each one as a YAML file under ‘@output`.

The upstream feed has no delta API, so any run that proceeds re-downloads and re-ingests the whole feed. There is therefore no value in a partial update: a run either skips entirely or does a full replace. ‘source` modes (matching the `Relaton::Core::DataFetcher.fetch` arg):

  • ‘“iso-open-data”` (default) - skip when the feed’s ‘Last-Modified` is unchanged; otherwise wipe `@output` + index and rebuild from scratch.

  • ‘“iso-open-data-all”` - the same full rebuild, but ignore the `Last-Modified` short-circuit and always run.

Wiping happens here, after the short-circuit decision, so ‘@output` and the index always mirror the current feed (records that have left it don’t linger as stale files or dangling index entries) without risking an empty tree on a skipped run. ‘#fetch` returns true when it rebuilt, false when it skipped, so callers can chain follow-up work (e.g. the pubid-v1 index).

Constant Summary collapse

OPEN_DATA_URL =
"https://isopublicstorageprod.blob.core.windows.net/" \
"opendata/_latest/iso_deliverables_metadata/json/" \
"iso_deliverables_metadata.jsonl".freeze
TC_DATA_URL =
"https://isopublicstorageprod.blob.core.windows.net/" \
"opendata/_latest/iso_technical_committees/json/" \
"iso_technical_committees.jsonl".freeze
LAST_MODIFIED_FILE =
"last_modified.txt".freeze
MAX_DOWNLOAD_RETRIES =
4
RETRY_BACKOFF_BASE =
30

Instance Method Summary collapse

Instance Method Details

#fetch(source = nil) ⇒ Object



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/relaton/iso/data_fetcher.rb', line 52

def fetch(source = nil)
  @source = source || "iso-open-data"
  @full_refresh = @source == "iso-open-data-all"

  Util.info "Fetching ISO Open Data (mode: #{@source})..."
  last_modified = fetch_last_modified
  return false if up_to_date?(last_modified)

  reset_output
  jsonl_path = download_dataset
  ref_index, amend_index, date_index = build_ref_index(jsonl_path)
  tc_index = build_tc_index
  ingest_records(jsonl_path, ref_index, tc_index, amend_index, date_index)
  merge_static_files

  index.save
  save_last_modified(last_modified)
  report_errors
  true
rescue StandardError => e
  Util.error "#{e.message}\n#{e.backtrace.join("\n")}"
  raise
end

#indexObject



46
47
48
49
50
# File 'lib/relaton/iso/data_fetcher.rb', line 46

def index
  @index ||= Relaton::Index.find_or_create(
    :iso, file: "#{INDEXFILE}.yaml", pubid_class: ::Pubid::Iso::Identifier,
  )
end

#log_error(msg) ⇒ Object



42
43
44
# File 'lib/relaton/iso/data_fetcher.rb', line 42

def log_error(msg)
  Util.error msg
end