Class: Relaton::Iso::DataFetcher
- Inherits:
-
Core::DataFetcher
- Object
- Core::DataFetcher
- Relaton::Iso::DataFetcher
- Defined in:
- lib/relaton/iso/data_fetcher.rb
Overview
Fetch ISO documents from the ISO Open Data programme bulk JSONL (see www.iso.org/open-data.html) and write each one as a YAML file under ‘@output`.
The upstream feed has no delta API, so any run that proceeds re-downloads and re-ingests the whole feed. There is therefore no value in a partial update: a run either skips entirely or does a full replace. ‘source` modes (matching the `Relaton::Core::DataFetcher.fetch` arg):
-
‘“iso-open-data”` (default) - skip when the feed’s ‘Last-Modified` is unchanged; otherwise wipe `@output` + index and rebuild from scratch.
-
‘“iso-open-data-all”` - the same full rebuild, but ignore the `Last-Modified` short-circuit and always run.
Wiping happens here, after the short-circuit decision, so ‘@output` and the index always mirror the current feed (records that have left it don’t linger as stale files or dangling index entries) without risking an empty tree on a skipped run. ‘#fetch` returns true when it rebuilt, false when it skipped, so callers can chain follow-up work (e.g. the pubid-v1 index).
Constant Summary collapse
- OPEN_DATA_URL =
"https://isopublicstorageprod.blob.core.windows.net/" \ "opendata/_latest/iso_deliverables_metadata/json/" \ "iso_deliverables_metadata.jsonl".freeze
- TC_DATA_URL =
"https://isopublicstorageprod.blob.core.windows.net/" \ "opendata/_latest/iso_technical_committees/json/" \ "iso_technical_committees.jsonl".freeze
- LAST_MODIFIED_FILE =
"last_modified.txt".freeze
- MAX_DOWNLOAD_RETRIES =
4- RETRY_BACKOFF_BASE =
30
Instance Method Summary collapse
Instance Method Details
#fetch(source = nil) ⇒ Object
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/relaton/iso/data_fetcher.rb', line 52 def fetch(source = nil) @source = source || "iso-open-data" @full_refresh = @source == "iso-open-data-all" Util.info "Fetching ISO Open Data (mode: #{@source})..." last_modified = fetch_last_modified return false if up_to_date?(last_modified) reset_output jsonl_path = download_dataset ref_index, amend_index, date_index = build_ref_index(jsonl_path) tc_index = build_tc_index ingest_records(jsonl_path, ref_index, tc_index, amend_index, date_index) merge_static_files index.save save_last_modified(last_modified) report_errors true rescue StandardError => e Util.error "#{e.}\n#{e.backtrace.join("\n")}" raise end |
#index ⇒ Object
46 47 48 49 50 |
# File 'lib/relaton/iso/data_fetcher.rb', line 46 def index @index ||= Relaton::Index.find_or_create( :iso, file: "#{INDEXFILE}.yaml", pubid_class: ::Pubid::Iso::Identifier, ) end |
#log_error(msg) ⇒ Object
42 43 44 |
# File 'lib/relaton/iso/data_fetcher.rb', line 42 def log_error(msg) Util.error msg end |