Class: Relaton::Iso::DataFetcher
- Inherits:
-
Core::DataFetcher
- Object
- Core::DataFetcher
- Relaton::Iso::DataFetcher
- Defined in:
- lib/relaton/iso/data_fetcher.rb
Overview
Fetch all the documents from ISO website.
Instance Method Summary collapse
-
#fetch ⇒ void
Go through all ICS and fetch all documents.
- #index ⇒ Object
-
#iso_queue ⇒ Relaton::Iso::Queue
ISO has too many docs.
- #log_error(msg) ⇒ Object
- #mutex ⇒ Object
-
#queue ⇒ Queue
The queue is used to store the ICS page paths beeing fetching in the current run.
Instance Method Details
#fetch ⇒ void
This method returns an undefined value.
Go through all ICS and fetch all documents.
46 47 48 49 50 51 52 53 54 55 |
# File 'lib/relaton/iso/data_fetcher.rb', line 46 def fetch # rubocop:disable Metrics/AbcSize Util.info "Scrapping ICS pages..." fetch_ics Util.info "(#{Time.now}) Scrapping documents..." fetch_docs iso_queue.save # index.sort! { |a, b| compare_docids a, b } index.save report_errors end |
#index ⇒ Object
26 27 28 |
# File 'lib/relaton/iso/data_fetcher.rb', line 26 def index @index ||= Relaton::Index.find_or_create :iso, file: "#{INDEXFILE}.yaml" end |
#iso_queue ⇒ Relaton::Iso::Queue
ISO has too many docs. GHA can’t get them all in one run. So, we need to split the process into several runs. The iso_queue is used to store the doc paths that have not been fetched.
37 38 39 |
# File 'lib/relaton/iso/data_fetcher.rb', line 37 def iso_queue @iso_queue ||= Relaton::Iso::Queue.new end |
#log_error(msg) ⇒ Object
22 23 24 |
# File 'lib/relaton/iso/data_fetcher.rb', line 22 def log_error(msg) Util.error msg end |
#mutex ⇒ Object
18 19 20 |
# File 'lib/relaton/iso/data_fetcher.rb', line 18 def mutex @mutex ||= Mutex.new end |
#queue ⇒ Queue
The queue is used to store the ICS page paths beeing fetching in the current run.
14 15 16 |
# File 'lib/relaton/iso/data_fetcher.rb', line 14 def queue @queue ||= ::Queue.new end |