Class: Relaton::Calconnect::Scraper

Inherits:
Object
  • Object
show all
Includes:
Core::ArrayWrapper, Core::HashKeysSymbolizer
Defined in:
lib/relaton/calconnect/scraper.rb

Constant Summary collapse

RELEASE_ASSET_URL =
"https://github.com/%<owner>s/%<repo>s/releases/download/" \
"%<tag>s/%<asset_stem>s.zip".freeze

Instance Method Summary collapse

Constructor Details

#initialize(errors = {}) ⇒ Scraper

Returns a new instance of Scraper.

Parameters:

  • errors (Hash) (defaults to: {})

    error tracking hash



17
18
19
# File 'lib/relaton/calconnect/scraper.rb', line 17

def initialize(errors = {})
  @errors = errors
end

Instance Method Details

#parse_page(hit) ⇒ Relaton::Calconnect::ItemData

Parse an aggregate-index document entry: download the per-document GitHub release zip, extract the RXL, and parse it into a bibitem.

Parameters:

  • hit (Hash)

    document entry from /cc/index.json

Returns:



29
30
31
32
33
34
# File 'lib/relaton/calconnect/scraper.rb', line 29

def parse_page(hit)
  zip_data = download_release_zip hit
  rxl = extract_rxl zip_data, rxl_filename(hit)
  xml = normalize_rxl rxl
  Item.from_xml xml
end