Module: Arxivarius::WebSource

Defined in:
lib/arxivarius/web_source.rb

Overview

Builds a Paper by scraping the public arXiv abstract page (arxiv.org/abs/<id>), used as an alternative to the Atom API when it is rate limited. Reads the Highwire ‘citation_*` <meta> tags plus a few body elements. Every field the API exposes is recovered except author affiliations, which are not present on the abstract page.

Constant Summary collapse

ABS_URL =
'https://arxiv.org/abs/'
PDF_URL =
'https://arxiv.org/pdf/'
USER_AGENT =
"arxivarius/#{Arxivarius::VERSION} " \
'(+https://github.com/antlypls/arxivarius)'.freeze

Class Method Summary collapse

Class Method Details

.fetch(id) ⇒ Object



16
17
18
19
20
21
22
23
24
# File 'lib/arxivarius/web_source.rb', line 16

def fetch(id)
  doc = ::Nokogiri::HTML(fetch_html(id))

  # No citation_title means the page is not an abstract page (e.g. an
  # arXiv "identifier not recognized" page served with a 200 status).
  return nil unless meta(doc, 'citation_title')

  build_paper(doc)
end