Module: Arxivarius::WebSource
- Defined in:
- lib/arxivarius/web_source.rb
Overview
Builds a Paper by scraping the public arXiv abstract page (arxiv.org/abs/<id>), used as an alternative to the Atom API when it is rate limited. Reads the Highwire ‘citation_*` <meta> tags plus a few body elements. Every field the API exposes is recovered except author affiliations, which are not present on the abstract page.
Constant Summary collapse
- ABS_URL =
'https://arxiv.org/abs/'- PDF_URL =
'https://arxiv.org/pdf/'- USER_AGENT =
"arxivarius/#{Arxivarius::VERSION} " \ '(+https://github.com/antlypls/arxivarius)'.freeze
Class Method Summary collapse
Class Method Details
.fetch(id) ⇒ Object
16 17 18 19 20 21 22 23 24 |
# File 'lib/arxivarius/web_source.rb', line 16 def fetch(id) doc = ::Nokogiri::HTML(fetch_html(id)) # No citation_title means the page is not an abstract page (e.g. an # arXiv "identifier not recognized" page served with a 200 status). return nil unless (doc, 'citation_title') build_paper(doc) end |