Module: Relaton::Gb::GbScraper

Extended by:: Scraper

Defined in:: lib/relaton/gb/gb_scraper.rb

Overview

National standard scraper.

Constant Summary collapse

SEARCH_URL =

"https://openstd.samr.gov.cn/bzgk/gb/std_list"

DOC_URL =

"http://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno="

Constants included from Scraper

Scraper::STAGES

Class Method Summary collapse

.agent ⇒ Object
.scrape_doc(hit) ⇒ RelatonGb::GbBibliographicItem
.scrape_page(text) ⇒ RelatonGb::HitCollection

Methods included from Scraper

create_org_name, get_contributors, get_docid, get_status, get_titles, scrapped_data

Class Method Details

.agent ⇒ `Object`



35
36
37

# File 'lib/relaton/gb/gb_scraper.rb', line 35

def agent
  @agent ||= Mechanize.new
end

.scrape_doc(hit) ⇒ `RelatonGb::GbBibliographicItem`

Parameters:

hit (RelatonGb::Hit) —

standard’s page id

Returns:

(RelatonGb::GbBibliographicItem)

# File 'lib/relaton/gb/gb_scraper.rb', line 41

def scrape_doc(hit)
  src = DOC_URL + hit.pid
  doc = agent.get src
  ItemData.new(**scrapped_data(doc, src, hit))
rescue Mechanize::Error => e
  raise Relaton::RequestError, e.message
end

.scrape_page(text) ⇒ `RelatonGb::HitCollection`

Parameters:

text (Strin) —

code of standard for serarch

Returns:

(RelatonGb::HitCollection)

# File 'lib/relaton/gb/gb_scraper.rb', line 18

def scrape_page(text) # rubocop:disable Metrics/MethodLength, Metrics/AbcSize
  doc = agent.get("#{SEARCH_URL}?p.p2=#{CGI.escape(text)}")
  hits = doc.xpath(
    "//table[contains(@class, 'result_list')]/tbody[2]/tr",
  ).map do |h|
    ref = h.at "./td[2]/a"
    pid = ref[:onclick].match(/[0-9A-F]+/).to_s
    status = h.at("./td[7]").text.strip
    rdate = h.at("./td[8]").text.strip
    Hit.new pid: pid, docref: ref.text, scraper: self,
            release_date: rdate, status: status
  end
  HitCollection.new hits.sort_by(&:release_date).reverse
rescue Mechanize::Error => e
  raise Relaton::RequestError, e.message
end

Module: Relaton::Gb::GbScraper

Overview

Constant Summary collapse

Constants included from Scraper

Class Method Summary collapse

Methods included from Scraper

Class Method Details

.agent ⇒ Object

.scrape_doc(hit) ⇒ RelatonGb::GbBibliographicItem

.scrape_page(text) ⇒ RelatonGb::HitCollection

.agent ⇒ `Object`

.scrape_doc(hit) ⇒ `RelatonGb::GbBibliographicItem`

.scrape_page(text) ⇒ `RelatonGb::HitCollection`