Class: Oddb2xml::Chapter70xtractor
- Defined in:
- lib/oddb2xml/chapter_70_hack.rb
Constant Summary collapse
- LIMITATIONS =
{ "L" => "Kostenübernahme nur nach vorgängiger allergologischer Abklärung.", "L1" => "Eine Flasche zu 20 ml Urtinktur einer bestimmten Pflanze pro Monat.", "L1, L2" => "Eine Flasche zu 20 ml Urtinktur einer bestimmten Pflanze pro Monat. Für Aesculus, Carduus Marianus, Ginkgo, Hedera helix, Hypericum perforatum, Lavandula, Rosmarinus officinalis, Taraxacum officinale.", "L3" => "Alle drei Monate wird eine Verordnung/Originalpackung pro Mittel vergütet." }
Instance Attribute Summary
Attributes inherited from Extractor
Class Method Summary collapse
- .items ⇒ Object
- .parse(html_file = "http://www.spezialitaetenliste.ch/varia_De.htm") ⇒ Object
- .parse_td(elem) ⇒ Object
Methods inherited from Extractor
Constructor Details
This class inherits a constructor from Oddb2xml::Extractor
Class Method Details
.items ⇒ Object
31 32 33 |
# File 'lib/oddb2xml/chapter_70_hack.rb', line 31 def self.items @@items end |
.parse(html_file = "http://www.spezialitaetenliste.ch/varia_De.htm") ⇒ Object
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/oddb2xml/chapter_70_hack.rb', line 35 def self.parse(html_file = "http://www.spezialitaetenliste.ch/varia_De.htm") data = Hash.new { |h, k| h[k] = [] } Ox. = { mode: :generic, effort: :tolerant, smart: true } parsed = Ox.load(Oddb2xml.uri_open(html_file).read, mode: :hash_no_attrs) res = parsed.values.first["body"] if parsed.respond_to?(:values) && parsed.values.first.is_a?(Hash) result = [] idx = 0 @@items = {} unless res.respond_to?(:values) warn "Chapter70: varia page has no <body> to parse (got #{res.class}); skipping" return [] end # The varia page used to expose the chapter-70 table as static HTML. It # is now a JavaScript single-page app whose <body> only contains an empty # <sl-root> shell, so there is no data table to walk. Each entry yielded # by iterating a Hash is a [tag, content] pair; a real row carries a Hash # of cells, while stray nodes (e.g. <script>) carry an Array. Skip the # latter so a redesigned/empty page degrades to "no items" instead of # raising NoMethodError. See GitHub issue #118. rows = res.values.last unless rows.respond_to?(:each) warn "Chapter70: varia page has no parseable rows (got #{rows.class}); skipping" return [] end rows.each do |item| cells = item.is_a?(Hash) ? item.values.first : nil next unless cells.respond_to?(:each) cells.each do |sub_elem| what = Chapter70xtractor.parse_td(sub_elem) idx += 1 puts "#{idx}: xx #{what}" if $VERBOSE result << what end end result2 = result.find_all { |x| (x.is_a?(Array) && x.first.is_a?(String)) && x.first.to_i > 100 } warn "Chapter70: varia page yielded no chapter-70 products; skipping" if result2.empty? result2.each do |entry| data = {} pharma_code = entry.first ean13 = (Oddb2xml::FAKE_GTIN_START + pharma_code.to_s) german = if entry[2].encoding.to_s.eql?("ASCII-8BIT") CGI.unescape(entry[2].force_encoding("ISO-8859-1")) else entry[2] end @@items[ean13] = { data_origin: "Chapter70", line: entry.join(","), ean13: ean13, description: german, quantity: entry[3], pharmacode: pharma_code, pub_price: entry[4], limitation: entry[5], type: :pharma } end result2 end |
.parse_td(elem) ⇒ Object
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# File 'lib/oddb2xml/chapter_70_hack.rb', line 7 def self.parse_td(elem) begin values = elem.is_a?(Array) ? elem : elem.values res = values.flatten.collect { |x| if x.nil? nil else x.is_a?(Hash) ? x.values : x.gsub(/\r\n/, "").strip end } puts "parse_td returns: #{res}" if $VERBOSE rescue => exc puts "Unable to pars #{elem} #{exc}" # binding.pry return nil end res.flatten # .join("\t") end |