Module: Rospatent::PatentParser
- Defined in:
- lib/rospatent/patent_parser.rb
Overview
Module for parsing patent documents’ XML content into structured formats
Class Method Summary collapse
-
.parse_abstract(patent_data, format: :text, language: "ru") ⇒ String?
Extract and parse the abstract content from a patent document.
-
.parse_description(patent_data, format: :text, language: "ru") ⇒ String, ...
Extract and parse the description content from a patent document.
Class Method Details
.parse_abstract(patent_data, format: :text, language: "ru") ⇒ String?
Extract and parse the abstract content from a patent document
15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# File 'lib/rospatent/patent_parser.rb', line 15 def self.parse_abstract(patent_data, format: :text, language: "ru") return nil unless patent_data && patent_data["abstract"] && patent_data["abstract"][language] abstract_xml = patent_data["abstract"][language] case format when :html # Extract the inner HTML content extract_inner_html(abstract_xml) else # Extract plain text extract_text_content(abstract_xml) end end |
.parse_description(patent_data, format: :text, language: "ru") ⇒ String, ...
Extract and parse the description content from a patent document
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/rospatent/patent_parser.rb', line 41 def self.parse_description(patent_data, format: :text, language: "ru") unless patent_data && patent_data["description"] && patent_data["description"][language] return nil end description_xml = patent_data["description"][language] case format when :html # Extract the inner HTML content extract_inner_html(description_xml) when :sections # Split the description into numbered sections extract_description_sections(description_xml) else # Extract plain text extract_text_content(description_xml) end end |