Module: Rospatent::PatentParser

Defined in:
lib/rospatent/patent_parser.rb

Overview

Module for parsing patent documents’ XML content into structured formats

Class Method Summary collapse

Class Method Details

.parse_abstract(patent_data, format: :text, language: "ru") ⇒ String?

Extract and parse the abstract content from a patent document

Examples:

Get plain text abstract

abstract = PatentParser.parse_abstract(patent_doc)

Get HTML abstract in English

abstract_html = PatentParser.parse_abstract(patent_doc, format: :html, language: "en")

Parameters:

  • patent_data (Hash)

    The patent document data returned by Client#patent method

  • format (Symbol) (defaults to: :text)

    The desired output format (:text or :html)

  • language (String) (defaults to: "ru")

    The language code (e.g., “ru”, “en”)

Returns:

  • (String, nil)

    The parsed abstract content in the requested format or nil if not found



15
16
17
18
19
20
21
22
23
24
25
26
27
28
# File 'lib/rospatent/patent_parser.rb', line 15

def self.parse_abstract(patent_data, format: :text, language: "ru")
  return nil unless patent_data && patent_data["abstract"] && patent_data["abstract"][language]

  abstract_xml = patent_data["abstract"][language]

  case format
  when :html
    # Extract the inner HTML content
    extract_inner_html(abstract_xml)
  else
    # Extract plain text
    extract_text_content(abstract_xml)
  end
end

.parse_description(patent_data, format: :text, language: "ru") ⇒ String, ...

Extract and parse the description content from a patent document

Examples:

Get plain text description

description = PatentParser.parse_description(patent_doc)

Get HTML description

description_html = PatentParser.parse_description(patent_doc, format: :html)

Get description split into sections

sections = PatentParser.parse_description(patent_doc, format: :sections)

Parameters:

  • patent_data (Hash)

    The patent document data returned by Client#patent method

  • format (Symbol) (defaults to: :text)

    The desired output format (:text, :html, or :sections)

  • language (String) (defaults to: "ru")

    The language code (e.g., “ru”, “en”)

Returns:

  • (String, Array, nil)

    The parsed description content in the requested format or nil if not found



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/rospatent/patent_parser.rb', line 41

def self.parse_description(patent_data, format: :text, language: "ru")
  unless patent_data && patent_data["description"] && patent_data["description"][language]
    return nil
  end

  description_xml = patent_data["description"][language]

  case format
  when :html
    # Extract the inner HTML content
    extract_inner_html(description_xml)
  when :sections
    # Split the description into numbered sections
    extract_description_sections(description_xml)
  else
    # Extract plain text
    extract_text_content(description_xml)
  end
end