Class: Html2rss::HtmlExtractor::HeadingExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/html_extractor/heading_extractor.rb

Overview

HeadingExtractor identifies and returns the best heading element within a container.

Constant Summary collapse

HEADING_TAGS =

Heading tags used to prioritize title extraction.

HtmlExtractor::HEADING_TAGS

Class Method Summary collapse

Class Method Details

.call(article_tag, fallback_anchorless:, selected_anchor:) ⇒ Nokogiri::XML::Node?

Returns the heading node, if found.

Parameters:

  • article_tag (Nokogiri::XML::Element)

    container node

  • fallback_anchorless (Boolean)

    whether to use fallback search

  • selected_anchor (Nokogiri::XML::Node, nil)

    anchor element

Returns:

  • (Nokogiri::XML::Node, nil)

    the heading node, if found



17
18
19
20
21
22
23
24
# File 'lib/html2rss/html_extractor/heading_extractor.rb', line 17

def call(, fallback_anchorless:, selected_anchor:)
  tags = .css(HEADING_TAGS.join(','))
  if tags.any?
    select_best_heading(tags)
  elsif fallback_anchorless && selected_anchor.nil?
    fallback_heading()
  end
end