Class: Html2rss::AutoSource::Scraper::LinkHeuristics::HrefExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/auto_source/scraper/link_heuristics.rb

Overview

Extracts a normalized href from a Nokogiri anchor or raw href value.

Constant Summary collapse

HREF_BASE_PATTERN =

Regexp to capture everything before the first ‘#’

/\A([^#]*)/

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(anchor_or_href) ⇒ HrefExtractor

Returns a new instance of HrefExtractor.

Parameters:

  • anchor_or_href (Nokogiri::XML::Element, String, #to_s)

    anchor element or href-like value



56
57
58
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 56

def initialize(anchor_or_href)
  @anchor_or_href = anchor_or_href
end

Class Method Details

.call(anchor_or_href) ⇒ String?

Returns href without fragment, or nil when blank.

Parameters:

  • anchor_or_href (Nokogiri::XML::Element, String, #to_s)

    anchor element or href-like value

Returns:

  • (String, nil)

    href without fragment, or nil when blank



53
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 53

def self.call(anchor_or_href) = new(anchor_or_href).call

Instance Method Details

#callString?

Returns href without fragment, or nil when blank.

Returns:

  • (String, nil)

    href without fragment, or nil when blank



61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 61

def call
  href = case @anchor_or_href
         when Nokogiri::XML::Node
           @anchor_or_href['href']
         else
           @anchor_or_href
         end

  return unless href

  # Extract base part before # and strip whitespace
  base = href.to_s[HREF_BASE_PATTERN, 1].strip
  base unless base.empty?
end