Class: Html2rss::AutoSource::Scraper::LinkHeuristics::HrefExtractor
- Inherits:
-
Object
- Object
- Html2rss::AutoSource::Scraper::LinkHeuristics::HrefExtractor
- Defined in:
- lib/html2rss/auto_source/scraper/link_heuristics.rb
Overview
Extracts a normalized href from a Nokogiri anchor or raw href value.
Constant Summary collapse
- HREF_BASE_PATTERN =
Regexp to capture everything before the first ‘#’
/\A([^#]*)/
Class Method Summary collapse
-
.call(anchor_or_href) ⇒ String?
Href without fragment, or nil when blank.
Instance Method Summary collapse
-
#call ⇒ String?
Href without fragment, or nil when blank.
-
#initialize(anchor_or_href) ⇒ HrefExtractor
constructor
A new instance of HrefExtractor.
Constructor Details
#initialize(anchor_or_href) ⇒ HrefExtractor
Returns a new instance of HrefExtractor.
56 57 58 |
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 56 def initialize(anchor_or_href) @anchor_or_href = anchor_or_href end |
Class Method Details
.call(anchor_or_href) ⇒ String?
Returns href without fragment, or nil when blank.
53 |
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 53 def self.call(anchor_or_href) = new(anchor_or_href).call |
Instance Method Details
#call ⇒ String?
Returns href without fragment, or nil when blank.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 61 def call href = case @anchor_or_href when Nokogiri::XML::Node @anchor_or_href['href'] else @anchor_or_href end return unless href # Extract base part before # and strip whitespace base = href.to_s[HREF_BASE_PATTERN, 1].strip base unless base.empty? end |