Class: Html2rss::AutoSource::Scraper::LinkHeuristics::LeadingSegments
- Inherits:
-
Object
- Object
- Html2rss::AutoSource::Scraper::LinkHeuristics::LeadingSegments
- Defined in:
- lib/html2rss/auto_source/scraper/link_heuristics.rb
Overview
Classifies route context before the final segment.
Instance Method Summary collapse
-
#all_junk? ⇒ Boolean
True when every leading segment is utility chrome.
-
#initialize(segments) ⇒ LeadingSegments
constructor
A new instance of LeadingSegments.
-
#trusted_post_context? ⇒ Boolean
True when leading segments provide article context.
Constructor Details
#initialize(segments) ⇒ LeadingSegments
Returns a new instance of LeadingSegments.
348 349 350 |
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 348 def initialize(segments) @segments = segments[0...-1] end |
Instance Method Details
#all_junk? ⇒ Boolean
Returns true when every leading segment is utility chrome.
353 354 355 356 357 |
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 353 def all_junk? junk_segments = PathClassifier::SEGMENT_SETS.fetch(:high_confidence_junk) @segments.any? && @segments.all? { |segment| junk_segments.include?(segment) } end |
#trusted_post_context? ⇒ Boolean
Returns true when leading segments provide article context.
360 361 362 363 364 365 366 367 368 369 |
# File 'lib/html2rss/auto_source/scraper/link_heuristics.rb', line 360 def trusted_post_context? content_segments = PathClassifier::SEGMENT_SETS.fetch(:content) context_segments = PathClassifier::SEGMENT_SETS.fetch(:deep_post_context) @segments.any? do |segment| content_segments.include?(segment) || segment.match?(PathClassifier::YEARISH_SEGMENT) || context_segments.include?(segment) end end |