Class: Html2rss::Articles::Deduplicator
- Inherits:
-
Object
- Object
- Html2rss::Articles::Deduplicator
- Defined in:
- lib/html2rss/articles/deduplicator.rb
Overview
Deduplicates a list of articles while preserving their original order.
The deduplicator prefers each article’s URL (combined with its ID when available) to determine uniqueness. When no URL is present, it falls back to the article ID, then to the GUID enriched with title and description metadata. If none of these identifiers are available it defaults to the article object’s hash to preserve the original entry.
Instance Method Summary collapse
-
#call ⇒ Array<Html2rss::RssBuilder::Article>
Returns the list of unique articles, preserving the order of the original collection and keeping the first occurrence of a duplicate.
-
#initialize(articles) ⇒ Deduplicator
constructor
A new instance of Deduplicator.
Constructor Details
#initialize(articles) ⇒ Deduplicator
Returns a new instance of Deduplicator.
20 21 22 23 24 |
# File 'lib/html2rss/articles/deduplicator.rb', line 20 def initialize(articles) raise ArgumentError, 'articles must be provided' unless articles @articles = articles end |
Instance Method Details
#call ⇒ Array<Html2rss::RssBuilder::Article>
Returns the list of unique articles, preserving the order of the original collection and keeping the first occurrence of a duplicate.
30 31 32 33 34 35 36 37 |
# File 'lib/html2rss/articles/deduplicator.rb', line 30 def call seen = Set.new articles.filter do |article| fingerprint = deduplication_fingerprint_for(article) || article.hash seen.add?(fingerprint) end end |