Class: Html2rss::RssBuilder::Article
- Inherits:
-
Object
- Object
- Html2rss::RssBuilder::Article
- Includes:
- Comparable, Enumerable
- Defined in:
- lib/html2rss/rss_builder/article.rb
Overview
Article is a simple data object representing an article extracted from a page. It is enumerable and responds to all keys specified in PROVIDED_KEYS. rubocop:disable Metrics/ClassLength
Constant Summary collapse
- PROVIDED_KEYS =
Allowed article attributes accepted by the value object constructor.
%i[id title description url image author guid published_at enclosures categories scraper].freeze
- DEDUP_FINGERPRINT_SEPARATOR =
Separator used to build deterministic deduplication fingerprints.
'#!/'- NOT_SET =
Sentinel object used to pre-initialize instance variables in the constructor. This ensures all Article instances share the exact same object shape (Ruby 3.3+ optimization), preventing performance warnings and slower instance variable access due to shape transitions when attributes are lazily/conditionally accessed in different sequences.
Object.new.freeze
Instance Method Summary collapse
-
#<=>(other) ⇒ Integer?
Comparison result for compatible Article values.
- #author ⇒ String?
-
#categories ⇒ Array<String>
Normalized, unique category names.
-
#deduplication_fingerprint ⇒ String, Integer
Returns a deterministic fingerprint used to detect duplicate articles.
-
#description ⇒ String
Rendered article description.
-
#each {|key, value| ... } ⇒ Enumerator
If no block is given.
- #enclosure ⇒ Html2rss::RssBuilder::Enclosure?
-
#enclosures ⇒ Array<Html2rss::RssBuilder::Enclosure>
Normalized enclosure objects.
-
#guid ⇒ String
Generates a unique identifier based on the URL and ID using CRC32.
-
#id ⇒ String?
Stable article identifier.
- #image ⇒ Url?
-
#initialize(**options) ⇒ Article
constructor
A new instance of Article.
-
#published_at ⇒ DateTime?
Parses and returns the published_at time.
-
#scraper ⇒ Class?
Scraper class that produced this article.
-
#title ⇒ String?
Article title.
- #url ⇒ Url?
-
#valid? ⇒ Boolean
Checks if the article is valid based on the presence of URL, ID, and either title or description.
Constructor Details
#initialize(**options) ⇒ Article
Returns a new instance of Article.
39 40 41 42 43 44 45 46 47 |
# File 'lib/html2rss/rss_builder/article.rb', line 39 def initialize(**) @to_h = .each_with_object({}) { |(k, v), h| h[k] = v.freeze if v }.freeze @description = @url = @image = @guid = @enclosures = @enclosure = @categories = @published_at = NOT_SET return unless (unknown_keys = .keys - PROVIDED_KEYS).any? Log.warn "Article: unknown keys found: #{unknown_keys.join(', ')}" end |
Instance Method Details
#<=>(other) ⇒ Integer?
Returns comparison result for compatible Article values.
167 168 169 170 171 |
# File 'lib/html2rss/rss_builder/article.rb', line 167 def <=>(other) return nil unless other.is_a?(Article) 0 if other.all? { |key, value| value == public_send(key) ? public_send(key) <=> value : false } end |
#author ⇒ String?
97 |
# File 'lib/html2rss/rss_builder/article.rb', line 97 def = blank_string_to_nil(@to_h[:author]) |
#categories ⇒ Array<String>
Returns normalized, unique category names.
139 140 141 142 143 144 145 146 147 |
# File 'lib/html2rss/rss_builder/article.rb', line 139 def categories return @categories unless @categories == NOT_SET @categories = @to_h[:categories].dup.to_a.tap do |categories| categories.map! { |category| category.to_s.strip } categories.reject!(&:empty?) categories.uniq! end end |
#deduplication_fingerprint ⇒ String, Integer
Returns a deterministic fingerprint used to detect duplicate articles.
111 112 113 |
# File 'lib/html2rss/rss_builder/article.rb', line 111 def deduplication_fingerprint dedup_from_url || dedup_from_id || dedup_from_guid || hash end |
#description ⇒ String
Returns rendered article description.
70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/html2rss/rss_builder/article.rb', line 70 def description return @description unless @description == NOT_SET @description = Rendering::DescriptionBuilder.new( base: @to_h[:description], title:, url:, enclosures:, image: ).call end |
#each {|key, value| ... } ⇒ Enumerator
Returns if no block is given.
57 58 59 60 61 |
# File 'lib/html2rss/rss_builder/article.rb', line 57 def each return enum_for(:each) unless block_given? PROVIDED_KEYS.each { |key| yield(key, public_send(key)) } end |
#enclosure ⇒ Html2rss::RssBuilder::Enclosure?
124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/html2rss/rss_builder/article.rb', line 124 def enclosure return @enclosure unless @enclosure == NOT_SET @enclosure = case (object = @to_h[:enclosures]&.first) when Hash Html2rss::RssBuilder::Enclosure.new(**object) when nil Html2rss::RssBuilder::Enclosure.new(url: image) if image else Log.warn "Article: unknown enclosure type: #{object.class}" nil end end |
#enclosures ⇒ Array<Html2rss::RssBuilder::Enclosure>
Returns normalized enclosure objects.
116 117 118 119 120 121 |
# File 'lib/html2rss/rss_builder/article.rb', line 116 def enclosures return @enclosures unless @enclosures == NOT_SET @enclosures = Array(@to_h[:enclosures]) .map { |enclosure| Html2rss::RssBuilder::Enclosure.new(**enclosure) } end |
#guid ⇒ String
Generates a unique identifier based on the URL and ID using CRC32.
101 102 103 104 105 |
# File 'lib/html2rss/rss_builder/article.rb', line 101 def guid return @guid unless @guid == NOT_SET @guid = Zlib.crc32(fetch_guid).to_s(36).encode('utf-8') end |
#id ⇒ String?
Returns stable article identifier.
64 |
# File 'lib/html2rss/rss_builder/article.rb', line 64 def id = blank_string_to_nil(@to_h[:id]) |
#image ⇒ Url?
90 91 92 93 94 |
# File 'lib/html2rss/rss_builder/article.rb', line 90 def image return @image unless @image == NOT_SET @image = Url.sanitize(@to_h[:image]) end |
#published_at ⇒ DateTime?
Parses and returns the published_at time.
151 152 153 154 155 156 157 158 |
# File 'lib/html2rss/rss_builder/article.rb', line 151 def published_at return @published_at unless @published_at == NOT_SET string = @to_h[:published_at].to_s.strip @published_at = string.empty? ? nil : DateTime.parse(string) rescue ArgumentError @published_at = nil end |
#scraper ⇒ Class?
Returns scraper class that produced this article.
161 162 163 |
# File 'lib/html2rss/rss_builder/article.rb', line 161 def scraper @to_h[:scraper] end |
#title ⇒ String?
Returns article title.
67 |
# File 'lib/html2rss/rss_builder/article.rb', line 67 def title = blank_string_to_nil(@to_h[:title]) |
#url ⇒ Url?
83 84 85 86 87 |
# File 'lib/html2rss/rss_builder/article.rb', line 83 def url return @url unless @url == NOT_SET @url = Url.sanitize(@to_h[:url]) end |
#valid? ⇒ Boolean
Checks if the article is valid based on the presence of URL, ID, and either title or description.
51 52 53 |
# File 'lib/html2rss/rss_builder/article.rb', line 51 def valid? !url.to_s.empty? && (!title.to_s.empty? || !description.to_s.empty?) && !id.to_s.empty? end |