Class: Html2rss::RssBuilder::Article

Inherits:
Object
  • Object
show all
Includes:
Comparable, Enumerable
Defined in:
lib/html2rss/rss_builder/article.rb

Overview

Article is a simple data object representing an article extracted from a page. It is enumerable and responds to all keys specified in PROVIDED_KEYS. rubocop:disable Metrics/ClassLength

Constant Summary collapse

PROVIDED_KEYS =

Allowed article attributes accepted by the value object constructor.

%i[id title description url image author guid published_at enclosures categories scraper].freeze
DEDUP_FINGERPRINT_SEPARATOR =

Separator used to build deterministic deduplication fingerprints.

'#!/'
NOT_SET =

Sentinel object used to pre-initialize instance variables in the constructor. This ensures all Article instances share the exact same object shape (Ruby 3.3+ optimization), preventing performance warnings and slower instance variable access due to shape transitions when attributes are lazily/conditionally accessed in different sequences.

Object.new.freeze

Instance Method Summary collapse

Constructor Details

#initialize(**options) ⇒ Article

Returns a new instance of Article.

Parameters:

  • options (Hash{Symbol => String})

Options Hash (**options):

  • :id (String)

    stable article identifier

  • :title (String)

    article title

  • :description (String)

    article description/content

  • :url (String, Html2rss::Url)

    canonical article URL

  • :image (String, Html2rss::Url)

    image URL for fallback enclosure rendering

  • :author (String)

    author name

  • :guid (String)

    explicit GUID override

  • :published_at (String, Time, DateTime)

    publication timestamp

  • :enclosures (Array<Hash{Symbol => Object}>)

    enclosure attribute hashes

  • :categories (Array<String>)

    category labels

  • :scraper (Class)

    scraper class that produced the article



39
40
41
42
43
44
45
46
47
# File 'lib/html2rss/rss_builder/article.rb', line 39

def initialize(**options)
  @to_h = options.each_with_object({}) { |(k, v), h| h[k] = v.freeze if v }.freeze

  @description = @url = @image = @guid = @enclosures = @enclosure = @categories = @published_at = NOT_SET

  return unless (unknown_keys = options.keys - PROVIDED_KEYS).any?

  Log.warn "Article: unknown keys found: #{unknown_keys.join(', ')}"
end

Instance Method Details

#<=>(other) ⇒ Integer?

Returns comparison result for compatible Article values.

Parameters:

  • other (Object)

    value compared against this article

Returns:

  • (Integer, nil)

    comparison result for compatible Article values



167
168
169
170
171
# File 'lib/html2rss/rss_builder/article.rb', line 167

def <=>(other)
  return nil unless other.is_a?(Article)

  0 if other.all? { |key, value| value == public_send(key) ? public_send(key) <=> value : false }
end

#authorString?

Returns:

  • (String, nil)


97
# File 'lib/html2rss/rss_builder/article.rb', line 97

def author = blank_string_to_nil(@to_h[:author])

#categoriesArray<String>

Returns normalized, unique category names.

Returns:

  • (Array<String>)

    normalized, unique category names



139
140
141
142
143
144
145
146
147
# File 'lib/html2rss/rss_builder/article.rb', line 139

def categories
  return @categories unless @categories == NOT_SET

  @categories = @to_h[:categories].dup.to_a.tap do |categories|
    categories.map! { |category| category.to_s.strip }
    categories.reject!(&:empty?)
    categories.uniq!
  end
end

#deduplication_fingerprintString, Integer

Returns a deterministic fingerprint used to detect duplicate articles.

Returns:

  • (String, Integer)


111
112
113
# File 'lib/html2rss/rss_builder/article.rb', line 111

def deduplication_fingerprint
  dedup_from_url || dedup_from_id || dedup_from_guid || hash
end

#descriptionString

Returns rendered article description.

Returns:

  • (String)

    rendered article description



70
71
72
73
74
75
76
77
78
79
80
# File 'lib/html2rss/rss_builder/article.rb', line 70

def description
  return @description unless @description == NOT_SET

  @description = Rendering::DescriptionBuilder.new(
    base: @to_h[:description],
    title:,
    url:,
    enclosures:,
    image:
  ).call
end

#each {|key, value| ... } ⇒ Enumerator

Returns if no block is given.

Yields:

  • (key, value)

Returns:

  • (Enumerator)

    if no block is given



57
58
59
60
61
# File 'lib/html2rss/rss_builder/article.rb', line 57

def each
  return enum_for(:each) unless block_given?

  PROVIDED_KEYS.each { |key| yield(key, public_send(key)) }
end

#enclosureHtml2rss::RssBuilder::Enclosure?



124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/html2rss/rss_builder/article.rb', line 124

def enclosure
  return @enclosure unless @enclosure == NOT_SET

  @enclosure = case (object = @to_h[:enclosures]&.first)
               when Hash
                 Html2rss::RssBuilder::Enclosure.new(**object)
               when nil
                 Html2rss::RssBuilder::Enclosure.new(url: image) if image
               else
                 Log.warn "Article: unknown enclosure type: #{object.class}"
                 nil
               end
end

#enclosuresArray<Html2rss::RssBuilder::Enclosure>

Returns normalized enclosure objects.

Returns:



116
117
118
119
120
121
# File 'lib/html2rss/rss_builder/article.rb', line 116

def enclosures
  return @enclosures unless @enclosures == NOT_SET

  @enclosures = Array(@to_h[:enclosures])
                .map { |enclosure| Html2rss::RssBuilder::Enclosure.new(**enclosure) }
end

#guidString

Generates a unique identifier based on the URL and ID using CRC32.

Returns:

  • (String)


101
102
103
104
105
# File 'lib/html2rss/rss_builder/article.rb', line 101

def guid
  return @guid unless @guid == NOT_SET

  @guid = Zlib.crc32(fetch_guid).to_s(36).encode('utf-8')
end

#idString?

Returns stable article identifier.

Returns:

  • (String, nil)

    stable article identifier



64
# File 'lib/html2rss/rss_builder/article.rb', line 64

def id = blank_string_to_nil(@to_h[:id])

#imageUrl?

Returns:



90
91
92
93
94
# File 'lib/html2rss/rss_builder/article.rb', line 90

def image
  return @image unless @image == NOT_SET

  @image = Url.sanitize(@to_h[:image])
end

#published_atDateTime?

Parses and returns the published_at time.

Returns:

  • (DateTime, nil)


151
152
153
154
155
156
157
158
# File 'lib/html2rss/rss_builder/article.rb', line 151

def published_at
  return @published_at unless @published_at == NOT_SET

  string = @to_h[:published_at].to_s.strip
  @published_at = string.empty? ? nil : DateTime.parse(string)
rescue ArgumentError
  @published_at = nil
end

#scraperClass?

Returns scraper class that produced this article.

Returns:

  • (Class, nil)

    scraper class that produced this article



161
162
163
# File 'lib/html2rss/rss_builder/article.rb', line 161

def scraper
  @to_h[:scraper]
end

#titleString?

Returns article title.

Returns:

  • (String, nil)

    article title



67
# File 'lib/html2rss/rss_builder/article.rb', line 67

def title = blank_string_to_nil(@to_h[:title])

#urlUrl?

Returns:



83
84
85
86
87
# File 'lib/html2rss/rss_builder/article.rb', line 83

def url
  return @url unless @url == NOT_SET

  @url = Url.sanitize(@to_h[:url])
end

#valid?Boolean

Checks if the article is valid based on the presence of URL, ID, and either title or description.

Returns:

  • (Boolean)

    True if the article is valid, otherwise false.



51
52
53
# File 'lib/html2rss/rss_builder/article.rb', line 51

def valid?
  !url.to_s.empty? && (!title.to_s.empty? || !description.to_s.empty?) && !id.to_s.empty?
end