Class: Html2rss::Url
Overview
A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.
Constant Summary collapse
- URI_REGEXP =
Regular expression for basic URI format validation
Addressable::URI::URIREGEX
- SUPPORTED_SCHEMES =
Schemes accepted by channel URL validation.
%w[http https].to_set.freeze
Instance Attribute Summary collapse
-
#path_segments ⇒ Array<String>
readonly
Returns the URL path split into non-empty segments.
Class Method Summary collapse
-
.extract_from_html(html) ⇒ Url?
Extracts a base URL from HTML metadata tags.
-
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation.
-
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
-
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
-
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string.
Instance Method Summary collapse
-
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality.
-
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
-
#absolute? ⇒ Boolean
Whether the URL includes scheme and host.
-
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host.
-
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match ‘hash`.
-
#fragment ⇒ String?
URI fragment without leading ‘#`.
-
#hash ⇒ Integer
Returns the hash code for this URL.
-
#host ⇒ String?
URI host component.
-
#initialize(uri) ⇒ Url
constructor
A new instance of Url.
-
#inspect ⇒ String
Returns a string representation of the URL for debugging.
-
#path ⇒ String?
URI path component.
-
#port ⇒ Integer?
URI port component.
-
#query ⇒ String?
URI query string without leading ‘?`.
-
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
-
#scheme ⇒ String?
URI scheme, for example ‘http` or `https`.
-
#titleized ⇒ String
Returns a titleized representation of the URL path.
-
#to_s ⇒ String
Normalized URL string.
-
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
-
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
Constructor Details
#initialize(uri) ⇒ Url
Returns a new instance of Url.
139 140 141 142 143 |
# File 'lib/html2rss/url.rb', line 139 def initialize(uri) @uri = uri.freeze @path_segments = @uri.path.to_s.split('/').reject(&:empty?).freeze freeze end |
Instance Attribute Details
#path_segments ⇒ Array<String> (readonly)
Returns the URL path split into non-empty segments.
179 180 181 |
# File 'lib/html2rss/url.rb', line 179 def path_segments @path_segments end |
Class Method Details
.extract_from_html(html) ⇒ Url?
Extracts a base URL from HTML metadata tags.
105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/html2rss/url.rb', line 105 def self.extract_from_html(html) doc = Nokogiri::HTML(html) = { 'link[rel="canonical"]' => 'href', 'meta[property="og:url"]' => 'content', 'meta[name="twitter:url"]' => 'content', 'base[href]' => 'href' } .each do |sel, attr| val = doc.at_css(sel)&.[](attr).to_s.strip return from_absolute(val) unless val.empty? rescue ArgumentError next end nil end |
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).
93 94 95 96 97 98 |
# File 'lib/html2rss/url.rb', line 93 def self.for_channel(url_string) stripped = url_string.to_s.strip return nil if stripped.empty? from_absolute(stripped).tap { validate_channel_url(_1) } end |
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
70 71 72 73 74 75 76 77 78 |
# File 'lib/html2rss/url.rb', line 70 def self.from_absolute(url_string) return url_string if url_string.is_a?(self) new(Addressable::URI.parse(url_string.to_s.strip).normalize).tap do |url| raise ArgumentError, 'URL must be absolute' unless url.absolute? end rescue Addressable::URI::InvalidURIError raise ArgumentError, 'URL must be absolute' end |
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/html2rss/url.rb', line 39 def self.from_relative(relative_url, base_url) url = Addressable::URI.parse(relative_url.to_s.strip) return new(url) if url.absolute? base_uri = Addressable::URI.parse(base_url.to_s) base_uri.path = '/' if base_uri.path.empty? new(base_uri.join(url).normalize) rescue Addressable::URI::InvalidURIError raise ArgumentError, 'URL could not be parsed' end |
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.
57 58 59 60 61 62 |
# File 'lib/html2rss/url.rb', line 57 def self.sanitize(raw_url) match = raw_url.to_s.match(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+}) return unless match new(Addressable::URI.parse(match[0].strip).normalize) end |
Instance Method Details
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.
254 |
# File 'lib/html2rss/url.rb', line 254 def <=>(other) = to_s <=> other.to_s |
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
261 |
# File 'lib/html2rss/url.rb', line 261 def ==(other) = other.is_a?(Url) && to_s == other.to_s |
#absolute? ⇒ Boolean
Returns whether the URL includes scheme and host.
167 |
# File 'lib/html2rss/url.rb', line 167 def absolute? = @uri.absolute? |
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.
241 242 243 244 245 246 |
# File 'lib/html2rss/url.rb', line 241 def channel_titleized nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?) host = @uri.host nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host end |
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match ‘hash`.
268 |
# File 'lib/html2rss/url.rb', line 268 def eql?(other) = other.is_a?(Url) && to_s == other.to_s |
#fragment ⇒ String?
Returns URI fragment without leading ‘#`.
164 |
# File 'lib/html2rss/url.rb', line 164 def fragment = @uri.fragment |
#hash ⇒ Integer
Returns the hash code for this URL.
274 |
# File 'lib/html2rss/url.rb', line 274 def hash = to_s.hash |
#host ⇒ String?
Returns URI host component.
152 |
# File 'lib/html2rss/url.rb', line 152 def host = @uri.host |
#inspect ⇒ String
Returns a string representation of the URL for debugging.
280 |
# File 'lib/html2rss/url.rb', line 280 def inspect = "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>" |
#path ⇒ String?
Returns URI path component.
158 |
# File 'lib/html2rss/url.rb', line 158 def path = @uri.path |
#port ⇒ Integer?
Returns URI port component.
155 |
# File 'lib/html2rss/url.rb', line 155 def port = @uri.port |
#query ⇒ String?
Returns URI query string without leading ‘?`.
161 |
# File 'lib/html2rss/url.rb', line 161 def query = @uri.query |
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
173 |
# File 'lib/html2rss/url.rb', line 173 def query_values = @uri.query_values(Hash) || {} |
#scheme ⇒ String?
Returns URI scheme, for example ‘http` or `https`.
149 |
# File 'lib/html2rss/url.rb', line 149 def scheme = @uri.scheme |
#titleized ⇒ String
Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.
215 216 217 218 219 220 221 222 223 224 225 226 227 |
# File 'lib/html2rss/url.rb', line 215 def titleized path = @uri.path return '' if path.empty? nicer_path = CGI.unescapeURIComponent(path) .split('/') .flat_map do |part| part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split end nicer_path.map!(&:capitalize) File.basename(nicer_path.join(' '), '.*') end |
#to_s ⇒ String
Returns normalized URL string.
146 |
# File 'lib/html2rss/url.rb', line 146 def to_s = @uri.to_s |
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
186 187 188 189 190 |
# File 'lib/html2rss/url.rb', line 186 def with_path(path) uri = @uri.dup uri.path = path self.class.from_absolute(uri.normalize.to_s) end |
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
197 198 199 200 201 |
# File 'lib/html2rss/url.rb', line 197 def with_query_values(values) uri = @uri.dup uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s) self.class.from_absolute(uri.normalize.to_s) end |