Class: Html2rss::Url
Overview
A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.
Constant Summary collapse
- URI_REGEXP =
Regular expression for basic URI format validation
Addressable::URI::URIREGEX
- SUPPORTED_SCHEMES =
Schemes accepted by channel URL validation.
%w[http https].to_set.freeze
Class Method Summary collapse
-
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation.
-
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
-
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
-
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string.
Instance Method Summary collapse
-
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality.
-
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
-
#absolute? ⇒ Boolean
Whether the URL includes scheme and host.
-
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host.
-
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match ‘hash`.
-
#fragment ⇒ String?
URI fragment without leading ‘#`.
-
#hash ⇒ Integer
Returns the hash code for this URL.
-
#host ⇒ String?
URI host component.
-
#initialize(uri) ⇒ Url
constructor
A new instance of Url.
-
#inspect ⇒ String
Returns a string representation of the URL for debugging.
-
#path ⇒ String?
URI path component.
-
#path_segments ⇒ Array<String>
Returns the URL path split into non-empty segments.
-
#port ⇒ Integer?
URI port component.
-
#query ⇒ String?
URI query string without leading ‘?`.
-
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
-
#scheme ⇒ String?
URI scheme, for example ‘http` or `https`.
-
#titleized ⇒ String
Returns a titleized representation of the URL path.
-
#to_s ⇒ String
Normalized URL string.
-
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
-
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
Constructor Details
#initialize(uri) ⇒ Url
Returns a new instance of Url.
124 125 126 127 |
# File 'lib/html2rss/url.rb', line 124 def initialize(uri) @uri = uri.freeze freeze end |
Class Method Details
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).
92 93 94 95 96 97 98 99 100 101 |
# File 'lib/html2rss/url.rb', line 92 def self.for_channel(url_string) return nil if url_string.nil? || url_string.empty? stripped = url_string.strip return nil if stripped.empty? url = from_absolute(stripped) validate_channel_url(url) url end |
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
68 69 70 71 72 73 74 75 76 77 |
# File 'lib/html2rss/url.rb', line 68 def self.from_absolute(url_string) return url_string if url_string.is_a?(self) url = new(Addressable::URI.parse(url_string.to_s.strip).normalize) raise ArgumentError, 'URL must be absolute' unless url.absolute? url rescue Addressable::URI::InvalidURIError raise ArgumentError, 'URL must be absolute' end |
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
38 39 40 41 42 43 44 45 46 |
# File 'lib/html2rss/url.rb', line 38 def self.from_relative(relative_url, base_url) url = Addressable::URI.parse(relative_url.to_s.strip) return new(url) if url.absolute? base_uri = Addressable::URI.parse(base_url.to_s) base_uri.path = '/' if base_uri.path.empty? new(base_uri.join(url).normalize) end |
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.
54 55 56 57 58 59 60 |
# File 'lib/html2rss/url.rb', line 54 def self.sanitize(raw_url) matched_urls = raw_url.to_s.scan(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+}) url = matched_urls.first.to_s.strip return nil if url.empty? new(Addressable::URI.parse(url).normalize) end |
Instance Method Details
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.
238 |
# File 'lib/html2rss/url.rb', line 238 def <=>(other) = to_s <=> other.to_s |
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
245 |
# File 'lib/html2rss/url.rb', line 245 def ==(other) = other.is_a?(Url) && to_s == other.to_s |
#absolute? ⇒ Boolean
Returns whether the URL includes scheme and host.
151 |
# File 'lib/html2rss/url.rb', line 151 def absolute? = @uri.absolute? |
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.
225 226 227 228 229 230 |
# File 'lib/html2rss/url.rb', line 225 def channel_titleized nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?) host = @uri.host nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host end |
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match ‘hash`.
252 |
# File 'lib/html2rss/url.rb', line 252 def eql?(other) = other.is_a?(Url) && to_s == other.to_s |
#fragment ⇒ String?
Returns URI fragment without leading ‘#`.
148 |
# File 'lib/html2rss/url.rb', line 148 def fragment = @uri.fragment |
#hash ⇒ Integer
Returns the hash code for this URL.
258 |
# File 'lib/html2rss/url.rb', line 258 def hash = to_s.hash |
#host ⇒ String?
Returns URI host component.
136 |
# File 'lib/html2rss/url.rb', line 136 def host = @uri.host |
#inspect ⇒ String
Returns a string representation of the URL for debugging.
264 |
# File 'lib/html2rss/url.rb', line 264 def inspect = "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>" |
#path ⇒ String?
Returns URI path component.
142 |
# File 'lib/html2rss/url.rb', line 142 def path = @uri.path |
#path_segments ⇒ Array<String>
Returns the URL path split into non-empty segments.
163 |
# File 'lib/html2rss/url.rb', line 163 def path_segments = @uri.path.to_s.split('/').reject(&:empty?) |
#port ⇒ Integer?
Returns URI port component.
139 |
# File 'lib/html2rss/url.rb', line 139 def port = @uri.port |
#query ⇒ String?
Returns URI query string without leading ‘?`.
145 |
# File 'lib/html2rss/url.rb', line 145 def query = @uri.query |
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
157 |
# File 'lib/html2rss/url.rb', line 157 def query_values = @uri.query_values(Hash) || {} |
#scheme ⇒ String?
Returns URI scheme, for example ‘http` or `https`.
133 |
# File 'lib/html2rss/url.rb', line 133 def scheme = @uri.scheme |
#titleized ⇒ String
Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.
199 200 201 202 203 204 205 206 207 208 209 210 211 |
# File 'lib/html2rss/url.rb', line 199 def titleized path = @uri.path return '' if path.empty? nicer_path = CGI.unescapeURIComponent(path) .split('/') .flat_map do |part| part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split end nicer_path.map!(&:capitalize) File.basename(nicer_path.join(' '), '.*') end |
#to_s ⇒ String
Returns normalized URL string.
130 |
# File 'lib/html2rss/url.rb', line 130 def to_s = @uri.to_s |
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
170 171 172 173 174 |
# File 'lib/html2rss/url.rb', line 170 def with_path(path) uri = @uri.dup uri.path = path self.class.from_absolute(uri.normalize.to_s) end |
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
181 182 183 184 185 |
# File 'lib/html2rss/url.rb', line 181 def with_query_values(values) uri = @uri.dup uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s) self.class.from_absolute(uri.normalize.to_s) end |