Class: Html2rss::Url
Overview
A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.
Constant Summary collapse
- URI_REGEXP =
Regular expression for basic URI format validation
Addressable::URI::URIREGEX
- SUPPORTED_SCHEMES =
Schemes accepted by channel URL validation.
%w[http https].to_set.freeze
Class Method Summary collapse
-
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation.
-
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
-
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
-
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string.
Instance Method Summary collapse
-
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality.
-
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
-
#absolute? ⇒ Boolean
Whether the URL includes scheme and host.
-
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host.
-
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match ‘hash`.
-
#fragment ⇒ String?
URI fragment without leading ‘#`.
-
#hash ⇒ Integer
Returns the hash code for this URL.
-
#host ⇒ String?
URI host component.
-
#initialize(uri) ⇒ Url
constructor
A new instance of Url.
-
#inspect ⇒ String
Returns a string representation of the URL for debugging.
-
#path ⇒ String?
URI path component.
-
#path_segments ⇒ Array<String>
Returns the URL path split into non-empty segments.
-
#port ⇒ Integer?
URI port component.
-
#query ⇒ String?
URI query string without leading ‘?`.
-
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
-
#scheme ⇒ String?
URI scheme, for example ‘http` or `https`.
-
#titleized ⇒ String
Returns a titleized representation of the URL path.
-
#to_s ⇒ String
Normalized URL string.
-
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
-
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
Constructor Details
#initialize(uri) ⇒ Url
Returns a new instance of Url.
126 127 128 129 |
# File 'lib/html2rss/url.rb', line 126 def initialize(uri) @uri = uri.freeze freeze end |
Class Method Details
.for_channel(url_string) ⇒ Url
Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).
94 95 96 97 98 99 100 101 102 103 |
# File 'lib/html2rss/url.rb', line 94 def self.for_channel(url_string) return nil if url_string.nil? || url_string.empty? stripped = url_string.strip return nil if stripped.empty? url = from_absolute(stripped) validate_channel_url(url) url end |
.from_absolute(url_string) ⇒ Url
Creates a URL from an already-absolute URL string.
70 71 72 73 74 75 76 77 78 79 |
# File 'lib/html2rss/url.rb', line 70 def self.from_absolute(url_string) return url_string if url_string.is_a?(self) url = new(Addressable::URI.parse(url_string.to_s.strip).normalize) raise ArgumentError, 'URL must be absolute' unless url.absolute? url rescue Addressable::URI::InvalidURIError raise ArgumentError, 'URL must be absolute' end |
.from_relative(relative_url, base_url) ⇒ Url
Creates a URL from a relative path and base URL.
38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/html2rss/url.rb', line 38 def self.from_relative(relative_url, base_url) url = Addressable::URI.parse(relative_url.to_s.strip) return new(url) if url.absolute? base_uri = Addressable::URI.parse(base_url.to_s) base_uri.path = '/' if base_uri.path.empty? new(base_uri.join(url).normalize) rescue Addressable::URI::InvalidURIError raise ArgumentError, 'URL could not be parsed' end |
.sanitize(raw_url) ⇒ Url?
Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.
56 57 58 59 60 61 62 |
# File 'lib/html2rss/url.rb', line 56 def self.sanitize(raw_url) matched_urls = raw_url.to_s.scan(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+}) url = matched_urls.first.to_s.strip return nil if url.empty? new(Addressable::URI.parse(url).normalize) end |
Instance Method Details
#<=>(other) ⇒ Integer
Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.
240 |
# File 'lib/html2rss/url.rb', line 240 def <=>(other) = to_s <=> other.to_s |
#==(other) ⇒ Boolean
Returns true if this URL is equal to another URL.
247 |
# File 'lib/html2rss/url.rb', line 247 def ==(other) = other.is_a?(Url) && to_s == other.to_s |
#absolute? ⇒ Boolean
Returns whether the URL includes scheme and host.
153 |
# File 'lib/html2rss/url.rb', line 153 def absolute? = @uri.absolute? |
#channel_titleized ⇒ String
Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.
227 228 229 230 231 232 |
# File 'lib/html2rss/url.rb', line 227 def channel_titleized nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?) host = @uri.host nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host end |
#eql?(other) ⇒ Boolean
Supports hash-based comparisons by ensuring equality semantics match ‘hash`.
254 |
# File 'lib/html2rss/url.rb', line 254 def eql?(other) = other.is_a?(Url) && to_s == other.to_s |
#fragment ⇒ String?
Returns URI fragment without leading ‘#`.
150 |
# File 'lib/html2rss/url.rb', line 150 def fragment = @uri.fragment |
#hash ⇒ Integer
Returns the hash code for this URL.
260 |
# File 'lib/html2rss/url.rb', line 260 def hash = to_s.hash |
#host ⇒ String?
Returns URI host component.
138 |
# File 'lib/html2rss/url.rb', line 138 def host = @uri.host |
#inspect ⇒ String
Returns a string representation of the URL for debugging.
266 |
# File 'lib/html2rss/url.rb', line 266 def inspect = "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>" |
#path ⇒ String?
Returns URI path component.
144 |
# File 'lib/html2rss/url.rb', line 144 def path = @uri.path |
#path_segments ⇒ Array<String>
Returns the URL path split into non-empty segments.
165 |
# File 'lib/html2rss/url.rb', line 165 def path_segments = @uri.path.to_s.split('/').reject(&:empty?) |
#port ⇒ Integer?
Returns URI port component.
141 |
# File 'lib/html2rss/url.rb', line 141 def port = @uri.port |
#query ⇒ String?
Returns URI query string without leading ‘?`.
147 |
# File 'lib/html2rss/url.rb', line 147 def query = @uri.query |
#query_values ⇒ Hash{String => String}
Returns the URL query string as a hash of string keys and values.
159 |
# File 'lib/html2rss/url.rb', line 159 def query_values = @uri.query_values(Hash) || {} |
#scheme ⇒ String?
Returns URI scheme, for example ‘http` or `https`.
135 |
# File 'lib/html2rss/url.rb', line 135 def scheme = @uri.scheme |
#titleized ⇒ String
Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.
201 202 203 204 205 206 207 208 209 210 211 212 213 |
# File 'lib/html2rss/url.rb', line 201 def titleized path = @uri.path return '' if path.empty? nicer_path = CGI.unescapeURIComponent(path) .split('/') .flat_map do |part| part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split end nicer_path.map!(&:capitalize) File.basename(nicer_path.join(' '), '.*') end |
#to_s ⇒ String
Returns normalized URL string.
132 |
# File 'lib/html2rss/url.rb', line 132 def to_s = @uri.to_s |
#with_path(path) ⇒ Url
Returns a copy of the URL with the provided path.
172 173 174 175 176 |
# File 'lib/html2rss/url.rb', line 172 def with_path(path) uri = @uri.dup uri.path = path self.class.from_absolute(uri.normalize.to_s) end |
#with_query_values(values) ⇒ Url
Returns a copy of the URL with the provided query values.
183 184 185 186 187 |
# File 'lib/html2rss/url.rb', line 183 def with_query_values(values) uri = @uri.dup uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s) self.class.from_absolute(uri.normalize.to_s) end |