Class: Html2rss::Url

Inherits:
Object
  • Object
show all
Includes:
Comparable
Defined in:
lib/html2rss/url.rb

Overview

A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.

Examples:

Creating a URL from a relative path

url = Url.from_relative('/path/to/article', 'https://example.com')
url.to_s # => "https://example.com/path/to/article"

Sanitizing a raw URL string

url = Url.sanitize('https://example.com/  ')
url.to_s # => "https://example.com/"

Getting titleized versions

url = Url.from_relative('/foo-bar/baz.txt', 'https://example.com')
url.titleized # => "Foo Bar Baz"
url.channel_titleized # => "example.com: Foo Bar Baz"

Constant Summary collapse

URI_REGEXP =

Regular expression for basic URI format validation

Addressable::URI::URIREGEX
SUPPORTED_SCHEMES =

Schemes accepted by channel URL validation.

%w[http https].to_set.freeze

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(uri) ⇒ Url

Returns a new instance of Url.

Parameters:

  • uri (Addressable::URI)

    the underlying Addressable::URI object (internal use only)



124
125
126
127
# File 'lib/html2rss/url.rb', line 124

def initialize(uri)
  @uri = uri.freeze
  freeze
end

Class Method Details

.for_channel(url_string) ⇒ Url

Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).

Examples:

Creating a channel URL

Url.for_channel('https://example.com')
# => #<Html2rss::Url:... @uri=#<Addressable::URI:... URI:https://example.com>>

Invalid channel URL

Url.for_channel('/relative/path')
# => raises ArgumentError: "URL must be absolute"

Parameters:

  • url_string (String)

    the URL string to validate and parse

Returns:

  • (Url)

    the validated and parsed URL

Raises:

  • (ArgumentError)

    if the URL doesn’t meet channel requirements



92
93
94
95
96
97
98
99
100
101
# File 'lib/html2rss/url.rb', line 92

def self.for_channel(url_string)
  return nil if url_string.nil? || url_string.empty?

  stripped = url_string.strip
  return nil if stripped.empty?

  url = from_absolute(stripped)
  validate_channel_url(url)
  url
end

.from_absolute(url_string) ⇒ Url

Creates a URL from an already-absolute URL string.

Parameters:

  • url_string (String, Html2rss::Url)

    the absolute URL to parse

Returns:

  • (Url)

    the parsed and normalized URL

Raises:

  • (ArgumentError)

    if the URL is not absolute or cannot be parsed



68
69
70
71
72
73
74
75
76
77
# File 'lib/html2rss/url.rb', line 68

def self.from_absolute(url_string)
  return url_string if url_string.is_a?(self)

  url = new(Addressable::URI.parse(url_string.to_s.strip).normalize)
  raise ArgumentError, 'URL must be absolute' unless url.absolute?

  url
rescue Addressable::URI::InvalidURIError
  raise ArgumentError, 'URL must be absolute'
end

.from_relative(relative_url, base_url) ⇒ Url

Creates a URL from a relative path and base URL.

Parameters:

  • relative_url (String, Html2rss::Url)

    the relative URL to resolve

  • base_url (String, Html2rss::Url)

    the base URL to resolve against

Returns:

  • (Url)

    the resolved absolute URL

Raises:

  • (ArgumentError)

    if the URL cannot be parsed



38
39
40
41
42
43
44
45
46
# File 'lib/html2rss/url.rb', line 38

def self.from_relative(relative_url, base_url)
  url = Addressable::URI.parse(relative_url.to_s.strip)
  return new(url) if url.absolute?

  base_uri = Addressable::URI.parse(base_url.to_s)
  base_uri.path = '/' if base_uri.path.empty?

  new(base_uri.join(url).normalize)
end

.sanitize(raw_url) ⇒ Url?

Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.

Parameters:

  • raw_url (String)

    the raw URL string to sanitize

Returns:

  • (Url, nil)

    the sanitized URL, or nil if no valid URL found



54
55
56
57
58
59
60
# File 'lib/html2rss/url.rb', line 54

def self.sanitize(raw_url)
  matched_urls = raw_url.to_s.scan(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+})
  url = matched_urls.first.to_s.strip
  return nil if url.empty?

  new(Addressable::URI.parse(url).normalize)
end

Instance Method Details

#<=>(other) ⇒ Integer

Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.

Parameters:

  • other (Url)

    the other URL to compare with

Returns:

  • (Integer)

    -1, 0, or 1 for less than, equal, or greater than



238
# File 'lib/html2rss/url.rb', line 238

def <=>(other) = to_s <=> other.to_s

#==(other) ⇒ Boolean

Returns true if this URL is equal to another URL.

Parameters:

  • other (Object)

    the other object to compare with

Returns:

  • (Boolean)

    true if the URLs are equal



245
# File 'lib/html2rss/url.rb', line 245

def ==(other) = other.is_a?(Url) && to_s == other.to_s

#absolute?Boolean

Returns whether the URL includes scheme and host.

Returns:

  • (Boolean)

    whether the URL includes scheme and host



151
# File 'lib/html2rss/url.rb', line 151

def absolute? = @uri.absolute?

#channel_titleizedString

Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.

Examples:

With path

url = Url.from_absolute('https://example.com/foo-bar/baz')
url.channel_titleized # => "example.com: Foo Bar Baz"

Without path (root URL)

url = Url.from_absolute('https://example.com')
url.channel_titleized # => "example.com"

Returns:

  • (String)

    the titleized channel URL



225
226
227
228
229
230
# File 'lib/html2rss/url.rb', line 225

def channel_titleized
  nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?)
  host = @uri.host

  nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host
end

#eql?(other) ⇒ Boolean

Supports hash-based comparisons by ensuring equality semantics match ‘hash`.

Parameters:

  • other (Object)

    the other object to compare with

Returns:

  • (Boolean)

    true if the URLs are considered equal



252
# File 'lib/html2rss/url.rb', line 252

def eql?(other) = other.is_a?(Url) && to_s == other.to_s

#fragmentString?

Returns URI fragment without leading ‘#`.

Returns:

  • (String, nil)

    URI fragment without leading ‘#`



148
# File 'lib/html2rss/url.rb', line 148

def fragment = @uri.fragment

#hashInteger

Returns the hash code for this URL.

Returns:

  • (Integer)

    the hash code



258
# File 'lib/html2rss/url.rb', line 258

def hash = to_s.hash

#hostString?

Returns URI host component.

Returns:

  • (String, nil)

    URI host component



136
# File 'lib/html2rss/url.rb', line 136

def host = @uri.host

#inspectString

Returns a string representation of the URL for debugging.

Returns:

  • (String)

    the debug representation



264
# File 'lib/html2rss/url.rb', line 264

def inspect = "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>"

#pathString?

Returns URI path component.

Returns:

  • (String, nil)

    URI path component



142
# File 'lib/html2rss/url.rb', line 142

def path = @uri.path

#path_segmentsArray<String>

Returns the URL path split into non-empty segments.

Returns:

  • (Array<String>)

    normalized path segments



163
# File 'lib/html2rss/url.rb', line 163

def path_segments = @uri.path.to_s.split('/').reject(&:empty?)

#portInteger?

Returns URI port component.

Returns:

  • (Integer, nil)

    URI port component



139
# File 'lib/html2rss/url.rb', line 139

def port = @uri.port

#queryString?

Returns URI query string without leading ‘?`.

Returns:

  • (String, nil)

    URI query string without leading ‘?`



145
# File 'lib/html2rss/url.rb', line 145

def query = @uri.query

#query_valuesHash{String => String}

Returns the URL query string as a hash of string keys and values.

Returns:

  • (Hash{String => String})

    normalized query parameters



157
# File 'lib/html2rss/url.rb', line 157

def query_values = @uri.query_values(Hash) || {}

#schemeString?

Returns URI scheme, for example ‘http` or `https`.

Returns:

  • (String, nil)

    URI scheme, for example ‘http` or `https`



133
# File 'lib/html2rss/url.rb', line 133

def scheme = @uri.scheme

#titleizedString

Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.

Examples:

Basic titleization

url = Url.from_absolute('https://example.com/foo-bar/baz.txt')
url.titleized # => "Foo Bar Baz"

With URL encoding

url = Url.from_absolute('https://example.com/hello%20world/article.html')
url.titleized # => "Hello World Article"

Returns:

  • (String)

    the titleized path, or empty string if path is empty



199
200
201
202
203
204
205
206
207
208
209
210
211
# File 'lib/html2rss/url.rb', line 199

def titleized
  path = @uri.path
  return '' if path.empty?

  nicer_path = CGI.unescapeURIComponent(path)
                  .split('/')
                  .flat_map do |part|
                    part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split
                  end

  nicer_path.map!(&:capitalize)
  File.basename(nicer_path.join(' '), '.*')
end

#to_sString

Returns normalized URL string.

Returns:

  • (String)

    normalized URL string



130
# File 'lib/html2rss/url.rb', line 130

def to_s = @uri.to_s

#with_path(path) ⇒ Url

Returns a copy of the URL with the provided path.

Parameters:

  • path (String)

    normalized absolute path

Returns:

  • (Url)

    a new URL with the updated path



170
171
172
173
174
# File 'lib/html2rss/url.rb', line 170

def with_path(path)
  uri = @uri.dup
  uri.path = path
  self.class.from_absolute(uri.normalize.to_s)
end

#with_query_values(values) ⇒ Url

Returns a copy of the URL with the provided query values.

Parameters:

  • values (Hash{String, Symbol => #to_s})

    query parameters to assign

Returns:

  • (Url)

    a new URL with the updated query string



181
182
183
184
185
# File 'lib/html2rss/url.rb', line 181

def with_query_values(values)
  uri = @uri.dup
  uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s)
  self.class.from_absolute(uri.normalize.to_s)
end