Class: Html2rss::Url

Inherits:

Object

Object
Html2rss::Url

show all

Includes:: Comparable

Defined in:: lib/html2rss/url.rb

Overview

A value object representing a resolved, absolute URL with built-in operations. Provides URL resolution, sanitization, and titleization capabilities.

Examples:

Creating a URL from a relative path

url = Url.from_relative('/path/to/article', 'https://example.com')
url.to_s # => "https://example.com/path/to/article"

Sanitizing a raw URL string

url = Url.sanitize('https://example.com/  ')
url.to_s # => "https://example.com/"

Getting titleized versions

url = Url.from_relative('/foo-bar/baz.txt', 'https://example.com')
url.titleized # => "Foo Bar Baz"
url.channel_titleized # => "example.com: Foo Bar Baz"

Constant Summary collapse

URI_REGEXP = Regular expression for basic URI format validation

Addressable::URI::URIREGEX

SUPPORTED_SCHEMES = Schemes accepted by channel URL validation.

%w[http https].to_set.freeze

Class Method Summary collapse

.for_channel(url_string) ⇒ Url

Creates a URL for channel use with validation.
.from_absolute(url_string) ⇒ Url

Creates a URL from an already-absolute URL string.
.from_relative(relative_url, base_url) ⇒ Url

Creates a URL from a relative path and base URL.
.sanitize(raw_url) ⇒ Url^?

Creates a URL by sanitizing a raw URL string.

Instance Method Summary collapse

#<=>(other) ⇒ Integer

Compares this URL with another URL for equality.
#==(other) ⇒ Boolean

Returns true if this URL is equal to another URL.
#absolute? ⇒ Boolean

Whether the URL includes scheme and host.
#channel_titleized ⇒ String

Returns a titleized representation of the URL with prefixed host.
#eql?(other) ⇒ Boolean

Supports hash-based comparisons by ensuring equality semantics match ‘hash`.
#fragment ⇒ String^?

URI fragment without leading ‘#`.
#hash ⇒ Integer

Returns the hash code for this URL.
#host ⇒ String^?

URI host component.
#initialize(uri) ⇒ Url constructor

A new instance of Url.
#inspect ⇒ String

Returns a string representation of the URL for debugging.
#path ⇒ String^?

URI path component.
#path_segments ⇒ Array<String>

Returns the URL path split into non-empty segments.
#port ⇒ Integer^?

URI port component.
#query ⇒ String^?

URI query string without leading ‘?`.
#query_values ⇒ Hash{String => String}

Returns the URL query string as a hash of string keys and values.
#scheme ⇒ String^?

URI scheme, for example ‘http` or `https`.
#titleized ⇒ String

Returns a titleized representation of the URL path.
#to_s ⇒ String

Normalized URL string.
#with_path(path) ⇒ Url

Returns a copy of the URL with the provided path.
#with_query_values(values) ⇒ Url

Returns a copy of the URL with the provided query values.

Constructor Details

#initialize(uri) ⇒ `Url`

Returns a new instance of Url.

Parameters:

uri (Addressable::URI) —

the underlying Addressable::URI object (internal use only)

# File 'lib/html2rss/url.rb', line 126

def initialize(uri)
  @uri = uri.freeze
  freeze
end

Class Method Details

.for_channel(url_string) ⇒ `Url`

Creates a URL for channel use with validation. Validates that the URL meets channel requirements (absolute, no @, supported schemes).

Examples:

Creating a channel URL

Url.for_channel('https://example.com')
# => #<Html2rss::Url:... @uri=#<Addressable::URI:... URI:https://example.com>>

Invalid channel URL

Url.for_channel('/relative/path')
# => raises ArgumentError: "URL must be absolute"

Parameters:

url_string (String) —

the URL string to validate and parse

Returns:

(Url) —

the validated and parsed URL

Raises:

(ArgumentError) —

if the URL doesn’t meet channel requirements

# File 'lib/html2rss/url.rb', line 94

def self.for_channel(url_string)
  return nil if url_string.nil? || url_string.empty?

  stripped = url_string.strip
  return nil if stripped.empty?

  url = from_absolute(stripped)
  validate_channel_url(url)
  url
end

.from_absolute(url_string) ⇒ `Url`

Creates a URL from an already-absolute URL string.

Parameters:

url_string (String, Html2rss::Url) —

the absolute URL to parse

Returns:

(Url) —

the parsed and normalized URL

Raises:

(ArgumentError) —

if the URL is not absolute or cannot be parsed

# File 'lib/html2rss/url.rb', line 70

def self.from_absolute(url_string)
  return url_string if url_string.is_a?(self)

  url = new(Addressable::URI.parse(url_string.to_s.strip).normalize)
  raise ArgumentError, 'URL must be absolute' unless url.absolute?

  url
rescue Addressable::URI::InvalidURIError
  raise ArgumentError, 'URL must be absolute'
end

.from_relative(relative_url, base_url) ⇒ `Url`

Creates a URL from a relative path and base URL.

Parameters:

relative_url (String, Html2rss::Url) —

the relative URL to resolve
base_url (String, Html2rss::Url) —

the base URL to resolve against

Returns:

(Url) —

the resolved absolute URL

Raises:

(ArgumentError) —

if the URL cannot be parsed

# File 'lib/html2rss/url.rb', line 38

def self.from_relative(relative_url, base_url)
  url = Addressable::URI.parse(relative_url.to_s.strip)
  return new(url) if url.absolute?

  base_uri = Addressable::URI.parse(base_url.to_s)
  base_uri.path = '/' if base_uri.path.empty?

  new(base_uri.join(url).normalize)
rescue Addressable::URI::InvalidURIError
  raise ArgumentError, 'URL could not be parsed'
end

.sanitize(raw_url) ⇒ `Url`^?

Creates a URL by sanitizing a raw URL string. Removes spaces and extracts the first valid URL from the string.

Parameters:

raw_url (String) —

the raw URL string to sanitize

Returns:

(Url, nil) —

the sanitized URL, or nil if no valid URL found

# File 'lib/html2rss/url.rb', line 56

def self.sanitize(raw_url)
  matched_urls = raw_url.to_s.scan(%r{(?:(?:https?|ftp|mailto)://|mailto:)[^\s<>"]+})
  url = matched_urls.first.to_s.strip
  return nil if url.empty?

  new(Addressable::URI.parse(url).normalize)
end

Instance Method Details

#<=>(other) ⇒ `Integer`

Compares this URL with another URL for equality. URLs are considered equal if their string representations are the same.

Parameters:

other (Url) —

the other URL to compare with

Returns:

(Integer) —

-1, 0, or 1 for less than, equal, or greater than

240	# File 'lib/html2rss/url.rb', line 240 def <=>(other) = to_s <=> other.to_s

#==(other) ⇒ `Boolean`

Returns true if this URL is equal to another URL.

Parameters:

other (Object) —

the other object to compare with

Returns:

(Boolean) —

true if the URLs are equal

247	# File 'lib/html2rss/url.rb', line 247 def ==(other) = other.is_a?(Url) && to_s == other.to_s

#absolute? ⇒ `Boolean`

Returns whether the URL includes scheme and host.

Returns:

(Boolean) —

whether the URL includes scheme and host

153	# File 'lib/html2rss/url.rb', line 153 def absolute? = @uri.absolute?

#channel_titleized ⇒ `String`

Returns a titleized representation of the URL with prefixed host. Creates a channel title by combining host and path information. Useful for RSS channel titles that need to identify the source.

Examples:

With path

url = Url.from_absolute('https://example.com/foo-bar/baz')
url.channel_titleized # => "example.com: Foo Bar Baz"

Without path (root URL)

url = Url.from_absolute('https://example.com')
url.channel_titleized # => "example.com"

Returns:

(String) —

the titleized channel URL

# File 'lib/html2rss/url.rb', line 227

def channel_titleized
  nicer_path = CGI.unescapeURIComponent(@uri.path).split('/').reject(&:empty?)
  host = @uri.host

  nicer_path.any? ? "#{host}: #{nicer_path.map(&:capitalize).join(' ')}" : host
end

#eql?(other) ⇒ `Boolean`

Supports hash-based comparisons by ensuring equality semantics match ‘hash`.

Parameters:

other (Object) —

the other object to compare with

Returns:

(Boolean) —

true if the URLs are considered equal

254	# File 'lib/html2rss/url.rb', line 254 def eql?(other) = other.is_a?(Url) && to_s == other.to_s

#fragment ⇒ `String`^?

Returns URI fragment without leading ‘#`.

Returns:

(String, nil) —

URI fragment without leading ‘#`

150	# File 'lib/html2rss/url.rb', line 150 def fragment = @uri.fragment

#hash ⇒ `Integer`

Returns the hash code for this URL.

Returns:

(Integer) —

the hash code

260	# File 'lib/html2rss/url.rb', line 260 def hash = to_s.hash

#host ⇒ `String`^?

Returns URI host component.

Returns:

(String, nil) —

URI host component

138	# File 'lib/html2rss/url.rb', line 138 def host = @uri.host

#inspect ⇒ `String`

Returns a string representation of the URL for debugging.

Returns:

(String) —

the debug representation

266	# File 'lib/html2rss/url.rb', line 266 def inspect = "#<#{self.class}:#{object_id} @uri=#{@uri.inspect}>"

#path ⇒ `String`^?

Returns URI path component.

Returns:

(String, nil) —

URI path component

144	# File 'lib/html2rss/url.rb', line 144 def path = @uri.path

#path_segments ⇒ `Array<String>`

Returns the URL path split into non-empty segments.

Returns:

(Array<String>) —

normalized path segments

165	# File 'lib/html2rss/url.rb', line 165 def path_segments = @uri.path.to_s.split('/').reject(&:empty?)

#port ⇒ `Integer`^?

Returns URI port component.

Returns:

(Integer, nil) —

URI port component

141	# File 'lib/html2rss/url.rb', line 141 def port = @uri.port

#query ⇒ `String`^?

Returns URI query string without leading ‘?`.

Returns:

(String, nil) —

URI query string without leading ‘?`

147	# File 'lib/html2rss/url.rb', line 147 def query = @uri.query

#query_values ⇒ `Hash{String => String}`

Returns the URL query string as a hash of string keys and values.

Returns:

(Hash{String => String}) —

normalized query parameters

159	# File 'lib/html2rss/url.rb', line 159 def query_values = @uri.query_values(Hash) \|\| {}

#scheme ⇒ `String`^?

Returns URI scheme, for example ‘http` or `https`.

Returns:

(String, nil) —

URI scheme, for example ‘http` or `https`

135	# File 'lib/html2rss/url.rb', line 135 def scheme = @uri.scheme

#titleized ⇒ `String`

Returns a titleized representation of the URL path. Converts the path to a human-readable title by cleaning and capitalizing words. Removes file extensions and special characters, then capitalizes each word.

Examples:

Basic titleization

url = Url.from_absolute('https://example.com/foo-bar/baz.txt')
url.titleized # => "Foo Bar Baz"

With URL encoding

url = Url.from_absolute('https://example.com/hello%20world/article.html')
url.titleized # => "Hello World Article"

Returns:

(String) —

the titleized path, or empty string if path is empty

# File 'lib/html2rss/url.rb', line 201

def titleized
  path = @uri.path
  return '' if path.empty?

  nicer_path = CGI.unescapeURIComponent(path)
                  .split('/')
                  .flat_map do |part|
                    part.gsub(/[^a-zA-Z0-9.]/, ' ').gsub(/\s+/, ' ').split
                  end

  nicer_path.map!(&:capitalize)
  File.basename(nicer_path.join(' '), '.*')
end

#to_s ⇒ `String`

Returns normalized URL string.

Returns:

(String) —

normalized URL string

132	# File 'lib/html2rss/url.rb', line 132 def to_s = @uri.to_s

#with_path(path) ⇒ `Url`

Returns a copy of the URL with the provided path.

Parameters:

path (String) —

normalized absolute path

Returns:

(Url) —

a new URL with the updated path

# File 'lib/html2rss/url.rb', line 172

def with_path(path)
  uri = @uri.dup
  uri.path = path
  self.class.from_absolute(uri.normalize.to_s)
end

#with_query_values(values) ⇒ `Url`

Returns a copy of the URL with the provided query values.

Parameters:

values (Hash{String, Symbol => #to_s}) —

query parameters to assign

Returns:

(Url) —

a new URL with the updated query string

# File 'lib/html2rss/url.rb', line 183

def with_query_values(values)
  uri = @uri.dup
  uri.query_values = values.transform_keys(&:to_s).transform_values(&:to_s)
  self.class.from_absolute(uri.normalize.to_s)
end

Class: Html2rss::Url

Overview

Examples:

Creating a URL from a relative path

Sanitizing a raw URL string

Getting titleized versions

Constant Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(uri) ⇒ Url

Class Method Details

.for_channel(url_string) ⇒ Url

Examples:

Creating a channel URL

Invalid channel URL

.from_absolute(url_string) ⇒ Url

.from_relative(relative_url, base_url) ⇒ Url

.sanitize(raw_url) ⇒ Url?

Instance Method Details

#<=>(other) ⇒ Integer

#==(other) ⇒ Boolean

#absolute? ⇒ Boolean

#channel_titleized ⇒ String

Examples:

With path

Without path (root URL)

#eql?(other) ⇒ Boolean

#fragment ⇒ String?

#hash ⇒ Integer

#host ⇒ String?

#inspect ⇒ String

#path ⇒ String?

#path_segments ⇒ Array<String>

#port ⇒ Integer?

#query ⇒ String?

#query_values ⇒ Hash{String => String}

#scheme ⇒ String?

#titleized ⇒ String

Examples:

Basic titleization

With URL encoding

#to_s ⇒ String

#with_path(path) ⇒ Url

#with_query_values(values) ⇒ Url

#initialize(uri) ⇒ `Url`

.for_channel(url_string) ⇒ `Url`

.from_absolute(url_string) ⇒ `Url`

.from_relative(relative_url, base_url) ⇒ `Url`

.sanitize(raw_url) ⇒ `Url`^?

#<=>(other) ⇒ `Integer`

#==(other) ⇒ `Boolean`

#absolute? ⇒ `Boolean`

#channel_titleized ⇒ `String`

#eql?(other) ⇒ `Boolean`

#fragment ⇒ `String`^?

#hash ⇒ `Integer`

#host ⇒ `String`^?

#inspect ⇒ `String`

#path ⇒ `String`^?

#path_segments ⇒ `Array<String>`

#port ⇒ `Integer`^?

#query ⇒ `String`^?

#query_values ⇒ `Hash{String => String}`

#scheme ⇒ `String`^?

#titleized ⇒ `String`

#to_s ⇒ `String`

#with_path(path) ⇒ `Url`

#with_query_values(values) ⇒ `Url`