Class: Archaeo::UrlNormalizer
- Inherits:
-
Object
- Object
- Archaeo::UrlNormalizer
- Defined in:
- lib/archaeo/url_normalizer.rb
Overview
Sanitizes and normalizes URLs for Wayback Machine API queries.
Handles common URL issues: whitespace, surrounding quotes, double percent-encoding, and inconsistent percent-encoding case.
Constant Summary collapse
- VALID_URL_RE =
%r{\A([a-z][a-z0-9+\-.]*://)?[^\s]+\z}
Instance Attribute Summary collapse
-
#normalized ⇒ Object
readonly
Returns the value of attribute normalized.
-
#original ⇒ Object
readonly
Returns the value of attribute original.
Class Method Summary collapse
Instance Method Summary collapse
-
#initialize(url) ⇒ UrlNormalizer
constructor
A new instance of UrlNormalizer.
- #to_s ⇒ Object
Constructor Details
#initialize(url) ⇒ UrlNormalizer
Returns a new instance of UrlNormalizer.
11 12 13 14 |
# File 'lib/archaeo/url_normalizer.rb', line 11 def initialize(url) @original = url.to_s @normalized = normalize(@original) end |
Instance Attribute Details
#normalized ⇒ Object (readonly)
Returns the value of attribute normalized.
9 10 11 |
# File 'lib/archaeo/url_normalizer.rb', line 9 def normalized @normalized end |
#original ⇒ Object (readonly)
Returns the value of attribute original.
9 10 11 |
# File 'lib/archaeo/url_normalizer.rb', line 9 def original @original end |
Class Method Details
.normalize(url) ⇒ Object
16 17 18 |
# File 'lib/archaeo/url_normalizer.rb', line 16 def self.normalize(url) new(url).normalized end |
.valid?(url) ⇒ Boolean
27 28 29 30 31 32 |
# File 'lib/archaeo/url_normalizer.rb', line 27 def self.valid?(url) normalized = normalize(url) return false if normalized.empty? normalized.match?(VALID_URL_RE) end |
.validate!(url) ⇒ Object
34 35 36 37 38 39 40 |
# File 'lib/archaeo/url_normalizer.rb', line 34 def self.validate!(url) normalized = normalize(url) raise ArgumentError, "URL cannot be empty" if normalized.empty? raise ArgumentError, "Invalid URL: #{url}" unless valid?(url) normalized end |
.with_scheme(url) ⇒ Object
20 21 22 23 |
# File 'lib/archaeo/url_normalizer.rb', line 20 def self.with_scheme(url) normalized = normalize(url) normalized.match?(%r{\A[a-z][a-z0-9+\-.]*://}) ? normalized : "https://#{normalized}" end |
Instance Method Details
#to_s ⇒ Object
42 43 44 |
# File 'lib/archaeo/url_normalizer.rb', line 42 def to_s @normalized end |