Class: Archaeo::UrlNormalizer

Inherits:
Object
  • Object
show all
Defined in:
lib/archaeo/url_normalizer.rb

Overview

Sanitizes and normalizes URLs for Wayback Machine API queries.

Handles common URL issues: whitespace, surrounding quotes, double percent-encoding, and inconsistent percent-encoding case.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(url) ⇒ UrlNormalizer

Returns a new instance of UrlNormalizer.



11
12
13
14
# File 'lib/archaeo/url_normalizer.rb', line 11

def initialize(url)
  @original = url.to_s
  @normalized = normalize(@original)
end

Instance Attribute Details

#normalizedObject (readonly)

Returns the value of attribute normalized.



9
10
11
# File 'lib/archaeo/url_normalizer.rb', line 9

def normalized
  @normalized
end

#originalObject (readonly)

Returns the value of attribute original.



9
10
11
# File 'lib/archaeo/url_normalizer.rb', line 9

def original
  @original
end

Class Method Details

.normalize(url) ⇒ Object



16
17
18
# File 'lib/archaeo/url_normalizer.rb', line 16

def self.normalize(url)
  new(url).normalized
end

.with_scheme(url) ⇒ Object



20
21
22
23
# File 'lib/archaeo/url_normalizer.rb', line 20

def self.with_scheme(url)
  normalized = normalize(url)
  normalized.match?(%r{\A[a-z][a-z0-9+\-.]*://}) ? normalized : "https://#{normalized}"
end

Instance Method Details

#to_sObject



25
26
27
# File 'lib/archaeo/url_normalizer.rb', line 25

def to_s
  @normalized
end