Module: Iriq::Normalizer

Defined in:
lib/iriq/normalizer.rb

Overview

Produces a canonical, shape-aware string for an identifier.

Normalizer.normalize("https://Foo.com:443/users/123")
# => "https://foo.com/users/{user_id}"

The form is intended for grouping/diffing — it is not a round-trippable URL.

Path + query rendering dispatches through an evidence source so the mechanical (classifier-only) and corpus-informed code paths share one entry point. When ‘evidence` is nil, NullEvidenceSource provides the mechanical behavior (PathShape + param-name-hint query rules). When a Corpus is passed as `evidence`, its observed Position / Cluster stats drive the rendering (variability promotion, popular outlier preservation, cluster-inferred query types).

Class Method Summary collapse

Class Method Details

.normalize(input, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) ⇒ Object



19
20
21
22
# File 'lib/iriq/normalizer.rb', line 19

def normalize(input, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil)
  iri = input.is_a?(Identifier) ? input : Parser.parse(input)
  normalize_identifier(iri, classifier: classifier, hints: hints, evidence: evidence)
end

.normalize_identifier(iri, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) ⇒ Object



24
25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/iriq/normalizer.rb', line 24

def normalize_identifier(iri, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil)
  return normalize_urn(iri, classifier, hints) if iri.urn?

  src = evidence || NullEvidenceSource.new
  out = +""
  out << "#{iri.scheme}://" if iri.scheme
  out << iri.host if iri.host
  out << ":#{iri.port}" if iri.port
  out << src.render_path(iri, classifier, hints)
  if iri.query_params && !iri.query_params.empty?
    out << "?" << src.render_query(iri, classifier)
  end
  out
end

.normalize_urn(iri, classifier, hints) ⇒ Object



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# File 'lib/iriq/normalizer.rb', line 39

def normalize_urn(iri, classifier, hints)
  return iri.canonical unless iri.scheme == "urn" && iri.nss && iri.nss.include?(":")

  ns, value = iri.nss.split(":", 2)
  entry     = SegmentHints.derive([ns, value], classifier).last
  shaped =
    if entry[:type] == :date && (canon = SegmentClassifier.canonical_date(entry[:value]))
      canon
    elsif entry[:type] == :currency && (canon = SegmentClassifier.canonical_currency(entry[:value]))
      canon
    elsif entry[:variable]
      "{#{(hints && entry[:hint]) || SegmentClassifier.display_type(entry[:type])}}"
    else
      entry[:value]
    end
  "urn:#{ns}:#{shaped}"
end