Module: Iriq::Normalizer
- Defined in:
- lib/iriq/normalizer.rb
Overview
Produces a canonical, shape-aware string for an identifier.
Normalizer.normalize("https://Foo.com:443/users/123")
# => "https://foo.com/users/{user_id}"
The form is intended for grouping/diffing — it is not a round-trippable URL.
Path + query rendering dispatches through an evidence source so the mechanical (classifier-only) and corpus-informed code paths share one entry point. When ‘evidence` is nil, NullEvidenceSource provides the mechanical behavior (PathShape + param-name-hint query rules). When a Corpus is passed as `evidence`, its observed Position / Cluster stats drive the rendering (variability promotion, popular outlier preservation, cluster-inferred query types).
Class Method Summary collapse
- .normalize(input, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) ⇒ Object
- .normalize_identifier(iri, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) ⇒ Object
- .normalize_urn(iri, classifier, hints) ⇒ Object
Class Method Details
.normalize(input, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) ⇒ Object
19 20 21 22 |
# File 'lib/iriq/normalizer.rb', line 19 def normalize(input, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) iri = input.is_a?(Identifier) ? input : Parser.parse(input) normalize_identifier(iri, classifier: classifier, hints: hints, evidence: evidence) end |
.normalize_identifier(iri, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) ⇒ Object
24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# File 'lib/iriq/normalizer.rb', line 24 def normalize_identifier(iri, classifier: SegmentClassifier::DEFAULT, hints: true, evidence: nil) return normalize_urn(iri, classifier, hints) if iri.urn? src = evidence || NullEvidenceSource.new out = +"" out << "#{iri.scheme}://" if iri.scheme out << iri.host if iri.host out << ":#{iri.port}" if iri.port out << src.render_path(iri, classifier, hints) if iri.query_params && !iri.query_params.empty? out << "?" << src.render_query(iri, classifier) end out end |
.normalize_urn(iri, classifier, hints) ⇒ Object
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/iriq/normalizer.rb', line 39 def normalize_urn(iri, classifier, hints) return iri.canonical unless iri.scheme == "urn" && iri.nss && iri.nss.include?(":") ns, value = iri.nss.split(":", 2) entry = SegmentHints.derive([ns, value], classifier).last shaped = if entry[:type] == :date && (canon = SegmentClassifier.canonical_date(entry[:value])) canon elsif entry[:type] == :currency && (canon = SegmentClassifier.canonical_currency(entry[:value])) canon elsif entry[:variable] "{#{(hints && entry[:hint]) || SegmentClassifier.display_type(entry[:type])}}" else entry[:value] end "urn:#{ns}:#{shaped}" end |