Class: Iriq::Clusterer
- Inherits:
-
Object
- Object
- Iriq::Clusterer
- Defined in:
- lib/iriq/clusterer.rb
Overview
Groups many identifiers by host + path shape. Use ‘add` to feed inputs and `clusters` to read out the groups. `explain` annotates a single identifier against the cluster it would fall into, including which positions are stable across all observed members.
Implemented as a thin wrapper over Storage::Memory — the same code path Corpus uses for the cluster portion of its state, so there’s only one place that knows how clusters get stored.
Class Method Summary collapse
Instance Method Summary collapse
- #add(input, shape: nil) ⇒ Object
- #clusters ⇒ Object
- #dump ⇒ Object
-
#explain(input) ⇒ Object
Returns a per-segment explanation for the input, merging classifier output with what we’ve observed in its cluster (i.e. positions that are factually stable get marked variable: false even if classifier would otherwise call them variable).
-
#initialize(classifier: SegmentClassifier::DEFAULT) ⇒ Clusterer
constructor
A new instance of Clusterer.
- #size ⇒ Object
Constructor Details
Class Method Details
.from_dump(h, classifier: SegmentClassifier::DEFAULT) ⇒ Object
54 55 56 57 58 59 |
# File 'lib/iriq/clusterer.rb', line 54 def self.from_dump(h, classifier: SegmentClassifier::DEFAULT) c = new(classifier: classifier) restored = h["clusters"].transform_values { |cdump| Cluster.from_dump(cdump) } c.instance_variable_get(:@storage).instance_variable_set(:@clusters, restored) c end |
Instance Method Details
#add(input, shape: nil) ⇒ Object
16 17 18 19 20 |
# File 'lib/iriq/clusterer.rb', line 16 def add(input, shape: nil) iri = coerce(input) key, host, scheme, derived = Cluster.key_for(iri, classifier: @classifier, shape: shape) @storage.add_to_cluster(key, host, scheme, derived, iri) end |
#clusters ⇒ Object
22 23 24 |
# File 'lib/iriq/clusterer.rb', line 22 def clusters @storage.clusters end |
#dump ⇒ Object
50 51 52 |
# File 'lib/iriq/clusterer.rb', line 50 def dump { "clusters" => clusters.each_with_object({}) { |c, h| h[c.key] = c.dump } } end |
#explain(input) ⇒ Object
Returns a per-segment explanation for the input, merging classifier output with what we’ve observed in its cluster (i.e. positions that are factually stable get marked variable: false even if classifier would otherwise call them variable).
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/iriq/clusterer.rb', line 34 def explain(input) iri = coerce(input) key, * = Cluster.key_for(iri, classifier: @classifier) cluster = clusters.find { |c| c.key == key } stats = cluster ? cluster.segment_stats : [] hinted = SegmentHints.derive(iri.path_segments, @classifier) hinted.each_with_index.map do |entry, i| stable = stats[i] && stats[i][:stable] entry.merge( variable: !stable && entry[:variable], stable: !!stable, ) end end |
#size ⇒ Object
26 27 28 |
# File 'lib/iriq/clusterer.rb', line 26 def size @storage.cluster_size end |