Class: Iriq::Clusterer

Inherits:
Object
  • Object
show all
Defined in:
lib/iriq/clusterer.rb

Overview

Groups many identifiers by host + path shape. Use ‘add` to feed inputs and `clusters` to read out the groups. `explain` annotates a single identifier against the cluster it would fall into, including which positions are stable across all observed members.

Implemented as a thin wrapper over Storage::Memory — the same code path Corpus uses for the cluster portion of its state, so there’s only one place that knows how clusters get stored.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(classifier: SegmentClassifier::DEFAULT) ⇒ Clusterer

Returns a new instance of Clusterer.



11
12
13
14
# File 'lib/iriq/clusterer.rb', line 11

def initialize(classifier: SegmentClassifier::DEFAULT)
  @classifier = classifier
  @storage    = Storage::Memory.new(classifier: classifier)
end

Class Method Details

.from_dump(h, classifier: SegmentClassifier::DEFAULT) ⇒ Object



54
55
56
57
58
59
# File 'lib/iriq/clusterer.rb', line 54

def self.from_dump(h, classifier: SegmentClassifier::DEFAULT)
  c = new(classifier: classifier)
  restored = h["clusters"].transform_values { |cdump| Cluster.from_dump(cdump) }
  c.instance_variable_get(:@storage).instance_variable_set(:@clusters, restored)
  c
end

Instance Method Details

#add(input, shape: nil) ⇒ Object



16
17
18
19
20
# File 'lib/iriq/clusterer.rb', line 16

def add(input, shape: nil)
  iri = coerce(input)
  key, host, scheme, derived = Cluster.key_for(iri, classifier: @classifier, shape: shape)
  @storage.add_to_cluster(key, host, scheme, derived, iri)
end

#clustersObject



22
23
24
# File 'lib/iriq/clusterer.rb', line 22

def clusters
  @storage.clusters
end

#dumpObject



50
51
52
# File 'lib/iriq/clusterer.rb', line 50

def dump
  { "clusters" => clusters.each_with_object({}) { |c, h| h[c.key] = c.dump } }
end

#explain(input) ⇒ Object

Returns a per-segment explanation for the input, merging classifier output with what we’ve observed in its cluster (i.e. positions that are factually stable get marked variable: false even if classifier would otherwise call them variable).



34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# File 'lib/iriq/clusterer.rb', line 34

def explain(input)
  iri = coerce(input)
  key, * = Cluster.key_for(iri, classifier: @classifier)
  cluster = clusters.find { |c| c.key == key }
  stats   = cluster ? cluster.segment_stats : []
  hinted  = SegmentHints.derive(iri.path_segments, @classifier)

  hinted.each_with_index.map do |entry, i|
    stable = stats[i] && stats[i][:stable]
    entry.merge(
      variable: !stable && entry[:variable],
      stable:   !!stable,
    )
  end
end

#sizeObject



26
27
28
# File 'lib/iriq/clusterer.rb', line 26

def size
  @storage.cluster_size
end