Class: Ckmedian::Clusterer

Inherits:
Object
  • Object
show all
Defined in:
lib/ckmedian/clusterer.rb

Overview

Optimal k-median clustering for univariate (1D) data using dynamic programming. Minimizes within-cluster sum of absolute deviations (L1 norm). More robust to outliers than k-means.

Instance Method Summary collapse

Constructor Details

#initialize(entries, kmin, kmax = kmin, kestimate = :fast) ⇒ Clusterer

Creates a new Ckmedian clusterer.

Examples:

Fixed number of clusters

Ckmedian::Clusterer.new([1, 2, 3, 100, 101], 2).clusters
# => [[1, 2, 3], [100, 101]]

Photo timeline clustering (robust to bursts and outliers)

timestamps = photos.map(&:taken_at).map(&:to_i)
Ckmedian::Clusterer.new(timestamps, 1, 20, :stable).clusters

Parameters:

  • entries (Array<Numeric>)

    The data points to cluster

  • kmin (Integer)

    Minimum number of clusters to consider

  • kmax (Integer) (defaults to: kmin)

    Maximum number of clusters to consider (defaults to kmin for fixed K)

  • kestimate (Symbol) (defaults to: :fast)

    Method for estimating optimal K:

    • :fast - Quick heuristic using implicit Gaussian assumption (best for large datasets)

    • :stable - Model-based estimation using Laplace Mixture Model (better for outliers/bursts)

    • :lmm - Alias for :stable (Laplace Mixture Model)

Raises:

  • (ArgumentError)


25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/ckmedian/clusterer.rb', line 25

def initialize(entries, kmin, kmax = kmin, kestimate = :fast)
  @xcount = entries.size

  raise ArgumentError, "Minimum cluster count is bigger than element count" if kmin > @xcount
  raise ArgumentError, "Maximum cluster count is bigger than element count" if kmax > @xcount

  @kmin                  = kmin
  @unique_xcount         = entries.uniq.size
  @kmax                  = [@unique_xcount, kmax].min
  @xsorted_original      = entries.sort
  @xsorted               = @xsorted_original.map(&:to_f)
  @use_stable_estimation = %i[lmm stable].include?(kestimate)
end

Instance Method Details

#clustersObject



39
40
41
42
43
44
45
46
47
48
# File 'lib/ckmedian/clusterer.rb', line 39

def clusters
  @clusters ||=
    if @unique_xcount <= 1
      [@xsorted_original]
    else
      sorted_group_sizes.each_with_object([]) do |size, groups|
        groups << @xsorted_original.shift(size)
      end
    end
end