Class: Ckmeans::Clusterer

Inherits:
Object
  • Object
show all
Defined in:
lib/ckmeans/clusterer.rb

Overview

Optimal k-means clustering for univariate (1D) data using dynamic programming. Minimizes within-cluster sum of squared distances (L2 norm).

Instance Method Summary collapse

Constructor Details

#initialize(entries, kmin, kmax = kmin, kestimate = :fast) ⇒ Clusterer

Creates a new Ckmeans clusterer.

Examples:

Fixed number of clusters

Ckmeans::Clusterer.new([1, 2, 3, 100, 101], 2).clusters
# => [[1, 2, 3], [100, 101]]

Automatic K selection with stable estimation

Ckmeans::Clusterer.new([1, 1, 1, 5, 5, 5, 10, 10, 10], 1, 5, :stable).clusters

Parameters:

  • entries (Array<Numeric>)

    The data points to cluster

  • kmin (Integer)

    Minimum number of clusters to consider

  • kmax (Integer) (defaults to: kmin)

    Maximum number of clusters to consider (defaults to kmin for fixed K)

  • kestimate (Symbol) (defaults to: :fast)

    Method for estimating optimal K:

    • :fast - Quick heuristic using implicit Gaussian assumption (best for large datasets)

    • :stable - Model-based estimation using Gaussian Mixture Model (better for duplicates/edge cases)

    • :gmm - Alias for :stable (Gaussian Mixture Model)

Raises:

  • (ArgumentError)


23
24
25
26
27
28
29
30
31
32
33
34
35
# File 'lib/ckmeans/clusterer.rb', line 23

def initialize(entries, kmin, kmax = kmin, kestimate = :fast)
  @xcount = entries.size

  raise ArgumentError, "Minimum cluster count is bigger than element count" if kmin > @xcount
  raise ArgumentError, "Maximum cluster count is bigger than element count" if kmax > @xcount

  @kmin                  = kmin
  @unique_xcount         = entries.uniq.size
  @kmax                  = [@unique_xcount, kmax].min
  @xsorted_original      = entries.sort
  @xsorted               = @xsorted_original.map(&:to_f)
  @use_stable_estimation = %i[gmm stable].include?(kestimate)
end

Instance Method Details

#clustersObject



37
38
39
40
41
42
43
44
45
46
# File 'lib/ckmeans/clusterer.rb', line 37

def clusters
  @clusters ||=
    if @unique_xcount <= 1
      [@xsorted_original]
    else
      sorted_group_sizes.each_with_object([]) do |size, groups|
        groups << @xsorted_original.shift(size)
      end
    end
end