Class: Classifier::LSI

Inherits:
Object show all
Includes:
Streaming, Mutex_m
Defined in:
lib/classifier/lsi.rb,
lib/classifier/lsi.rb,
lib/classifier/lsi/incremental_svd.rb

Overview

This class implements a Latent Semantic Indexer, which can search, classify and cluster data based on underlying semantic relations. For more information on the algorithms used, please consult Wikipedia.

Defined Under Namespace

Modules: IncrementalSVD

Constant Summary collapse

DEFAULT_MAX_RANK =

Default maximum rank for incremental SVD

100

Constants included from Streaming

Streaming::DEFAULT_BATCH_SIZE

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Streaming

#delete_checkpoint, #list_checkpoints, #save_checkpoint

Constructor Details

#initialize(options = {}) ⇒ LSI

Create a fresh index. If you want to call #build_index manually, use

Classifier::LSI.new auto_rebuild: false

For incremental SVD mode (adds documents without full rebuild):

Classifier::LSI.new incremental: true, max_rank: 100


99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/classifier/lsi.rb', line 99

def initialize(options = {})
  super()
  @auto_rebuild = true unless options[:auto_rebuild] == false
  @word_list = WordList.new
  @items = {}
  @version = 0
  @built_at_version = -1
  @dirty = false
  @storage = nil

  # Incremental SVD settings
  @incremental_mode = options[:incremental] == true
  @max_rank = options[:max_rank] || DEFAULT_MAX_RANK
  @u_matrix = nil
  @initial_vocab_size = nil
  @min_word_length = options[:min_word_length] || Classifier.config.min_word_length
end

Class Attribute Details

.backendObject

Returns the value of attribute backend.



15
16
17
# File 'lib/classifier/lsi.rb', line 15

def backend
  @backend
end

Instance Attribute Details

#auto_rebuildObject

Returns the value of attribute auto_rebuild.



86
87
88
# File 'lib/classifier/lsi.rb', line 86

def auto_rebuild
  @auto_rebuild
end

#singular_valuesObject (readonly)

Returns the value of attribute singular_values.



85
86
87
# File 'lib/classifier/lsi.rb', line 85

def singular_values
  @singular_values
end

#storageObject

Returns the value of attribute storage.



86
87
88
# File 'lib/classifier/lsi.rb', line 86

def storage
  @storage
end

#word_listObject (readonly)

Returns the value of attribute word_list.



85
86
87
# File 'lib/classifier/lsi.rb', line 85

def word_list
  @word_list
end

Class Method Details

.from_json(json) ⇒ Object

Loads an LSI index from a JSON string or Hash created by #to_json or #as_json. The index will be rebuilt after loading.

Raises:

  • (ArgumentError)


537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
# File 'lib/classifier/lsi.rb', line 537

def self.from_json(json)
  data = json.is_a?(String) ? JSON.parse(json) : json
  raise ArgumentError, "Invalid classifier type: #{data['type']}" unless data['type'] == 'lsi'

  # Create instance with auto_rebuild disabled during loading
  instance = new(auto_rebuild: false)

  # Restore items (categories stay as strings, matching original storage)
  data['items'].each do |item_key, item_data|
    word_hash = item_data['word_hash'].transform_keys(&:to_sym)
    categories = item_data['categories']
    instance.instance_variable_get(:@items)[item_key] = ContentNode.new(word_hash, *categories)
    instance.instance_variable_set(:@version, instance.instance_variable_get(:@version) + 1)
  end

  # Restore auto_rebuild setting and rebuild index
  instance.auto_rebuild = data['auto_rebuild']
  instance.build_index
  instance
end

.load(storage:) ⇒ Object

Loads an LSI index from the configured storage. The storage is set on the returned instance.

Raises:



620
621
622
623
624
625
626
627
# File 'lib/classifier/lsi.rb', line 620

def self.load(storage:)
  data = storage.read
  raise StorageError, 'No saved state found' unless data

  instance = from_json(data)
  instance.storage = storage
  instance
end

.load_checkpoint(storage:, checkpoint_id:) ⇒ Object

Loads an LSI index from a checkpoint.

Raises:

  • (ArgumentError)


639
640
641
642
643
644
645
646
647
648
649
650
651
# File 'lib/classifier/lsi.rb', line 639

def self.load_checkpoint(storage:, checkpoint_id:)
  raise ArgumentError, 'Storage must be File storage for checkpoints' unless storage.is_a?(Storage::File)

  dir = File.dirname(storage.path)
  base = File.basename(storage.path, '.*')
  ext = File.extname(storage.path)
  checkpoint_path = File.join(dir, "#{base}_checkpoint_#{checkpoint_id}#{ext}")

  checkpoint_storage = Storage::File.new(path: checkpoint_path)
  instance = load(storage: checkpoint_storage)
  instance.storage = storage
  instance
end

.load_from_file(path) ⇒ Object

Loads an LSI index from a file (legacy API).



632
633
634
# File 'lib/classifier/lsi.rb', line 632

def self.load_from_file(path)
  from_json(File.read(path))
end

.matrix_classObject

Get the Matrix class for the current backend



31
32
33
# File 'lib/classifier/lsi.rb', line 31

def matrix_class
  backend == :native ? Classifier::Linalg::Matrix : ::Matrix
end

.native_available?Boolean

Check if using native C extension

Returns:

  • (Boolean)


19
20
21
# File 'lib/classifier/lsi.rb', line 19

def native_available?
  backend == :native
end

.vector_classObject

Get the Vector class for the current backend



25
26
27
# File 'lib/classifier/lsi.rb', line 25

def vector_class
  backend == :native ? Classifier::Linalg::Vector : ::Vector
end

Instance Method Details

#<<(item) ⇒ Object

A less flexible shorthand for add_item that assumes you are passing in a string with no categorries. item will be duck typed via to_s .



248
249
250
# File 'lib/classifier/lsi.rb', line 248

def <<(item)
  add_item(item)
end

#add(**items) ⇒ Object

Adds items to the index using hash-style syntax. The hash keys are categories, and values are items (or arrays of items).

For example:

lsi = Classifier::LSI.new
lsi.add("Dog" => "Dogs are loyal pets")
lsi.add("Cat" => "Cats are independent")
lsi.add(Bird: "Birds can fly")  # Symbol keys work too

Multiple items with the same category:

lsi.add("Dog" => ["Dogs are loyal", "Puppies are cute"])

Batch operations with multiple categories:

lsi.add(
  "Dog" => ["Dogs are loyal", "Puppies are cute"],
  "Cat" => ["Cats are independent", "Kittens are playful"]
)


198
199
200
201
202
# File 'lib/classifier/lsi.rb', line 198

def add(**items)
  items.each do |category, value|
    Array(value).each { |doc| add_item(doc, category.to_s) }
  end
end

#add_batch(batch_size: Streaming::DEFAULT_BATCH_SIZE, **items) ⇒ Object

Adds items to the index in batches from an array. Documents are added without rebuilding, then the index is rebuilt at the end.

Examples:

Batch add with progress

lsi.add_batch(Dog: documents, batch_size: 100) do |progress|
  puts "#{progress.percent}% complete"
end


696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
# File 'lib/classifier/lsi.rb', line 696

def add_batch(batch_size: Streaming::DEFAULT_BATCH_SIZE, **items)
  original_auto_rebuild = @auto_rebuild
  @auto_rebuild = false

  begin
    total_docs = items.values.sum { |v| Array(v).size }
    progress = Streaming::Progress.new(total: total_docs)

    items.each do |category, documents|
      Array(documents).each_slice(batch_size) do |batch|
        batch.each { |doc| add_item(doc, category.to_s) }
        progress.completed += batch.size
        progress.current_batch += 1
        yield progress if block_given?
      end
    end
  ensure
    @auto_rebuild = original_auto_rebuild
    build_index if original_auto_rebuild
  end
end

#add_item(item, *categories, &block) ⇒ Object

Deprecated.

Use #add instead for clearer hash-style syntax.

Adds an item to the index. item is assumed to be a string, but any item may be indexed so long as it responds to #to_s or if you provide an optional block explaining how the indexer can fetch fresh string data. This optional block is passed the item, so the item may only be a reference to a URL or file name.

For example:

lsi = Classifier::LSI.new
lsi.add_item "This is just plain text"
lsi.add_item "/home/me/filename.txt" { |x| File.read x }
ar = ActiveRecordObject.find( :all )
lsi.add_item ar, *ar.categories { |x| ar.content }


220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
# File 'lib/classifier/lsi.rb', line 220

def add_item(item, *categories, &block)
  clean_word_hash =
    if block
      block.call(item).clean_word_hash(@min_word_length)
    else
      item.to_s.clean_word_hash(@min_word_length)
    end

  node = nil

  synchronize do
    node = ContentNode.new(clean_word_hash, *categories)
    @items[item] = node
    @version += 1
    @dirty = true
  end

  # Use incremental update if enabled and we have a U matrix
  return perform_incremental_update(node, clean_word_hash) if @incremental_mode && @u_matrix

  build_index if @auto_rebuild
end

#as_jsonObject

Returns a hash representation of the LSI index. Only source data (word_hash, categories) is included, not computed vectors. This can be converted to JSON or used directly.



508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
# File 'lib/classifier/lsi.rb', line 508

def as_json(*)
  items_data = @items.transform_values do |node|
    {
      word_hash: node.word_hash.transform_keys(&:to_s),
      categories: node.categories.map(&:to_s)
    }
  end

  {
    version: 1,
    type: 'lsi',
    auto_rebuild: @auto_rebuild,
    items: items_data
  }
end

#build_index(cutoff = 0.75, force: false) ⇒ Object

This function rebuilds the index if needs_rebuild? returns true. For very large document spaces, this indexing operation may take some time to complete, so it may be wise to place the operation in another thread.

As a rule, indexing will be fairly swift on modern machines until you have well over 500 documents indexed, or have an incredibly diverse vocabulary for your documents.

The optional parameter “cutoff” is a tuning parameter. When the index is built, a certain number of s-values are discarded from the system. The cutoff parameter tells the indexer how many of these values to keep. A value of 1 for cutoff means that no semantic analysis will take place, turning the LSI class into a simple vector search engine.



301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
# File 'lib/classifier/lsi.rb', line 301

def build_index(cutoff = 0.75, force: false)
  validate_cutoff!(cutoff)

  synchronize do
    return unless force || needs_rebuild_unlocked?

    make_word_list

    doc_list = @items.values
    tda = doc_list.collect { |node| node.raw_vector_with(@word_list) }

    if self.class.native_available?
      # Convert vectors to arrays for matrix construction
      tda_arrays = tda.map { |v| v.respond_to?(:to_a) ? v.to_a : v }
      tdm = self.class.matrix_class.alloc(*tda_arrays).trans
      ntdm, u_mat = build_reduced_matrix_with_u(tdm, cutoff)
      assign_native_ext_lsi_vectors(ntdm, doc_list)
    else
      tdm = Matrix.rows(tda).trans
      ntdm, u_mat = build_reduced_matrix_with_u(tdm, cutoff)
      assign_ruby_lsi_vectors(ntdm, doc_list)
    end

    # Store U matrix for incremental mode
    if @incremental_mode
      @u_matrix = u_mat
      @initial_vocab_size = @word_list.size
    end

    @built_at_version = @version
  end
end

#categories_for(item) ⇒ Object

Returns the categories for a given indexed items. You are free to add and remove items from this as you see fit. It does not invalide an index to change its categories.



256
257
258
259
260
261
262
# File 'lib/classifier/lsi.rb', line 256

def categories_for(item)
  synchronize do
    return [] unless @items[item]

    @items[item].categories
  end
end

#classify(doc, cutoff = 0.30, &block) ⇒ Object

This function uses a voting system to categorize documents, based on the categories of other documents. It uses the same logic as the find_related function to find related documents, then returns the most obvious category from this list.



429
430
431
432
433
434
435
436
437
438
# File 'lib/classifier/lsi.rb', line 429

def classify(doc, cutoff = 0.30, &block)
  validate_cutoff!(cutoff)

  synchronize do
    votes = vote_unlocked(doc, cutoff, &block)

    ranking = votes.keys.sort_by { |x| votes[x] }
    ranking[-1]
  end
end

#classify_with_confidence(doc, cutoff = 0.30, &block) ⇒ Object

Returns the same category as classify() but also returns a confidence value derived from the vote share that the winning category got.

e.g. category,confidence = classify_with_confidence(doc) if confidence < 0.3

category = nil

end

See classify() for argument docs



459
460
461
462
463
464
465
466
467
468
469
470
471
472
# File 'lib/classifier/lsi.rb', line 459

def classify_with_confidence(doc, cutoff = 0.30, &block)
  validate_cutoff!(cutoff)

  synchronize do
    votes = vote_unlocked(doc, cutoff, &block)
    votes_sum = votes.values.sum
    return [nil, nil] if votes_sum.zero?

    ranking = votes.keys.sort_by { |x| votes[x] }
    winner = ranking[-1]
    vote_share = votes[winner] / votes_sum.to_f
    [winner, vote_share]
  end
end

#current_rankObject

Returns the current rank of the incremental SVD (number of singular values kept). Returns nil if incremental mode is not active.



157
158
159
# File 'lib/classifier/lsi.rb', line 157

def current_rank
  @singular_values&.count(&:positive?)
end

#dirty?Boolean

Returns true if there are unsaved changes.

Returns:

  • (Boolean)


612
613
614
# File 'lib/classifier/lsi.rb', line 612

def dirty?
  @dirty
end

#disable_incremental_mode!Object

Disables incremental mode. Subsequent adds will trigger full rebuilds.



164
165
166
167
168
# File 'lib/classifier/lsi.rb', line 164

def disable_incremental_mode!
  @incremental_mode = false
  @u_matrix = nil
  @initial_vocab_size = nil
end

#enable_incremental_mode!(max_rank: DEFAULT_MAX_RANK) ⇒ Object

Enables incremental mode with optional max_rank setting. The next build_index call will store the U matrix for incremental updates.



174
175
176
177
# File 'lib/classifier/lsi.rb', line 174

def enable_incremental_mode!(max_rank: DEFAULT_MAX_RANK)
  @incremental_mode = true
  @max_rank = max_rank
end

This function takes content and finds other documents that are semantically “close”, returning an array of documents sorted from most to least relavant. max_nearest specifies the number of documents to return. A value of 0 means that it returns all the indexed documents, sorted by relavence.

This is particularly useful for identifing clusters in your document space. For example you may want to identify several “What’s Related” items for weblog articles, or find paragraphs that relate to each other in an essay.



414
415
416
417
418
419
420
421
# File 'lib/classifier/lsi.rb', line 414

def find_related(doc, max_nearest = 3, &block)
  synchronize do
    carry =
      proximity_array_for_content_unlocked(doc, &block).reject { |pair| pair[0] == doc }
    result = carry.collect { |x| x[0] }
    result[0..(max_nearest - 1)]
  end
end

#highest_ranked_stems(doc, count = 3) ⇒ Object

Prototype, only works on indexed documents. I have no clue if this is going to work, but in theory it’s supposed to.



478
479
480
481
482
483
484
485
486
# File 'lib/classifier/lsi.rb', line 478

def highest_ranked_stems(doc, count = 3)
  synchronize do
    raise 'Requested stem ranking on non-indexed content!' unless @items[doc]

    arr = node_for_content_unlocked(doc).lsi_vector.to_a
    top_n = arr.sort.reverse[0..(count - 1)]
    top_n.collect { |x| @word_list.word_for_index(arr.index(x)) }
  end
end

#highest_relative_content(max_chunks = 10) ⇒ Object

This method returns max_chunks entries, ordered by their average semantic rating. Essentially, the average distance of each entry from all other entries is calculated, the highest are returned.

This can be used to build a summary service, or to provide more information about your dataset’s general content. For example, if you were to use categorize on the results of this data, you could gather information on what your dataset is generally about.



344
345
346
347
348
349
350
351
352
353
# File 'lib/classifier/lsi.rb', line 344

def highest_relative_content(max_chunks = 10)
  synchronize do
    return [] if needs_rebuild_unlocked?

    avg_density = {}
    @items.each_key { |x| avg_density[x] = proximity_array_for_content_unlocked(x).sum { |pair| pair[1] } }

    avg_density.keys.sort_by { |x| avg_density[x] }.reverse[0..(max_chunks - 1)].map
  end
end

#incremental_enabled?Boolean

Returns true if incremental mode is enabled and active. Incremental mode becomes active after the first build_index call.

Returns:

  • (Boolean)


149
150
151
# File 'lib/classifier/lsi.rb', line 149

def incremental_enabled?
  @incremental_mode && !@u_matrix.nil?
end

#itemsObject

Returns an array of items that are indexed.



281
282
283
# File 'lib/classifier/lsi.rb', line 281

def items
  synchronize { @items.keys }
end

#marshal_dumpObject

Custom marshal serialization to exclude mutex state



490
491
492
# File 'lib/classifier/lsi.rb', line 490

def marshal_dump
  [@auto_rebuild, @word_list, @items, @version, @built_at_version, @dirty, @min_word_length]
end

#marshal_load(data) ⇒ Object

Custom marshal deserialization to recreate mutex



496
497
498
499
500
501
# File 'lib/classifier/lsi.rb', line 496

def marshal_load(data)
  mu_initialize
  @auto_rebuild, @word_list, @items, @version, @built_at_version, @dirty,
    @min_word_length = data
  @storage = nil
end

#needs_rebuild?Boolean

Returns true if the index needs to be rebuilt. The index needs to be built after all informaton is added, but before you start using it for search, classification and cluster detection.

Returns:

  • (Boolean)


122
123
124
# File 'lib/classifier/lsi.rb', line 122

def needs_rebuild?
  synchronize { (@items.keys.size > 1) && (@version != @built_at_version) }
end

#proximity_array_for_content(doc, &block) ⇒ Object

This function is the primitive that find_related and classify build upon. It returns an array of 2-element arrays. The first element of this array is a document, and the second is its “score”, defining how “close” it is to other indexed items.

These values are somewhat arbitrary, having to do with the vector space created by your content, so the magnitude is interpretable but not always meaningful between indexes.

The parameter doc is the content to compare. If that content is not indexed, you can pass an optional block to define how to create the text data. See add_item for examples of how this works.



369
370
371
# File 'lib/classifier/lsi.rb', line 369

def proximity_array_for_content(doc, &block)
  synchronize { proximity_array_for_content_unlocked(doc, &block) }
end

#proximity_norms_for_content(doc, &block) ⇒ Object

Similar to proximity_array_for_content, this function takes similar arguments and returns a similar array. However, it uses the normalized calculated vectors instead of their full versions. This is useful when you’re trying to perform operations on content that is much smaller than the text you’re working with. search uses this primitive.



380
381
382
# File 'lib/classifier/lsi.rb', line 380

def proximity_norms_for_content(doc, &block)
  synchronize { proximity_norms_for_content_unlocked(doc, &block) }
end

#reloadObject

Reloads the LSI index from the configured storage. Raises UnsavedChangesError if there are unsaved changes. Use reload! to force reload and discard changes.

Raises:

  • (ArgumentError)


583
584
585
586
587
588
589
590
591
592
593
# File 'lib/classifier/lsi.rb', line 583

def reload
  raise ArgumentError, 'No storage configured' unless storage
  raise UnsavedChangesError, 'Unsaved changes would be lost. Call save first or use reload!' if @dirty

  data = storage.read
  raise StorageError, 'No saved state found' unless data

  restore_from_json(data)
  @dirty = false
  self
end

#reload!Object

Force reloads the LSI index from storage, discarding any unsaved changes.

Raises:

  • (ArgumentError)


598
599
600
601
602
603
604
605
606
607
# File 'lib/classifier/lsi.rb', line 598

def reload!
  raise ArgumentError, 'No storage configured' unless storage

  data = storage.read
  raise StorageError, 'No saved state found' unless data

  restore_from_json(data)
  @dirty = false
  self
end

#remove_item(item) ⇒ Object

Removes an item from the database, if it is indexed.



267
268
269
270
271
272
273
274
275
276
277
# File 'lib/classifier/lsi.rb', line 267

def remove_item(item)
  removed = synchronize do
    next false unless @items.key?(item)

    @items.delete(item)
    @version += 1
    @dirty = true
    true
  end
  build_index if removed && @auto_rebuild
end

#saveObject

Saves the LSI index to the configured storage. Raises ArgumentError if no storage is configured.

Raises:

  • (ArgumentError)


562
563
564
565
566
567
# File 'lib/classifier/lsi.rb', line 562

def save
  raise ArgumentError, 'No storage configured. Use save_to_file(path) or set storage=' unless storage

  storage.write(to_json)
  @dirty = false
end

#save_to_file(path) ⇒ Object

Saves the LSI index to a file (legacy API).



572
573
574
575
576
# File 'lib/classifier/lsi.rb', line 572

def save_to_file(path)
  result = File.write(path, to_json)
  @dirty = false
  result
end

#search(string, max_nearest = 3) ⇒ Object

This function allows for text-based search of your index. Unlike other functions like find_related and classify, search only takes short strings. It will also ignore factors like repeated words. It is best for short, google-like search terms. A search will first priortize lexical relationships, then semantic ones.

While this may seem backwards compared to the other functions that LSI supports, it is actually the same algorithm, just applied on a smaller document.



393
394
395
396
397
398
399
400
401
# File 'lib/classifier/lsi.rb', line 393

def search(string, max_nearest = 3)
  synchronize do
    return [] if needs_rebuild_unlocked?

    carry = proximity_norms_for_content_unlocked(string)
    result = carry.collect { |x| x[0] }
    result[0..(max_nearest - 1)]
  end
end

#singular_value_spectrumObject



127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/classifier/lsi.rb', line 127

def singular_value_spectrum
  return nil unless @singular_values

  total = @singular_values.sum
  return nil if total.zero?

  cumulative = 0.0
  @singular_values.map.with_index do |value, i|
    cumulative += value
    {
      dimension: i,
      value: value,
      percentage: value / total,
      cumulative_percentage: cumulative / total
    }
  end
end

#to_jsonObject

Serializes the LSI index to a JSON string. Only source data (word_hash, categories) is serialized, not computed vectors. On load, the index will be rebuilt automatically.



529
530
531
# File 'lib/classifier/lsi.rb', line 529

def to_json(*)
  as_json.to_json
end

#train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block) ⇒ Object

Alias train_batch to add_batch for API consistency with other classifiers. Note: LSI uses categories differently (items have categories, not the training call).



722
723
724
725
726
727
728
# File 'lib/classifier/lsi.rb', line 722

def train_batch(category = nil, documents = nil, batch_size: Streaming::DEFAULT_BATCH_SIZE, **categories, &block)
  if category && documents
    add_batch(batch_size: batch_size, **{ category.to_sym => documents }, &block)
  else
    add_batch(batch_size: batch_size, **categories, &block)
  end
end

#train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE) ⇒ Object

Trains the LSI index from an IO stream. Each line in the stream is treated as a separate document. Documents are added without rebuilding, then the index is rebuilt at the end.

Examples:

Train from a file

lsi.train_from_stream(:category, File.open('corpus.txt'))

With progress tracking

lsi.train_from_stream(:category, io, batch_size: 500) do |progress|
  puts "#{progress.completed} documents processed"
end


666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
# File 'lib/classifier/lsi.rb', line 666

def train_from_stream(category, io, batch_size: Streaming::DEFAULT_BATCH_SIZE)
  original_auto_rebuild = @auto_rebuild
  @auto_rebuild = false

  begin
    reader = Streaming::LineReader.new(io, batch_size: batch_size)
    total = reader.estimate_line_count
    progress = Streaming::Progress.new(total: total)

    reader.each_batch do |batch|
      batch.each { |text| add_item(text, category) }
      progress.completed += batch.size
      progress.current_batch += 1
      yield progress if block_given?
    end
  ensure
    @auto_rebuild = original_auto_rebuild
    build_index if original_auto_rebuild
  end
end

#vote(doc, cutoff = 0.30, &block) ⇒ Object



441
442
443
444
445
# File 'lib/classifier/lsi.rb', line 441

def vote(doc, cutoff = 0.30, &block)
  validate_cutoff!(cutoff)

  synchronize { vote_unlocked(doc, cutoff, &block) }
end