index_util

Subclass-defined local search indexes for Ruby.

index_util lets a small Ruby class define where documents come from, how they are split, and how final searchable fragments are produced. The gem stores those fragments in SQLite with FTS5 keyword lookup, sqlite-vec vector lookup, and embedding_util embeddings/reranking.

This gem is in the 0.x series. The API is intentionally unstable until 1.0, and public method names, configuration options, return shapes, and default profiles may change between minor releases.

Installation

Add the gem to your Gemfile:

gem "index_util"

Then install dependencies:

bundle install

Quick Start

#!/usr/bin/env ruby

require "index_util"

class MyIndex < IndexUtil::Index
  def database_file
    "myindex.sqlite3"
  end

  def document_list
    Dir["docs/**/*.md"]
  end

  def document_sections(_document, content)
    content.split(/^## /).each_with_index.to_h { |section, index| [index, section] }
  end
end

MyIndex.cli if $PROGRAM_NAME == __FILE__

Build and query:

./myindex.rb --index
./myindex.rb "how do I split text?" --limit 5

CLI

Subclass scripts can call MyIndex.cli and use:

./myindex.rb --index
./myindex.rb --index-new
./myindex.rb --index-update
./myindex.rb "query text" --limit 5

The installed executable can load a class explicitly:

index_util --require ./myindex.rb --class MyIndex index
index_util --require ./myindex.rb --class MyIndex query "query text" --limit 5

When a script should support both direct execution and installed CLI loading, guard the class CLI call with if $PROGRAM_NAME == __FILE__.

Indexing commands print one-line progress to stderr while they list, load, embed, and store documents. Successful queries print a JSON array with document and content. Failures print compact JSON to stderr and exit non-zero.

Examples

Examples live under examples/. Each example is a small standalone directory with its own Gemfile pointing back to this checkout.

Recipe-card example:

cd examples/recipes
bundle install
bundle exec ruby recipes.rb --index
bundle exec ruby recipes.rb "quick vegetarian dinner" --limit 3

Kreuzberg PDF example:

cd examples/pdf_notes
bundle install
bundle exec ruby pdf_notes.rb --index
bundle exec ruby pdf_notes.rb "how should a risk assessment be conducted?" --limit 3

Ruby API example:

cd examples/ruby_api
bundle install
bundle exec ruby ruby_api.rb --index
bundle exec ruby ruby_api.rb "how to split a string" --limit 5

API

Subclass IndexUtil::Index and define:

  • database_file returns the SQLite path.
  • document_list returns source documents.
  • document_content(document) defaults to File.read(document).
  • document_checksum(document, content) defaults to stable SHA-256 over content.to_s.
  • document_sections(document, content) defaults to { nil => content }.
  • document_postprocess(fragment_document, content) defaults to identity and must return a String.
  • query_amendments(query) defaults to {} and returns extra {document_id => content} candidates.

Fragment document ids are built as document for nil sections or document#section_id otherwise. # is not escaped.

Development

bundle install
bundle exec rake

Contributing

Bug reports and pull requests are welcome at https://github.com/rbutils/index_util.

License

The gem is available as open source under the terms of the MIT License.