index_util
Subclass-defined local search indexes for Ruby.
index_util lets a small Ruby class define where documents come from, how they are split, and how final searchable fragments are produced. The gem stores those fragments in SQLite with FTS5 keyword lookup, sqlite-vec vector lookup, and embedding_util embeddings/reranking.
This gem is in the 0.x series. The API is intentionally unstable until 1.0, and public method names, configuration options, return shapes, and default profiles may change between minor releases.
Installation
Add the gem to your Gemfile:
gem "index_util"
Then install dependencies:
bundle install
Quick Start
#!/usr/bin/env ruby
require "index_util"
class MyIndex < IndexUtil::Index
def database_file
"myindex.sqlite3"
end
def document_list
Dir["docs/**/*.md"]
end
def document_sections(_document, content)
content.split(/^## /).each_with_index.to_h { |section, index| [index, section] }
end
end
MyIndex.cli if $PROGRAM_NAME == __FILE__
Build and query:
./myindex.rb --index
./myindex.rb "how do I split text?" --limit 5
CLI
Subclass scripts can call MyIndex.cli and use:
./myindex.rb --index
./myindex.rb --index-new
./myindex.rb --index-update
./myindex.rb "query text" --limit 5
The installed executable can load a class explicitly:
index_util --require ./myindex.rb --class MyIndex index
index_util --require ./myindex.rb --class MyIndex query "query text" --limit 5
When a script should support both direct execution and installed CLI loading, guard the class CLI call with if $PROGRAM_NAME == __FILE__.
Indexing commands print one-line progress to stderr while they list, load, embed, and store documents. Successful queries print a JSON array with document and content. Failures print compact JSON to stderr and exit non-zero.
Examples
Examples live under examples/. Each example is a small standalone directory with its own Gemfile pointing back to this checkout.
Recipe-card example:
cd examples/recipes
bundle install
bundle exec ruby recipes.rb --index
bundle exec ruby recipes.rb "quick vegetarian dinner" --limit 3
Kreuzberg PDF example:
cd examples/pdf_notes
bundle install
bundle exec ruby pdf_notes.rb --index
bundle exec ruby pdf_notes.rb "how should a risk assessment be conducted?" --limit 3
Ruby API example:
cd examples/ruby_api
bundle install
bundle exec ruby ruby_api.rb --index
bundle exec ruby ruby_api.rb "how to split a string" --limit 5
API
Subclass IndexUtil::Index and define:
database_filereturns the SQLite path.document_listreturns source documents.document_content(document)defaults toFile.read(document).document_checksum(document, content)defaults to stable SHA-256 overcontent.to_s.document_sections(document, content)defaults to{ nil => content }.document_postprocess(fragment_document, content)defaults to identity and must return aString.query_amendments(query)defaults to{}and returns extra{document_id => content}candidates.
Fragment document ids are built as document for nil sections or document#section_id otherwise. # is not escaped.
Development
bundle install
bundle exec rake
Contributing
Bug reports and pull requests are welcome at https://github.com/rbutils/index_util.
License
The gem is available as open source under the terms of the MIT License.