redact_ner
Ruby bindings for the redact-ner Rust
crate. Provides Named Entity Recognition (NER) for PII detection backed by the
ONNX Runtime.
Status
Early MVP. Wraps the upstream NerRecognizer API surface:
from_file, analyze, available?, supports_language?,
supported_entities, plus configuration accessors.
Installation
The gem ships a native Rust extension built via
rb_sys. You need:
- Ruby >= 3.0
- A working Rust toolchain (
rustcandcargo, stable) - The ONNX Runtime shared library available at runtime (see below)
bundle install
bin/rake compile # or: bundle exec rake compile
Note: invoke rake via
bin/rake(a bundler binstub) orbundle exec rake. Running plainrakewill fail because the globally installed rake conflicts with the bundle-locked version.
Precompiled musl gems (Alpine / distroless)
The precompiled x86_64-linux-musl and aarch64-linux-musl gems link the C++
runtime libstdc++ dynamically, so it must be present at load time. Bare
Alpine/distroless images do not ship it — install it first, e.g. apk add
--no-cache libstdc++ on Alpine. glibc images (e.g. Debian slim) already
include it.
ONNX Runtime
redact-ner uses the ort crate with the load-dynamic feature, which means
the ONNX Runtime shared library is looked up at runtime, not at link time. You
must point ORT_DYLIB_PATH to a libonnxruntime.so / .dylib / .dll
compatible with the upstream crate.
Example (Linux):
export ORT_DYLIB_PATH=/path/to/onnxruntime-linux-x64-1.20.0/lib/libonnxruntime.so.1.20.0
Graceful fallback — important
Upstream redact-ner does not raise when a model or tokenizer cannot be
loaded. Recognizer.from_file always returns a recognizer object; if the ONNX
session or tokenizer fails to initialize (model file missing, ORT_DYLIB_PATH
unset, etc.), the recognizer is left in an "unavailable" state and
#analyze quietly returns an empty array.
If you want a hard failure instead, check #available? immediately:
rec = RedactNer::Recognizer.from_file("model.onnx")
raise "NER model failed to load" unless rec.available?
Usage
require "redact_ner"
recognizer = RedactNer::Recognizer.from_file("path/to/model.onnx")
results = recognizer.analyze("John Doe works at Acme Corp in New York", "en")
results.each do |r|
puts "#{r.entity_type}\t#{r.start}..#{r.end}\t#{r.score.round(3)}\t#{r.text}"
end
analyze returns an array of RedactNer::Result, which is a Struct with the
following attributes:
| Attribute | Type | Notes |
|---|---|---|
entity_type |
String | e.g. "PERSON", "ORGANIZATION", "LOCATION" |
start |
Integer | byte offset, inclusive |
end |
Integer | byte offset, exclusive |
score |
Float | model confidence in [0.0, 1.0] |
recognizer_name |
String | always "NerRecognizer" |
text |
String | the matched substring |
Other methods:
recognizer.available? # => true if a model + tokenizer were loaded
recognizer.supports_language?("ja") # => true / false
recognizer.supported_entities # => ["PERSON", "LOCATION", ...]
recognizer.name # => "NerRecognizer"
recognizer.min_confidence # => 0.7
recognizer.max_seq_length # => 512
recognizer.model_path # => the path you passed in
Model files
This gem does not bundle models. Use the
scripts/export_ner_model.py
helper from the upstream censgate/redact repository to export a HuggingFace
NER model to ONNX. Place the resulting model.onnx, tokenizer.json, and
config.json in a single directory and pass the .onnx path to
Recognizer.from_file.
Development
bundle install
bin/rake compile
bin/rake test
Releasing
For maintainers — to cut a new release:
- Update
lib/redact_ner/version.rbandCHANGELOG.md. Commit. - Tag the release:
git tag v$(ruby -Ilib -rredact_ner/version -e 'puts RedactNer::VERSION') - Push:
git push && git push --tags - Build and ship the gem:
bin/rake build # produces pkg/redact_ner-X.Y.Z.gem
gem push pkg/redact_ner-*.gem
The first gem push requires a rubygems.org API key (or an OIDC
trusted-publisher setup). Two-factor MFA is required by this gem's
rubygems_mfa_required metadata.
Note: this currently ships a "source" gem only. End users compile the Rust extension on
gem install. Cross-compiled precompiled gems (per platform) can be added later viarb-sys-dockandrake-compiler-dock.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option. The upstream redact-ner crate is Apache-2.0.