Ocrb

OCR

Installation

Install the gem and add to the application's Gemfile by executing:

bundle add ocrb

If bundler is not being used to manage dependencies, install the gem by executing:

gem install ocrb

Usage

Require the gem and call Ocrb.run with an image path and prompt:

require "ocrb"

text = Ocrb.run("receipt.jpg", "Extract the text from this image.")
puts text

By default, Ocrb.run uses the ollama CLI with the glm-ocr:bf16 model:

Ocrb.run("receipt.jpg", "Summarize the line items.")

If you want to use an OpenAI-compatible API instead, pass the built-in extractor explicitly:

require "ocrb"

text = Ocrb.run(
  "receipt.jpg",
  "Recognize total amount.",
  extractor: Ocrb::Extractors::OpenAi.new(
    url: "http://127.0.0.1:1234/v1",
    model: "zai-org/glm-4.6v-flash",
    api_key: ENV.fetch("OPENAI_API_KEY", "asdf"),
    json: {type: 'object', properties: {amount: {type: 'string'}}} # can be `nil` or `true` or `response_format.json_schema.schema`
  )
)

You can also resize the image before OCR by passing a resizer:

require "ocrb"

text = Ocrb.run(
  "receipt.jpg",
  "Extract all visible text.",
  resizer: Ocrb::Resizers::Sips.new(resample_width: 1024)
)

Both extractor and resizer are duck-typed. Any object that responds to extract(image_path, prompt) or resize(image_path) can be passed in.

License

The gem is available as open source under the terms of the MIT License.