pdf2markdownOCR

A Ruby gem for converting PDF documents to Markdown using a locally-hosted vision LLM (OCR via AI).
Pages are rendered as high-resolution PNG images and then sent to an OpenAI-compatible API endpoint for text extraction.

Requirements

  • Ruby >= 3.1
  • poppler-utils (pdftoppm, pdfinfo)
  • OpenAI-compatible vision LLM server (e.g. vLLM, Ollama, llama.cpp)

Install poppler on Debian/Ubuntu:

sudo apt install poppler-utils

You can get Deepseek's OCR-2 model at huggingface

Installation

Add to your Gemfile:

gem 'pdf2markdownOCR'

Then run:

bundle install

Or install directly:

gem install pdf2markdownOCR

Configuration

Configuration can be set via a block or via environment variables. The block takes priority.

Via configure block

You can configure the gem using a configuration block. This are the options and its default values.

require 'pdf2markdownOCR'

Pdf2MarkdownOCR.configure do |config|
  # URL of your OpenAI-compatible LLM server
  config.llm_api_url = "http://localhost:8000"

  # Model name to request from the server
  config.llm_model = "deepseek-ai/DeepSeek-OCR-2"

  # PNG resolution used when rasterising PDF pages (higher = better OCR, slower)
  config.png_dpi_resolution = 300

  # Conversion mode: :single_thread or :multi_thread
  # :multi_thread converts all pages to pngs in parallel threads
  config.mode = :multi_thread

  # The gem uses Ruby's stdlib `Logger` writing to `$stdout`. You can provide your own instance. To silence it completely, just pass Logger.new("/dev/null") 

  config.logger = Logger.new($stdout).tap do |log|
    log.progname = self.class.name.split('::').first
  end
end

Usage as a library

Convert a PDF and get Markdown as a string

require 'pdf2markdownOCR'

markdown = Pdf2MarkdownOCR.convert_pdf("document.pdf")
puts markdown

Convert a PDF and write directly to a file


Pdf2MarkdownOCR.convert_pdf("document.pdf", "output.md")
# => nil  (content written to output.md)

Usage as a CLI

After installation the pdf2markdownocr executable is available on your PATH. Options are the same as in the configuration block

Usage: pdf2markdownocr [options] <pdf_path>

Converts a PDF file to Markdown using OCR.

Options:
  -o, --output FILE    Output Markdown file
  --llm-api-url OpenAI compatible server URL
  --llm-model MODEL
  --mode Processing mode: single_thread or multi_thread
  --png-dpi DPI resolution for PNG conversion
  -h, --help Show help message

Examples

# Basic conversion (output saved to output.md)
pdf2markdownocr document.pdf

# Custom output file
pdf2markdownocr document.pdf -o result.md

# Custom llm

pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:9800 --llm-model deepseek-ai/DeepSeek-OCR

# Print version
pdf2markdownocr --version

License

MIT