pdf2markdownOCR

A Ruby gem for converting PDF documents to Markdown using a locally-hosted vision LLM (OCR via AI).
Pages are rendered as high-resolution PNG images and then sent to an OpenAI-compatible API endpoint for text extraction.

Requirements

Ruby >= 3.1
poppler-utils (pdftoppm, pdfinfo)
OpenAI-compatible vision LLM server (e.g. vLLM, Ollama, llama.cpp)

Install poppler on Debian/Ubuntu:

sudo apt install poppler-utils

You can get Deepseek's OCR-2 model at huggingface

Installation

Add to your Gemfile:

gem 'pdf2markdownOCR'

Then run:

bundle install

Or install directly:

gem install pdf2markdownOCR

Configuration

Configuration can be set via a block or via environment variables. The block takes priority.

Via configure block

You can configure the gem using a configuration block. This are the options and its default values.

require 'pdf2markdownOCR'

Pdf2MarkdownOCR.configure do |config|
  # URL of your OpenAI-compatible LLM server
  config.llm_api_url = "http://localhost:8000"

  # Model name to request from the server
  config.llm_model = "deepseek-ai/DeepSeek-OCR-2"

  # PNG resolution used when rasterising PDF pages (higher = better OCR, slower)
  config.png_dpi_resolution = 300

  # Conversion mode: :single_thread or :multi_thread
  # :multi_thread converts all pages to pngs in parallel threads
  config.mode = :multi_thread

  # The gem uses Ruby's stdlib `Logger` writing to `$stdout`. You can provide your own instance. To silence it completely, just pass Logger.new("/dev/null") 

  config.logger = Logger.new($stdout).tap do |log|
    log.progname = self.class.name.split('::').first
  end
end

Usage as a library

Convert a PDF and get Markdown as a string

require 'pdf2markdownOCR'

markdown = Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf")
puts markdown

Convert a PDF and write directly to a file


Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md")
# => nil  (content written to output.md)

Convert specific page range


Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md", pages: "1,2,5-7") #Will convert pages 1,2,5,6,7

Usage as a CLI

After installation the pdf2markdownocr executable is available on your PATH. Options are the same as in the configuration block

Usage: pdf2markdownocr [options] <pdf_path>

Converts a PDF file to Markdown using OCR.

Options:
  -o, --output FILE    Output Markdown file
  --llm-api-url OpenAI compatible server URL
  --llm-model MODEL
  --mode Processing mode: single_thread or multi_thread
  --png-dpi DPI resolution for PNG conversion
  --pages Page range
  -h, --help Show help message

Examples

# Basic conversion (output saved to output.md)
pdf2markdownocr document.pdf

# Custom output file
pdf2markdownocr document.pdf -o result.md

# Custom llm

pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:9800 --llm-model deepseek-ai/DeepSeek-OCR

# Print version
pdf2markdownocr --version

Running the models

Ollama setup

Easy to try, but not recommended because performance isnt great, as it doesnt process the requests in parallel

Pull the model

ollama pull deepseek-ocr:latest
ollama run deepseek-ocr:latest

Then call the tool with the correct port and model

pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:11434 --llm-model deepseek-ocr:latest

vLLM

Official vLLM Guide

Install uv and torch, and vllm

uv venv
source .venv/bin/activate

Ive had problems with my GPU by using the default vllm install and I find that installing torch and torchvision separately helps. (Install pytorch)[https://pytorch.org/get-started/locally/]

uv run pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu132 #This will depend on the cuda version installed in your system

Install vllm

uv pip install -U vllm --torch-backend auto

Then run the model

uv run vllm serve deepseek-ai/DeepSeek-OCR-2 --logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0

License

MIT