pdf2markdownOCR
A Ruby gem for converting PDF documents to Markdown using a locally-hosted vision LLM (OCR via AI).
Pages are rendered as high-resolution PNG images and then sent to an OpenAI-compatible API endpoint for text extraction.
Requirements
- Ruby >= 3.1
- poppler-utils (
pdftoppm,pdfinfo) - OpenAI-compatible vision LLM server (e.g. vLLM, Ollama, llama.cpp)
Install poppler on Debian/Ubuntu:
sudo apt install poppler-utils
You can get Deepseek's OCR-2 model at huggingface
Installation
Add to your Gemfile:
gem 'pdf2markdownOCR'
Then run:
bundle install
Or install directly:
gem install pdf2markdownOCR
Configuration
Configuration can be set via a block or via environment variables. The block takes priority.
Via configure block
You can configure the gem using a configuration block. This are the options and its default values.
require 'pdf2markdownOCR'
Pdf2MarkdownOCR.configure do |config|
# URL of your OpenAI-compatible LLM server
config.llm_api_url = "http://localhost:8000"
# Model name to request from the server
config.llm_model = "deepseek-ai/DeepSeek-OCR-2"
# PNG resolution used when rasterising PDF pages (higher = better OCR, slower)
config.png_dpi_resolution = 300
# Conversion mode: :single_thread or :multi_thread
# :multi_thread converts all pages to pngs in parallel threads
config.mode = :multi_thread
# The gem uses Ruby's stdlib `Logger` writing to `$stdout`. You can provide your own instance. To silence it completely, just pass Logger.new("/dev/null")
config.logger = Logger.new($stdout).tap do |log|
log.progname = self.class.name.split('::').first
end
end
Usage as a library
Convert a PDF and get Markdown as a string
require 'pdf2markdownOCR'
markdown = Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf")
puts markdown
Convert a PDF and write directly to a file
Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md")
# => nil (content written to output.md)
Convert specific page range
Pdf2MarkdownOCR.convert_pdf(pdf_path: "document.pdf", output_file: "output.md", pages: "1,2,5-7") #Will convert pages 1,2,5,6,7
Usage as a CLI
After installation the pdf2markdownocr executable is available on your PATH. Options are the same as in the configuration block
Usage: pdf2markdownocr [options] <pdf_path>
Converts a PDF file to Markdown using OCR.
Options:
-o, --output FILE Output Markdown file
--llm-api-url OpenAI compatible server URL
--llm-model MODEL
--mode Processing mode: single_thread or multi_thread
--png-dpi DPI resolution for PNG conversion
--pages Page range
-h, --help Show help message
Examples
# Basic conversion (output saved to output.md)
pdf2markdownocr document.pdf
# Custom output file
pdf2markdownocr document.pdf -o result.md
# Custom llm
pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:9800 --llm-model deepseek-ai/DeepSeek-OCR
# Print version
pdf2markdownocr --version
Running the models
Ollama setup
Easy to try, but not recommended because performance isnt great, as it doesnt process the requests in parallel
Pull the model
ollama pull deepseek-ocr:latest
ollama run deepseek-ocr:latest
Then call the tool with the correct port and model
pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:11434 --llm-model deepseek-ocr:latest
vLLM
- Install uv and torch, and vllm
uv venv
source .venv/bin/activate
Ive had problems with my GPU by using the default vllm install and I find that installing torch and torchvision separately helps. (Install pytorch)[https://pytorch.org/get-started/locally/]
uv run pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu132 #This will depend on the cuda version installed in your system
Install vllm
uv pip install -U vllm --torch-backend auto
Then run the model
uv run vllm serve deepseek-ai/DeepSeek-OCR-2 --logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0
License
MIT