pdf2markdownOCR
A Ruby gem for converting PDF documents to Markdown using a locally-hosted vision LLM (OCR via AI).
Pages are rendered as high-resolution PNG images and then sent to an OpenAI-compatible API endpoint for text extraction.
Requirements
- Ruby >= 3.1
- poppler-utils (
pdftoppm,pdfinfo) - OpenAI-compatible vision LLM server (e.g. vLLM, Ollama, llama.cpp)
Install poppler on Debian/Ubuntu:
sudo apt install poppler-utils
You can get Deepseek's OCR-2 model at huggingface
Installation
Add to your Gemfile:
gem 'pdf2markdownOCR'
Then run:
bundle install
Or install directly:
gem install pdf2markdownOCR
Configuration
Configuration can be set via a block or via environment variables. The block takes priority.
Via configure block
You can configure the gem using a configuration block. This are the options and its default values.
require 'pdf2markdownOCR'
Pdf2MarkdownOCR.configure do |config|
# URL of your OpenAI-compatible LLM server
config.llm_api_url = "http://localhost:8000"
# Model name to request from the server
config.llm_model = "deepseek-ai/DeepSeek-OCR-2"
# PNG resolution used when rasterising PDF pages (higher = better OCR, slower)
config.png_dpi_resolution = 300
# Conversion mode: :single_thread or :multi_thread
# :multi_thread converts all pages to pngs in parallel threads
config.mode = :multi_thread
# The gem uses Ruby's stdlib `Logger` writing to `$stdout`. You can provide your own instance. To silence it completely, just pass Logger.new("/dev/null")
config.logger = Logger.new($stdout).tap do |log|
log.progname = self.class.name.split('::').first
end
end
Usage as a library
Convert a PDF and get Markdown as a string
require 'pdf2markdownOCR'
markdown = Pdf2MarkdownOCR.convert_pdf("document.pdf")
puts markdown
Convert a PDF and write directly to a file
Pdf2MarkdownOCR.convert_pdf("document.pdf", "output.md")
# => nil (content written to output.md)
Usage as a CLI
After installation the pdf2markdownocr executable is available on your PATH. Options are the same as in the configuration block
Usage: pdf2markdownocr [options] <pdf_path>
Converts a PDF file to Markdown using OCR.
Options:
-o, --output FILE Output Markdown file
--llm-api-url OpenAI compatible server URL
--llm-model MODEL
--mode Processing mode: single_thread or multi_thread
--png-dpi DPI resolution for PNG conversion
-h, --help Show help message
Examples
# Basic conversion (output saved to output.md)
pdf2markdownocr document.pdf
# Custom output file
pdf2markdownocr document.pdf -o result.md
# Custom llm
pdf2markdownocr document.pdf -o result.md --llm-api-url http://localhost:9800 --llm-model deepseek-ai/DeepSeek-OCR
# Print version
pdf2markdownocr --version
License
MIT