Module: Rubino::Documents
- Defined in:
- lib/rubino/documents.rb,
lib/rubino/documents/html.rb,
lib/rubino/documents/table.rb,
lib/rubino/documents/registry.rb,
lib/rubino/documents/converters/csv.rb,
lib/rubino/documents/converters/pdf.rb,
lib/rubino/documents/converters/xml.rb,
lib/rubino/documents/converters/docx.rb,
lib/rubino/documents/converters/html.rb,
lib/rubino/documents/converters/json.rb,
lib/rubino/documents/converters/pptx.rb,
lib/rubino/documents/converters/xlsx.rb,
lib/rubino/documents/converters/plain.rb
Overview
In-repo document-to-Markdown conversion – a focused reimplementation of markitdown’s CORE converters in pure Ruby (issue #6). The public surface is a single entry point:
Rubino::Documents.to_markdown(path, mime: nil) -> String | nil
Architecture (mirrors markitdown): most converters extract structure via a mature MIT gem, shape it into an intermediate HTML string, and let ONE HTML->Markdown core (Documents::Html, built on kramdown which is already a rubino dependency) emit the final Markdown. csv/xlsx feed ONE Markdown table emitter (Documents::Table). The per-format converters are therefore thin.
Extraction gems (roo, docx, pdf-reader, ruby_powerpoint) are OPTIONAL: each converter ‘require`s its gem lazily inside a begin/rescue LoadError and a converter that can’t load its gem simply reports itself unavailable. The module MUST load and run with NONE of the optional gems installed – callers then fall back to the existing shell-extraction hint. There is never an external process and never a hard runtime dependency. That is the whole point: the original concern was “markitdown isn’t installed”.
Defined Under Namespace
Modules: Converters, Html, Registry, Table
Class Method Summary collapse
-
.supported?(mime: nil, path: nil) ⇒ Boolean
True when at least one converter for the (mime, path) pair is available in-process (its optional gem, if any, is loadable).
-
.to_markdown(path, mime: nil) ⇒ Object
Converts the file at
pathto Markdown, picking the first registered converter that accepts the (mime, path) pair and whose optional gem is loadable.
Class Method Details
.supported?(mime: nil, path: nil) ⇒ Boolean
True when at least one converter for the (mime, path) pair is available in-process (its optional gem, if any, is loadable). Drives the preamble / environment / doctor advertising without attempting a conversion.
46 47 48 |
# File 'lib/rubino/documents.rb', line 46 def supported?(mime: nil, path: nil) !Registry.for(mime: mime, path: path).nil? end |
.to_markdown(path, mime: nil) ⇒ Object
Converts the file at path to Markdown, picking the first registered converter that accepts the (mime, path) pair and whose optional gem is loadable. Returns the Markdown String, or nil when no converter can handle the file (unknown format, or the format’s optional gem isn’t installed, or extraction produced nothing). Never raises – a converter failure degrades to nil so the caller emits the actionable shell-hint.
32 33 34 35 36 37 38 39 40 41 |
# File 'lib/rubino/documents.rb', line 32 def to_markdown(path, mime: nil) converter = Registry.for(mime: mime, path: path) return nil unless converter out = converter.convert(path) out = out.to_s out.strip.empty? ? nil : out rescue LoadError, StandardError nil end |