Class: Markdownator::Converters::Docx
- Defined in:
- lib/markdownator/converters/docx.rb
Overview
Converts a Word .docx (Office Open XML) document into Markdown.
A .docx is a ZIP whose ‘word/document.xml` holds the body. We map heading styles to `#` levels, list paragraphs to bullets, and `w:tbl` to Markdown tables.
Constant Summary collapse
- W_NS =
"http://schemas.openxmlformats.org/wordprocessingml/2006/main"
Instance Method Summary collapse
Instance Method Details
#accepts?(_io, stream_info) ⇒ Boolean
13 14 15 16 17 18 19 |
# File 'lib/markdownator/converters/docx.rb', line 13 def accepts?(_io, stream_info) matches?( stream_info, extensions: %w[docx], mimetypes: %w[application/vnd.openxmlformats-officedocument.wordprocessingml.document] ) end |
#convert(io, _stream_info, **_options) ⇒ Object
21 22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/markdownator/converters/docx.rb', line 21 def convert(io, _stream_info, **) Markdownator.require_optional("zip", feature: "DOCX conversion") Markdownator.require_optional("nokogiri", feature: "DOCX conversion") xml = read_entry(io, "word/document.xml") raise FileConversionError, "DOCX is missing word/document.xml" if xml.nil? doc = Nokogiri::XML(xml) doc.remove_namespaces! body = doc.at_xpath("//body") blocks = body.nil? ? [] : body.element_children.filter_map { |node| render_block(node) } Result.new(markdown: blocks.join("\n\n")) end |