Class: Markdownator::Converters::Docx

Inherits:
Base
  • Object
show all
Defined in:
lib/markdownator/converters/docx.rb

Overview

Converts a Word .docx (Office Open XML) document into Markdown.

A .docx is a ZIP whose ‘word/document.xml` holds the body. We map heading styles to `#` levels, list paragraphs to bullets, and `w:tbl` to Markdown tables.

Constant Summary collapse

W_NS =
"http://schemas.openxmlformats.org/wordprocessingml/2006/main"

Instance Method Summary collapse

Instance Method Details

#accepts?(_io, stream_info) ⇒ Boolean

Returns:

  • (Boolean)


13
14
15
16
17
18
19
# File 'lib/markdownator/converters/docx.rb', line 13

def accepts?(_io, stream_info)
  matches?(
    stream_info,
    extensions: %w[docx],
    mimetypes: %w[application/vnd.openxmlformats-officedocument.wordprocessingml.document]
  )
end

#convert(io, _stream_info, **_options) ⇒ Object



21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/markdownator/converters/docx.rb', line 21

def convert(io, _stream_info, **_options)
  Markdownator.require_optional("zip", feature: "DOCX conversion")
  Markdownator.require_optional("nokogiri", feature: "DOCX conversion")

  xml = read_entry(io, "word/document.xml")
  raise FileConversionError, "DOCX is missing word/document.xml" if xml.nil?

  doc = Nokogiri::XML(xml)
  doc.remove_namespaces!
  body = doc.at_xpath("//body")
  blocks = body.nil? ? [] : body.element_children.filter_map { |node| render_block(node) }
  Result.new(markdown: blocks.join("\n\n"))
end