Class: Docx::Document
- Inherits:
-
Object
- Object
- Docx::Document
- Includes:
- SimpleInspect
- Defined in:
- lib/docx/document.rb
Overview
The Document class wraps around a docx file and provides methods to interface with it.
# get a Docx::Document for a docx file in the local directory
doc = Docx::Document.open("test.docx")
# get the text from the document
puts doc.text
# do the same thing in a block
Docx::Document.open("test.docx") do |d|
puts d.text
end
Instance Attribute Summary collapse
-
#doc ⇒ Object
readonly
Returns the value of attribute doc.
-
#footers ⇒ Object
readonly
Returns the value of attribute footers.
-
#headers ⇒ Object
readonly
Returns the value of attribute headers.
-
#styles ⇒ Object
readonly
Returns the value of attribute styles.
-
#xml ⇒ Object
readonly
Returns the value of attribute xml.
-
#zip ⇒ Object
readonly
Returns the value of attribute zip.
Class Method Summary collapse
-
.open(path, &block) ⇒ Object
With no associated block, Docx::Document.open is a synonym for Docx::Document.new.
Instance Method Summary collapse
- #bookmarks ⇒ Object
- #default_paragraph_style ⇒ Object
-
#document_properties ⇒ Object
This stores the current global document properties, for now.
-
#each_paragraph ⇒ Object
Deprecated.
-
#font_size ⇒ Object
Some documents have this set, others don’t.
- #hyperlink_relationships ⇒ Object
-
#hyperlinks ⇒ Object
Hyperlink targets are extracted from the document.xml.rels file.
-
#initialize(path_or_io, options = {}) ⇒ Document
constructor
A new instance of Document.
- #paragraphs ⇒ Object
- #replace_entry(entry_path, file_contents) ⇒ Object
-
#save(path) ⇒ Object
Save document to provided path call-seq: save(filepath) => void.
-
#stream ⇒ Object
Output entire document as a StringIO object.
- #style_name_of(style_id) ⇒ Object
- #styles_configuration ⇒ Object
- #tables ⇒ Object
-
#to_html ⇒ Object
Output entire document as a String HTML fragment.
-
#to_s ⇒ Object
(also: #text)
call-seq: to_s -> string.
- #to_xml ⇒ Object
Methods included from SimpleInspect
Constructor Details
#initialize(path_or_io, options = {}) ⇒ Document
Returns a new instance of Document.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/docx/document.rb', line 27 def initialize(path_or_io, = {}) @replace = {} # accept path-like objects (e.g. Pathname, File) by using their path (#101) path_or_io = path_or_io.to_path if path_or_io.respond_to?(:to_path) # if path-or_io is string && does not contain a null byte if (path_or_io.instance_of?(String) && !/\u0000/.match?(path_or_io)) @zip = Zip::File.open(path_or_io) else @zip = Zip::File.open_buffer(path_or_io) end document = @zip.glob('word/document*.xml').first raise Errno::ENOENT if document.nil? @document_xml = document.get_input_stream.read @doc = Nokogiri::XML(@document_xml) load_styles load_rels load_headers yield(self) if block_given? ensure @zip.close unless @zip.nil? end |
Instance Attribute Details
#doc ⇒ Object (readonly)
Returns the value of attribute doc.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def doc @doc end |
#footers ⇒ Object (readonly)
Returns the value of attribute footers.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def @footers end |
#headers ⇒ Object (readonly)
Returns the value of attribute headers.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def headers @headers end |
#styles ⇒ Object (readonly)
Returns the value of attribute styles.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def styles @styles end |
#xml ⇒ Object (readonly)
Returns the value of attribute xml.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def xml @xml end |
#zip ⇒ Object (readonly)
Returns the value of attribute zip.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def zip @zip end |
Class Method Details
.open(path, &block) ⇒ Object
With no associated block, Docx::Document.open is a synonym for Docx::Document.new. If the optional code block is given, it will be passed the opened docx file as an argument and the Docx::Document oject will automatically be closed when the block terminates. The values of the block will be returned from Docx::Document.open. call-seq:
open(filepath) => file
open(filepath) {|file| block } => obj
66 67 68 |
# File 'lib/docx/document.rb', line 66 def self.open(path, &block) new(path, &block) end |
Instance Method Details
#bookmarks ⇒ Object
74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/docx/document.rb', line 74 def bookmarks bkmrks_hsh = {} bkmrks_ary = @doc.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node } # also scan headers and footers so their bookmarks can be read and edited bkmrks_ary += headers.values.flat_map { |h| h.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node } } bkmrks_ary += .values.flat_map { |f| f.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node } } # auto-generated by office 2010 bkmrks_ary.reject! { |b| b.name == '_GoBack' } bkmrks_ary.each { |b| bkmrks_hsh[b.name] = b } bkmrks_hsh end |
#default_paragraph_style ⇒ Object
190 191 192 |
# File 'lib/docx/document.rb', line 190 def default_paragraph_style @styles&.at_xpath("w:styles/w:style[@w:type='paragraph' and @w:default='1']/w:name/@w:val")&.value end |
#document_properties ⇒ Object
This stores the current global document properties, for now
55 56 57 58 59 60 |
# File 'lib/docx/document.rb', line 55 def document_properties { font_size: font_size, hyperlinks: hyperlinks } end |
#each_paragraph ⇒ Object
Deprecated
Iterates over paragraphs within document call-seq:
each_paragraph => Enumerator
125 126 127 |
# File 'lib/docx/document.rb', line 125 def each_paragraph paragraphs.each { |p| yield(p) } end |
#font_size ⇒ Object
Some documents have this set, others don’t. Values are returned as half-points, so to get points, that’s why it’s divided by 2.
96 97 98 99 100 101 102 |
# File 'lib/docx/document.rb', line 96 def font_size size_value = @styles&.at_xpath('//w:docDefaults//w:rPrDefault//w:rPr//w:sz/@w:val')&.value return nil unless size_value size_value.to_i / 2 end |
#hyperlink_relationships ⇒ Object
115 116 117 |
# File 'lib/docx/document.rb', line 115 def hyperlink_relationships @rels.xpath("//xmlns:Relationship[contains(@Type,'hyperlink')]") end |
#hyperlinks ⇒ Object
Hyperlink targets are extracted from the document.xml.rels file
105 106 107 108 109 110 111 112 113 |
# File 'lib/docx/document.rb', line 105 def hyperlinks hyperlink_relationships.each_with_object({}) do |rel, hash| id = rel.attributes['Id'] target = rel.attributes['Target'] next unless id && target hash[id.value] = target.value end end |
#paragraphs ⇒ Object
70 71 72 |
# File 'lib/docx/document.rb', line 70 def paragraphs @doc.xpath('//w:document//w:body/w:p').map { |p_node| parse_paragraph_from p_node } end |
#replace_entry(entry_path, file_contents) ⇒ Object
186 187 188 |
# File 'lib/docx/document.rb', line 186 def replace_entry(entry_path, file_contents) @replace[entry_path] = file_contents end |
#save(path) ⇒ Object
Save document to provided path call-seq:
save(filepath) => void
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/docx/document.rb', line 143 def save(path) with_zip64_disabled do update Zip::OutputStream.open(path) do |out| zip.each do |entry| next unless entry.file? out.put_next_entry(entry.name) value = @replace[entry.name] || zip.read(entry.name) out.write(value) end end zip.close end end |
#stream ⇒ Object
Output entire document as a StringIO object
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# File 'lib/docx/document.rb', line 162 def stream with_zip64_disabled do update stream = Zip::OutputStream.write_buffer do |out| zip.each do |entry| next unless entry.file? out.put_next_entry(entry.name) if @replace[entry.name] out.write(@replace[entry.name]) else out.write(zip.read(entry.name)) end end end stream.rewind stream end end |
#style_name_of(style_id) ⇒ Object
194 195 196 |
# File 'lib/docx/document.rb', line 194 def style_name_of(style_id) styles_configuration.style_of(style_id).name end |
#styles_configuration ⇒ Object
198 199 200 |
# File 'lib/docx/document.rb', line 198 def styles_configuration @styles_configuration ||= Elements::Containers::StylesConfiguration.new(@styles.dup) end |
#tables ⇒ Object
90 91 92 |
# File 'lib/docx/document.rb', line 90 def tables @doc.xpath('//w:document//w:body//w:tbl').map { |t_node| parse_table_from t_node } end |
#to_html ⇒ Object
Output entire document as a String HTML fragment
136 137 138 |
# File 'lib/docx/document.rb', line 136 def to_html paragraphs.map(&:to_html).join("\n") end |
#to_s ⇒ Object Also known as: text
call-seq:
to_s -> string
131 132 133 |
# File 'lib/docx/document.rb', line 131 def to_s paragraphs.map(&:to_s).join("\n") end |
#to_xml ⇒ Object
86 87 88 |
# File 'lib/docx/document.rb', line 86 def to_xml Nokogiri::XML(@document_xml) end |