Class: Docx::Document
- Inherits:
-
Object
- Object
- Docx::Document
- Includes:
- SimpleInspect
- Defined in:
- lib/docx/document.rb
Overview
The Document class wraps around a docx file and provides methods to interface with it.
# get a Docx::Document for a docx file in the local directory
doc = Docx::Document.open("test.docx")
# get the text from the document
puts doc.text
# do the same thing in a block
Docx::Document.open("test.docx") do |d|
puts d.text
end
Instance Attribute Summary collapse
-
#doc ⇒ Object
readonly
Returns the value of attribute doc.
-
#footers ⇒ Object
readonly
Returns the value of attribute footers.
-
#headers ⇒ Object
readonly
Returns the value of attribute headers.
-
#styles ⇒ Object
readonly
Returns the value of attribute styles.
-
#xml ⇒ Object
readonly
Returns the value of attribute xml.
-
#zip ⇒ Object
readonly
Returns the value of attribute zip.
Class Method Summary collapse
-
.open(path, &block) ⇒ Object
With no associated block, Docx::Document.open is a synonym for Docx::Document.new.
Instance Method Summary collapse
- #bookmarks ⇒ Object
- #default_paragraph_style ⇒ Object
-
#document_properties ⇒ Object
This stores the current global document properties, for now.
-
#each_paragraph ⇒ Object
Deprecated.
-
#font_size ⇒ Object
Some documents have this set, others don’t.
- #hyperlink_relationships ⇒ Object
-
#hyperlinks ⇒ Object
Hyperlink targets are extracted from the document.xml.rels file.
-
#initialize(path_or_io, options = {}) ⇒ Document
constructor
A new instance of Document.
- #paragraphs ⇒ Object
- #replace_entry(entry_path, file_contents) ⇒ Object
-
#save(path) ⇒ Object
Save document to provided path call-seq: save(filepath) => void.
-
#stream ⇒ Object
Output entire document as a StringIO object.
- #style_name_of(style_id) ⇒ Object
- #styles_configuration ⇒ Object
- #tables ⇒ Object
-
#to_html ⇒ Object
Output entire document as a String HTML fragment.
-
#to_s ⇒ Object
(also: #text)
call-seq: to_s -> string.
- #to_xml ⇒ Object
Methods included from SimpleInspect
Constructor Details
#initialize(path_or_io, options = {}) ⇒ Document
Returns a new instance of Document.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/docx/document.rb', line 27 def initialize(path_or_io, = {}) @replace = {} # if path-or_io is string && does not contain a null byte if (path_or_io.instance_of?(String) && !/\u0000/.match?(path_or_io)) @zip = Zip::File.open(path_or_io) else @zip = Zip::File.open_buffer(path_or_io) end document = @zip.glob('word/document*.xml').first raise Errno::ENOENT if document.nil? @document_xml = document.get_input_stream.read @doc = Nokogiri::XML(@document_xml) load_styles load_headers yield(self) if block_given? ensure @zip.close unless @zip.nil? end |
Instance Attribute Details
#doc ⇒ Object (readonly)
Returns the value of attribute doc.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def doc @doc end |
#footers ⇒ Object (readonly)
Returns the value of attribute footers.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def @footers end |
#headers ⇒ Object (readonly)
Returns the value of attribute headers.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def headers @headers end |
#styles ⇒ Object (readonly)
Returns the value of attribute styles.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def styles @styles end |
#xml ⇒ Object (readonly)
Returns the value of attribute xml.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def xml @xml end |
#zip ⇒ Object (readonly)
Returns the value of attribute zip.
25 26 27 |
# File 'lib/docx/document.rb', line 25 def zip @zip end |
Class Method Details
.open(path, &block) ⇒ Object
With no associated block, Docx::Document.open is a synonym for Docx::Document.new. If the optional code block is given, it will be passed the opened docx file as an argument and the Docx::Document oject will automatically be closed when the block terminates. The values of the block will be returned from Docx::Document.open. call-seq:
open(filepath) => file
open(filepath) {|file| block } => obj
62 63 64 |
# File 'lib/docx/document.rb', line 62 def self.open(path, &block) new(path, &block) end |
Instance Method Details
#bookmarks ⇒ Object
70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/docx/document.rb', line 70 def bookmarks bkmrks_hsh = {} bkmrks_ary = @doc.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node } # also scan headers and footers so their bookmarks can be read and edited bkmrks_ary += headers.values.flat_map { |h| h.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node } } bkmrks_ary += .values.flat_map { |f| f.xpath('//w:bookmarkStart').map { |b_node| parse_bookmark_from b_node } } # auto-generated by office 2010 bkmrks_ary.reject! { |b| b.name == '_GoBack' } bkmrks_ary.each { |b| bkmrks_hsh[b.name] = b } bkmrks_hsh end |
#default_paragraph_style ⇒ Object
182 183 184 |
# File 'lib/docx/document.rb', line 182 def default_paragraph_style @styles.at_xpath("w:styles/w:style[@w:type='paragraph' and @w:default='1']/w:name/@w:val").value end |
#document_properties ⇒ Object
This stores the current global document properties, for now
51 52 53 54 55 56 |
# File 'lib/docx/document.rb', line 51 def document_properties { font_size: font_size, hyperlinks: hyperlinks } end |
#each_paragraph ⇒ Object
Deprecated
Iterates over paragraphs within document call-seq:
each_paragraph => Enumerator
117 118 119 |
# File 'lib/docx/document.rb', line 117 def each_paragraph paragraphs.each { |p| yield(p) } end |
#font_size ⇒ Object
Some documents have this set, others don’t. Values are returned as half-points, so to get points, that’s why it’s divided by 2.
92 93 94 95 96 97 98 |
# File 'lib/docx/document.rb', line 92 def font_size size_value = @styles&.at_xpath('//w:docDefaults//w:rPrDefault//w:rPr//w:sz/@w:val')&.value return nil unless size_value size_value.to_i / 2 end |
#hyperlink_relationships ⇒ Object
107 108 109 |
# File 'lib/docx/document.rb', line 107 def hyperlink_relationships @rels.xpath("//xmlns:Relationship[contains(@Type,'hyperlink')]") end |
#hyperlinks ⇒ Object
Hyperlink targets are extracted from the document.xml.rels file
101 102 103 104 105 |
# File 'lib/docx/document.rb', line 101 def hyperlinks hyperlink_relationships.each_with_object({}) do |rel, hash| hash[rel.attributes['Id'].value] = rel.attributes['Target'].value end end |
#paragraphs ⇒ Object
66 67 68 |
# File 'lib/docx/document.rb', line 66 def paragraphs @doc.xpath('//w:document//w:body/w:p').map { |p_node| parse_paragraph_from p_node } end |
#replace_entry(entry_path, file_contents) ⇒ Object
178 179 180 |
# File 'lib/docx/document.rb', line 178 def replace_entry(entry_path, file_contents) @replace[entry_path] = file_contents end |
#save(path) ⇒ Object
Save document to provided path call-seq:
save(filepath) => void
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/docx/document.rb', line 135 def save(path) with_zip64_disabled do update Zip::OutputStream.open(path) do |out| zip.each do |entry| next unless entry.file? out.put_next_entry(entry.name) value = @replace[entry.name] || zip.read(entry.name) out.write(value) end end zip.close end end |
#stream ⇒ Object
Output entire document as a StringIO object
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
# File 'lib/docx/document.rb', line 154 def stream with_zip64_disabled do update stream = Zip::OutputStream.write_buffer do |out| zip.each do |entry| next unless entry.file? out.put_next_entry(entry.name) if @replace[entry.name] out.write(@replace[entry.name]) else out.write(zip.read(entry.name)) end end end stream.rewind stream end end |
#style_name_of(style_id) ⇒ Object
186 187 188 |
# File 'lib/docx/document.rb', line 186 def style_name_of(style_id) styles_configuration.style_of(style_id).name end |
#styles_configuration ⇒ Object
190 191 192 |
# File 'lib/docx/document.rb', line 190 def styles_configuration @styles_configuration ||= Elements::Containers::StylesConfiguration.new(@styles.dup) end |
#tables ⇒ Object
86 87 88 |
# File 'lib/docx/document.rb', line 86 def tables @doc.xpath('//w:document//w:body//w:tbl').map { |t_node| parse_table_from t_node } end |
#to_html ⇒ Object
Output entire document as a String HTML fragment
128 129 130 |
# File 'lib/docx/document.rb', line 128 def to_html paragraphs.map(&:to_html).join("\n") end |
#to_s ⇒ Object Also known as: text
call-seq:
to_s -> string
123 124 125 |
# File 'lib/docx/document.rb', line 123 def to_s paragraphs.map(&:to_s).join("\n") end |
#to_xml ⇒ Object
82 83 84 |
# File 'lib/docx/document.rb', line 82 def to_xml Nokogiri::XML(@document_xml) end |