rbpptx
Read-and-edit Ruby gem for .pptx (PowerPoint, OOXML) files, modeled after
python-pptx.
rbpptx is built around a single design promise: what we don't understand,
we don't touch. The presentation is opened as a ZIP package, only the
parts you mutate are re-serialized, and every other entry — slide layouts,
masters, themes, embedded fonts, images, animations, hyperlink colors,
PowerPoint-extension XML — round-trips byte-for-byte. Inside a slide that
you do edit, only the <a:p> children of the targeted <p:txBody> are
rewritten; surrounding <p:nvSpPr>, <p:spPr>, <a:bodyPr>, <a:lstStyle>,
and any unknown OOXML extensions remain in place. The first existing
<a:rPr> (font, size, color, language) is cloned onto each new run, so
template formatting carries through to the replacement text.
Supported:
- open an existing
.pptx, enumerate slides in presentation order - read every shape's name, concatenated text, and placeholder type
(
:ctrTitle,:subTitle,:body, ...) - walk into paragraphs and runs (
shape.paragraphs,shape.runs,paragraph.runs) - bulldozer setter:
shape.text = "..."replaces the entire text body, cloning the first existing<a:rPr>onto each new run so template formatting carries through - surgical setter:
run.text = "..."rewrites only the run's<a:t>content, preserving<a:rPr>, soft line breaks, sibling runs, paragraph properties, and unknown OOXML extensions - save back to a new path, or overwrite the original via temp-file + atomic rename — unmodified entries stream through untouched
Out of scope:
- creating a presentation from scratch —
Rbpptx.newis reserved for a future release and currently raisesNotImplementedError - inserting / deleting / reordering / duplicating slides
- editing media (images, embedded fonts) or adding new shapes
- placeholder-aware API (
slide.title,slide.placeholders[:body]) - notes pages, comments, charts, SmartArt
- legacy
.ppt(BIFF/CFB) input —rbpptxreads OOXML.pptxonly. Convert first, e.g.libreoffice --headless --convert-to pptx file.ppt.Rbpptx.opendetects the OLE compound-file magic and raisesRbpptx::UnsupportedFormatErrorwith the conversion hint rather than surfacing an opaque ZIP parse error from rubyzip.
Install
gem install rbpptx
Or in a Gemfile:
gem "rbpptx"
Requires Ruby 3.2+, nokogiri, and rubyzip.
Usage
Reading
require "rbpptx"
Rbpptx.open("deck.pptx") do |pres|
pres..each do ||
puts .name # => "slide1", "slide2", ...
.shapes.each do |shape|
puts " #{shape.name}: #{shape.text.inspect}"
end
end
end
Rbpptx.open yields a Rbpptx::Presentation and auto-closes the underlying
ZIP when the block returns (or raises). Without a block it returns the
presentation; call pres.close when done.
Slides are enumerated in the order they appear inside the presentation —
i.e. the order of <p:sldId> entries in ppt/presentation.xml, which is
not necessarily the slide-part filename order.
Editing text and saving
The bulldozer setter replaces a shape's entire text body. Lines (separated
by "\n") become individual <a:p> paragraphs:
require "rbpptx"
Rbpptx.open("template.pptx") do |pres|
pres..first.shapes.first.text = "差し替えタイトル"
pres.[1].shapes.find { |s| s.text.include?("会社名") }.text = "Acme Inc."
pres.save("filled.pptx")
end
The surgical setter rewrites only the targeted run's <a:t>, preserving
all other formatting and structural XML. Use it for {{slot}}-style
template fills where a marker run sits inside a multi-run paragraph:
Rbpptx.open("template.pptx") do |pres|
pres..each do ||
.shapes.flat_map(&:runs).each do |run|
run.text = run.text.gsub("{{name}}", "Acme")
end
end
pres.save("filled.pptx")
end
Or target by placeholder role:
Rbpptx.open("template.pptx") do |pres|
pres..first.shapes.find { |s| s.placeholder_type == :ctrTitle }.text = "Q2 Review"
pres..first.shapes.find { |s| s.placeholder_type == :subTitle }.text = "Acme Inc."
pres.save("filled.pptx")
end
Same-path overwrite
Presentation#save writes to a temp file in the target directory and
atomically renames it into place, so pres.save(original_path) is safe:
Rbpptx.open("deck.pptx") do |pres|
pres..first.shapes.first.text = "Updated"
pres.save("deck.pptx") # overwrite in place
end
If the rename fails, the temp file is removed before the error propagates.
Inspecting without parsing every slide
Slide XML is parsed lazily: enumerating slides, reading their name, or
checking dirty? does not pay the per-slide parse cost. The first call to
slide.shapes (or slide.to_xml) is what triggers the parse, and the
result is cached.
Rbpptx.open("deck.pptx") do |pres|
pres..length # cheap
pres..map(&:name) # cheap
pres..first.shapes.first.text # parses only slide 1
end
Error handling
All rbpptx exceptions inherit from Rbpptx::Error and the messages carry
the presentation path and (where relevant) the offending part name:
| Error | Raised when |
|---|---|
Rbpptx::UnsupportedFormatError |
file is empty, not a ZIP, or a legacy .ppt (BIFF/CFB) container |
Rbpptx::PresentationFormatError |
_rels/.rels, ppt/presentation.xml, or its rels are malformed |
Rbpptx::SlideFormatError |
a slide part is missing or its XML cannot be parsed |
Rbpptx::SlideNotFoundError |
reserved for future index-based slide accessors |
Rbpptx::ClosedPresentationError |
slides / save called after close |
begin
Rbpptx.open("deck.pptx") { |pres| pres..each { |s| process(s) } }
rescue Rbpptx::UnsupportedFormatError => e
warn "not a pptx: #{e.}"
rescue Rbpptx::PresentationFormatError, Rbpptx::SlideFormatError => e
warn "malformed: #{e.}"
end
Design Notes
- Fidelity over feature coverage.
rbpptxis a small, surgical editor, not a PowerPoint object model. The bytes it doesn't understand are the bytes it doesn't touch — including PowerPoint-version-specific extensions (p14:,p15:,ahyp:,mc:AlternateContent) that a full object model would have to know about to round-trip safely. - Lazy parse, eager write. Slides parse on first access; saves
re-serialize only the slides whose shapes have been mutated, and stream
every other ZIP entry through with its already-deflated bytes
(
Zip::OutputStream#copy_raw_entry) — no decompress, no re-deflate. Bit-identical preservation, not just content-identical. - rPr cloning, not rebuilding. When a shape's text is replaced, the
first existing
<a:rPr>is deep-cloned onto each new run. A template's font, size, color, andlangattribute carry through withoutrbpptxneeding to model those properties. - Atomic same-path save.
savewrites to a sibling temp file and renames it into place, so a crash mid-write cannot leave a half-written.pptxat the target path.
Benchmarks
The selling-point story for rbpptx is on real-world decks where the
package carries embedded media — themes, slide masters, slide layouts,
images, fonts. python-pptx parses these eagerly at Presentation(path)
time; rbpptx defers everything past presentation.xml and its rels, so
the open-and-read path stays cheap regardless of how much binary mass the
deck carries.
Real-world deck (18 MB, 21 slides, embedded fonts and images), 5 iterations:
| benchmark | rbpptx (s) | python-pptx (s) | docxtemplater (s) | rbpptx vs python-pptx | rbpptx vs docxtemplater |
|---|---|---|---|---|---|
| read (walk all text) | 0.019 | 0.089 | 0.037 | 4.7x faster | 2.0x faster |
| fill_one (1 + save) | 0.033 | 0.555 | 0.149 | 16.8x faster | 4.5x faster |
| fill_many (10 + save) | 0.042 | 0.545 | 0.144 | 13.0x faster | 3.4x faster |
Memory (RSS delta) on the read scenario:
| library | rss Δ (kB) |
|---|---|
| rbpptx | 2,956 |
| python-pptx | 28,416 |
| docxtemplater | 34,816 |
The fill scenarios used to tie when each library re-deflated every ZIP
entry on save. rbpptx now streams unmodified entries through with
Zip::OutputStream#copy_raw_entry, so the deflate cost is paid only on
the slide XML that actually changed — which is what unlocks the 13–17x
gap against python-pptx and the 3–4.5x gap against docxtemplater on
read-modify-write. The advantage is real and reproducible:
cd benchmark && npm install && cd ..
RBPPTX_SAMPLE=/path/to/your/deck.pptx bundle exec ruby benchmark/compare.rb
uv is required for the python-pptx subprocess; the comparison script
invokes uv run --with python-pptx python3 benchmark/python_pptx_compare.py
so no system-level Python install is mutated. The Node.js side runs from
benchmark/node_modules (gitignored); rerun npm install after pulling
benchmark changes. Use any .pptx you have on hand — the table above
was measured against an 18 MB deck with 21 slides and embedded fonts.
To benchmark against a synthetic deck with no embedded media (isolates the XML-rewrite cost from binary deflate), generate a fixture first:
uv run --with python-pptx python3 benchmark/fixture_gen.py 50 /tmp/bench_50.pptx
RBPPTX_SAMPLE=/tmp/bench_50.pptx bundle exec ruby benchmark/compare.rb
Development
Development in this repository assumes Ruby 3.4.8 (.ruby-version).
bundle install
# Run tests
bundle exec ruby -Ilib -Itest test/rbpptx_test.rb
# Generate API docs
bundle exec rake rdoc
The bulk of the test suite runs against a small in-tree fixture
(~3 KB) built from scratch by test/fixture_builder.rb using only
rubyzip and hand-coded OOXML. No external sample is required for a
fresh checkout to get meaningful coverage of read, edit, save, and
fidelity behavior. A handful of dog-fooding tests (test_sample_*) opt
into any real-world .pptx you point at via the RBPPTX_SAMPLE
environment variable, and skip cleanly when it is unset:
RBPPTX_SAMPLE=/path/to/your/deck.pptx bundle exec ruby -Ilib -Itest test/rbpptx_test.rb