mpp_reader
Pure Ruby reader for Microsoft Project .mpp files. No native extensions, no
runtime dependencies.
Status: core reader working. Reads tasks (name, dates, duration, outline hierarchy, percent complete, milestone/active/manual flags, notes), resources, assignments (task/resource links, units, work, dates), predecessor links (type and lag) and calendars (weekly hours, exceptions, base/resource calendar relationships) from MPP14 files, with output verified field-by-field against MPXJ on a 29-file / 2173-task real-world corpus. Notes/comments (RTF converted to plain text) are available on tasks, resources and assignments. Not yet read: custom fields, baselines, recurring exception patterns.
require "mpp_reader"
project = MppReader.open("plan.mpp")
project.tasks.each do |task|
puts "#{' ' * task.outline_level}#{task.name}: " \
"#{task.start} .. #{task.finish} (#{task.duration&.value} #{task.duration&.units})"
end
project.resources.map(&:name)
project.assignments.each { |a| puts "#{a.task_unique_id} -> #{a.resource_unique_id}: #{a.units}" }
project.tasks.flat_map(&:predecessors).each { |r| puts "#{r.predecessor_task_unique_id} #{r.type} #{r.successor_task_unique_id}" }
CLI
mpp_reader plan.mpp # human-readable task tree, resources, notes
mpp_reader plan.mpp --json # full structured dump (tasks, resources,
# assignments, calendars) as one JSON object
Scope
Targets the MPP14 format only — files saved by Microsoft Project 2010
through 2021. Older formats (MPP8/9/12) and password-protected files raise
MppReader::UnsupportedFormatError.
Architecture
An .mpp file is an OLE2 compound document ([MS-CFB]): a mini-filesystem of
storages (folders) and streams (files).
MppReader::Cfbf— generic compound-file reader (Cfbf::File.read(path),#stream("path/in/file")). Handles v3/v4 sector sizes, chained DIFAT, miniFAT, with cycle and corruption detection.- Project data lives under the
" 114"storage:TBkndTask,TBkndRsc,TBkndAssn,TBkndCal,TBkndConsdirectories, each holdingFixedData/FixedMeta(fixed-size records) andVarMeta/Var2Data(variable-length fields) streams. The binary layouts are ported from MPXJ, which is the de-facto specification of this undocumented format.
Tools: tool/compare_oracle.rb
diffs this gem's output against MPXJ JSON output for a corpus;
tool/generate_field_tables.rb regenerates the field-id tables from an MPXJ
checkout.
Tests
bundle install
bundle exec rake test
Unit tests use synthetic compound files (test/support/) and run
self-contained. The smoke test additionally runs against a local corpus of
real .mpp files when present — it looks in ../examples or the directory
named by MPP_EXAMPLES. That corpus is company data and is never committed
(.gitignore blocks *.mpp).
License
LGPL-2.1-or-later. The MPP14 format-reading logic is ported from MPXJ (LGPL 2.1); the vendored CFBF container code originates from the MIT-licensed msg-extractor-ruby and is relicensed here under the LGPL by its author.