mpp_reader

Pure Ruby reader for Microsoft Project .mpp files. No native extensions, no runtime dependencies.

Status: core reader working. Reads tasks (name, dates, duration, outline hierarchy, percent complete, milestone/active/manual flags, notes), resources, assignments (task/resource links, units, work, dates), predecessor links (type and lag) and calendars (weekly hours, exceptions, base/resource calendar relationships) from MPP14 files, with output verified field-by-field against MPXJ on a 29-file / 2173-task real-world corpus. Notes/comments (RTF converted to plain text) are available on tasks, resources and assignments. Not yet read: custom fields, baselines, recurring exception patterns.

require "mpp_reader"

project = MppReader.open("plan.mpp")
project.tasks.each do |task|
  puts "#{'  ' * task.outline_level}#{task.name}: " \
       "#{task.start} .. #{task.finish} (#{task.duration&.value} #{task.duration&.units})"
end
project.resources.map(&:name)
project.assignments.each { |a| puts "#{a.task_unique_id} -> #{a.resource_unique_id}: #{a.units}" }
project.tasks.flat_map(&:predecessors).each { |r| puts "#{r.predecessor_task_unique_id} #{r.type} #{r.successor_task_unique_id}" }

CLI

mpp_reader plan.mpp           # human-readable task tree, resources, notes
mpp_reader plan.mpp --json    # full structured dump (tasks, resources,
                              # assignments, calendars) as one JSON object

Scope

Targets the MPP14 format only — files saved by Microsoft Project 2010 through 2021. Older formats (MPP8/9/12) and password-protected files raise MppReader::UnsupportedFormatError.

Architecture

An .mpp file is an OLE2 compound document ([MS-CFB]): a mini-filesystem of storages (folders) and streams (files).

  • MppReader::Cfbf — generic compound-file reader (Cfbf::File.read(path), #stream("path/in/file")). Handles v3/v4 sector sizes, chained DIFAT, miniFAT, with cycle and corruption detection.
  • Project data lives under the " 114" storage: TBkndTask, TBkndRsc, TBkndAssn, TBkndCal, TBkndCons directories, each holding FixedData/FixedMeta (fixed-size records) and VarMeta/Var2Data (variable-length fields) streams. The binary layouts are ported from MPXJ, which is the de-facto specification of this undocumented format.

Tools: tool/compare_oracle.rb diffs this gem's output against MPXJ JSON output for a corpus; tool/generate_field_tables.rb regenerates the field-id tables from an MPXJ checkout.

Tests

bundle install
bundle exec rake test

Unit tests use synthetic compound files (test/support/) and run self-contained. The smoke test additionally runs against a local corpus of real .mpp files when present — it looks in ../examples or the directory named by MPP_EXAMPLES. That corpus is company data and is never committed (.gitignore blocks *.mpp).

License

LGPL-2.1-or-later. The MPP14 format-reading logic is ported from MPXJ (LGPL 2.1); the vendored CFBF container code originates from the MIT-licensed msg-extractor-ruby and is relicensed here under the LGPL by its author.