msg_extractor
Pure Ruby parser for Microsoft Outlook .msg files. Parses the OLE2/CFBF
container and MAPI properties into structured Ruby objects — no native
extensions, no runtime dependencies, no Python. Built for use in Ruby and
Rails applications.
Installation
# Gemfile
gem "msg_extractor"
Requires Ruby >= 3.1.
Usage
require "msg_extractor"
msg = MsgExtractor.open("invoice.msg") # also accepts an IO or a binary String
msg.subject # => "Invoice 2026-001"
msg.sender # => #<MsgExtractor::Recipient name="Bob" email="bob@example.com">
msg.to # => [Recipient, ...] (also: cc, bcc, recipients)
msg.date # => Time (UTC)
msg.body # => plain text body (UTF-8)
msg.html_body # => HTML body; extracted from the RTF body when absent
msg.rtf_body # => decompressed RTF (binary) or nil
msg.headers # => case-insensitive transport headers: msg.headers["Subject"]
msg..each do |att|
att.filename # => "report.pdf"
att.mime_type # => "application/pdf"
att.content_id # => for matching cid: URLs in html_body
att.data # => raw bytes — hand to ActiveStorage, S3, etc.
att.save(dir: "tmp/")
att. # => parsed MsgExtractor::Message when the attachment
# is itself an embedded .msg (att.embedded_message?)
end
msg.save(dir: "out/") # writes message.txt, message.html and attachments
MsgExtractor.open returns a typed object based on the message class:
| Message class | Returned type | Extra readers |
|---|---|---|
| IPM.Note, REPORT.* | Message |
— |
| IPM.Contact, IPM.DistList | Contact |
display_name, given_name, surname, company, job_title, business_phone, home_phone, mobile_phone, postal_address, emails |
| IPM.Appointment, IPM.Schedule.Meeting.* | Appointment |
starts_at, ends_at, location, all_day?, organizer, required_attendees, optional_attendees |
| IPM.Task | Task |
starts_on, due_on, status, percent_complete, complete?, owner |
Other message classes raise MsgExtractor::UnsupportedTypeError; pass
strict: false to get a generic MessageObject instead.
Errors: all inherit from MsgExtractor::Error — InvalidFormatError,
UnsupportedTypeError, CorruptFileError.
CLI
msg_extractor FILE... [--out DIR] [--json] [--attachments-only]
Development
bundle install
rake test
Test fixtures and oracle data come from the Python
extract_msg project,
which this gem uses as a black-box behavioral reference (no code is
translated from it). See tool/generate_oracle.py.
Credits
This gem was developed with Claude Code (Anthropic's Claude Fable 5 model), implementing the Microsoft open specifications ([MS-CFB], [MS-OXMSG], [MS-OXRTFCP], [MS-OXRTFEX]) and validated against the output of the Python extract_msg library.
License
MIT.