msg_extractor

Pure Ruby parser for Microsoft Outlook .msg files. Parses the OLE2/CFBF container and MAPI properties into structured Ruby objects — no native extensions, no runtime dependencies, no Python. Built for use in Ruby and Rails applications.

Installation

# Gemfile
gem "msg_extractor"

Requires Ruby >= 3.1.

Usage

require "msg_extractor"

msg = MsgExtractor.open("invoice.msg")   # also accepts an IO or a binary String

msg.subject        # => "Invoice 2026-001"
msg.sender         # => #<MsgExtractor::Recipient name="Bob" email="bob@example.com">
msg.to             # => [Recipient, ...]   (also: cc, bcc, recipients)
msg.date           # => Time (UTC)
msg.body           # => plain text body (UTF-8)
msg.html_body      # => HTML body; extracted from the RTF body when absent
msg.rtf_body       # => decompressed RTF (binary) or nil
msg.headers        # => case-insensitive transport headers: msg.headers["Subject"]

msg.attachments.each do |att|
  att.filename     # => "report.pdf"
  att.mime_type    # => "application/pdf"
  att.content_id   # => for matching cid: URLs in html_body
  att.data         # => raw bytes — hand to ActiveStorage, S3, etc.
  att.save(dir: "tmp/")
  att.message      # => parsed MsgExtractor::Message when the attachment
                   #    is itself an embedded .msg (att.embedded_message?)
end

msg.save(dir: "out/")  # writes message.txt, message.html and attachments

MsgExtractor.open returns a typed object based on the message class:

Message class Returned type Extra readers
IPM.Note, REPORT.* Message
IPM.Contact, IPM.DistList Contact display_name, given_name, surname, company, job_title, business_phone, home_phone, mobile_phone, postal_address, emails
IPM.Appointment, IPM.Schedule.Meeting.* Appointment starts_at, ends_at, location, all_day?, organizer, required_attendees, optional_attendees
IPM.Task Task starts_on, due_on, status, percent_complete, complete?, owner

Other message classes raise MsgExtractor::UnsupportedTypeError; pass strict: false to get a generic MessageObject instead.

Errors: all inherit from MsgExtractor::ErrorInvalidFormatError, UnsupportedTypeError, CorruptFileError.

CLI

msg_extractor FILE... [--out DIR] [--json] [--attachments-only]

Development

bundle install
rake test

Test fixtures and oracle data come from the Python extract_msg project, which this gem uses as a black-box behavioral reference (no code is translated from it). See tool/generate_oracle.py.

Credits

This gem was developed with Claude Code (Anthropic's Claude Fable 5 model), implementing the Microsoft open specifications ([MS-CFB], [MS-OXMSG], [MS-OXRTFCP], [MS-OXRTFEX]) and validated against the output of the Python extract_msg library.

License

MIT.