Module: MsgExtractor::Mapi::Decoders

Defined in:
lib/msg_extractor/mapi/decoders.rb

Overview

Converts raw MAPI property bytes into Ruby values. All text becomes UTF-8 with invalid sequences replaced; binary stays ASCII-8BIT.

Constant Summary collapse

CODE_PAGES =
{
  437 => "IBM437", 850 => "IBM850", 932 => "Windows-31J", 936 => "GBK",
  949 => "EUC-KR", 950 => "Big5",
  1250 => "Windows-1250", 1251 => "Windows-1251", 1252 => "Windows-1252",
  1253 => "Windows-1253", 1254 => "Windows-1254", 1255 => "Windows-1255",
  1256 => "Windows-1256", 1257 => "Windows-1257", 1258 => "Windows-1258",
  20127 => "US-ASCII", 28591 => "ISO-8859-1", 28592 => "ISO-8859-2",
  28605 => "ISO8859-15", 65001 => "UTF-8"
}.freeze
EPOCH_DELTA =

Seconds between 1601-01-01 (FILETIME epoch) and 1970-01-01 (Unix).

11_644_473_600

Class Method Summary collapse

Class Method Details

.bytes_to_utf8(bytes, codepage) ⇒ Object

PR_HTML bytes -> UTF-8 string using PR_INTERNET_CPID.



51
# File 'lib/msg_extractor/mapi/decoders.rb', line 51

def bytes_to_utf8(bytes, codepage) = string8(bytes, codepage)

.decode(type, bytes, codepage: 1252) ⇒ Object

For fixed-width types, bytes may be the full 8-byte record value field; unpack reads only the leading bytes it needs.



23
24
25
26
27
28
29
30
31
32
33
34
35
# File 'lib/msg_extractor/mapi/decoders.rb', line 23

def decode(type, bytes, codepage: 1252)
  case type
  when PT_UNICODE then utf16(bytes)
  when PT_STRING8 then string8(bytes, codepage)
  when PT_SYSTIME then filetime(bytes.unpack1("Q<"))
  when PT_LONG    then bytes.unpack1("l<")
  when PT_SHORT   then bytes.unpack1("s<")
  when PT_I8      then bytes.unpack1("q<")
  when PT_DOUBLE  then bytes.unpack1("E")
  when PT_BOOLEAN then (bytes.unpack1("v") || 0) != 0
  else bytes # PT_BINARY, PT_OBJECT, PT_CLSID and anything unknown: raw
  end
end

.filetime(ticks) ⇒ Object



53
54
55
56
# File 'lib/msg_extractor/mapi/decoders.rb', line 53

def filetime(ticks)
  return nil if ticks.nil? || ticks.zero?
  Time.at(Rational(ticks, 10_000_000) - EPOCH_DELTA).utc
end

.string8(bytes, codepage) ⇒ Object



43
44
45
46
47
48
# File 'lib/msg_extractor/mapi/decoders.rb', line 43

def string8(bytes, codepage)
  encoding = CODE_PAGES.fetch(codepage, "Windows-1252")
  bytes.dup.force_encoding(encoding)
       .encode(Encoding::UTF_8, invalid: :replace, undef: :replace)
       .sub(/\0+\z/, "")
end

.utf16(bytes) ⇒ Object



37
38
39
40
41
# File 'lib/msg_extractor/mapi/decoders.rb', line 37

def utf16(bytes)
  bytes.dup.force_encoding(Encoding::UTF_16LE)
       .encode(Encoding::UTF_8, invalid: :replace, undef: :replace)
       .sub(/\0+\z/, "")
end