Module: MsgExtractor::Mapi::Decoders
- Defined in:
- lib/msg_extractor/mapi/decoders.rb
Overview
Converts raw MAPI property bytes into Ruby values. All text becomes UTF-8 with invalid sequences replaced; binary stays ASCII-8BIT.
Constant Summary collapse
- CODE_PAGES =
{ 437 => "IBM437", 850 => "IBM850", 932 => "Windows-31J", 936 => "GBK", 949 => "EUC-KR", 950 => "Big5", 1250 => "Windows-1250", 1251 => "Windows-1251", 1252 => "Windows-1252", 1253 => "Windows-1253", 1254 => "Windows-1254", 1255 => "Windows-1255", 1256 => "Windows-1256", 1257 => "Windows-1257", 1258 => "Windows-1258", 20127 => "US-ASCII", 28591 => "ISO-8859-1", 28592 => "ISO-8859-2", 28605 => "ISO8859-15", 65001 => "UTF-8" }.freeze
- EPOCH_DELTA =
Seconds between 1601-01-01 (FILETIME epoch) and 1970-01-01 (Unix).
11_644_473_600
Class Method Summary collapse
-
.bytes_to_utf8(bytes, codepage) ⇒ Object
PR_HTML bytes -> UTF-8 string using PR_INTERNET_CPID.
-
.decode(type, bytes, codepage: 1252) ⇒ Object
For fixed-width types,
bytesmay be the full 8-byte record value field; unpack reads only the leading bytes it needs. - .filetime(ticks) ⇒ Object
- .string8(bytes, codepage) ⇒ Object
- .utf16(bytes) ⇒ Object
Class Method Details
.bytes_to_utf8(bytes, codepage) ⇒ Object
PR_HTML bytes -> UTF-8 string using PR_INTERNET_CPID.
51 |
# File 'lib/msg_extractor/mapi/decoders.rb', line 51 def bytes_to_utf8(bytes, codepage) = string8(bytes, codepage) |
.decode(type, bytes, codepage: 1252) ⇒ Object
For fixed-width types, bytes may be the full 8-byte record value field; unpack reads only the leading bytes it needs.
23 24 25 26 27 28 29 30 31 32 33 34 35 |
# File 'lib/msg_extractor/mapi/decoders.rb', line 23 def decode(type, bytes, codepage: 1252) case type when PT_UNICODE then utf16(bytes) when PT_STRING8 then string8(bytes, codepage) when PT_SYSTIME then filetime(bytes.unpack1("Q<")) when PT_LONG then bytes.unpack1("l<") when PT_SHORT then bytes.unpack1("s<") when PT_I8 then bytes.unpack1("q<") when PT_DOUBLE then bytes.unpack1("E") when PT_BOOLEAN then (bytes.unpack1("v") || 0) != 0 else bytes # PT_BINARY, PT_OBJECT, PT_CLSID and anything unknown: raw end end |
.filetime(ticks) ⇒ Object
53 54 55 56 |
# File 'lib/msg_extractor/mapi/decoders.rb', line 53 def filetime(ticks) return nil if ticks.nil? || ticks.zero? Time.at(Rational(ticks, 10_000_000) - EPOCH_DELTA).utc end |
.string8(bytes, codepage) ⇒ Object
43 44 45 46 47 48 |
# File 'lib/msg_extractor/mapi/decoders.rb', line 43 def string8(bytes, codepage) encoding = CODE_PAGES.fetch(codepage, "Windows-1252") bytes.dup.force_encoding(encoding) .encode(Encoding::UTF_8, invalid: :replace, undef: :replace) .sub(/\0+\z/, "") end |
.utf16(bytes) ⇒ Object
37 38 39 40 41 |
# File 'lib/msg_extractor/mapi/decoders.rb', line 37 def utf16(bytes) bytes.dup.force_encoding(Encoding::UTF_16LE) .encode(Encoding::UTF_8, invalid: :replace, undef: :replace) .sub(/\0+\z/, "") end |