Module: Sisimai::Message

Defined in:
lib/sisimai/message.rb

Overview

Sisimai::Message convert bounce email text to data structure. It resolve email text into an UNIX From line, the header part of the mail, delivery status, and RFC822 header part. When the email given as a argument of “rise” method is not a bounce email, the method returns nil.

Constant Summary collapse

DefaultSet =
Sisimai::Order.another.freeze
LhostTable =
Sisimai::Lhost.path.freeze
Fields1894 =
Sisimai::RFC1894.FIELDINDEX.freeze
Fields5322 =
Sisimai::RFC5322.FIELDINDEX.freeze
Fields5965 =
Sisimai::RFC5965.FIELDINDEX.freeze
FieldIndex =
[Fields1894.flatten, Fields5322.flatten, Fields5965.flatten].flatten.freeze
FieldTable =
FieldIndex.map { |e| [e.downcase, e] }.to_h.freeze
Boundaries =
['Content-Type: message/rfc822', 'Content-Type: text/rfc822-headers'].freeze
MediaTypes =
[
  %w[message/xdelivery-status message/delivery-status],
  %w[message/disposition-notification message/delivery-status],
  %w[message/global-delivery-status message/delivery-status],
  %w[message/global-disposition-notification  message/delivery-status],
  %w[message/global-delivery-status message/delivery-status],
  %w[message/global-headers text/rfc822-headers],
  %w[message/global message/rfc822],
].freeze

Class Method Summary collapse

Class Method Details

.makemap(argv0 = '', argv1 = nil) ⇒ Hash

Convert a text including email headers to a hash reference

Parameters:

  • argv0 (String) (defaults to: '')

    Email header data

  • argv1 (Bool) (defaults to: nil)

    Decode “Subject:” header

Returns:

  • (Hash)

    Structured email header data

Since:

  • v4.25.6



146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
# File 'lib/sisimai/message.rb', line 146

def makemap(argv0 = '', argv1 = nil)
  return {} if argv0.empty?
  argv0.gsub!(/^[>]+[ ]/m, '') # Remove '>' indent symbol of forwarded message

  # Select and convert all the headers in $argv0. The following regular expression is based on
  # https://gist.github.com/xtetsuji/b080e1f5551d17242f6415aba8a00239
  headermaps = {'subject' => ''}
  receivedby = []
  argv0.scan(/^([\w-]+):[ ]*(.*?)\n(?!\s)/m) { |e| headermaps[e[0].downcase] = e[1] }
  headermaps.delete('received')
  headermaps.each_key { |e| headermaps[e] = headermaps[e].gsub(/\n\s+/, ' ') }

  if argv0.include?('Received:')
    # Capture values of each Received: header
    re = argv0.scan(/^Received:[ ]*(.*?)\n(?!\s)/m).flatten
    re.each do |e|
      # 1. Exclude the Received header including "(qmail ** invoked from network)".
      # 2. Convert all consecutive spaces and line breaks into a single space character.
      next if e.include?(' invoked by uid')
      next if e.include?(' invoked from network')

      e.gsub!(/\n\s+/, ' ')
      e.squeeze!("\n\t ")
      receivedby << e
    end
  end
  headermaps['received'] = receivedby

  return headermaps if argv1.nil? || headermaps['subject'].empty?

  # Convert MIME-Encoded subject
  if Sisimai::String.is_8bit(headermaps['subject'])
    # The value of ``Subject'' header is including multibyte character, is not MIME-Encoded text.
    headermaps['subject'].scrub!('?')
  else
    # MIME-Encoded subject field or ASCII characters only
    r = []
    if Sisimai::RFC2045.is_encoded(headermaps['subject'])
      # split the value of Subject by borderline
      headermaps['subject'].split(/ /).each do |v|
        # Insert value to the array if the string is MIME encoded text
        r << v if Sisimai::RFC2045.is_encoded(v)
      end
    else
      # Subject line is not MIME encoded
      r << headermaps['subject']
    end
    headermaps['subject'] = Sisimai::RFC2045.decodeH(r)
  end
  return headermaps
end

.part(email) ⇒ Array

Divide email data up headers and a body part.

Parameters:

  • email (String)

    Email data

Returns:

  • (Array)

    Email data after split



106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/sisimai/message.rb', line 106

def part(email)
  return nil if email.empty?

  parts = ['', '', '']  # 0:From, 1:Header, 2:Body
  email.gsub!(/\A\s+/, '')
  email.gsub!(/\r\n/, "\n") if email.include?("\r\n")

  (parts[1], parts[2]) = email.split(/\n\n/, 2)
  return nil if parts[1].nil? || parts[2].nil?

  if parts[1].start_with?('From ')
    # From MAILER-DAEMON Tue Feb 11 00:00:00 2014
    parts[0] = parts[1].split(/\n/, 2)[0].delete("\r")
  else
    # Set pseudo UNIX From line
    parts[0] = 'MAILER-DAEMON Tue Feb 11 00:00:00 2014'
  end
  parts[1] += "\n" if parts[1].end_with?("\n") == false

  %w[image/ application/ text/html].each do |e|
    # https://github.com/sisimai/p5-sisimai/issues/492, Reduce email size
    p0 = 0
    p1 = 0
    ep = e == 'text/html' ? '</html>' : "--\n"
    while true
      # Remove each part from "Content-Type: image/..." to "--\n" (the end of each boundary)
      p0 = parts[2].index("Content-Type: #{e}", p0); break if p0.nil?
      p1 = parts[2].index(ep, p0 + 32);              break if p1.nil?
      parts[2][p0, p1 - p0] = ''
    end
  end
  parts[2] += "\n"
  return parts
end

.rise(**argvs) ⇒ Sisimai::Message

Read an email message and convert to structured format

Parameters:

  • argvs (Hash)

    Module to be loaded

Returns:

  • (Sisimai::Message)

    Structured email data or nil if each value of the arguments are missing



40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# File 'lib/sisimai/message.rb', line 40

def rise(**argvs)
  return nil if argvs.nil?
  email = argvs[:data].scrub('?').gsub("\r\n", "\n")
  thing = {'from' => '','header' => {}, 'rfc822' => '', 'ds' => [], 'catch' => nil}
  param = {}

  aftersplit = nil
  beforefact = nil
  parseagain = 0

  while parseagain < 2 do
    # 1. Split email data to headers and a body part.
    break unless aftersplit = Sisimai::Message.part(email)

    # 2. Convert email headers from text to hash reference
    thing['from']   = aftersplit[0]
    thing['header'] = Sisimai::Message.makemap(aftersplit[1])

    # 3. Decode and rewrite the "Subject:" header
    if thing['header']['subject'].empty? == false
      # Decode MIME-Encoded "Subject:" header
      cv = thing['header']['subject']
      cq = Sisimai::RFC2045.is_encoded(cv) ? Sisimai::RFC2045.decodeH(cv.split(/[ ]/)) : cv
      cl = cq.downcase
      p1 = cl.index('fwd:'); p1 = cl.index('fw:') if p1.nil?

      # Remove "Fwd:" string from the Subject: header
      if p1
        # Delete quoted strings, quote symbols(>)
        cq = cq[cq.index(':') + 1, cq.size]
        aftersplit[2] = aftersplit[2].gsub(/^[>][ ]/, '').gsub(/^[>]$/, '')
      end
      thing['header']['subject'] = cq
    end

    # 4. Rewrite message body for detecting the bounce reason
    param = {
      'hook' => argvs[:hook] || nil,
      'mail' => thing,
      'body' => aftersplit[2],
      'tryonfirst' => Sisimai::Order.make(thing['header']['subject'])
    }
    break if beforefact = Sisimai::Message.sift(param)
    break if Boundaries.none? { |a| aftersplit[2].include?(a) }

    # 5. Try to sift again
    #    There is a bounce message inside of mutipart/*, try to sift the first message/rfc822
    #    part as a entire message body again.
    parseagain += 1
    email = Sisimai::RFC5322.part(aftersplit[2], Boundaries, true).pop.sub(/\A\s+/, '')
    break if email.size < 128
  end
  return nil if beforefact.nil? || beforefact.empty?

  # 6. Rewrite headers of the original message in the body part
  %w|ds catch rfc822|.each { |e| thing[e] = beforefact[e] }
  p = beforefact['rfc822']
  p = aftersplit[2] if p.empty?
  thing['rfc822'] = p.is_a?(::String) ? Sisimai::Message.makemap(p, true) : p

  return thing
end

.sift(argvs) ⇒ Hash

This method is abstract.

Sift bounce mail with each MTA module

Returns Decoded and structured bounce mails.

Parameters:

  • argvs (Hash)

    Processing message entity.

  • options

    argvs [Hash] mail Email message entity

  • options

    mail [String] from From line of mbox

  • options

    mail [Hash] header Email header data

  • options

    mail [String] rfc822 Original message part

  • options

    mail [Array] ds Delivery status list(decoded data)

  • options

    argvs [String] body Email message body

  • options

    argvs [Array] tryonfirst MTA module list to load on first

Returns:

  • (Hash)

    Decoded and structured bounce mails



316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
# File 'lib/sisimai/message.rb', line 316

def sift(argvs)
  return nil if argvs['mail'].nil? || argvs['body'].nil?

  mailheader = argvs['mail']['header']
  bodystring = argvs['body']
  hookmethod = argvs['hook'] || nil
  havecaught = nil
  return nil if mailheader.nil?

  # PRECHECK_EACH_HEADER:
  # Set empty string if the value is nil
  mailheader['from']         ||= ''
  mailheader['subject']      ||= ''
  mailheader['content-type'] ||= ''

  # Tidy up each field name and value in the entire message body
  bodystring = Sisimai::Message.tidy(bodystring)

  # Decode BASE64 Encoded message body, rewrite.
  mesgformat = (mailheader['content-type'] || '').downcase
  ctencoding = (mailheader['content-transfer-encoding'] || '').downcase
  if mesgformat.start_with?('text/plain', 'text/html')
    # Content-Type: text/plain; charset=UTF-8
    if ctencoding == 'base64'
      # Content-Transfer-Encoding: base64
      bodystring = Sisimai::RFC2045.decodeB(bodystring)

    elsif ctencoding == 'quoted-printable'
      # Content-Transfer-Encoding: quoted-printable
      bodystring = Sisimai::RFC2045.decodeQ(bodystring)
    end

    if mesgformat.start_with?('text/html;')
      # Content-Type: text/html;...
      bodystring = Sisimai::String.to_plain(bodystring, true)
    end
  elsif mesgformat.start_with?('multipart/')
    # NOT text/plain
    # In case of Content-Type: multipart/*
    p = Sisimai::RFC2045.makeflat(mailheader['content-type'], bodystring)
    bodystring = p if p.empty? == false
  end
  bodystring = bodystring.scrub('?').delete("\r").gsub("\t", " ")

  haveloaded = {}
  havesifted = nil
  modulename = ''
  if hookmethod.is_a? Proc
    # Call the hook method
    begin
      p = {'headers' => mailheader, 'message' => bodystring}
      havecaught = hookmethod.call(p)
    rescue StandardError => ce
      warn ' ***warning: Something is wrong in hook method ":hook":' + ce.to_s
    end
  end

  catch :DECODER do
    while true
      # 1. MTA Module Candidates to be tried on first
      # 2. Sisimai::Lhost::*
      # 3. Sisimai::RFC3464
      # 4. Sisimai::ARF
      # 5. Sisimai::RFC3834
      [argvs['tryonfirst'], DefaultSet].flatten.each do |r|
        # Try MTA module candidates
        next if haveloaded[r]
        require LhostTable[r]
        havesifted = Module.const_get(r).inquire(mailheader, bodystring)
        haveloaded[r] = true
        modulename = r
        throw :DECODER if havesifted
      end

      unless haveloaded['Sisimai::RFC3464']
        # When the all of Sisimai::Lhost::* modules did not return bounce data, call Sisimai::RFC3464;
        require 'sisimai/rfc3464'
        havesifted = Sisimai::RFC3464.inquire(mailheader, bodystring)
        modulename = 'RFC3464'
        throw :DECODER if havesifted
      end

      unless haveloaded['Sisimai::ARF']
        # Feedback Loop message
        require 'sisimai/arf'
        havesifted = Sisimai::ARF.inquire(mailheader, bodystring)
        modulename = "ARF"
        throw :DECODER if havesifted
      end

      unless haveloaded['Sisimai::RFC3834']
        # Try to sift the message as auto reply message defined in RFC3834
        require 'sisimai/rfc3834'
        havesifted = Sisimai::RFC3834.inquire(mailheader, bodystring)
        modulename = 'RFC3834'
        throw :DECODER if havesifted
      end

      break # as of now, we have no sample email for coding this block
    end
  end
  return nil if havesifted.nil?

  havesifted['catch'] = havecaught
  modulename = modulename.sub(/\A.+::/, '')
  havesifted['ds'].each do |e|
    e["agent"] = modulename if e["agent"].nil? || e["agent"].empty?
    e.each_key { |a| e[a] = "" if e[a].nil? }  # Replace nil with ""
  end
  return havesifted
end

.tidy(argv0 = '') ⇒ String

This method is abstract.

Tidy up each field name and format

Returns Strings tidied up.

Parameters:

  • argv0 (String) (defaults to: '')

    Strings including field and value used at an email

Returns:

  • (String)

    Strings tidied up

Since:

  • v5.0.0



202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
# File 'lib/sisimai/message.rb', line 202

def tidy(argv0 = '')
  return '' if argv0.empty?

  email = ''
  lines = argv0.split("\n")
  index = -1
  lines.each do |e|
    # Find and tidy up fields defined in RFC5322, RFC1894, and RFC5965
    # 1. Find a field label defined in RFC5322, RFC1894, or RFC5965 from this line
    p0 = e.index(':') || -1
    cf = e.downcase[0, p0].to_s.rstrip
    fn = FieldTable[cf] || ''

    index += 1
    if fn == ''
      # There is neither ":" character nor the field listed in $FieldTable
      email += "#{e}\n"
      next
    end

    # 2. Tidy up a sub type of each field defined in RFC1894 such as Reporting-MTA: DNS;...
    ab = []
    bf = e[p0 + 1, e.size - p0 - 1]
    p1 = bf.index(';')
    while true
      # Such as Diagnostic-Code, Remote-MTA, and so on
      # - Before: Diagnostic-Code: SMTP;550 User unknown
      # - After:  Diagnostic-Code: smtp; 550 User unknown
      break if ['Content-Type'].concat(Fields1894).none? { |a| a == fn }

      if p1
        # The field including one or more ";"
        bf.split(';').each do |f|
          # 2-1. Trim leading and trailing space characters from the current buffer
          f.strip!
          ps = ''

          # 2-2. Convert some parameters to the lower-cased string
          while true
            # For example,
            # - Content-Type: Message/delivery-status => message/delivery-status
            # - Content-Type: Charset=UTF8            => charset=utf8
            # - Reporting-MTA: DNS; ...               => dns
            # - Final-Recipient: RFC822; ...          => rfc822
            break if f.include?(' ')

            p2 = f.index('=')
            if p2
              # charset=, boundary=, and other pairs divided by "="
              ps = f[0, p2].downcase
              f[0, p2] = ps
            end
            f.downcase! if ps != 'boundary'
            f = 'rfc822' if f == 'rfc/822'
            break
          end
          ab << f
        end

        while true
          # Diagnostic-Code: x-unix;
          #   /var/email/kijitora/Maildir/tmp/1000000000.A000000B00000.neko22:
          #   Disk quota exceeded
          break if fn != 'Diagnostic-Code'
          break if ab.size != 1
          break if lines[index + 1].start_with?(' ') == false

          ab << ''
          break
        end
        bf = ab.join('; ') # Insert " " (space characer) immediately after ";"
        ab = []

      else
        # There is no ";" in the field
        break if fn.end_with?('-Date')        # Arrival-Date, Last-Attempt-Date
        break if fn.end_with?('-Message-ID')  # X-Original-Message-ID
        bf.downcase!
      end
      break
    end

    # 3. Tidy up a value, and a parameter of Content-Type: field
    if fn == "Content-Type"
      # Replace the value of "Content-Type" field
      MediaTypes.each do |f|
        # - Before: Content-Type: message/xdelivery-status; ...
        # - After:  Content-Type: message/delivery-status; ...
        p1 = bf.index(f[0]) || next
        bf[p1, f[0].size] = f[1]
      end
    end

    # 4. Remove redundant space characters
    bf = bf.squeeze(' ').strip
    email += sprintf("%s: %s\n", fn, bf)
  end

  # 5. Convert the lower-cased SMTP command to the upper-cased.
  email  = email.gsub("after end of data:", "after end of DATA:")
  email += "\n" if email.end_with?("\n\n") == false
  return email
end