Class: Uniword::Infrastructure::MimeParser

Inherits:
Object
  • Object
show all
Defined in:
lib/uniword/infrastructure/mime_parser.rb

Overview

Parses MHTML (MIME HTML) files into Mhtml::Document model.

Parses MIME multipart structure, decodes content transfer encodings, creates typed MimePart objects, and populates Mhtml::Document.

Examples:

Parse an MHTML file

parser = Uniword::Infrastructure::MimeParser.new
document = parser.parse("document.mhtml")

Instance Method Summary collapse

Instance Method Details

#parse(path) ⇒ Mhtml::Document

Parse MHTML file and return a populated Mhtml::Document.

Parameters:

  • path (String)

    The file path to parse

Returns:

Raises:

  • (ArgumentError)

    if path is nil or file not found



21
22
23
24
25
26
27
# File 'lib/uniword/infrastructure/mime_parser.rb', line 21

def parse(path)
  raise ArgumentError, "Path cannot be nil" if path.nil?
  raise ArgumentError, "File not found: #{path}" unless File.exist?(path)

  content = File.binread(path).force_encoding("UTF-8")
  parse_content(content)
end

#parse_content(content) ⇒ Mhtml::Document

Parse MHTML content string.

Parameters:

  • content (String)

    The MHTML content to parse

Returns:



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/uniword/infrastructure/mime_parser.rb', line 33

def parse_content(content)
  @content = content
  @boundary = extract_boundary
  @raw_parts = split_parts

  document = Mhtml::Document.new
  document.boundary = @boundary

  @raw_parts.each do |part|
    mime_part = parse_mime_part(part)
    next unless mime_part

    document.html_part = mime_part if mime_part.is_a?(Mhtml::HtmlPart) && !document.html_part

    document.add_part(mime_part)
  end

  (document)
  document
end