Class: Uniword::FormatDetector
- Inherits:
-
Object
- Object
- Uniword::FormatDetector
- Defined in:
- lib/uniword/format_detector.rb
Overview
Detects document format from file signatures and extensions.
Responsibility: Identify document format using file magic numbers and fallback to extension-based detection. Follows Single Responsibility Principle - detection logic separated from other concerns.
Detection strategy:
-
Check file signature (magic number)
-
Check MIME headers for MHTML
-
Fallback to file extension
Constant Summary collapse
- ZIP_SIGNATURE =
ZIP file magic number (PKx03x04)
[0x50, 0x4B, 0x03, 0x04].pack("C*").freeze
- HTML_MARKERS =
HTML tag markers
["<!DOCTYPE html", "<html", "<HTML"].freeze
- MIME_HEADER =
MIME version header for MHTML
"MIME-Version:"
Instance Method Summary collapse
-
#detect(path) ⇒ Symbol
Detect the format of a file or stream.
Instance Method Details
#detect(path) ⇒ Symbol
Detect the format of a file or stream.
40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/uniword/format_detector.rb', line 40 def detect(path) # For streams, detect from content return detect_stream_format(path) if path.is_a?(IO) || path.is_a?(StringIO) validate_path(path) # Try signature-based detection first format = detect_by_signature(path) return format if format # Fallback to extension-based detection detect_by_extension(path) end |