Module: Jekyll::L10n::TextNormalizer
- Defined in:
- lib/jekyll-l10n/utils/text_normalizer.rb
Overview
Normalizes whitespace in text for consistent matching.
TextNormalizer converts multiple whitespace characters (newlines, tabs, carriage returns) to single spaces and collapses consecutive spaces into a single space. Used during extraction and translation to ensure consistent text matching regardless of HTML formatting.
Key responsibilities:
-
Replace newlines, tabs, carriage returns with spaces
-
Collapse consecutive spaces to single space
Class Method Summary collapse
-
.normalize(text) ⇒ String?
Normalize whitespace in text for consistent translation matching.
Class Method Details
.normalize(text) ⇒ String?
Normalize whitespace in text for consistent translation matching.
Why Normalization Is Critical ===
HTML rendering treats whitespace differently than source code:
-
Multiple spaces render as one space
-
Newlines become spaces (unless in <pre> tags)
-
Tabs become spaces
Without normalization, matching fails:
Source HTML: "<p>Hello world</p>" (two spaces)
Rendered: "Hello world" (one space)
PO entry msgid: "Hello world" (one space)
Without normalization: "Hello world" ≠ "Hello world" (NO MATCH!)
With normalization: "Hello world" == "Hello world" (MATCH!)
Process ===
-
Replace all newlines, tabs, carriage returns with spaces
-
Collapse consecutive spaces into single space
This ensures text from DOM matches extracted msgid exactly.
40 41 42 43 44 |
# File 'lib/jekyll-l10n/utils/text_normalizer.rb', line 40 def normalize(text) return nil if text.nil? text.gsub(/[[:space:]]+/, ' ') end |