Module: Jekyll::L10n::TextNormalizer

Defined in:
lib/jekyll-l10n/utils/text_normalizer.rb

Overview

Normalizes whitespace in text for consistent matching.

TextNormalizer converts multiple whitespace characters (newlines, tabs, carriage returns) to single spaces and collapses consecutive spaces into a single space. Used during extraction and translation to ensure consistent text matching regardless of HTML formatting.

Key responsibilities:

  • Replace newlines, tabs, carriage returns with spaces

  • Collapse consecutive spaces to single space

Class Method Summary collapse

Class Method Details

.normalize(text) ⇒ String?

Normalize whitespace in text for consistent translation matching.

Why Normalization Is Critical ===

HTML rendering treats whitespace differently than source code:

  • Multiple spaces render as one space

  • Newlines become spaces (unless in <pre> tags)

  • Tabs become spaces

Without normalization, matching fails:

Source HTML: "<p>Hello  world</p>"  (two spaces)
Rendered: "Hello world" (one space)
PO entry msgid: "Hello world" (one space)
Without normalization: "Hello  world" ≠ "Hello world" (NO MATCH!)
With normalization: "Hello world" == "Hello world" (MATCH!)

Process ===

  1. Replace all newlines, tabs, carriage returns with spaces

  2. Collapse consecutive spaces into single space

This ensures text from DOM matches extracted msgid exactly.

Parameters:

  • text (String, nil)

    Text to normalize

Returns:

  • (String, nil)

    Normalized text or nil



40
41
42
43
44
# File 'lib/jekyll-l10n/utils/text_normalizer.rb', line 40

def normalize(text)
  return nil if text.nil?

  text.gsub(/[[:space:]]+/, ' ')
end