Class: Jekyll::L10n::HtmlTranslator

Inherits:
Object
  • Object
show all
Defined in:
lib/jekyll-l10n/translation/html_translator.rb

Overview

Applies translations from PO files to HTML text nodes and DOM attributes.

HtmlTranslator walks the DOM tree of parsed HTML documents and applies translations to text content and configurable HTML attributes (title, alt, aria-label, etc.). It supports three fallback modes for missing translations: using original text, marking untranslated content, or leaving blank. It also handles block-level translations for elements with complete translations and preserves URL transformations.

Key responsibilities:

  • Parse full HTML documents while preserving DOCTYPE and structure

  • Translate text nodes using normalized text for matching

  • Translate HTML attributes (title, alt, aria-label, placeholder, aria-description)

  • Apply fallback modes when translations are missing (english/marker/empty)

  • Handle block-level translations for content elements

  • Transform relative URLs to locale-prefixed URLs

  • Remove auto-inserted meta charset tags from serialized HTML

Examples:

translator = HtmlTranslator.new('english', ['title', 'alt'])
translated = translator.translate(html, translations, 'es', '/baseurl')
# Returns HTML with text and attributes translated to Spanish

See Also:

Constant Summary collapse

INLINE_ELEMENT_TAGS =

Inline element tags that are indexed and restored during block-level translation. Includes translatable-content tags (replaced with <g> in Phase 2) and literal-content tags (class/style stripped but tag preserved). Derived from the canonical taxonomy in HtmlTextUtils so extraction and injection never diverge.

(HtmlTextUtils::PLACEHOLDERED_INLINE_TAGS +
HtmlTextUtils::LITERAL_INLINE_TAGS).freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(fallback_mode, translatable_attrs, debug_logging: false) ⇒ HtmlTranslator

Initialize a new HtmlTranslator.

Parameters:

  • fallback_mode (String, Symbol)

    How to handle missing translations: ‘english’ or :english - use original text (default) ‘marker’ or :marker - wrap with untranslated marker ‘empty’ or :empty - leave blank

  • translatable_attrs (Array<String>)

    HTML attributes to extract and translate (e.g., [‘title’, ‘alt’, ‘aria-label’, ‘placeholder’, ‘aria-description’])

  • debug_logging (Boolean) (defaults to: false)

    (keyword) Enable detailed debug logging for translation process



52
53
54
55
56
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 52

def initialize(fallback_mode, translatable_attrs, debug_logging: false)
  @fallback_mode = fallback_mode
  @translatable_attrs = translatable_attrs
  @debug_logging = debug_logging
end

Instance Attribute Details

#debug_loggingObject (readonly)

Returns the value of attribute debug_logging.



40
41
42
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 40

def debug_logging
  @debug_logging
end

#fallback_modeObject (readonly)

Returns the value of attribute fallback_mode.



40
41
42
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 40

def fallback_mode
  @fallback_mode
end

#translatable_attrsObject (readonly)

Returns the value of attribute translatable_attrs.



40
41
42
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 40

def translatable_attrs
  @translatable_attrs
end

Instance Method Details

#translate(html, translations, locale = 'en', baseurl = '') ⇒ String

Translate an HTML document to a specific locale.

Parses the HTML document, applies translations to text nodes and attributes, transforms URLs to be locale-aware, and returns the translated HTML with proper structure preserved.

Parameters:

  • html (String)

    Full HTML document to translate

  • translations (Hash)

    Translation hash mapping normalized text to translated strings or metadata hashes with :msgstr, :reference, :fuzzy keys

  • locale (String) (defaults to: 'en')

    Target locale code (defaults to “en”; e.g., ‘es’, ‘fr’)

  • baseurl (String) (defaults to: '')

    Base URL for relative URL transformation (defaults to “”)

Returns:

  • (String)

    Translated HTML with URLs transformed and meta charset removed



70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 70

def translate(html, translations, locale = 'en', baseurl = '')
  # Use HtmlParser to properly parse full HTML documents while preserving
  # DOCTYPE, html tag, and document structure. Any auto-inserted meta tags are
  # removed by HtmlParser.remove_meta_charset after serialization.
  # See: spec/regression/nokogiri_meta_tag_spec.rb for regression tests
  doc = HtmlParser.parse_document(html)

  translate_node(doc, translations)

  # Transform URLs on the document object before serialization to avoid double-parsing
  # and preserve the correct DOCTYPE and HTML structure. This prevents Nokogiri from
  # downgrading to HTML 4.0 DOCTYPE when parsing the serialized HTML again.
  # See: spec/jekyll-l10n/utils/url_transformer_spec.rb for tests
  UrlTransformer.transform_document(doc, locale, baseurl)

  result = doc.to_html

  # Remove the auto-inserted meta tag by libxml2 during HTML serialization
  # Matches: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  HtmlParser.remove_meta_charset(result)
end