Class: Jekyll::L10n::HtmlTranslator

Inherits:
Object
  • Object
show all
Defined in:
lib/jekyll-l10n/translation/html_translator.rb

Overview

Applies translations from PO files to HTML text nodes and DOM attributes.

HtmlTranslator walks the DOM tree of parsed HTML documents and applies translations to text content and configurable HTML attributes (title, alt, aria-label, etc.). It supports three fallback modes for missing translations: using original text, marking untranslated content, or leaving blank. It also handles block-level translations for elements with complete translations and preserves URL transformations.

Key responsibilities:

  • Parse full HTML documents while preserving DOCTYPE and structure

  • Translate text nodes using normalized text for matching

  • Translate HTML attributes (title, alt, aria-label, placeholder, aria-description)

  • Apply fallback modes when translations are missing (english/marker/empty)

  • Handle block-level translations for content elements

  • Transform relative URLs to locale-prefixed URLs

  • Remove auto-inserted meta charset tags from serialized HTML

Examples:

translator = HtmlTranslator.new('english', ['title', 'alt'])
translated = translator.translate(html, translations, 'es', '/baseurl')
# Returns HTML with text and attributes translated to Spanish

See Also:

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(fallback_mode, translatable_attrs, debug_logging: false) ⇒ HtmlTranslator

Initialize a new HtmlTranslator.

Parameters:

  • fallback_mode (String, Symbol)

    How to handle missing translations: ‘english’ or :english - use original text (default) ‘marker’ or :marker - wrap with untranslated marker ‘empty’ or :empty - leave blank

  • translatable_attrs (Array<String>)

    HTML attributes to extract and translate (e.g., [‘title’, ‘alt’, ‘aria-label’, ‘placeholder’, ‘aria-description’])

  • debug_logging (Boolean) (defaults to: false)

    (keyword) Enable detailed debug logging for translation process



51
52
53
54
55
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 51

def initialize(fallback_mode, translatable_attrs, debug_logging: false)
  @fallback_mode = fallback_mode
  @translatable_attrs = translatable_attrs
  @debug_logging = debug_logging
end

Instance Attribute Details

#debug_loggingObject (readonly)

Returns the value of attribute debug_logging.



39
40
41
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 39

def debug_logging
  @debug_logging
end

#fallback_modeObject (readonly)

Returns the value of attribute fallback_mode.



39
40
41
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 39

def fallback_mode
  @fallback_mode
end

#translatable_attrsObject (readonly)

Returns the value of attribute translatable_attrs.



39
40
41
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 39

def translatable_attrs
  @translatable_attrs
end

Instance Method Details

#translate(html, translations, locale = 'en', baseurl = '') ⇒ String

Translate an HTML document to a specific locale.

Parses the HTML document, applies translations to text nodes and attributes, transforms URLs to be locale-aware, and returns the translated HTML with proper structure preserved.

Parameters:

  • html (String)

    Full HTML document to translate

  • translations (Hash)

    Translation hash mapping normalized text to translated strings or metadata hashes with :msgstr, :reference, :fuzzy keys

  • locale (String) (defaults to: 'en')

    Target locale code (defaults to “en”; e.g., ‘es’, ‘fr’)

  • baseurl (String) (defaults to: '')

    Base URL for relative URL transformation (defaults to “”)

Returns:

  • (String)

    Translated HTML with URLs transformed and meta charset removed



69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/jekyll-l10n/translation/html_translator.rb', line 69

def translate(html, translations, locale = 'en', baseurl = '')
  # Use HtmlParser to properly parse full HTML documents while preserving
  # DOCTYPE, html tag, and document structure. Any auto-inserted meta tags are
  # removed by HtmlParser.remove_meta_charset after serialization.
  # See: spec/regression/nokogiri_meta_tag_spec.rb for regression tests
  doc = HtmlParser.parse_document(html)

  translate_node(doc, translations)

  # Transform URLs on the document object before serialization to avoid double-parsing
  # and preserve the correct DOCTYPE and HTML structure. This prevents Nokogiri from
  # downgrading to HTML 4.0 DOCTYPE when parsing the serialized HTML again.
  # See: spec/jekyll-l10n/utils/url_transformer_spec.rb for tests
  UrlTransformer.transform_document(doc, locale, baseurl)

  result = doc.to_html

  # Remove the auto-inserted meta tag by libxml2 during HTML serialization
  # Matches: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  HtmlParser.remove_meta_charset(result)
end