Module: Jekyll::L10n::BlockTextExtractor

Defined in:
lib/jekyll-l10n/translation/block_text_extractor.rb

Overview

Extracts normalized text from block-level HTML elements.

BlockTextExtractor extracts the complete text content from a block element while removing nested block-level elements and empty icon tags. This is used to match against block-level translations where the entire element has a single translation rather than individual text node translations.

Key responsibilities:

  • Extract text from extractable block elements

  • Remove nested block elements from text

  • Remove empty icon tags (external link markers)

  • Normalize and validate extracted text

  • Decode HTML entities

Examples:

text = BlockTextExtractor.extract(paragraph_node)
# Returns normalized text from paragraph, useful for finding block translations

Class Method Summary collapse

Class Method Details

.extract(node) ⇒ String?

Extract normalized block text from an element.

Returns nil if element is not extractable or if extracted text fails validation. Clones the node, removes nested block elements and empty icon tags, normalizes whitespace, decodes HTML entities, and validates.

Parameters:

  • node (Nokogiri::XML::Element)

    DOM element to extract from

Returns:

  • (String, nil)

    Normalized text from element, or nil if not valid



38
39
40
41
42
43
44
45
46
47
48
49
# File 'lib/jekyll-l10n/translation/block_text_extractor.rb', line 38

def extract(node)
  return nil unless extractable?(node)

  clone = node.dup
  HtmlTextUtils.remove_block_elements(clone)
  HtmlTextUtils.remove_empty_icon_tags(clone)

  text = TextNormalizer.normalize(clone.inner_html).strip
  text = HtmlTextUtils.decode_html_entities(text)

  TextValidator.valid?(text) ? text : nil
end

.extractable?(node) ⇒ Boolean

Returns:

  • (Boolean)


51
52
53
# File 'lib/jekyll-l10n/translation/block_text_extractor.rb', line 51

def extractable?(node)
  node.element? && HtmlElements::CONTENT_ELEMENTS.include?(node.name)
end