Module: Jekyll::L10n::BlockTextExtractor

Defined in:
lib/jekyll-l10n/translation/block_text_extractor.rb

Overview

Extracts normalized text from block-level HTML elements.

BlockTextExtractor extracts the complete text content from a block element while removing nested block-level elements and empty icon tags. This is used to match against block-level translations where the entire element has a single translation rather than individual text node translations.

Key responsibilities:

  • Extract text from extractable block elements

  • Remove nested block elements from text

  • Remove empty icon tags (external link markers)

  • Normalize and validate extracted text

Examples:

text = BlockTextExtractor.extract(paragraph_node)
# Returns normalized text from paragraph, useful for finding block translations

Class Method Summary collapse

Class Method Details

.extract(node) ⇒ String?

Extract normalized block text from an element.

Returns nil if element is not extractable or if extracted text fails validation. Clones the node, removes nested block elements and empty icon tags, normalizes whitespace, and validates. HTML entities are preserved verbatim to match the keys produced by the extraction pipeline.

Parameters:

  • node (Nokogiri::XML::Element)

    DOM element to extract from

Returns:

  • (String, nil)

    Normalized text from element, or nil if not valid



38
39
40
41
42
43
44
45
46
47
48
# File 'lib/jekyll-l10n/translation/block_text_extractor.rb', line 38

def extract(node)
  return nil unless extractable?(node)

  clone = node.dup
  HtmlTextUtils.remove_block_elements(clone)
  HtmlTextUtils.remove_empty_icon_tags(clone)

  text = TextNormalizer.normalize(clone.inner_html).strip

  TextValidator.valid?(text) ? text : nil
end

.extractable?(node) ⇒ Boolean

Returns:

  • (Boolean)


50
51
52
# File 'lib/jekyll-l10n/translation/block_text_extractor.rb', line 50

def extractable?(node)
  node.element? && HtmlElements::CONTENT_ELEMENTS.include?(node.name)
end