Module: Jekyll::L10n::BlockTextExtractor
- Defined in:
- lib/jekyll-l10n/translation/block_text_extractor.rb
Overview
Extracts normalized text from block-level HTML elements.
BlockTextExtractor extracts the complete text content from a block element while removing nested block-level elements and empty icon tags. This is used to match against block-level translations where the entire element has a single translation rather than individual text node translations.
Key responsibilities:
-
Extract text from extractable block elements
-
Remove nested block elements from text
-
Remove empty icon tags (external link markers)
-
Normalize and validate extracted text
-
Decode HTML entities
Class Method Summary collapse
-
.extract(node) ⇒ String?
Extract normalized block text from an element.
- .extractable?(node) ⇒ Boolean
Class Method Details
.extract(node) ⇒ String?
Extract normalized block text from an element.
Returns nil if element is not extractable or if extracted text fails validation. Clones the node, removes nested block elements and empty icon tags, normalizes whitespace, decodes HTML entities, and validates.
38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/jekyll-l10n/translation/block_text_extractor.rb', line 38 def extract(node) return nil unless extractable?(node) clone = node.dup HtmlTextUtils.remove_block_elements(clone) HtmlTextUtils.(clone) text = TextNormalizer.normalize(clone.inner_html).strip text = HtmlTextUtils.decode_html_entities(text) TextValidator.valid?(text) ? text : nil end |
.extractable?(node) ⇒ Boolean
51 52 53 |
# File 'lib/jekyll-l10n/translation/block_text_extractor.rb', line 51 def extractable?(node) node.element? && HtmlElements::CONTENT_ELEMENTS.include?(node.name) end |