Class: Jekyll::L10n::HtmlStringExtractor
- Inherits:
-
Object
- Object
- Jekyll::L10n::HtmlStringExtractor
- Defined in:
- lib/jekyll-l10n/extraction/html_string_extractor.rb
Overview
Extracts translatable strings from HTML documents for localization.
HtmlStringExtractor walks the DOM tree of parsed HTML and extracts text content from content elements and values from configurable HTML attributes. It deduplicates entries by msgid and generates file location references for each extraction to aid in debugging and tracking. Entries are excluded based on CSS selectors.
Key responsibilities:
-
Parse HTML into DOM tree
-
Walk DOM recursively to find translatable content
-
Extract text from content elements (p, h1-h6, li, etc.)
-
Extract attribute values (title, alt, aria-label, etc.)
-
Generate file location references for each extracted string
-
Skip elements matching exclude selectors
-
Deduplicate entries by msgid
Instance Attribute Summary collapse
-
#exclude_selectors ⇒ Object
readonly
Returns the value of attribute exclude_selectors.
-
#translatable_attrs ⇒ Object
readonly
Returns the value of attribute translatable_attrs.
Instance Method Summary collapse
-
#extract(html, dest, file_path) ⇒ Array<Hash>
Extract translatable strings from HTML.
-
#initialize(translatable_attrs, exclude_selectors) ⇒ HtmlStringExtractor
constructor
Initialize a new HtmlStringExtractor.
Constructor Details
#initialize(translatable_attrs, exclude_selectors) ⇒ HtmlStringExtractor
Initialize a new HtmlStringExtractor.
38 39 40 41 |
# File 'lib/jekyll-l10n/extraction/html_string_extractor.rb', line 38 def initialize(translatable_attrs, exclude_selectors) @translatable_attrs = translatable_attrs @exclude_selectors = exclude_selectors end |
Instance Attribute Details
#exclude_selectors ⇒ Object (readonly)
Returns the value of attribute exclude_selectors.
30 31 32 |
# File 'lib/jekyll-l10n/extraction/html_string_extractor.rb', line 30 def exclude_selectors @exclude_selectors end |
#translatable_attrs ⇒ Object (readonly)
Returns the value of attribute translatable_attrs.
30 31 32 |
# File 'lib/jekyll-l10n/extraction/html_string_extractor.rb', line 30 def translatable_attrs @translatable_attrs end |
Instance Method Details
#extract(html, dest, file_path) ⇒ Array<Hash>
Extract translatable strings from HTML.
Walks the DOM tree and extracts text nodes from content elements and values from specified attributes. Each extraction is assigned a file location reference for debugging. Entries are deduplicated by msgid (multiple occurrences of same text yield a single entry).
57 58 59 60 61 62 63 64 |
# File 'lib/jekyll-l10n/extraction/html_string_extractor.rb', line 57 def extract(html, dest, file_path) entries = [] doc = Nokogiri::HTML(html) walk_dom(doc.root, file_path, entries, dest) entries.uniq { |e| e[:msgid] } end |