Class: Jekyll::L10n::Extractor
- Inherits:
-
Object
- Object
- Jekyll::L10n::Extractor
- Defined in:
- lib/jekyll-l10n/extraction/extractor.rb
Overview
String Extraction Orchestrator - Finds translatable strings in generated HTML
The Extractor is the main entry point for the string extraction workflow. It scans all generated HTML files after Jekyll’s build, identifies translatable content (text nodes and configurable HTML attributes), and creates or updates GNU Gettext PO files with the extracted strings.
The extraction workflow:
-
Scans all HTML files in Jekyll output directory (_site/)
-
For each HTML file, extracts translatable text and attributes
-
Normalizes text for consistent matching across builds
-
Creates or updates page-specific PO files in _locales/ directory
-
Optionally applies automatic translations via LibreTranslate API
Key responsibilities:
-
Load and validate extraction configuration from pages
-
Extract text and attributes from HTML with file location references
-
Create and update PO files with extracted strings
-
Log extraction statistics and progress
-
Coordinate with LibreTranslate for automatic translation
Instance Attribute Summary collapse
-
#site ⇒ Object
readonly
Returns the value of attribute site.
Instance Method Summary collapse
- #default_stats ⇒ Object
-
#extract_site ⇒ Hash<Symbol, Integer>
Extract all translatable strings from the generated site.
- #extract_strings_from_file(file_path, config) ⇒ Object
- #find_libretranslate_config ⇒ Object
-
#initialize(site) ⇒ Extractor
constructor
Initialize the string extractor.
- #process_all_html_files ⇒ Object
- #process_file(file_path) ⇒ Object
- #translate_all_compendia ⇒ Object
Constructor Details
#initialize(site) ⇒ Extractor
Initialize the string extractor
Sets up configuration and result saving infrastructure for extraction.
53 54 55 56 57 58 59 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 53 def initialize(site) @site = site @source = SiteConfigAccessor.source(@site) @dest = SiteConfigAccessor.dest(@site) @config_loader = ExtractionConfigLoader.new(@site, @dest) @result_saver = ExtractionResultSaver.new(@site) end |
Instance Attribute Details
#site ⇒ Object (readonly)
Returns the value of attribute site.
46 47 48 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 46 def site @site end |
Instance Method Details
#default_stats ⇒ Object
123 124 125 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 123 def default_stats { files_processed: 0, strings_extracted: 0, po_files_created: 0 } end |
#extract_site ⇒ Hash<Symbol, Integer>
Extract all translatable strings from the generated site
Main entry point for extraction. Scans all HTML files in the build output, extracts translatable strings and attributes, creates/updates PO files, and optionally translates strings via LibreTranslate API.
74 75 76 77 78 79 80 81 82 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 74 def extract_site Jekyll.logger.info 'Localization', 'Extracting translatable strings...' start_time = Time.now stats = process_all_html_files @result_saver.finalize_compendia translate_all_compendia ExtractionLogger.log_summary(stats, Time.now - start_time) stats end |
#extract_strings_from_file(file_path, config) ⇒ Object
127 128 129 130 131 132 133 134 135 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 127 def extract_strings_from_file(file_path, config) return [] unless File.exist?(file_path) html = FileOperations.read_utf8(file_path) exclude_selectors = @config_loader.extract_exclude_selectors(config) extractor = HtmlStringExtractor.new(config.translatable_attributes, exclude_selectors) extractor.extract(html, @dest, file_path) end |
#find_libretranslate_config ⇒ Object
137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 137 def find_libretranslate_config return nil unless @site.respond_to?(:pages) @site.pages.each do |page| next unless page.data['with_locales'] == true config = @config_loader.load_page_config(page.destination('')) return config if config.libretranslate_enabled? end nil end |
#process_all_html_files ⇒ Object
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 84 def process_all_html_files stats = { files_processed: 0, strings_extracted: 0, po_files_created: 0 } html_files = Dir.glob(File.join(@dest, '**', '*.html')) html_files.each do |file_path| next if @config_loader.skip_localized_page?(file_path) file_stats = process_file(file_path) stats[:files_processed] += file_stats[:files_processed] stats[:strings_extracted] += file_stats[:strings_extracted] stats[:po_files_created] += file_stats[:po_files_created] end stats end |
#process_file(file_path) ⇒ Object
109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 109 def process_file(file_path) return default_stats unless @config_loader.valid_for_extraction?(file_path) config = @config_loader.load_page_config(file_path) entries = extract_strings_from_file(file_path, config) return default_stats if entries.empty? page_path = construct_page_path(file_path) @result_saver.save_results(config, entries, page_path) rescue StandardError => e ExtractionLogger.log_error(file_path, e) default_stats end |
#translate_all_compendia ⇒ Object
100 101 102 103 104 105 106 107 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 100 def translate_all_compendia config = find_libretranslate_config return unless config ErrorHandler.handle_with_logging('machine translation') do @result_saver.translate_compendia(config) end end |