Class: Jekyll::L10n::Extractor
- Inherits:
-
Object
- Object
- Jekyll::L10n::Extractor
- Defined in:
- lib/jekyll-l10n/extraction/extractor.rb
Overview
String Extraction Orchestrator - Finds translatable strings in generated HTML
The Extractor is the main entry point for the string extraction workflow. It scans all generated HTML files after Jekyll’s build, identifies translatable content (text nodes and configurable HTML attributes), and creates or updates GNU Gettext PO files with the extracted strings.
The extraction workflow:
-
Scans all HTML files in Jekyll output directory (_site/)
-
For each HTML file, extracts translatable text and attributes
-
Normalizes text for consistent matching across builds
-
Creates or updates page-specific PO files in _locales/ directory
-
Optionally applies automatic translations via LibreTranslate API
Key responsibilities:
-
Load and validate extraction configuration from pages
-
Extract text and attributes from HTML with file location references
-
Create and update PO files with extracted strings
-
Log extraction statistics and progress
-
Coordinate with LibreTranslate for automatic translation
Instance Attribute Summary collapse
-
#site ⇒ Object
readonly
Returns the value of attribute site.
Instance Method Summary collapse
- #default_stats ⇒ Object
-
#extract_site ⇒ Hash<Symbol, Integer>
Extract all translatable strings from the generated site.
- #extract_strings_from_file(file_path, config) ⇒ Object
- #find_libretranslate_config ⇒ Object
-
#initialize(site) ⇒ Extractor
constructor
Initialize the string extractor.
- #process_all_html_files ⇒ Object
- #process_file(file_path) ⇒ Object
- #translate_all_compendia ⇒ Object
Constructor Details
#initialize(site) ⇒ Extractor
Initialize the string extractor
Sets up configuration and result saving infrastructure for extraction.
53 54 55 56 57 58 59 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 53 def initialize(site) @site = site @source = SiteConfigAccessor.source(@site) @dest = SiteConfigAccessor.dest(@site) @config_loader = ExtractionConfigLoader.new(@site, @dest) @result_saver = ExtractionResultSaver.new(@site) end |
Instance Attribute Details
#site ⇒ Object (readonly)
Returns the value of attribute site.
46 47 48 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 46 def site @site end |
Instance Method Details
#default_stats ⇒ Object
122 123 124 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 122 def default_stats { files_processed: 0, strings_extracted: 0, po_files_created: 0 } end |
#extract_site ⇒ Hash<Symbol, Integer>
Extract all translatable strings from the generated site
Main entry point for extraction. Scans all HTML files in the build output, extracts translatable strings and attributes, creates/updates PO files, and optionally translates strings via LibreTranslate API.
74 75 76 77 78 79 80 81 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 74 def extract_site Jekyll.logger.info 'Localization', 'Extracting translatable strings...' start_time = Time.now stats = process_all_html_files translate_all_compendia ExtractionLogger.log_summary(stats, Time.now - start_time) stats end |
#extract_strings_from_file(file_path, config) ⇒ Object
126 127 128 129 130 131 132 133 134 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 126 def extract_strings_from_file(file_path, config) return [] unless File.exist?(file_path) html = FileOperations.read_utf8(file_path) exclude_selectors = @config_loader.extract_exclude_selectors(config) extractor = HtmlStringExtractor.new(config.translatable_attributes, exclude_selectors) extractor.extract(html, @dest, file_path) end |
#find_libretranslate_config ⇒ Object
136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 136 def find_libretranslate_config return nil unless @site.respond_to?(:pages) @site.pages.each do |page| next unless page.data['with_locales'] == true config = @config_loader.load_page_config(page.destination('')) return config if config.libretranslate_enabled? end nil end |
#process_all_html_files ⇒ Object
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 83 def process_all_html_files stats = { files_processed: 0, strings_extracted: 0, po_files_created: 0 } html_files = Dir.glob(File.join(@dest, '**', '*.html')) html_files.each do |file_path| next if @config_loader.skip_localized_page?(file_path) file_stats = process_file(file_path) stats[:files_processed] += file_stats[:files_processed] stats[:strings_extracted] += file_stats[:strings_extracted] stats[:po_files_created] += file_stats[:po_files_created] end stats end |
#process_file(file_path) ⇒ Object
108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 108 def process_file(file_path) return default_stats unless @config_loader.valid_for_extraction?(file_path) config = @config_loader.load_page_config(file_path) entries = extract_strings_from_file(file_path, config) return default_stats if entries.empty? page_path = construct_page_path(file_path) @result_saver.save_results(config, entries, page_path) rescue StandardError => e ExtractionLogger.log_error(file_path, e) default_stats end |
#translate_all_compendia ⇒ Object
99 100 101 102 103 104 105 106 |
# File 'lib/jekyll-l10n/extraction/extractor.rb', line 99 def translate_all_compendia config = find_libretranslate_config return unless config ErrorHandler.handle_with_logging('machine translation') do @result_saver.translate_compendia(config) end end |