Class: Jekyll::L10n::PoFileReader
- Inherits:
-
Object
- Object
- Jekyll::L10n::PoFileReader
- Defined in:
- lib/jekyll-l10n/po_file/reader.rb
Overview
Parses GNU Gettext PO files into translation hashes.
PoFileReader reads and parses PO files in standard GNU Gettext format, supporting multiple parsing modes: simple (msgid -> msgstr), with references (for debugging), and with merge metadata (including fuzzy flags). It handles multi-line strings, comment extraction, and various escape sequences. Both instance-based and class-based APIs are supported for backward compatibility.
Key responsibilities:
-
Parse PO files into translation hashes
-
Handle msgid/msgstr pairs with continuation lines
-
Extract and preserve reference comments (file location references)
-
Extract and preserve fuzzy flags during merging
-
Parse multi-line strings with proper escaping
-
Support three modes: simple, with_references, and for_merge
-
Handle both file paths and inline content strings
rubocop:disable Metrics/ClassLength
Constant Summary collapse
- MSGID_PATTERN =
rubocop:enable Metrics/ClassLength
/^msgid ['"](.*)['"] *$/.freeze
- MSGSTR_PATTERN =
/^msgstr ['"](.*)['"] *$/.freeze
- NO_REFERENCE =
nil
Class Method Summary collapse
- .build_translation_entry(msgstr, reference, fuzzy, with_metadata) ⇒ Object
- .collect_continuation_lines(lines, start_idx, values, delimiter) ⇒ Object
- .continuation_line?(line) ⇒ Boolean
-
.extract_metadata_before_msgid(lines, msgid_idx, include_fuzzy: false) ⇒ Object
Unified metadata extraction: extracts reference and optionally fuzzy flag.
- .extract_msgid_and_continuation(lines, start_idx) ⇒ Object
- .extract_msgstr_and_continuation(lines, start_idx) ⇒ Object
- .extract_po_field(lines, start_idx, pattern) ⇒ Object
-
.extract_reference_and_fuzzy_before_msgid(lines, msgid_idx) ⇒ Object
Backward compatibility wrapper.
-
.extract_reference_before_msgid(lines, msgid_idx) ⇒ Object
Backward compatibility wrapper.
- .extract_reference_from_line(comment_line) ⇒ Object
- .fuzzy_line?(comment_line) ⇒ Boolean
- .msgid_line?(line) ⇒ Boolean
-
.parse(po_path) ⇒ Hash
Parse a PO file (class method, for backward compatibility).
-
.parse_for_merge(po_path) ⇒ Hash
Parse a PO file for merging (class method, for backward compatibility).
-
.parse_with_references(po_path) ⇒ Hash
Parse a PO file with references (class method, for backward compatibility).
-
.process_line(lines, idx, translations) ⇒ Object
Backward compatibility wrapper.
- .process_line_for_merge(lines, idx, translations) ⇒ Object
- .process_line_internal(lines, idx, translations, with_references) ⇒ Object
-
.process_line_with_reference(lines, idx, translations) ⇒ Object
Backward compatibility wrappers.
-
.process_msgid_msgstr_pair(lines, start_idx, translations, reference: nil, fuzzy: nil, with_mode: false) ⇒ Object
Unified method for processing msgid/msgstr pairs with optional reference and fuzzy metadata with_mode: false (default, simple format), true (with reference), :merge (with both) rubocop:disable Metrics/ParameterLists, Metrics/AbcSize, Metrics/PerceivedComplexity.
-
.process_msgid_msgstr_pair_internal(lines, start_idx, translations, reference = nil, fuzzy = nil) ⇒ Object
Backward compatibility alias.
-
.process_msgid_msgstr_pair_with_metadata(lines, start_idx, translations, reference: nil, fuzzy: nil) ⇒ Object
Backward compatibility alias.
-
.process_msgid_msgstr_pair_with_reference(lines, start_idx, translations, reference) ⇒ Object
Backward compatibility alias.
- .process_msgid_with_references(lines, idx, translations, with_references) ⇒ Object
-
.process_po_lines(content) ⇒ Object
Backward compatibility wrapper.
- .process_po_lines_for_merge(content) ⇒ Object
- .process_po_lines_internal(content, with_references) ⇒ Object
- .process_po_lines_with_references(content) ⇒ Object
- .read_po_file(po_path) ⇒ Object
-
.skip_blank_and_comments(lines, idx) ⇒ Object
Skips blank lines and comments before processing entries.
-
.split_lines(content) ⇒ Object
Splits PO file content into individual lines.
-
.stop_collecting?(line) ⇒ Boolean
Determines if we should stop collecting continuation lines.
-
.store_translation(translations, msgid, msgstr, reference: nil, fuzzy: nil, with_metadata: false) ⇒ Object
Store translation with optional metadata (reference location and fuzzy flag).
-
.store_translation_internal(translations, msgid, msgstr, reference: nil, fuzzy: nil) ⇒ Object
Kept for backward compatibility with existing tests Supports both positional and keyword argument calling styles.
- .store_translation_with_fuzzy(translations, msgid, msgstr, fuzzy:) ⇒ Object
-
.store_translation_with_reference(translations, msgid, msgstr, reference:) ⇒ Object
Backward compatibility wrappers for old method signatures.
- .store_translation_with_reference_and_fuzzy(translations, msgid, msgstr, reference:, fuzzy:) ⇒ Object
-
.unescape_string(str, delimiter) ⇒ Object
Unescape PO file string values containing escape sequences.
Instance Method Summary collapse
-
#initialize(po_path_or_content = nil) ⇒ PoFileReader
constructor
Initialize a new PoFileReader.
-
#parse ⇒ Hash
Parse PO file into simple translation hash.
-
#parse_for_merge ⇒ Hash
Parse PO file with all metadata for merging.
-
#parse_with_references ⇒ Hash
Parse PO file with reference comments preserved.
Constructor Details
#initialize(po_path_or_content = nil) ⇒ PoFileReader
Initialize a new PoFileReader.
Accepts either a file path (if file exists) or inline PO content. Determines which based on whether the path exists in the filesystem. Defaults to nil, which initializes the reader with empty content.
47 48 49 50 51 52 53 54 55 56 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 47 def initialize(po_path_or_content = nil) # Support both file path and content string if po_path_or_content && File.exist?(po_path_or_content.to_s) @po_path = po_path_or_content @content = nil else @content = po_path_or_content @po_path = nil end end |
Class Method Details
.build_translation_entry(msgstr, reference, fuzzy, with_metadata) ⇒ Object
414 415 416 417 418 419 420 421 422 423 424 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 414 def self.build_translation_entry(msgstr, reference, fuzzy, ) # Simple format when no metadata requested and none provided return msgstr if ! && reference.nil? && fuzzy.nil? # Build metadata hash based on what's provided entry = { msgstr: msgstr } entry[:reference] = reference unless reference.nil? entry[:fuzzy] = fuzzy unless fuzzy.nil? entry[:comment] = nil if !fuzzy.nil? || !reference.nil? entry end |
.collect_continuation_lines(lines, start_idx, values, delimiter) ⇒ Object
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 333 def self.collect_continuation_lines(lines, start_idx, values, delimiter) idx = start_idx while idx < lines.length cont_line = lines[idx].strip break if stop_collecting?(cont_line) break unless continuation_line?(cont_line) unescaped = unescape_string(cont_line[1...-1], delimiter) values << unescaped idx += 1 end combined_value = values.join { value: combined_value, next_line: idx } end |
.continuation_line?(line) ⇒ Boolean
354 355 356 357 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 354 def self.continuation_line?(line) (line.start_with?('"') && line.end_with?('"')) || (line.start_with?("'") && line.end_with?("'")) end |
.extract_metadata_before_msgid(lines, msgid_idx, include_fuzzy: false) ⇒ Object
Unified metadata extraction: extracts reference and optionally fuzzy flag
274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 274 def self.(lines, msgid_idx, include_fuzzy: false) reference = nil fuzzy = false comments_end = msgid_idx - 1 while comments_end >= 0 comment_line = lines[comments_end].strip break unless comment_line.start_with?('#') || comment_line.empty? reference = extract_reference_from_line(comment_line) || reference fuzzy = true if include_fuzzy && fuzzy_line?(comment_line) comments_end -= 1 end include_fuzzy ? [reference, fuzzy] : reference end |
.extract_msgid_and_continuation(lines, start_idx) ⇒ Object
310 311 312 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 310 def self.extract_msgid_and_continuation(lines, start_idx) extract_po_field(lines, start_idx, MSGID_PATTERN) end |
.extract_msgstr_and_continuation(lines, start_idx) ⇒ Object
314 315 316 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 314 def self.extract_msgstr_and_continuation(lines, start_idx) extract_po_field(lines, start_idx, MSGSTR_PATTERN) end |
.extract_po_field(lines, start_idx, pattern) ⇒ Object
318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 318 def self.extract_po_field(lines, start_idx, pattern) line = lines[start_idx].strip match = line.match(pattern) if match delimiter = line[match.begin(1) - 1] values = [unescape_string(match[1], delimiter)] else values = [] delimiter = '"' end collect_continuation_lines(lines, start_idx + 1, values, delimiter) end |
.extract_reference_and_fuzzy_before_msgid(lines, msgid_idx) ⇒ Object
Backward compatibility wrapper
306 307 308 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 306 def self.extract_reference_and_fuzzy_before_msgid(lines, msgid_idx) (lines, msgid_idx, include_fuzzy: true) end |
.extract_reference_before_msgid(lines, msgid_idx) ⇒ Object
Backward compatibility wrapper
301 302 303 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 301 def self.extract_reference_before_msgid(lines, msgid_idx) (lines, msgid_idx, include_fuzzy: false) end |
.extract_reference_from_line(comment_line) ⇒ Object
292 293 294 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 292 def self.extract_reference_from_line(comment_line) comment_line.sub(/^#:\s*/, '').strip if comment_line.start_with?('#:') end |
.fuzzy_line?(comment_line) ⇒ Boolean
296 297 298 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 296 def self.fuzzy_line?(comment_line) comment_line.start_with?('#,') && comment_line.include?('fuzzy') end |
.msgid_line?(line) ⇒ Boolean
126 127 128 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 126 def self.msgid_line?(line) line.start_with?('msgid ') end |
.parse(po_path) ⇒ Hash
Parse a PO file (class method, for backward compatibility).
96 97 98 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 96 def self.parse(po_path) new(po_path).parse end |
.parse_for_merge(po_path) ⇒ Hash
Parse a PO file for merging (class method, for backward compatibility).
112 113 114 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 112 def self.parse_for_merge(po_path) new(po_path).parse_for_merge end |
.parse_with_references(po_path) ⇒ Hash
Parse a PO file with references (class method, for backward compatibility).
104 105 106 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 104 def self.parse_with_references(po_path) new(po_path).parse_with_references end |
.process_line(lines, idx, translations) ⇒ Object
Backward compatibility wrapper
122 123 124 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 122 def self.process_line(lines, idx, translations) process_line_internal(lines, idx, translations, false) end |
.process_line_for_merge(lines, idx, translations) ⇒ Object
234 235 236 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 234 def self.process_line_for_merge(lines, idx, translations) process_line_internal(lines, idx, translations, :merge) end |
.process_line_internal(lines, idx, translations, with_references) ⇒ Object
238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 238 def self.process_line_internal(lines, idx, translations, with_references) return idx + 1 if idx >= lines.length idx = skip_blank_and_comments(lines, idx) return idx if idx >= lines.length line = lines[idx].strip if msgid_line?(line) process_msgid_with_references(lines, idx, translations, with_references) else idx + 1 end end |
.process_line_with_reference(lines, idx, translations) ⇒ Object
Backward compatibility wrappers
230 231 232 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 230 def self.process_line_with_reference(lines, idx, translations) process_line_internal(lines, idx, translations, true) end |
.process_msgid_msgstr_pair(lines, start_idx, translations, reference: nil, fuzzy: nil, with_mode: false) ⇒ Object
Unified method for processing msgid/msgstr pairs with optional reference and fuzzy metadata with_mode: false (default, simple format), true (with reference), :merge (with both) rubocop:disable Metrics/ParameterLists, Metrics/AbcSize, Metrics/PerceivedComplexity
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 151 def self.process_msgid_msgstr_pair(lines, start_idx, translations, reference: nil, fuzzy: nil, with_mode: false) # rubocop:enable Metrics/ParameterLists, Metrics/AbcSize, Metrics/PerceivedComplexity # Handle nil sentinel values (from NO_REFERENCE constant) reference = nil if reference == NO_REFERENCE fuzzy = nil if fuzzy == NO_REFERENCE msgid = extract_msgid_and_continuation(lines, start_idx) msgid_value = msgid[:value] i = msgid[:next_line] if i < lines.length && lines[i].strip.start_with?('msgstr ') msgstr = extract_msgstr_and_continuation(lines, i) msgstr_value = msgstr[:value] i = msgstr[:next_line] # Always use metadata format if in reference or merge mode = with_mode == true || with_mode == :merge || !reference.nil? || !fuzzy.nil? store_translation( translations, msgid_value, msgstr_value, reference: reference, fuzzy: fuzzy, with_metadata: ) else i += 1 end i end |
.process_msgid_msgstr_pair_internal(lines, start_idx, translations, reference = nil, fuzzy = nil) ⇒ Object
Backward compatibility alias
196 197 198 199 200 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 196 def self.process_msgid_msgstr_pair_internal(lines, start_idx, translations, reference = nil, fuzzy = nil) process_msgid_msgstr_pair(lines, start_idx, translations, reference: reference, fuzzy: fuzzy) end |
.process_msgid_msgstr_pair_with_metadata(lines, start_idx, translations, reference: nil, fuzzy: nil) ⇒ Object
Backward compatibility alias
181 182 183 184 185 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 181 def self.(lines, start_idx, translations, reference: nil, fuzzy: nil) process_msgid_msgstr_pair(lines, start_idx, translations, reference: reference, fuzzy: fuzzy, with_mode: :merge) end |
.process_msgid_msgstr_pair_with_reference(lines, start_idx, translations, reference) ⇒ Object
Backward compatibility alias
188 189 190 191 192 193 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 188 def self.process_msgid_msgstr_pair_with_reference(lines, start_idx, translations, reference) process_msgid_msgstr_pair( lines, start_idx, translations, reference: reference, fuzzy: nil, with_mode: true ) end |
.process_msgid_with_references(lines, idx, translations, with_references) ⇒ Object
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 253 def self.process_msgid_with_references(lines, idx, translations, with_references) case with_references when true reference = extract_reference_before_msgid(lines, idx) process_msgid_msgstr_pair( lines, idx, translations, reference: reference, fuzzy: nil, with_mode: true ) when :merge reference, fuzzy = extract_reference_and_fuzzy_before_msgid(lines, idx) process_msgid_msgstr_pair( lines, idx, translations, reference: reference, fuzzy: fuzzy, with_mode: :merge ) else process_msgid_msgstr_pair(lines, idx, translations, reference: nil, fuzzy: nil, with_mode: false) end end |
.process_po_lines(content) ⇒ Object
Backward compatibility wrapper
117 118 119 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 117 def self.process_po_lines(content) process_po_lines_internal(content, false) end |
.process_po_lines_for_merge(content) ⇒ Object
206 207 208 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 206 def self.process_po_lines_for_merge(content) process_po_lines_internal(content, :merge) end |
.process_po_lines_internal(content, with_references) ⇒ Object
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 210 def self.process_po_lines_internal(content, with_references) translations = {} lines = split_lines(content) i = 0 while i < lines.length i = case with_references when true process_line_internal(lines, i, translations, true) when :merge process_line_internal(lines, i, translations, :merge) else process_line_internal(lines, i, translations, false) end end translations end |
.process_po_lines_with_references(content) ⇒ Object
202 203 204 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 202 def self.process_po_lines_with_references(content) process_po_lines_internal(content, true) end |
.read_po_file(po_path) ⇒ Object
385 386 387 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 385 def self.read_po_file(po_path) FileOperations.read_utf8(po_path) end |
.skip_blank_and_comments(lines, idx) ⇒ Object
Skips blank lines and comments before processing entries
136 137 138 139 140 141 142 143 144 145 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 136 def self.skip_blank_and_comments(lines, idx) return idx if idx >= lines.length line = lines[idx].strip if line.empty? || line.start_with?('#') idx + 1 else idx end end |
.split_lines(content) ⇒ Object
Splits PO file content into individual lines
131 132 133 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 131 def self.split_lines(content) content.split("\n") end |
.stop_collecting?(line) ⇒ Boolean
Determines if we should stop collecting continuation lines
453 454 455 456 457 458 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 453 def self.stop_collecting?(line) line.empty? || line.start_with?('#') || line.start_with?('msgid') || line.start_with?('msgstr') end |
.store_translation(translations, msgid, msgstr, reference: nil, fuzzy: nil, with_metadata: false) ⇒ Object
Store translation with optional metadata (reference location and fuzzy flag).
Three Storage Formats ===
-
Simple: { msgid => msgstr } Used during translation lookup - minimal memory, fastest access
-
With Reference: { msgid => { msgstr: “…”, reference: “file.html:10” } } Used during extraction - preserves source location for debugging
-
With Merge Metadata: { msgid => { msgstr: “…”, reference: “…”, fuzzy: false } } Used during merging - tracks fuzzy flag for incomplete translations
Why three formats instead of one?
-
Memory efficiency: Simple format used most often (translation lookup)
-
Flexibility: Can handle different parsing modes without wasting storage
-
Backward compatibility: Supports legacy calling conventions
rubocop:disable Metrics/ParameterLists
406 407 408 409 410 411 412 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 406 def self.store_translation(translations, msgid, msgstr, reference: nil, fuzzy: nil, with_metadata: false) # rubocop:enable Metrics/ParameterLists return if msgid.nil? || msgstr.nil? || msgid.empty? translations[msgid] = build_translation_entry(msgstr, reference, fuzzy, ) end |
.store_translation_internal(translations, msgid, msgstr, reference: nil, fuzzy: nil) ⇒ Object
Kept for backward compatibility with existing tests Supports both positional and keyword argument calling styles
428 429 430 431 432 433 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 428 def self.store_translation_internal(translations, msgid, msgstr, reference: nil, fuzzy: nil) # Handle nil sentinel values (from NO_REFERENCE constant) reference = nil if reference == NO_REFERENCE fuzzy = nil if fuzzy == NO_REFERENCE store_translation(translations, msgid, msgstr, reference: reference, fuzzy: fuzzy) end |
.store_translation_with_fuzzy(translations, msgid, msgstr, fuzzy:) ⇒ Object
441 442 443 444 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 441 def self.store_translation_with_fuzzy(translations, msgid, msgstr, fuzzy:) store_translation(translations, msgid, msgstr, reference: nil, fuzzy: fuzzy, with_metadata: true) end |
.store_translation_with_reference(translations, msgid, msgstr, reference:) ⇒ Object
Backward compatibility wrappers for old method signatures
436 437 438 439 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 436 def self.store_translation_with_reference(translations, msgid, msgstr, reference:) store_translation(translations, msgid, msgstr, reference: reference, fuzzy: nil, with_metadata: true) end |
.store_translation_with_reference_and_fuzzy(translations, msgid, msgstr, reference:, fuzzy:) ⇒ Object
446 447 448 449 450 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 446 def self.store_translation_with_reference_and_fuzzy(translations, msgid, msgstr, reference:, fuzzy:) store_translation(translations, msgid, msgstr, reference: reference, fuzzy: fuzzy, with_metadata: true) end |
.unescape_string(str, delimiter) ⇒ Object
Unescape PO file string values containing escape sequences.
IMPORTANT: Order Matters for Correctness ===
Must unescape escaped quotes BEFORE unescaping backslashes. Why: If we unescape backslashes first, we lose information about which backslashes were part of escape sequences vs. literal text.
Example with wrong order (backslash first):
Input: "Say \\" Hello" (should be: Say \ Hello with closing quote)
Wrong: "\\\\" -> "\\" -> Remove quotes -> "Say \" Hello" (quote not closed!)
Correct order (quote first):
Input: "Say \\" Hello"
Right: "\\" -> " " (two backslashes become one literal backslash) -> "Say \ Hello"
Single vs Double quotes matter:
-
Single quotes: Use \‘ and \\
-
Double quotes: Use \“ and \\
377 378 379 380 381 382 383 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 377 def self.unescape_string(str, delimiter) if delimiter == "'" str.gsub("\\'", "'").gsub('\\\\', '\\') else str.gsub('\\"', '"').gsub('\\\\', '\\') end end |
Instance Method Details
#parse ⇒ Hash
Parse PO file into simple translation hash.
Returns hash mapping msgid strings to msgstr strings (no metadata).
63 64 65 66 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 63 def parse content = load_content process_po_lines_instance(content, false) end |
#parse_for_merge ⇒ Hash
Parse PO file with all metadata for merging.
Returns hash mapping msgid strings to metadata hashes containing msgstr, reference, and fuzzy flag. Used when merging with existing translations.
87 88 89 90 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 87 def parse_for_merge content = load_content process_po_lines_instance(content, :merge) end |
#parse_with_references ⇒ Hash
Parse PO file with reference comments preserved.
Returns hash mapping msgid strings to metadata hashes containing msgstr and reference (file location for debugging).
75 76 77 78 |
# File 'lib/jekyll-l10n/po_file/reader.rb', line 75 def parse_with_references content = load_content process_po_lines_instance(content, true) end |