Class: Jekyll::L10n::PoFileReader

Inherits:
Object
  • Object
show all
Defined in:
lib/jekyll-l10n/po_file/reader.rb

Overview

Parses GNU Gettext PO files into translation hashes.

PoFileReader reads and parses PO files in standard GNU Gettext format, supporting multiple parsing modes: simple (msgid -> msgstr), with references (for debugging), and with merge metadata (including fuzzy flags). It handles multi-line strings, comment extraction, and various escape sequences. Both instance-based and class-based APIs are supported for backward compatibility.

Key responsibilities:

  • Parse PO files into translation hashes

  • Handle msgid/msgstr pairs with continuation lines

  • Extract and preserve reference comments (file location references)

  • Extract and preserve fuzzy flags during merging

  • Parse multi-line strings with proper escaping

  • Support three modes: simple, with_references, and for_merge

  • Handle both file paths and inline content strings

rubocop:disable Metrics/ClassLength

Examples:

reader = PoFileReader.new('_locales/es.po')
simple = reader.parse  # { "msgid" => "msgstr" }
with_refs = reader.parse_with_references  # { "msgid" => { msgstr: "...", reference:
  "..." } }
for_merge = reader.parse_for_merge  # { "msgid" => { msgstr: "...", reference:
  "...", fuzzy: false } }

Constant Summary collapse

MSGID_PATTERN =

rubocop:enable Metrics/ClassLength

/^msgid ['"](.*)['"] *$/.freeze
MSGSTR_PATTERN =
/^msgstr ['"](.*)['"] *$/.freeze
NO_REFERENCE =
nil

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(po_path_or_content = nil) ⇒ PoFileReader

Initialize a new PoFileReader.

Accepts either a file path (if file exists) or inline PO content. Determines which based on whether the path exists in the filesystem. Defaults to nil, which initializes the reader with empty content.

Parameters:

  • po_path_or_content (String, nil) (defaults to: nil)

    File path to PO file or inline content string (defaults to nil)



47
48
49
50
51
52
53
54
55
56
# File 'lib/jekyll-l10n/po_file/reader.rb', line 47

def initialize(po_path_or_content = nil)
  # Support both file path and content string
  if po_path_or_content && File.exist?(po_path_or_content.to_s)
    @po_path = po_path_or_content
    @content = nil
  else
    @content = po_path_or_content
    @po_path = nil
  end
end

Class Method Details

.build_translation_entry(msgstr, reference, fuzzy, with_metadata) ⇒ Object



414
415
416
417
418
419
420
421
422
423
424
# File 'lib/jekyll-l10n/po_file/reader.rb', line 414

def self.build_translation_entry(msgstr, reference, fuzzy, )
  # Simple format when no metadata requested and none provided
  return msgstr if ! && reference.nil? && fuzzy.nil?

  # Build metadata hash based on what's provided
  entry = { msgstr: msgstr }
  entry[:reference] = reference unless reference.nil?
  entry[:fuzzy] = fuzzy unless fuzzy.nil?
  entry[:comment] = nil if !fuzzy.nil? || !reference.nil?
  entry
end

.collect_continuation_lines(lines, start_idx, values, delimiter) ⇒ Object



333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
# File 'lib/jekyll-l10n/po_file/reader.rb', line 333

def self.collect_continuation_lines(lines, start_idx, values, delimiter)
  idx = start_idx

  while idx < lines.length
    cont_line = lines[idx].strip

    break if stop_collecting?(cont_line)

    break unless continuation_line?(cont_line)

    unescaped = unescape_string(cont_line[1...-1], delimiter)
    values << unescaped
    idx += 1

  end

  combined_value = values.join

  { value: combined_value, next_line: idx }
end

.continuation_line?(line) ⇒ Boolean

Returns:

  • (Boolean)


354
355
356
357
# File 'lib/jekyll-l10n/po_file/reader.rb', line 354

def self.continuation_line?(line)
  (line.start_with?('"') && line.end_with?('"')) ||
    (line.start_with?("'") && line.end_with?("'"))
end

.extract_metadata_before_msgid(lines, msgid_idx, include_fuzzy: false) ⇒ Object

Unified metadata extraction: extracts reference and optionally fuzzy flag



274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
# File 'lib/jekyll-l10n/po_file/reader.rb', line 274

def self.(lines, msgid_idx, include_fuzzy: false)
  reference = nil
  fuzzy = false
  comments_end = msgid_idx - 1

  while comments_end >= 0
    comment_line = lines[comments_end].strip
    break unless comment_line.start_with?('#') || comment_line.empty?

    reference = extract_reference_from_line(comment_line) || reference
    fuzzy = true if include_fuzzy && fuzzy_line?(comment_line)

    comments_end -= 1
  end

  include_fuzzy ? [reference, fuzzy] : reference
end

.extract_msgid_and_continuation(lines, start_idx) ⇒ Object



310
311
312
# File 'lib/jekyll-l10n/po_file/reader.rb', line 310

def self.extract_msgid_and_continuation(lines, start_idx)
  extract_po_field(lines, start_idx, MSGID_PATTERN)
end

.extract_msgstr_and_continuation(lines, start_idx) ⇒ Object



314
315
316
# File 'lib/jekyll-l10n/po_file/reader.rb', line 314

def self.extract_msgstr_and_continuation(lines, start_idx)
  extract_po_field(lines, start_idx, MSGSTR_PATTERN)
end

.extract_po_field(lines, start_idx, pattern) ⇒ Object



318
319
320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/jekyll-l10n/po_file/reader.rb', line 318

def self.extract_po_field(lines, start_idx, pattern)
  line = lines[start_idx].strip

  match = line.match(pattern)
  if match
    delimiter = line[match.begin(1) - 1]
    values = [unescape_string(match[1], delimiter)]
  else
    values = []
    delimiter = '"'
  end

  collect_continuation_lines(lines, start_idx + 1, values, delimiter)
end

.extract_reference_and_fuzzy_before_msgid(lines, msgid_idx) ⇒ Object

Backward compatibility wrapper



306
307
308
# File 'lib/jekyll-l10n/po_file/reader.rb', line 306

def self.extract_reference_and_fuzzy_before_msgid(lines, msgid_idx)
  (lines, msgid_idx, include_fuzzy: true)
end

.extract_reference_before_msgid(lines, msgid_idx) ⇒ Object

Backward compatibility wrapper



301
302
303
# File 'lib/jekyll-l10n/po_file/reader.rb', line 301

def self.extract_reference_before_msgid(lines, msgid_idx)
  (lines, msgid_idx, include_fuzzy: false)
end

.extract_reference_from_line(comment_line) ⇒ Object



292
293
294
# File 'lib/jekyll-l10n/po_file/reader.rb', line 292

def self.extract_reference_from_line(comment_line)
  comment_line.sub(/^#:\s*/, '').strip if comment_line.start_with?('#:')
end

.fuzzy_line?(comment_line) ⇒ Boolean

Returns:

  • (Boolean)


296
297
298
# File 'lib/jekyll-l10n/po_file/reader.rb', line 296

def self.fuzzy_line?(comment_line)
  comment_line.start_with?('#,') && comment_line.include?('fuzzy')
end

.msgid_line?(line) ⇒ Boolean

Returns:

  • (Boolean)


126
127
128
# File 'lib/jekyll-l10n/po_file/reader.rb', line 126

def self.msgid_line?(line)
  line.start_with?('msgid ')
end

.parse(po_path) ⇒ Hash

Parse a PO file (class method, for backward compatibility).

Parameters:

  • po_path (String)

    Path to PO file

Returns:

  • (Hash)

    Simple translation hash



96
97
98
# File 'lib/jekyll-l10n/po_file/reader.rb', line 96

def self.parse(po_path)
  new(po_path).parse
end

.parse_for_merge(po_path) ⇒ Hash

Parse a PO file for merging (class method, for backward compatibility).

Parameters:

  • po_path (String)

    Path to PO file

Returns:

  • (Hash)

    Translation hash with merge metadata



112
113
114
# File 'lib/jekyll-l10n/po_file/reader.rb', line 112

def self.parse_for_merge(po_path)
  new(po_path).parse_for_merge
end

.parse_with_references(po_path) ⇒ Hash

Parse a PO file with references (class method, for backward compatibility).

Parameters:

  • po_path (String)

    Path to PO file

Returns:

  • (Hash)

    Translation hash with references



104
105
106
# File 'lib/jekyll-l10n/po_file/reader.rb', line 104

def self.parse_with_references(po_path)
  new(po_path).parse_with_references
end

.process_line(lines, idx, translations) ⇒ Object

Backward compatibility wrapper



122
123
124
# File 'lib/jekyll-l10n/po_file/reader.rb', line 122

def self.process_line(lines, idx, translations)
  process_line_internal(lines, idx, translations, false)
end

.process_line_for_merge(lines, idx, translations) ⇒ Object



234
235
236
# File 'lib/jekyll-l10n/po_file/reader.rb', line 234

def self.process_line_for_merge(lines, idx, translations)
  process_line_internal(lines, idx, translations, :merge)
end

.process_line_internal(lines, idx, translations, with_references) ⇒ Object



238
239
240
241
242
243
244
245
246
247
248
249
250
251
# File 'lib/jekyll-l10n/po_file/reader.rb', line 238

def self.process_line_internal(lines, idx, translations, with_references)
  return idx + 1 if idx >= lines.length

  idx = skip_blank_and_comments(lines, idx)
  return idx if idx >= lines.length

  line = lines[idx].strip

  if msgid_line?(line)
    process_msgid_with_references(lines, idx, translations, with_references)
  else
    idx + 1
  end
end

.process_line_with_reference(lines, idx, translations) ⇒ Object

Backward compatibility wrappers



230
231
232
# File 'lib/jekyll-l10n/po_file/reader.rb', line 230

def self.process_line_with_reference(lines, idx, translations)
  process_line_internal(lines, idx, translations, true)
end

.process_msgid_msgstr_pair(lines, start_idx, translations, reference: nil, fuzzy: nil, with_mode: false) ⇒ Object

Unified method for processing msgid/msgstr pairs with optional reference and fuzzy metadata with_mode: false (default, simple format), true (with reference), :merge (with both) rubocop:disable Metrics/ParameterLists, Metrics/AbcSize, Metrics/PerceivedComplexity



151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
# File 'lib/jekyll-l10n/po_file/reader.rb', line 151

def self.process_msgid_msgstr_pair(lines, start_idx, translations,
                                   reference: nil, fuzzy: nil, with_mode: false)
  # rubocop:enable Metrics/ParameterLists, Metrics/AbcSize, Metrics/PerceivedComplexity
  # Handle nil sentinel values (from NO_REFERENCE constant)
  reference = nil if reference == NO_REFERENCE
  fuzzy = nil if fuzzy == NO_REFERENCE

  msgid = extract_msgid_and_continuation(lines, start_idx)
  msgid_value = msgid[:value]
  i = msgid[:next_line]

  if i < lines.length && lines[i].strip.start_with?('msgstr ')
    msgstr = extract_msgstr_and_continuation(lines, i)
    msgstr_value = msgstr[:value]
    i = msgstr[:next_line]

    # Always use metadata format if in reference or merge mode
     = with_mode == true || with_mode == :merge || !reference.nil? || !fuzzy.nil?
    store_translation(
      translations, msgid_value, msgstr_value,
      reference: reference, fuzzy: fuzzy, with_metadata: 
    )
  else
    i += 1
  end

  i
end

.process_msgid_msgstr_pair_internal(lines, start_idx, translations, reference = nil, fuzzy = nil) ⇒ Object

Backward compatibility alias



196
197
198
199
200
# File 'lib/jekyll-l10n/po_file/reader.rb', line 196

def self.process_msgid_msgstr_pair_internal(lines, start_idx, translations,
                                            reference = nil, fuzzy = nil)
  process_msgid_msgstr_pair(lines, start_idx, translations, reference: reference,
                                                            fuzzy: fuzzy)
end

.process_msgid_msgstr_pair_with_metadata(lines, start_idx, translations, reference: nil, fuzzy: nil) ⇒ Object

Backward compatibility alias



181
182
183
184
185
# File 'lib/jekyll-l10n/po_file/reader.rb', line 181

def self.(lines, start_idx, translations,
                                                 reference: nil, fuzzy: nil)
  process_msgid_msgstr_pair(lines, start_idx, translations, reference: reference,
                                                            fuzzy: fuzzy, with_mode: :merge)
end

.process_msgid_msgstr_pair_with_reference(lines, start_idx, translations, reference) ⇒ Object

Backward compatibility alias



188
189
190
191
192
193
# File 'lib/jekyll-l10n/po_file/reader.rb', line 188

def self.process_msgid_msgstr_pair_with_reference(lines, start_idx, translations, reference)
  process_msgid_msgstr_pair(
    lines, start_idx, translations,
    reference: reference, fuzzy: nil, with_mode: true
  )
end

.process_msgid_with_references(lines, idx, translations, with_references) ⇒ Object



253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# File 'lib/jekyll-l10n/po_file/reader.rb', line 253

def self.process_msgid_with_references(lines, idx, translations, with_references)
  case with_references
  when true
    reference = extract_reference_before_msgid(lines, idx)
    process_msgid_msgstr_pair(
      lines, idx, translations,
      reference: reference, fuzzy: nil, with_mode: true
    )
  when :merge
    reference, fuzzy = extract_reference_and_fuzzy_before_msgid(lines, idx)
    process_msgid_msgstr_pair(
      lines, idx, translations,
      reference: reference, fuzzy: fuzzy, with_mode: :merge
    )
  else
    process_msgid_msgstr_pair(lines, idx, translations, reference: nil, fuzzy: nil,
                                                        with_mode: false)
  end
end

.process_po_lines(content) ⇒ Object

Backward compatibility wrapper



117
118
119
# File 'lib/jekyll-l10n/po_file/reader.rb', line 117

def self.process_po_lines(content)
  process_po_lines_internal(content, false)
end

.process_po_lines_for_merge(content) ⇒ Object



206
207
208
# File 'lib/jekyll-l10n/po_file/reader.rb', line 206

def self.process_po_lines_for_merge(content)
  process_po_lines_internal(content, :merge)
end

.process_po_lines_internal(content, with_references) ⇒ Object



210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# File 'lib/jekyll-l10n/po_file/reader.rb', line 210

def self.process_po_lines_internal(content, with_references)
  translations = {}
  lines = split_lines(content)
  i = 0

  while i < lines.length
    i = case with_references
        when true
          process_line_internal(lines, i, translations, true)
        when :merge
          process_line_internal(lines, i, translations, :merge)
        else
          process_line_internal(lines, i, translations, false)
        end
  end

  translations
end

.process_po_lines_with_references(content) ⇒ Object



202
203
204
# File 'lib/jekyll-l10n/po_file/reader.rb', line 202

def self.process_po_lines_with_references(content)
  process_po_lines_internal(content, true)
end

.read_po_file(po_path) ⇒ Object



385
386
387
# File 'lib/jekyll-l10n/po_file/reader.rb', line 385

def self.read_po_file(po_path)
  FileOperations.read_utf8(po_path)
end

.skip_blank_and_comments(lines, idx) ⇒ Object

Skips blank lines and comments before processing entries



136
137
138
139
140
141
142
143
144
145
# File 'lib/jekyll-l10n/po_file/reader.rb', line 136

def self.skip_blank_and_comments(lines, idx)
  return idx if idx >= lines.length

  line = lines[idx].strip
  if line.empty? || line.start_with?('#')
    idx + 1
  else
    idx
  end
end

.split_lines(content) ⇒ Object

Splits PO file content into individual lines



131
132
133
# File 'lib/jekyll-l10n/po_file/reader.rb', line 131

def self.split_lines(content)
  content.split("\n")
end

.stop_collecting?(line) ⇒ Boolean

Determines if we should stop collecting continuation lines

Returns:

  • (Boolean)


453
454
455
456
457
458
# File 'lib/jekyll-l10n/po_file/reader.rb', line 453

def self.stop_collecting?(line)
  line.empty? ||
    line.start_with?('#') ||
    line.start_with?('msgid') ||
    line.start_with?('msgstr')
end

.store_translation(translations, msgid, msgstr, reference: nil, fuzzy: nil, with_metadata: false) ⇒ Object

Store translation with optional metadata (reference location and fuzzy flag).

Three Storage Formats ===

  1. Simple: { msgid => msgstr } Used during translation lookup - minimal memory, fastest access

  2. With Reference: { msgid => { msgstr: “…”, reference: “file.html:10” } } Used during extraction - preserves source location for debugging

  3. With Merge Metadata: { msgid => { msgstr: “…”, reference: “…”, fuzzy: false } } Used during merging - tracks fuzzy flag for incomplete translations

Why three formats instead of one?

  • Memory efficiency: Simple format used most often (translation lookup)

  • Flexibility: Can handle different parsing modes without wasting storage

  • Backward compatibility: Supports legacy calling conventions

rubocop:disable Metrics/ParameterLists



406
407
408
409
410
411
412
# File 'lib/jekyll-l10n/po_file/reader.rb', line 406

def self.store_translation(translations, msgid, msgstr, reference: nil, fuzzy: nil,
                           with_metadata: false)
  # rubocop:enable Metrics/ParameterLists
  return if msgid.nil? || msgstr.nil? || msgid.empty?

  translations[msgid] = build_translation_entry(msgstr, reference, fuzzy, )
end

.store_translation_internal(translations, msgid, msgstr, reference: nil, fuzzy: nil) ⇒ Object

Kept for backward compatibility with existing tests Supports both positional and keyword argument calling styles



428
429
430
431
432
433
# File 'lib/jekyll-l10n/po_file/reader.rb', line 428

def self.store_translation_internal(translations, msgid, msgstr, reference: nil, fuzzy: nil)
  # Handle nil sentinel values (from NO_REFERENCE constant)
  reference = nil if reference == NO_REFERENCE
  fuzzy = nil if fuzzy == NO_REFERENCE
  store_translation(translations, msgid, msgstr, reference: reference, fuzzy: fuzzy)
end

.store_translation_with_fuzzy(translations, msgid, msgstr, fuzzy:) ⇒ Object



441
442
443
444
# File 'lib/jekyll-l10n/po_file/reader.rb', line 441

def self.store_translation_with_fuzzy(translations, msgid, msgstr, fuzzy:)
  store_translation(translations, msgid, msgstr, reference: nil, fuzzy: fuzzy,
                                                 with_metadata: true)
end

.store_translation_with_reference(translations, msgid, msgstr, reference:) ⇒ Object

Backward compatibility wrappers for old method signatures



436
437
438
439
# File 'lib/jekyll-l10n/po_file/reader.rb', line 436

def self.store_translation_with_reference(translations, msgid, msgstr, reference:)
  store_translation(translations, msgid, msgstr, reference: reference, fuzzy: nil,
                                                 with_metadata: true)
end

.store_translation_with_reference_and_fuzzy(translations, msgid, msgstr, reference:, fuzzy:) ⇒ Object



446
447
448
449
450
# File 'lib/jekyll-l10n/po_file/reader.rb', line 446

def self.store_translation_with_reference_and_fuzzy(translations, msgid, msgstr,
                                                    reference:, fuzzy:)
  store_translation(translations, msgid, msgstr, reference: reference, fuzzy: fuzzy,
                                                 with_metadata: true)
end

.unescape_string(str, delimiter) ⇒ Object

Unescape PO file string values containing escape sequences.

IMPORTANT: Order Matters for Correctness ===

Must unescape escaped quotes BEFORE unescaping backslashes. Why: If we unescape backslashes first, we lose information about which backslashes were part of escape sequences vs. literal text.

Example with wrong order (backslash first):

Input: "Say \\" Hello"  (should be: Say \ Hello with closing quote)
Wrong: "\\\\" -> "\\" -> Remove quotes -> "Say \" Hello" (quote not closed!)

Correct order (quote first):

Input: "Say \\" Hello"
Right: "\\" -> " " (two backslashes become one literal backslash) -> "Say \ Hello"

Single vs Double quotes matter:

  • Single quotes: Use \‘ and \\

  • Double quotes: Use \“ and \\



377
378
379
380
381
382
383
# File 'lib/jekyll-l10n/po_file/reader.rb', line 377

def self.unescape_string(str, delimiter)
  if delimiter == "'"
    str.gsub("\\'", "'").gsub('\\\\', '\\')
  else
    str.gsub('\\"', '"').gsub('\\\\', '\\')
  end
end

Instance Method Details

#parseHash

Parse PO file into simple translation hash.

Returns hash mapping msgid strings to msgstr strings (no metadata).

Returns:

  • (Hash)

    Simple translation hash { msgid => msgstr }



63
64
65
66
# File 'lib/jekyll-l10n/po_file/reader.rb', line 63

def parse
  content = load_content
  process_po_lines_instance(content, false)
end

#parse_for_mergeHash

Parse PO file with all metadata for merging.

Returns hash mapping msgid strings to metadata hashes containing msgstr, reference, and fuzzy flag. Used when merging with existing translations.

Returns:

  • (Hash)

    Translation hash with merge metadata { msgid => { msgstr: “…”, reference: “…”, fuzzy: false } }



87
88
89
90
# File 'lib/jekyll-l10n/po_file/reader.rb', line 87

def parse_for_merge
  content = load_content
  process_po_lines_instance(content, :merge)
end

#parse_with_referencesHash

Parse PO file with reference comments preserved.

Returns hash mapping msgid strings to metadata hashes containing msgstr and reference (file location for debugging).

Returns:

  • (Hash)

    Translation hash with references { msgid => { msgstr: “…”, reference: “…” } }



75
76
77
78
# File 'lib/jekyll-l10n/po_file/reader.rb', line 75

def parse_with_references
  content = load_content
  process_po_lines_instance(content, true)
end