Class: Pubid::Cie::Parser

Inherits:
Parslet::Parser
  • Object
show all
Defined in:
lib/pubid/cie/parser.rb

Overview

Parser for CIE identifiers Handles dual-style system (legacy vs current)

Class Method Summary collapse

Class Method Details

.parse(string) ⇒ Object

Class method for parsing with preprocessing



329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
# File 'lib/pubid/cie/parser.rb', line 329

def self.parse(string)
  # Minimal preprocessing for data quality
  cleaned = string.strip

  # Remove comments (text after #)
  cleaned = cleaned.gsub(/\s*#.*$/, "")

  # Normalize spaces
  cleaned = cleaned.gsub(/\s+/, " ")

  # Insert missing colon before year in language patterns like /E2007 -> /E:2007
  # This is a data quality fix - correct format always has colon
  cleaned = cleaned.gsub(%r{/(E|F|G|DE|ES|CN|RU|FR)(\d{4})}, '/\1:\2')

  new.parse(cleaned)
end