Class: Pubid::Cie::Parser
- Inherits:
-
Parslet::Parser
- Object
- Parslet::Parser
- Pubid::Cie::Parser
- Defined in:
- lib/pubid/cie/parser.rb
Overview
Parser for CIE identifiers Handles dual-style system (legacy vs current)
Class Method Summary collapse
-
.parse(string) ⇒ Object
Class method for parsing with preprocessing.
Class Method Details
.parse(string) ⇒ Object
Class method for parsing with preprocessing
329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 |
# File 'lib/pubid/cie/parser.rb', line 329 def self.parse(string) # Minimal preprocessing for data quality cleaned = string.strip # Remove comments (text after #) cleaned = cleaned.gsub(/\s*#.*$/, "") # Normalize spaces cleaned = cleaned.gsub(/\s+/, " ") # Insert missing colon before year in language patterns like /E2007 -> /E:2007 # This is a data quality fix - correct format always has colon cleaned = cleaned.gsub(%r{/(E|F|G|DE|ES|CN|RU|FR)(\d{4})}, '/\1:\2') new.parse(cleaned) end |