Class: Pubid::Nist::Preprocessor

Inherits:
Object
  • Object
show all
Defined in:
lib/pubid/nist/preprocessor.rb

Overview

Owns all regex-based normalization applied to NIST identifier strings before the Parslet grammar sees them.

The Parser entry point delegates to Preprocessor#call; the grammar itself never inspects raw user input. Each private method below is a named stage of normalization, applied in the order declared in #call. Stages are kept in the historically validated sequence — reordering them risks regressions because later stages often match patterns produced by earlier ones.

Format detection (:mr vs :short) is also owned here because it is a property of the original input, not of the parsed tree.

Defined Under Namespace

Classes: Result

Constant Summary collapse

ROMAN_TO_ARABIC =

Convert Roman numerals to Arabic numbers per NIST spec.

{
  "I" => "1",
  "II" => "2",
  "III" => "3",
  "IV" => "4",
  "V" => "5",
  "VI" => "6",
  "VII" => "7",
  "VIII" => "8",
  "IX" => "9",
  "X" => "10",
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(input) ⇒ Preprocessor

Returns a new instance of Preprocessor.



37
38
39
40
# File 'lib/pubid/nist/preprocessor.rb', line 37

def initialize(input)
  @input = input.to_s.strip
  @cleaned = Core::UpdateCodes.apply(@input, :nist)
end

Instance Method Details

#callObject

Run every normalization stage and return a Result.

Stage order is load-bearing — later stages match patterns produced by earlier ones. Reordering requires running the full NIST fixture suite to verify no regression.



47
48
49
50
# File 'lib/pubid/nist/preprocessor.rb', line 47

def call
  run_stages
  Result.new(cleaned: @cleaned, format: detected_format)
end

#detected_formatObject

Detect input format: :mr (dot-separated machine-readable) or :short.



81
82
83
# File 'lib/pubid/nist/preprocessor.rb', line 81

def detected_format
  @input.include?(".") && !@input.match?(/\s/) ? :mr : :short
end

#run_stagesObject

Sequence of normalization stages in historically validated order. Extracted so rubocop can scope length/ABC metrics narrowly. rubocop:disable Metrics/MethodLength, Metrics/AbcSize



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'lib/pubid/nist/preprocessor.rb', line 55

def run_stages
  normalize_publisher_and_series!
  normalize_lcirc_supplement_contexts!
  normalize_revision_spacing!
  normalize_letter_suffix_casing!
  normalize_draft_and_volume!
  convert_roman_volumes!
  normalize_supplement_and_part!
  normalize_version_notation!
  normalize_edition_year_suffix!
  normalize_revision_with_letter!
  normalize_version_dotted_spaces!
  normalize_update_markers!
  normalize_supplement_variants!
  normalize_revision_language!
  normalize_mr_translation_codes!
  convert_dashyear_to_edition!
  revert_dashyear_for_series!
  normalize_version_verbose!
  normalize_part_notation!
  normalize_series_specific_spacing!
  normalize_verbose_keywords!
end