Class: Pubid::Nist::Preprocessor
- Inherits:
-
Object
- Object
- Pubid::Nist::Preprocessor
- Defined in:
- lib/pubid/nist/preprocessor.rb
Overview
Owns all regex-based normalization applied to NIST identifier strings before the Parslet grammar sees them.
The Parser entry point delegates to Preprocessor#call; the grammar itself never inspects raw user input. Each private method below is a named stage of normalization, applied in the order declared in #call. Stages are kept in the historically validated sequence — reordering them risks regressions because later stages often match patterns produced by earlier ones.
Format detection (:mr vs :short) is also owned here because it is a property of the original input, not of the parsed tree.
Defined Under Namespace
Classes: Result
Constant Summary collapse
- ROMAN_TO_ARABIC =
Convert Roman numerals to Arabic numbers per NIST spec.
{ "I" => "1", "II" => "2", "III" => "3", "IV" => "4", "V" => "5", "VI" => "6", "VII" => "7", "VIII" => "8", "IX" => "9", "X" => "10", }.freeze
Instance Method Summary collapse
-
#call ⇒ Object
Run every normalization stage and return a Result.
-
#detected_format ⇒ Object
Detect input format: :mr (dot-separated machine-readable) or :short.
-
#initialize(input) ⇒ Preprocessor
constructor
A new instance of Preprocessor.
-
#run_stages ⇒ Object
Sequence of normalization stages in historically validated order.
Constructor Details
#initialize(input) ⇒ Preprocessor
Returns a new instance of Preprocessor.
37 38 39 40 |
# File 'lib/pubid/nist/preprocessor.rb', line 37 def initialize(input) @input = input.to_s.strip @cleaned = Core::UpdateCodes.apply(@input, :nist) end |
Instance Method Details
#call ⇒ Object
Run every normalization stage and return a Result.
Stage order is load-bearing — later stages match patterns produced by earlier ones. Reordering requires running the full NIST fixture suite to verify no regression.
47 48 49 50 |
# File 'lib/pubid/nist/preprocessor.rb', line 47 def call run_stages Result.new(cleaned: @cleaned, format: detected_format) end |
#detected_format ⇒ Object
Detect input format: :mr (dot-separated machine-readable) or :short.
81 82 83 |
# File 'lib/pubid/nist/preprocessor.rb', line 81 def detected_format @input.include?(".") && !@input.match?(/\s/) ? :mr : :short end |
#run_stages ⇒ Object
Sequence of normalization stages in historically validated order. Extracted so rubocop can scope length/ABC metrics narrowly. rubocop:disable Metrics/MethodLength, Metrics/AbcSize
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/pubid/nist/preprocessor.rb', line 55 def run_stages normalize_publisher_and_series! normalize_lcirc_supplement_contexts! normalize_revision_spacing! normalize_letter_suffix_casing! normalize_draft_and_volume! convert_roman_volumes! normalize_supplement_and_part! normalize_version_notation! normalize_edition_year_suffix! normalize_revision_with_letter! normalize_version_dotted_spaces! normalize_update_markers! normalize_supplement_variants! normalize_revision_language! normalize_mr_translation_codes! convert_dashyear_to_edition! revert_dashyear_for_series! normalize_version_verbose! normalize_part_notation! normalize_series_specific_spacing! normalize_verbose_keywords! end |