Class: Pubid::Nist::ParserOutputNormalizer

Inherits:
Object
  • Object
show all
Defined in:
lib/pubid/nist/parser_output_normalizer.rb

Overview

Normalizes the raw hash produced by the NIST parser before the Builder constructs the identifier object.

The parser emits a flat hash with keys like :first_number, :second_number, :edition_dash_year, :update_prefix, etc. Many of those keys are *incompatible shapes* — e.g. parser captures a year as :edition_dash_year when it is actually a second_number, or a letter+digits suffix lives inside :first_number when it should become a Part component.

Each ‘normalize_*` method here performs one such shape correction, mutating the hash in place. The Normalizer is intentionally side-effect-only: it never reads from the Builder, the Caster, or the identifier classes, so it can be tested in isolation.

Pre-processing blocks that need to surface extracted components to the Builder (e.g. letter-suffix Part components, embedded-edition objects) remain in Builder#build because they create local variables that flow into the construction phase. All other normalizations live here.

Constant Summary collapse

VALID_YEAR_RANGE =

Range of years we treat as “looks like a calendar year” when disambiguating :edition_dash_year from :second_number.

(1901..2026).freeze
DASH_YEAR_AS_EDITION_SERIES =

Series that treat :edition_dash_year as a year-only edition when the dash year falls in VALID_YEAR_RANGE. For other series with a dash-year in this range, the dash-year is interpreted differently (or kept as a compound number, depending on the branch).

%w[HB CS FIPS].freeze

Instance Method Summary collapse

Instance Method Details

#normalize(parsed_hash) ⇒ Hash

Apply all normalizations to the parsed hash in the correct order.

Parameters:

  • parsed_hash (Hash)

    parser output (mutated in place)

Returns:

  • (Hash)

    the same hash, normalized



37
38
39
40
41
42
43
44
45
46
# File 'lib/pubid/nist/parser_output_normalizer.rb', line 37

def normalize(parsed_hash)
  merge_edition_e_into_update(parsed_hash)
  extract_embedded_edition_with_year(parsed_hash)
  extract_embedded_edition_without_dash_year(parsed_hash)
  split_second_number_edition_year(parsed_hash)
  split_fips_month_year_after_part(parsed_hash)
  disambiguate_ir_compound_vs_edition(parsed_hash)
  disambiguate_dash_year(parsed_hash)
  parsed_hash
end