Class: Pubid::Utils::StringNormalizer

Inherits:
Object
  • Object
show all
Defined in:
lib/pubid/utils/string_normalizer.rb

Overview

String normalization utilities for PubID parsing and rendering

This module provides centralized string manipulation methods to reduce duplication and improve consistency.

Usage

Pubid::Utils::StringNormalizer.normalize_dashes(str)
Pubid::Utils::StringNormalizer.normalize_whitespace(str)
Pubid::Utils::StringNormalizer.split_compound_number(str)

Constant Summary collapse

DASH_CHARS =

Unicode dash characters that should be normalized to ASCII hyphen

["-", "", "", "", ""].freeze
WHITESPACE_CHARS =

Whitespace characters to normalize to single space

[" ", "\t", "\n", "\r", "\u00A0"].freeze

Class Method Summary collapse

Class Method Details

.blank?(str) ⇒ Boolean

Check if string is blank (nil, empty, or only whitespace)

Examples:

Check blank

StringNormalizer.blank?(nil) # => true
StringNormalizer.blank?("") # => true
StringNormalizer.blank?("  ") # => true
StringNormalizer.blank?("ISO") # => false

Parameters:

  • str (String)

    input string

Returns:

  • (Boolean)

    true if blank



177
178
179
# File 'lib/pubid/utils/string_normalizer.rb', line 177

def blank?(str)
  str.nil? || str.to_s.strip.empty?
end

.clean_abbr(str) ⇒ String

Clean and uppercase abbreviation

Examples:

Clean abbreviation

StringNormalizer.clean_abbr("  amd  ") # => "AMD"
StringNormalizer.clean_abbr("Amd") # => "AMD"

Parameters:

  • str (String, nil)

    input string

Returns:

  • (String)

    cleaned and uppercased abbreviation



63
64
65
66
67
# File 'lib/pubid/utils/string_normalizer.rb', line 63

def clean_abbr(str)
  return "" if str.nil?

  normalize_whitespace(str).upcase
end

.extract_number_suffix(str) ⇒ Array<String, nil>

Extract numeric suffix from string

Examples:

Extract number and suffix

StringNormalizer.extract_number_suffix("800-53r5") # => ["53", "r5"]
StringNormalizer.extract_number_suffix("ISO") # => [nil, nil]

Parameters:

  • str (String)

    input string

Returns:

  • (Array<String, nil>)
    number, suffix

    or [nil, nil]



97
98
99
100
101
102
103
104
105
106
# File 'lib/pubid/utils/string_normalizer.rb', line 97

def extract_number_suffix(str)
  return [nil, nil] unless str

  match = str.match(/^(\D+)?(\d+)([a-zA-Z]+)?$/)
  return [nil, nil] unless match

  number = match[2]
  suffix = match[3]
  [number, suffix]
end

.join_parts(parts, separator: "") ⇒ String

Join parts with proper separator, skipping nils

Examples:

Join parts

StringNormalizer.join_parts(["ISO", "9001", nil, "2015"], " ")
# => "ISO 9001 2015"
StringNormalizer.join_parts(["ISO", nil, "9001"], "-")
# => "ISO-9001"

Parameters:

  • parts (Array)

    parts to join

  • separator (String) (defaults to: "")

    separator between parts

Returns:

  • (String)

    joined string



120
121
122
# File 'lib/pubid/utils/string_normalizer.rb', line 120

def join_parts(parts, separator: "")
  parts.compact.join(separator)
end

.normalize_dashes(str) ⇒ String

Normalize all dash characters to ASCII hyphen

Examples:

Normalize unicode dashes

StringNormalizer.normalize_dashes("ISO‑9001") # => "ISO-9001"
StringNormalizer.normalize_dashes("ISO–9001") # => "ISO-9001"

Parameters:

  • str (String, nil)

    input string

Returns:

  • (String)

    normalized string with ASCII hyphens



33
34
35
36
37
# File 'lib/pubid/utils/string_normalizer.rb', line 33

def normalize_dashes(str)
  return str if str.nil?

  str.tr(DASH_CHARS.join, "-")
end

.normalize_whitespace(str) ⇒ String

Normalize whitespace to single space and strip

Examples:

Normalize whitespace

StringNormalizer.normalize_whitespace("ISO  9001") # => "ISO 9001"
StringNormalizer.normalize_whitespace("ISO\t9001") # => "ISO 9001"

Parameters:

  • str (String, nil)

    input string

Returns:

  • (String)

    normalized string with single spaces



48
49
50
51
52
# File 'lib/pubid/utils/string_normalizer.rb', line 48

def normalize_whitespace(str)
  return "" if str.nil?

  str.gsub(/[#{WHITESPACE_CHARS.join}]+/, " ").strip
end

.split_compound_number(str, separators: ["-", "/"]) ⇒ Array<String>

Split compound number (e.g., “800-53-1” -> [“800”, “53”, “1”])

Examples:

Split compound numbers

StringNormalizer.split_compound_number("800-53-1")
# => ["800", "53", "1"]
StringNormalizer.split_compound_number("800/53")
# => ["800", "53"]

Parameters:

  • str (String)

    input string

  • separators (Array<String>) (defaults to: ["-", "/"])

    separators to split on

Returns:

  • (Array<String>)

    split parts



81
82
83
84
85
86
# File 'lib/pubid/utils/string_normalizer.rb', line 81

def split_compound_number(str, separators: ["-", "/"])
  return [] unless str

  normalized = normalize_dashes(str)
  normalized.split(/[#{separators.join}]/).reject(&:empty?)
end

.title_case(str) ⇒ String

Convert to title case (first letter of each word uppercase)

Examples:

Title case

StringNormalizer.title_case("international standard")
# => "International Standard"
StringNormalizer.title_case("ISO/TR")
# => "ISO/TR"  # Preserves existing caps

Parameters:

  • str (String)

    input string

Returns:

  • (String)

    title cased string



154
155
156
157
158
159
160
161
162
163
164
# File 'lib/pubid/utils/string_normalizer.rb', line 154

def title_case(str)
  return "" if str.nil?

  str.split.map do |word|
    if word.upcase == word # Preserve acronyms
      word
    else
      word.capitalize
    end
  end.join(" ")
end

.to_s(str) ⇒ String

Safe to_s method that handles nil

Examples:

Safe to_s

StringNormalizer.to_s(nil) # => ""
StringNormalizer.to_s("ISO") # => "ISO"

Parameters:

  • str (String, nil)

    input string

Returns:

  • (String)

    string or empty string



190
191
192
# File 'lib/pubid/utils/string_normalizer.rb', line 190

def to_s(str)
  str.nil? ? "" : str.to_s
end

.truncate(str, max_length:, ellipsis: "...") ⇒ String

Truncate string to max length with ellipsis

Examples:

Truncate string

StringNormalizer.truncate("International Standard", 15)
# => "International ..."
StringNormalizer.truncate("ISO", 10)
# => "ISO"

Parameters:

  • str (String)

    input string

  • max_length (Integer)

    maximum length

  • ellipsis (String) (defaults to: "...")

    ellipsis string (default: “…”)

Returns:

  • (String)

    truncated string



137
138
139
140
141
# File 'lib/pubid/utils/string_normalizer.rb', line 137

def truncate(str, max_length:, ellipsis: "...")
  return str if str.nil? || str.length <= max_length

  str[0...(max_length - ellipsis.length)] + ellipsis
end