Module: SmartCsvImport::HeaderNormalizer
- Defined in:
- lib/smart_csv_import/header_normalizer.rb
Constant Summary collapse
- ABBREVIATIONS =
Unambiguous abbreviations only — terms that reliably mean one thing in a business CSV context. This list is intentionally small and conservative.
DO NOT add entries that could mean two different things depending on domain (e.g. “ext” = file extension or phone extension, “co” = company or county, “apt” = apartment or adjective, “sal” / “val” = names).
This list is not meant to be comprehensive. The LLM fallback strategy handles the long tail of ambiguous and domain-specific abbreviations far better than any static dictionary can.
{ # Personal "dob" => "date of birth", "dod" => "date of death", "ssn" => "social security number", "nin" => "national insurance number", "dba" => "doing business as", # Contact "tel" => "telephone", # Location "addr" => "address", "zip" => "zip code", "ste" => "suite", # Organisation / HR "dept" => "department", "mgr" => "manager", "emp" => "employee", "org" => "organization", "corp" => "corporation", # Quantities / identifiers "qty" => "quantity", "amt" => "amount", "num" => "number", "ref" => "reference", "acct" => "account", # Finance "bal" => "balance", "pmt" => "payment", "inv" => "invoice", # Misc "desc" => "description", "info" => "information", "misc" => "miscellaneous", }.freeze
Class Method Summary collapse
Class Method Details
.normalize(header) ⇒ Object
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/smart_csv_import/header_normalizer.rb', line 50 def self.normalize(header) text = header.to_s # Split camelCase and PascalCase: "CustomerDOB" → "Customer DOB" text = text .gsub(/([a-z])([A-Z])/, '\1 \2') .gsub(/([A-Z]{2,})([A-Z][a-z])/, '\1 \2') # Underscores, dashes, dots, slashes → spaces text = text.tr("_./\\-", " ") # Strip non-alphanumeric characters (removes #, *, (, ), etc.) text = text.gsub(/[^a-zA-Z0-9\s]/, " ") # Collapse whitespace text = text.gsub(/\s+/, " ").strip # Expand abbreviations — whole-word, case-insensitive text = text.split(" ").map do |word| ABBREVIATIONS[word.downcase] || word end.join(" ") # Final collapse in case expansions introduced extra spaces text.gsub(/\s+/, " ").strip end |