neu-mods
Northeastern-flavored MODS v3 projection + selection for the DRS, shared by Cerberus (front end) and Atlas (API backend).
It is a Nokogiri-native, dependency-light contract over MODS documents — pure functions over a parsed document, nothing else. No Rails, no persistence, no HTTP. It answers two questions:
- "Where is X?" —
Selectorsreturn live Nokogiri nodes, so they serve both the read path (projection reads their text) and the write path (an editor mutates the returned node in place). The node an editor changes is provably the node the projection reads. - "What does this project to?" —
Projectionreturns plain data (hashes/strings/arrays — never opaque typed objects) for indexing/display.
It depends on Nokogiri alone — deliberately not the sul-dlss/mods +
nom-xml stack (which is sunsetting alongside Stanford's move to Cocina). See
the design note in the DRS gap-reports for the full rationale.
Usage
require "neu-mods"
doc = NEU::MODS::Document.parse(xml_string)
# Projection (plain data)
doc.plain_title # => "What's New - How We Respond to Disaster, Episode 1"
doc.title_parts # => { non_sort:, subtitle:, title:, part_name:, part_number: }
doc.abstract # => normalized, paragraph-joined String
doc.topical_subjects # => ["Civil society", ...] (every <topic>, for the access copy)
doc.keywords # => [...] (only the editable attribute-free keyword subjects)
doc.to_h # => full projection, keyed to Atlas's Metadata::MODS attributes
# Pure title composition (no document needed) — for callers that already hold
# the parts (e.g. Atlas's access-copy model) and must not re-parse XML on read.
NEU::MODS.compose_title(non_sort: "", title: "What's New",
part_name: "How We Respond to Disaster", part_number: "Episode 1")
# => "What's New - How We Respond to Disaster, Episode 1" (== doc.plain_title)
# Selectors (live nodes — for editing)
node = doc.primary_title_info.at_xpath("mods:title", NEU::MODS::NAMESPACE)
node.content = "New Title" unless NEU::MODS.whitespace_equivalent?(node.text, "New Title")
doc.to_xml
Two normalizers, two jobs
NEU::MODS.whitespace_equivalent?/.canonical_ws— the no-op guard: did an edit change anything, or only insignificant whitespace? (Used to avoid minting an unchanged OCFL MODS version.)NEU::MODS.normalize_paragraphs/.normalize— clean curator freetext for the JSON/Solr access copy (dash/smart-punctuation transliteration, control stripping, paragraph handling). The XML preservation copy is never touched.
Behavior fidelity & known caveats
The projection is behavior-preserving with Atlas's prior mods-gem-based
extraction, pinned by spec/conformance_spec.rb against work-mods.xml. Two
intentional notes:
- Name display reproduces the
modsgem'sdisplay_value_w_dateincluding its quirks (e.g. multiplegivennameParts concatenate with no separator), to preserve existing Solr/display output. Cleanups are a deliberate future contract change, not a silent one. - Roles & languages read the
type="text"term and fall back to the raw code — they are not MARC-relator / ISO-639 translated. Records carrying text forms (the norm) are unaffected; code-only records would differ. Vendoring those lookup tables (or depending oniso-639) is deferred to keep the gem Nokogiri-only and small.
Source convention
Every character-class regex in TextNormalizer is built programmatically from
codepoints, so the source stays pure ASCII (no literal smart-quotes/dashes, no
raw control bytes). A spec enforces this. Keep it that way.
Development
bundle install
bundle exec rspec
bundle exec rubocop
Versioned via the .version file (read by lib/neu/mods/version.rb); released
with bundler/gem_tasks (rake release), mirroring atlas_rb.