Module: Relaton::Bib::Sanitizer

Defined in:: lib/relaton/bib/sanitizer.rb

Overview

Strips inline markup not in the basicdoc PureTextElement set (plus <p>, <eref>, <xref>, <fn>) from raw marked-up content strings. Disallowed elements are unwrapped: tags removed, inner text kept.

<fn> is admitted beyond strict PureTextElement because bibliographic titles in real Metanorma input routinely carry footnotes (e.g. ISO standards titles with a disclaimer footnote), and downstream consumers — notably relaton-render’s own inline-tag allow-list —already accept <fn> as a legitimate child of <title>. Stripping it here would break the round-trip.

OPAQUE elements (currently <stem>) are also allowed, but the sanitiser does not descend into them: their contents are out-of-band inline notation (MathML, AsciiMath, LaTeX) rather than basicdoc markup, and must be preserved verbatim. Without the opaque-skip, the recursive walk would unwrap MathML / AsciiMath elements down to bare text nodes — see #116 for the round-trip-loss symptom.

Constant Summary collapse

ALLOWED =

%w[
  em strong sub sup tt underline strike smallcap br stem
  p eref xref fn
].freeze

OPAQUE = Elements whose children are non-basicdoc inline notation (MathML, AsciiMath, LaTeX, …) and must be preserved verbatim rather than sanitised against ALLOWED.

%w[stem].freeze

RENAME =

{
  "italic" => "em",
}.freeze

TAG_RX =

%r{<[a-zA-Z/!?]}

Class Method Summary collapse

.sanitize(content) ⇒ Object

Class Method Details

.sanitize(content) ⇒ `Object`