Module: Relaton::Bib::Sanitizer

Defined in:
lib/relaton/bib/sanitizer.rb

Overview

Strips inline markup not in the basicdoc PureTextElement set (plus <p>, <eref>, <xref>) from raw marked-up content strings. Disallowed elements are unwrapped: tags removed, inner text kept.

Constant Summary collapse

ALLOWED =
%w[
  em strong sub sup tt underline strike smallcap br stem
  p eref xref
].freeze
RENAME =
{
  "italic" => "em",
}.freeze
TAG_RX =
%r{<[a-zA-Z/!?]}

Class Method Summary collapse

Class Method Details

.sanitize(content) ⇒ Object



20
21
22
23
24
25
26
27
28
# File 'lib/relaton/bib/sanitizer.rb', line 20

def self.sanitize(content)
  return content unless sanitizable?(content)

  fragment = Nokogiri::XML::DocumentFragment.parse(content)
  return content if fragment.errors.any?

  sanitize_children(fragment)
  fragment.children.map { |c| c.to_xml(encoding: "UTF-8") }.join
end