philiprehberger-sanitize_html

Tests Gem Version Last updated

HTML sanitizer with configurable allow lists, security profiles, and URL/CSS sanitization for safe user content rendering

Requirements

  • Ruby >= 3.1

Installation

Add to your Gemfile:

gem "philiprehberger-sanitize_html"

Or install directly:

gem install philiprehberger-sanitize_html

Usage

require "philiprehberger/sanitize_html"

# Clean HTML with default allowed tags
safe = Philiprehberger::SanitizeHtml.clean('<p>Hello <script>alert("xss")</script></p>')
# => "<p>Hello </p>"

Custom Allow Lists

Philiprehberger::SanitizeHtml.clean(
  '<div class="box"><span>text</span></div>',
  tags: %w[div span],
  attributes: { 'div' => %w[class] }
)
# => '<div class="box"><span>text</span></div>'

Security Profiles

# :strict - removes all tags
Philiprehberger::SanitizeHtml.clean('<p>Hello <b>world</b></p>', profile: :strict)
# => "Hello world"

# :moderate - basic formatting (p, br, strong, em, b, i, u, lists, blockquote)
Philiprehberger::SanitizeHtml.clean('<p>Hello <b>world</b></p>', profile: :moderate)
# => "<p>Hello <b>world</b></p>"

# :permissive - most safe tags (formatting, links, images, tables, divs, spans)
Philiprehberger::SanitizeHtml.clean('<div><table><tr><td>cell</td></tr></table></div>', profile: :permissive)
# => "<div><table><tr><td>cell</td></tr></table></div>"

# :markdown - code, links, formatting, headings, tables
Philiprehberger::SanitizeHtml.clean('<pre><code>puts "hi"</code></pre>', profile: :markdown)
# => '<pre><code>puts "hi"</code></pre>'

URL Protocol Sanitization

# Default: allows http, https, mailto
Philiprehberger::SanitizeHtml.clean('<a href="javascript:alert(1)">click</a>')
# => "<a>click</a>"

# Custom allowed protocols
Philiprehberger::SanitizeHtml.clean(
  '<a href="ftp://files.example.com/doc.pdf">download</a>',
  allowed_protocols: %w[http https ftp]
)
# => '<a href="ftp://files.example.com/doc.pdf">download</a>'

Data URI Filtering

# Allow specific MIME types for data: URIs
Philiprehberger::SanitizeHtml.clean(
  '<a href="data:image/png;base64,abc123">image</a>',
  allowed_data_mimes: ['image/png', 'image/jpeg']
)
# => '<a href="data:image/png;base64,abc123">image</a>'

CSS Sanitization

# Safe CSS properties are preserved, dangerous ones are stripped
Philiprehberger::SanitizeHtml.clean(
  '<p style="color: red; expression(alert(1))">text</p>',
  tags: %w[p],
  attributes: { 'p' => %w[style] }
)
# => '<p style="color: red">text</p>'

Callback Hooks

# Custom tag processing with on_tag callback
result = Philiprehberger::SanitizeHtml.clean(
  '<a href="http://example.com">link</a>',
  on_tag: ->(tag, attrs) {
    attrs['rel'] = 'nofollow' if tag == 'a'
    attrs
  }
)

# Return nil from callback to remove a tag
result = Philiprehberger::SanitizeHtml.clean(
  '<p>Keep</p><strong>Remove</strong>',
  on_tag: ->(tag, _attrs) { tag == 'strong' ? nil : {} }
)
# => "<p>Keep</p>"

Strip All Tags

Philiprehberger::SanitizeHtml.strip('<p>Hello <strong>world</strong></p>')
# => "Hello world"

Plain Text Extraction

# strip_tags removes all HTML and decodes entities for indexing or previews
Philiprehberger::SanitizeHtml.strip_tags('<p>Tom &amp; Jerry</p>')
# => "Tom & Jerry"

# script and style content is removed entirely, matching browser behavior
Philiprehberger::SanitizeHtml.strip_tags('Hi<script>alert(1)</script> there')
# => "Hi there"

# The :text_only profile is equivalent to strip_tags
Philiprehberger::SanitizeHtml.clean('<b>hi</b>', profile: :text_only)
# => "hi"

Escape HTML

Philiprehberger::SanitizeHtml.escape('<p>Hello</p>')
# => "&lt;p&gt;Hello&lt;/p&gt;"

API

Method / Constant Description
.clean(html, tags:, attributes:, profile:, allowed_protocols:, allowed_data_mimes:, on_tag:) Sanitize HTML keeping only allowed tags and attributes with optional security profile, URL sanitization, data URI filtering, and callback hooks
.strip(html) Remove all HTML tags, returning plain text (with entity normalization)
.strip_tags(html) Convert HTML to plain text by removing all tags (including script/style content) and decoding entities; returns "" for nil or empty input
.escape(html) Entity-encode all HTML special characters
DEFAULT_ALLOWED_TAGS Frozen array of tag names allowed by default (p, br, strong, em, b, i, u, a, ul, ol, li, blockquote, code, pre, h1-h6)
DEFAULT_ALLOWED_ATTRIBUTES Frozen hash of attributes allowed per tag (a => href, title; img => src, alt)
DEFAULT_ALLOWED_PROTOCOLS Frozen array of allowed URL protocols (http, https, mailto)
DEFAULT_ALLOWED_DATA_MIMES Frozen empty array of allowed data URI MIME types (none by default)
SAFE_CSS_PROPERTIES Frozen array of CSS property names considered safe for style attributes
PROFILES Frozen hash of predefined security profiles (:strict, :moderate, :permissive, :markdown, :text_only)
DANGEROUS_TAGS Frozen array of tags always removed with their content (script, style, iframe)
EVENT_ATTRIBUTE_PATTERN Regex matching event-handler attributes (e.g. onclick, onload) that are always stripped
Error Base error class for the module (Philiprehberger::SanitizeHtml::Error)

Development

bundle install
bundle exec rspec
bundle exec rubocop

Support

If you find this project useful:

Star the repo

🐛 Report issues

💡 Suggest features

❤️ Sponsor development

🌐 All Open Source Projects

💻 GitHub Profile

🔗 LinkedIn Profile

License

MIT