Class: Markbridge::Renderers::Discourse::MarkdownEscaper
- Inherits:
-
Object
- Object
- Markbridge::Renderers::Discourse::MarkdownEscaper
- Defined in:
- lib/markbridge/renderers/discourse/markdown_escaper.rb
Overview
Escapes text to prevent interpretation as Markdown formatting.
Design principles:
-
No false negatives: all potentially special sequences MUST be escaped
-
False positives OK: over-escaping is acceptable for safety
-
Autolinks preserved: <https://…>, <mailto:…>, and <email@domain> remain functional
-
HTML escaped: tags, processing instructions, and SGML declarations are neutralized
-
Performance: minimal allocations, byte-level processing, early returns
-
Discourse-compatible: handles ndash conversion, unlimited ordered list numbers
Optimized for Ruby 3.3+ with YJIT. Key optimizations:
-
Fast path returns original string for plain text (no allocations)
-
Pre-allocated result buffers with estimated capacity
-
Byte-level processing for inline escaping (YJIT-friendly tight loops)
-
Simplified escaping rules: [ breaks links, so ] doesn’t need escaping
Constant Summary collapse
- MAYBE_SPECIAL =
Fast-path check: any character that might need escaping Only includes characters we actually escape (removed ], {, }, ^) > is needed for blockquote detection at line start
/[\\`*_\[#+\-.!<>&|~=>)]/- MAYBE_INDENTED_CODE =
Check for indented code on any line Matches: 4+ spaces, tab, or space+tab combinations that reach column 4+
/(?:^|\n)(?: {4}|\t| {1,3}\t)/- ATX_HEADING =
Block-level patterns
/\A\#{1,6}(?=[ \t]|$)/- BLOCK_QUOTE =
/\A>/- BULLET_LIST =
List markers followed by space, tab, or end of line
/\A[-+*](?=[ \t]|$)/- ORDERED_LIST =
/\A(\d+)([.)])(?=[ \t])/- THEMATIC_BREAK_DASH =
/\A(?:-[ \t]*){3,}$/- THEMATIC_BREAK_STAR =
/\A(?:\*[ \t]*){3,}$/- THEMATIC_BREAK_UNDERSCORE =
/\A(?:_[ \t]*){3,}$/- FENCED_CODE_BACKTICK =
/\A`{3,}[^`]*$/- FENCED_CODE_TILDE =
/\A~{3,}/- SETEXT_UNDERLINE_EQUALS =
/\A=+[ \t]*$/- SETEXT_UNDERLINE_DASH =
/\A-+[ \t]*$/- INDENTED_CODE =
Indented code: 4+ spaces, tab at start, or space+tab reaching column 4+
/\A(?: {4}|\t| {1,3}\t)/- INLINE_SPECIAL =
Inline quick-check pattern (includes < for HTML tag escaping)
/[\\*_`\[!|<&~-]/- ENTITY_REF =
Entity reference pattern (we escape these to prevent conversion)
/\A&(?:\#[xX][0-9a-fA-F]{1,6}|\#[0-9]{1,7}|[a-zA-Z][a-zA-Z0-9]{0,31});/- HTML_ATTR =
HTML tag pattern (we escape these, but NOT autolinks) Handles quoted attributes which can contain > characters Attribute patterns: name=“value” | name=‘value’ | name=value | name
/(?:\s+[a-zA-Z_:][a-zA-Z0-9_.:-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s"'=<>`]+))?)/- HTML_TAG =
%r{\A</?[a-zA-Z][a-zA-Z0-9-]*#{HTML_ATTR}*\s*/?>}- AUTOLINK =
Autolink pattern - we pass these through entirely unchanged Matches <http://…>, <https://…>, <mailto:…>, and email addresses
%r{\A<(?:https?://|mailto:)[^>\s]*>|\A<[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*>}i- HTML_TAG_START =
Match HTML-like constructs that need escaping:
-
Processing instructions: <?php, <?xml, etc.
-
SGML declarations: <!DOCTYPE, <!ELEMENT, <![CDATA[, <!–, etc.
-
Incomplete/multi-line HTML tags: <div followed by attributes on next line
-
Custom elements: <my-component>, <responsive-image>
The (?:[s/]|$) ensures we don’t match comparisons like “a < b”
-
%r{\A<(?:[?!]|/?\s*[a-zA-Z][a-zA-Z0-9-]*(?:[\s/]|$))}- BACKSLASH =
Byte constants for inline processing
92- BANG =
\
33- HASH =
!
35- AMP =
#
38- STAR =
&
42- PLUS =
*
43- DASH =
+
45- LT =
-
60- EQUALS =
<
61- GT =
62- BRACKET_OPEN =
>
91- UNDERSCORE =
[
95- BACKTICK =
_
96- PIPE =
‘
124- TILDE =
|
126- SPACE =
~
32- TAB =
9- DIGIT_0 =
48- DIGIT_9 =
57
Instance Method Summary collapse
-
#escape(text) ⇒ String
Escapes markdown special characters in the given text.
-
#initialize(escape_hard_line_breaks: false) ⇒ MarkdownEscaper
constructor
A new instance of MarkdownEscaper.
Constructor Details
#initialize(escape_hard_line_breaks: false) ⇒ MarkdownEscaper
Returns a new instance of MarkdownEscaper.
37 38 39 40 41 42 |
# File 'lib/markbridge/renderers/discourse/markdown_escaper.rb', line 37 def initialize(escape_hard_line_breaks: false) @escape_hard_line_breaks = escape_hard_line_breaks @inline_content = nil @inline_result = nil @inline_len = 0 end |
Instance Method Details
#escape(text) ⇒ String
Multi-line HTML tags and blocks are handled by escaping the opening <
Escapes markdown special characters in the given text.
Handles both block-level constructs (headings, lists, code blocks, HTML blocks) and inline formatting (emphasis, code spans, links, inline HTML). Autolinks (<https://…>, <email@domain>) are intentionally preserved.
124 125 126 127 128 129 130 131 132 133 134 |
# File 'lib/markbridge/renderers/discourse/markdown_escaper.rb', line 124 def escape(text) return "".freeze if text.nil? return text if text.empty? # Neutralize hard line breaks (trailing 2+ spaces before newline) text = text.gsub(/ +\n/, "\n") if @escape_hard_line_breaks && text.include?(" \n") return text unless MAYBE_SPECIAL.match?(text) || MAYBE_INDENTED_CODE.match?(text) escape_text(text) end |