Commonmarker
Ruby wrapper for Rust's comrak crate.
It passes all of the CommonMark test suite, and is therefore spec-complete. It also includes extensions to the CommonMark spec as documented in the GitHub Flavored Markdown spec, such as support for tables, strikethroughs, and autolinking.
[!NOTE] By default, the following extensions are enabled for end user convenience:
strikethrough,tagfilter,table,autolink,tasklist(all from the GFM spec), andshortcodes. Thesyntax_highlighterplugin is also enabled by default, using the"base16-ocean.dark"theme.For more information on the available options and extensions, see the documentation below.
Installation
Add this line to your application's Gemfile:
gem 'commonmarker'
And then execute:
$ bundle
Or install it yourself as:
$ gem install commonmarker
Usage
This gem expects to receive UTF-8 strings. Ensure your strings are the right encoding before passing them into Commonmarker.
Converting to HTML
Call to_html on a string to convert it to HTML:
require 'commonmarker'
Commonmarker.to_html('"Hi *there*"', options: {
parse: { smart: true }
})
# => <p>“Hi <em>there</em>”</p>\n
(The second argument is optional--see below for more information.)
Generating a document
You can also parse a string to receive a :document node. You can then print that node to HTML, iterate over the children, and do other fun node stuff. For example:
require 'commonmarker'
doc = Commonmarker.parse("*Hello* world", options: {
parse: { smart: true }
})
puts(doc.to_html) # => <p><em>Hello</em> world</p>\n
doc.walk do |node|
puts node.type # => [:document, :paragraph, :emph, :text, :text]
end
(The second argument is optional--see below for more information.)
When it comes to modifying the document, you can perform the following operations:
insert_beforeinsert_afterprepend_childappend_childdelete
You can also get the source position of a node by calling source_position:
doc = Commonmarker.parse("*Hello* world")
puts doc.first_child.first_child.source_position
# => {:start_line=>1, :start_column=>1, :end_line=>1, :end_column=>7}
You can also modify the following attributes:
urltitleheader_levellist_typelist_startlist_tightfence_info
Example: Walking the AST
You can use walk or each to iterate over nodes:
walkwill iterate on a node and recursively iterate on a node's children.eachwill iterate on a node's direct children, but no further.
require 'commonmarker'
# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")
# Walk tree and print out URLs for links
doc.walk do |node|
if node.type == :link
printf("URL = %s\n", node.url)
end
end
# => URL = https://www.github.com
# Transform links to regular text
doc.walk do |node|
if node.type == :link
node.insert_before(node.first_child)
node.delete
end
end
# => <h1><a href=\"#the-site\"></a>The site</h1>\n<p>GitHub</p>\n
Example: Converting a document back into raw CommonMark
You can use to_commonmark on a node to render it as raw text:
require 'commonmarker'
# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")
# Transform links to regular text
doc.walk do |node|
if node.type == :link
node.insert_before(node.first_child)
node.delete
end
end
doc.to_commonmark
# => # The site\n\nGitHub\n
Options and plugins
Options
Commonmarker accepts the same parse, render, and extensions options that comrak does, as a hash dictionary with symbol keys:
Commonmarker.to_html('"Hi *there*"', options:{
parse: { smart: true },
render: { hardbreaks: false}
})
Note that there is a distinction in comrak for "parse" options and "render" options, which are represented in the tables below. As well, if you wish to disable any-non boolean option, pass in nil.
Parse options
| Name | Description | Default |
|---|---|---|
smart |
Punctuation (quotes, full-stops and hyphens) are converted into 'smart' punctuation. | false |
default_info_string |
The default info string for fenced code blocks. | "" |
relaxed_tasklist_matching |
Enables relaxing of the tasklist extension matching, allowing any non-space to be used for the "checked" state instead of only x and X. |
false |
relaxed_autolinks |
Enable relaxing of the autolink extension parsing, allowing links to be recognized when in brackets, as well as permitting any url scheme. | false |
leave_footnote_definitions |
Allow footnote definitions to remain in their original positions instead of being moved to the document's end (only affects AST) | false |
ignore_setext |
Ignores setext-style headings. | false |
sourcepos_chars |
Use character-based column tracking in source positions instead of byte-based. Relevant for multi-byte UTF-8 documents with sourcepos. |
false |
Render options
| Name | Description | Default |
|---|---|---|
hardbreaks |
Soft line breaks translate into hard line breaks. | true |
github_pre_lang |
GitHub-style <pre lang="xyz"> is used for fenced code blocks with info tags. |
true |
full_info_string |
Gives info string data after a space in a data-meta attribute on code blocks. |
false |
width |
The wrap column when outputting CommonMark. | 80 |
unsafe |
Allow rendering of raw HTML and potentially dangerous links. | false |
escape |
Escape raw HTML instead of clobbering it. | false |
sourcepos |
Include source position attribute in HTML and XML output. | false |
escaped_char_spans |
Wrap escaped characters in span tags. | true |
ignore_empty_links |
Ignores empty links, leaving the Markdown text in place. | false |
gfm_quirks |
Outputs HTML with GFM-style quirks; namely, not nesting <strong> inlines. |
false |
prefer_fenced |
Always output fenced code blocks, even where an indented one could be used. | false |
tasklist_classes |
Add CSS classes to the HTML output of the tasklist extension | false |
compact_html |
Suppress newlines in pretty-printed HTML output. | false |
As well, there are several extensions which you can toggle in the same manner:
Commonmarker.to_html('"Hi *there*"', options: {
extension: { footnotes: true, description_lists: true },
render: { hardbreaks: false }
})
Extension options
| Name | Description | Default |
|---|---|---|
strikethrough |
Enables the strikethrough extension from the GFM spec. | true |
tagfilter |
Enables the tagfilter extension from the GFM spec. | true |
table |
Enables the table extension from the GFM spec. | true |
autolink |
Enables the autolink extension from the GFM spec. | true |
tasklist |
Enables the task list extension from the GFM spec. | true |
superscript |
Enables the superscript Comrak extension. | false |
header_ids |
Enables the header IDs Comrak extension. from the GFM spec. | "" |
header_id_prefix_in_href |
Also add the prefix to generated href attributes pointing to headers. |
false |
footnotes |
Enables the footnotes extension per cmark-gfm. |
false |
inline_footnotes |
Enables the inline footnotes extension. | false |
description_lists |
Enables the description lists extension. | false |
front_matter_delimiter |
Enables the front matter extension. | "" |
multiline_block_quotes |
Enables the multiline block quotes extension. | false |
math_dollars, math_code |
Enables the math extension. | false |
shortcodes |
Enables the shortcodes extension. | true |
wikilinks_title_before_pipe |
Enables the wikilinks extension, placing the title before the dividing pipe. | false |
wikilinks_title_after_pipe |
Enables the wikilinks extension, placing the title after the dividing pipe. | false |
underline |
Enables the underline extension. | false |
spoiler |
Enables the spoiler extension. | false |
greentext |
Enables the greentext extension. | false |
subtext |
Enables the subtext extension. | false |
subscript |
Enables the subscript extension. | false |
alerts |
Enables the alerts extension. | false |
cjk_friendly_emphasis |
Enables the CJK friendly emphasis extension. | false |
highlight |
Enables highlighting via == |
false |
insert |
Enables the insert extension, rendering ++text++ as <ins>text</ins>. |
false |
block_directive |
Enables the block directive extension. | false |
For more information on these options, see the comrak documentation.
Plugins
In addition to the possibilities provided by generic CommonMark rendering, Commonmarker also supports plugins as a means of providing further niceties.
Syntax Highlighter Plugin
The syntax highlighter plugin is enabled by default, using the "base16-ocean.dark" theme. It applies syntax highlighting to fenced code blocks that specify a language.
The library comes with a set of pre-existing themes for highlighting code:
"base16-ocean.dark""base16-eighties.dark""base16-mocha.dark""base16-ocean.light""InspiredGitHub""Solarized (dark)""Solarized (light)"
code = <<~CODE
```ruby
def hello
puts "hello"
end
```
CODE
# pass in a theme name from a pre-existing set
puts Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "InspiredGitHub" } })
# <pre style="background-color:#ffffff;" lang="ruby"><code>
# <span style="font-weight:bold;color:#a71d5d;">def </span><span style="font-weight:bold;color:#795da3;">hello
# </span><span style="color:#62a35c;">puts </span><span style="color:#183691;">"hello"
# </span><span style="font-weight:bold;color:#a71d5d;">end
# </span>
# </code></pre>
To disable this plugin, set the value to nil:
code = <<~CODE
```ruby
def hello
puts "hello"
end
```
CODE
Commonmarker.to_html(code, plugins: { syntax_highlighter: nil })
# <pre lang="ruby"><code>def hello
# puts "hello"
# end
# </code></pre>
To output CSS classes instead of style attributes, set the theme key to "":
code = <<~CODE
```ruby
def hello
puts "hello"
end
CODE
Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "" } })
# <pre class="syntax-highlighting"><code><span class="source ruby"><span class="meta function ruby"><span class="keyword control def ruby">def</span></span><span class="meta function ruby"> # <span class="entity name function ruby">hello</span></span>
# <span class="support function builtin ruby">puts</span> <span class="string quoted double ruby"><span class="punctuation definition string begin ruby">"</span>hello<span class="punctuation definition string end ruby">"</span></span>
# <span class="keyword control ruby">end</span>\n</span></code></pre>
To use a custom theme, you can provide a path to a directory containing .tmtheme files to load:
Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "Monokai", path: "./themes" } })
Output formats
Commonmarker can currently only generate output in one format: HTML.
HTML
puts Commonmarker.to_html('*Hello* world!')
# <p><em>Hello</em> world!</p>
Developing locally
After cloning the repo:
script/bootstrap
bundle exec rake compile
If there were no errors, you're done! Otherwise, make sure to follow the comrak dependency instructions.
Benchmarks
❯ bundle exec rake benchmark
input size = 11064832 bytes
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
Markly.render_html 1.000 i/100ms
Markly::Node#to_html 1.000 i/100ms
Commonmarker.to_html 1.000 i/100ms
Commonmarker::Node.to_html
1.000 i/100ms
Kramdown::Document#to_html
1.000 i/100ms
Calculating -------------------------------------
Markly.render_html 15.606 (±25.6%) i/s - 71.000 in 5.047132s
Markly::Node#to_html 15.692 (±25.5%) i/s - 72.000 in 5.095810s
Commonmarker.to_html 4.482 (± 0.0%) i/s - 23.000 in 5.137680s
Commonmarker::Node.to_html
5.092 (±19.6%) i/s - 25.000 in 5.072220s
Kramdown::Document#to_html
0.379 (± 0.0%) i/s - 2.000 in 5.277770s
Comparison:
Markly::Node#to_html: 15.7 i/s
Markly.render_html: 15.6 i/s - same-ish: difference falls within error
Commonmarker::Node.to_html: 5.1 i/s - 3.08x slower
Commonmarker.to_html: 4.5 i/s - 3.50x slower
Kramdown::Document#to_html: 0.4 i/s - 41.40x slower