gfm_to_blockkit

AST-based converter from GitHub-Flavored Markdown (GFM) to Slack Block Kit blocks.

Unlike regex-based converters, gfm_to_blockkit uses commonmarker (which wraps the comrak Rust crate) to parse GFM into an abstract syntax tree, then walks that tree to produce structured Block Kit JSON ready for the Slack API.

Installation

Add to your Gemfile:

gem "gfm_to_blockkit"

Then run bundle install.

Usage

require "gfm_to_blockkit"

markdown = <<~MD
  # Hello World

  This is **bold** and *italic* text with a [link](https://example.com).

  - Item 1
  - Item 2
    - Nested item

  ```ruby
  puts "hello"

MD

blocks = GfmToBlockkit.convert(markdown)


The result is an array of Block Kit block hashes (with symbol keys) that you can pass directly to the Slack API:

```ruby
client.chat_postMessage(
  channel: "#general",
  blocks: GfmToBlockkit.convert(markdown),
  text: "Fallback text"
)

Table format option

Tables can be rendered as ASCII art in a preformatted block (default) or as native Slack table blocks:

# Default: ASCII table in rich_text_preformatted
blocks = GfmToBlockkit.convert(markdown)

# Native Slack table block (limited availability)
blocks = GfmToBlockkit.convert(markdown, table_format: :native)

How it works

This section walks through the architecture and conversion pipeline so you can reason about how any given markdown input becomes Block Kit output.

Why AST-based?

GFM and Slack's mrkdwn look superficially similar, but their syntax diverges in enough places that regex-based transliteration breaks down on anything non-trivial. Bold is **text** in GFM but *text* in mrkdwn. Italic is *text* in GFM but _text_ in mrkdwn. That means a naive s/\*\*/\*/g pass will mangle any document that uses both bold and italic. Nesting, escaping, and context-sensitivity make it worse.

By parsing into an AST first, we sidestep all of this. The parser resolves ambiguity for us--we get a tree of typed nodes (:strong, :emph, :link, etc.) and can emit the correct Slack syntax for each one without worrying about what the original markdown characters were.

The pipeline

GFM string
    |
    v
Commonmarker.parse()  -->  AST (document node with block-level children)
    |
    v
Converter walks top-level nodes, looks up handler via registry
    |
    +-- :paragraph        -->  Converters::Paragraph   -->  section block(s)
    +-- :heading          -->  Converters::Heading     -->  header block
    +-- :list             -->  Converters::List        -->  rich_text block with rich_text_list elements
    +-- :code_block       -->  Converters::CodeBlock   -->  rich_text block with rich_text_preformatted
    +-- :block_quote      -->  Converters::BlockQuote  -->  rich_text block with rich_text_quote
    +-- :table            -->  Converters::Table       -->  rich_text block or native table block
    +-- :thematic_break   -->  Converters::ThematicBreak  -->  divider block
    +-- ...
    |
    v
Array of Block Kit block hashes

The entry point is GfmToBlockkit.convert(markdown), which instantiates a Converter and calls it. The converter builds an immutable Context (a Data.define) containing the two inline renderers and the table_format option, then parses the markdown with all GFM extensions enabled (tables, strikethrough, autolinks, task lists, footnotes) and iterates the document's top-level children. Each child is a block-level AST node--a paragraph, heading, list, code block, etc. The converter dispatches each node by looking up the appropriate converter class from a registry on Converters::Base, passing the Context to the converter's constructor. Each converter class registers itself for the node types it handles via handles :node_type, and implements a convert(node) method that returns an array of one or more Block Kit blocks.

The hybrid block strategy

Slack's Block Kit has two main ways to display formatted text:

  1. Section blocks with mrkdwn text: A { type: "section", text: { type: "mrkdwn", text: "..." } } where the text string uses Slack's mrkdwn syntax (*bold*, _italic_, <url|text>, etc.). Simple and clean, but limited to flat text--no structural support for nested lists, code with language hints, or semantic quotes.

  2. Rich text blocks: A { type: "rich_text", elements: [...] } containing structured sub-elements like rich_text_list, rich_text_preformatted, and rich_text_quote. These give much better rendering for structured content but are more verbose.

The gem uses a hybrid approach: section blocks for paragraphs, rich text blocks for everything structural. This means simple text gets the clean, lightweight section treatment, while lists, code blocks, and blockquotes get the richer rendering they need.

The practical consequence is that the gem has two inline rendering paths, since the format of inline content (bold, italic, links, etc.) differs depending on which block type it's going into.

Two inline renderers

This is the key architectural decision. When the converter encounters inline content (bold text, links, code spans, etc.), it needs to render that content differently depending on the destination block type:

Renderers::Mrkdwn produces a string in Slack's mrkdwn syntax. It's used for content going into section blocks. It recursively walks the inline children of a node, wrapping text in the appropriate mrkdwn delimiters:

AST: paragraph > strong > text("hello")
                          |
MrkdwnRenderer walks:     strong wraps children in *...*
                          text returns escaped "hello"
                          |
Output string:            "*hello*"

The renderer maps GFM constructs to their mrkdwn equivalents: **bold** becomes *bold*, *italic* becomes _italic_, ~~strike~~ becomes ~strike~, [text](url) becomes <url|text>. It also escapes &, <, and > in text content (but not inside URLs) to prevent Slack from misinterpreting them.

Renderers::RichText produces an array of element hashes for use inside rich text blocks. Instead of wrapping text in delimiter characters, it creates structured objects with explicit style flags:

AST: paragraph > strong > emph > text("hello")
                                 |
RichTextRenderer walks:          strong pushes {bold: true} onto style stack
                                 emph pushes {italic: true}
                                 text emits element with merged style
                                 |
Output element:                  { type: "text", text: "hello", style: { bold: true, italic: true } }

The key mechanism here is the style stack--an immutable hash that accumulates active styles as the renderer descends through formatting nodes. When it hits a :strong node, it merges { bold: true } into the stack and recurses into the children. When it hits a :emph, it merges { italic: true }. When it finally reaches a :text leaf, it emits a text element with whatever styles have accumulated. This naturally handles arbitrary nesting depths without any special-casing.

Links become { type: "link", url: "...", text: "..." } elements (with the current style stack applied). Inline images become link elements as a fallback, since Slack's rich text elements have no inline image type.

Block-level conversion details

Paragraphs are the simplest case. The converter passes the paragraph node to Renderers::Mrkdwn, gets back a mrkdwn string, and wraps it in a section block. If the resulting string exceeds 3,000 characters (Slack's limit for section text), TextSplitter breaks it into multiple section blocks.

There's one special case: if a paragraph contains nothing but a single image, the converter recognizes this and emits an image block instead of a section. Images mixed with other text stay inline as mrkdwn link fallbacks.

Headings become header blocks. Since header blocks only accept plain_text (no formatting), the converter extracts raw text content from the heading node, stripping any bold/italic/code formatting. If the text exceeds 150 characters (Slack's header limit), it's truncated with an ellipsis. All heading levels (h1-h6) map to the same block type since Slack only has one header style.

Code blocks become rich_text blocks containing a rich_text_preformatted element. The fence info (e.g., ruby from `ruby) is extracted and passed as the language field, which Slack uses for syntax highlighting. The code content itself goes in as a plain text element.

Lists are the most complex converter. It walks the list structure recursively via collect_items, building rich_text_list elements. Each list item becomes a rich_text_section containing its inline content (rendered via Renderers::RichText). Nesting is expressed through the indent property--the top-level list has indent: 0, a sub-list has indent: 1, and so on. The style ("bullet" or "ordered") comes from the AST's list_type, and ordered lists starting at a number other than 1 get an offset value.

After collecting all list items, the converter runs a merge pass: adjacent rich_text_list elements with the same style and indent level are combined into a single element with multiple rich_text_section children. This is because each list item initially produces its own rich_text_list wrapper (to handle interleaving with sub-lists at different indent levels), but Slack expects sibling items to be grouped.

Task list items (GFM's - [x] / - [ ] syntax) are detected via the :taskitem node type. The checked state is determined by rendering the node to HTML and checking for the checked attribute (the commonmarker Ruby binding doesn't expose this as a direct accessor). Checked items get a :white_check_mark: emoji prepended; unchecked items get :white_large_square:.

Block-level children inside list items (code blocks, blockquotes, headings) are rendered as inline rich text elements via the shared render_child_as_elements method on Converters::Base. This same method is used by the BlockQuote converter, avoiding duplication of the "render a block-level node as inline elements" logic.

Blockquotes become rich_text blocks containing a rich_text_quote element. The quote's inline content is rendered via Renderers::RichText. Since Slack's rich_text_quote doesn't support nesting, nested blockquotes are flattened into a single quote element with > text prefixes to visually indicate depth. Like the List converter, block-level children (code, headings) are handled via render_child_as_elements; the only special cases BlockQuote handles itself are nested :block_quote (depth recursion) and :list (text bullet rendering within quotes).

Tables have two rendering paths controlled by the table_format option. The Table converter is a thin dispatcher that delegates to one of two sub-converters in converters/table/. The default Table::Preformatted extracts rows and cells from the AST, computes column widths, reads alignment from the HTML output (since commonmarker doesn't expose alignment as a direct accessor), and formats each row into an ASCII-art table inside a rich_text_preformatted element. Table::Native emits a Slack table block with structured column and row data.

Thematic breaks (---) map directly to { type: "divider" }.

HTML blocks are preserved as-is in rich_text_preformatted elements (Slack can't render HTML, so monospace display is the most faithful option). The content is extracted via to_commonmark rather than string_content because the commonmarker binding doesn't expose string content for HTML block nodes.

Footnote definitions become context blocks--Slack's block type for de-emphasized supplementary information. The footnote label is extracted from the node's HTML output. Footnote references in body text are rendered as [^1] style markers.

Text splitting

Slack section blocks have a hard 3,000-character limit on the text field. TextSplitter handles this by finding intelligent split points when text exceeds the limit. It searches backward from the 3,000-character mark with a priority order:

  1. Paragraph break (\n\n) -- preserves document structure
  2. Line break (\n) -- preserves line structure
  3. Sentence boundary (. / ! / ? followed by space) -- preserves readability
  4. Word boundary (space) -- avoids mid-word breaks
  5. Hard split at 3,000 -- last resort for pathological input like very long URLs

Each candidate must be at least 25% of the way into the text to avoid producing tiny fragments. The splitter runs iteratively until all chunks are within the limit.

Constraints and limits

The converter enforces Slack's Block Kit constraints:

Constraint Limit Behavior
Section text 3,000 chars Split into multiple section blocks
Header text 150 chars Truncated with ellipsis
Image alt_text 2,000 chars Truncated
Blocks per message 50 Output array truncated

File layout

File Purpose
lib/gfm_to_blockkit.rb Entry point, defines GfmToBlockkit.convert()
lib/gfm_to_blockkit/context.rb Immutable Data.define passed to converters (renderers + options)
lib/gfm_to_blockkit/converter.rb Thin orchestrator: parse, build context, dispatch via registry
Converters
converters/base.rb Registry, converter_for(), shared render_child_as_elements, helpers
converters/paragraph.rb handles :paragraph — section blocks, standalone image detection
converters/heading.rb handles :heading — header blocks, 150-char truncation
converters/code_block.rb handles :code_block — rich_text_preformatted with language
converters/block_quote.rb handles :block_quote — rich_text_quote, nested quote flattening
converters/list.rb handles :list — rich_text_list with nesting, task items, merge pass
converters/table.rb handles :table — thin dispatcher to sub-converters
converters/table/preformatted.rb ASCII art table in rich_text_preformatted
converters/table/native.rb Slack native table block
converters/thematic_break.rb handles :thematic_break — divider block
converters/html_block.rb handles :html_block — rich_text_preformatted
converters/footnote_definition.rb handles :footnote_definition — context block
converters/image.rb Image block (called by Paragraph for standalone images)
Renderers
renderers/mrkdwn.rb Inline AST nodes → mrkdwn strings (for section blocks)
renderers/rich_text.rb Inline AST nodes → rich_text element arrays (for rich_text blocks)
Utilities
text_splitter.rb Split long text at sensible boundaries (3,000 char limit)
version.rb Version constant

GFM to Block Kit mapping

GFM Block Kit Converter / Renderer
Paragraph section block with mrkdwn text Converters::ParagraphRenderers::Mrkdwn
Heading (h1-h6) header block (plain_text, max 150 chars) Converters::Heading
Bold / Italic / Strikethrough / Code *bold* _italic_ ~strike~ `code` Renderers::Mrkdwn
Link `<url\ text>`
Image (standalone) image block Converters::Image
Image (inline) mrkdwn link fallback Renderers::Mrkdwn
Code block rich_text > rich_text_preformatted (with language) Converters::CodeBlock
Blockquote rich_text > rich_text_quote Converters::BlockQuoteRenderers::RichText
Unordered list rich_text > rich_text_list (style: bullet) Converters::ListRenderers::RichText
Ordered list rich_text > rich_text_list (style: ordered) Converters::ListRenderers::RichText
Task list rich_text > rich_text_list with checkbox emoji Converters::List
Table (default) rich_text > rich_text_preformatted (ASCII) Converters::Table::Preformatted
Table (native) table block Converters::Table::Native
Horizontal rule divider block Converters::ThematicBreak
Footnote context block Converters::FootnoteDefinition
HTML block rich_text > rich_text_preformatted Converters::HtmlBlock

Requirements

  • Ruby >= 3.2
  • commonmarker ~> 2.0

License

MIT