Purpose
prosereflect is a Ruby gem for working with the document structure used by the ProseMirror rich text editor.
It provides a set of models and utilities for parsing, manipulating, and accessing the hierarchical document tree structure represented in ProseMirror’s JSON/YAML format. This allows for convenient traversal and extraction of content from rich text documents.
Full documentation is available on the docs site.
Installation
Add this line to your application’s Gemfile:
gem 'prosereflect'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install prosereflect
Usage
Parsing ProseMirror documents
From YAML
require 'prosereflect'
# Parse from YAML string or file
yaml_content = File.read('document.yaml')
document = Prosereflect::Parser.parse_document(yaml_content)
# Access the document structure
document.content.each do |node|
# Work with nodes
end
From JSON
require 'prosereflect'
# Parse from JSON string or file
json_content = File.read('document.json')
document = Prosereflect::Parser.parse_document(json_content)
Navigating the document
# Get all tables in the document
tables = document.tables
# Get all paragraphs
paragraphs = document.paragraphs
# Access the first table
first_table = document.find_first('table')
# Access header row and data rows in a table
header = first_table.header_row
data_rows = first_table.data_rows
# Access cells in a table
cell = first_table.cell_at(0, 0) # First data row, first column
Accessing content
# Get text content from a paragraph
paragraph = document.paragraphs.first
text = paragraph.text_content
# Get text content from a table cell
cell = document.tables.first.cell_at(0, 0)
cell_text = cell.text_content
# Get cell content as separate lines
lines = cell.lines
Finding nodes
# Find the first node of a specific type
table = document.find_first('table')
paragraph = document.find_first('paragraph')
# Find all nodes of a specific type
tables = document.find_all('table')
text_nodes = document.find_all('text')
# Find child nodes of a specific type
table_cells = table.find_children(TableCell)
HTML Conversion
The gem provides functionality to convert between HTML and ProseMirror document models.
From HTML
require 'prosereflect'
# Parse from HTML string
html_content = '<p>This is a <strong>bold</strong> text in a paragraph.</p>'
document = Prosereflect::Input::Html.parse(html_content)
# Access the document structure
paragraph = document.paragraphs.first
text_content = paragraph.text_content # "This is a bold text in a paragraph."
User Mentions
The gem supports user mentions in documents, which can be useful for social features or collaborative editing.
# Create a document with user mentions
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Hello ')
# Add a user mention
user = Prosereflect::User.new
user.id = '123'
paragraph.add_child(user)
paragraph.add_text('!')
# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<p>Hello <user-mention data-id=\"123\"></user-mention>!</p>"
# Parse HTML with user mentions
html_content = '<p>Hello <user-mention data-id="123"></user-mention>!</p>'
document = Prosereflect::Input::Html.parse(html_content)
# Access user mentions
user_mentions = document.find_all('user')
first_user = user_mentions.first
user_id = first_user.id # => "123"
User mentions are represented as <user-mention> elements in HTML with a data-id attribute containing the user’s identifier. When parsing HTML, these elements are converted to User nodes in the document model.
Common use cases: - Mentioning users in comments or messages - Tagging users in collaborative documents - Tracking user references in content
To HTML
require 'prosereflect'
# Create a document
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Plain text')
paragraph.add_text(' with bold', [Prosereflect::Mark::Bold.new])
# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<html><body><p>Plain text<strong> with bold</strong></p></body></html>"
Round-trip Conversion
# Start with HTML
original_html = '<p>This is <em>styled</em> text.</p>'
# Convert to document model
document = Prosereflect::Input::Html.parse(original_html)
# Modify the document if needed
document.paragraphs.first.add_text(' with additions')
# Convert back to HTML
modified_html = Prosereflect::Output::Html.convert(document)
Data model
The prosereflect gem represents the document structure as a hierarchy of node objects.
+-------------------+
| Document |
| |
| +content |
+--------+----------+
|
| 1..*
+--------v----------+
| Node |
| |
| -type |
| -attrs |
| -marks |
| +content |
+-------------------+
|
+----+----+---------------------+-------------+
| | | |
+---v---+ +---v----------+ +-------v--------+ +-v-----+
|Table | | Paragraph | | Text | | User |
| | | | | | | |
+---+---+ +--------------+ +----------------+ +-------+
|
|
+---v-----------+
| TableRow |
| |
+---+-----------+
|
+---v-----------+
| TableCell |
| |
+---------------+
Classes
Node
Base class for all node types.
type-
The node type (e.g., "doc", "paragraph", "text", "table")
content-
A collection of child nodes
attrs-
Attributes specific to the node type
marks-
Formatting marks applied to the node
Document
Top-level container representing a ProseMirror document.
content-
A collection of top-level nodes in the document
Paragraph
Represents a paragraph of text.
text_content-
Returns the combined text content of all child text nodes
Text
Represents a text node.
text-
The text content of the node
User
Represents a user mention in the document.
id-
The unique identifier of the referenced user
type-
Always set to "user"
content-
Always empty (user mentions cannot have child nodes)
Table
Represents a table structure.
rows-
Collection of table rows
header_row-
First row if it contains header cells
data_rows-
All non-header rows
Heading
Represents a heading element (h1-h6).
level-
The heading level (1-6)
text_content-
Returns the combined text content of all child text nodes
content-
Collection of child nodes (text, styled text, etc.)
Image
Represents an image element.
src-
The image source URL
alt-
Alternative text description
title-
Image tooltip text
width-
Image width in pixels
height-
Image height in pixels
HorizontalRule
Represents a horizontal rule (hr) element.
style-
Border style (solid, dashed, dotted)
width-
Rule width (px or %)
thickness-
Border thickness in pixels
BulletList
Represents an unordered list.
bullet_style-
List style type (disc, circle, square)
items-
Collection of list items
OrderedList
Represents an ordered list.
start-
Starting number for the list
items-
Collection of list items
ListItem
Represents a list item within ordered or unordered lists.
content-
Collection of child nodes (can contain paragraphs, nested lists, etc.)
text_content-
Returns the combined text content
Blockquote
Represents a blockquote element.
citation-
Optional citation URL
blocks-
Collection of content blocks within the quote
CodeBlockWrapper
Container for code blocks with additional attributes.
line_numbers-
Whether to display line numbers
highlight_lines-
Array of line numbers to highlight
code_blocks-
Collection of code blocks
CodeBlock
Represents a code block with syntax highlighting.
content-
The code content
language-
Programming language for syntax highlighting
Mark
Base class for text formatting marks.
Available Mark Types
Bold-
Bold text formatting
Italic-
Italic text formatting
Code-
Inline code formatting
Link-
Hyperlink with href attribute
Strike-
Strikethrough text
Subscript-
Subscript text
Superscript-
Superscript text
Underline-
Underlined text
TableRow
Represents a row in a table.
cells-
All cells in the row
TableCell
Represents a cell in a table.
paragraphs-
All paragraphs in the cell
text_content-
All text content combined
lines-
Text content split into separate lines
Development
Adding test fixtures
The repository includes a utility script bin/extract-ituob-amendments.rb to
extract ProseMirror content from the ITU Operational Bulletin for test fixtures.
Syntax:
$ bin/extract-ituob-amendments.rb {filename} {issue_number}
Where,
{filename}-
The amendments YAML file to extract from. The script expects the
{filename}file in the format used by the ITU Operational Bulletin data repository: https://github.com/ituob/itu-ob-data/ {issue_number}-
The issue number to use in the generated file names.
This command:
-
Extract ProseMirror content from the specified amendments file
-
Generate both YAML and JSON files in the current directory
-
Name files according to the pattern
ituob-<issue_number>-<publication>.<format>
These generated files can be moved to spec/fixtures/ituob-<issue_number>/ to use in tests.
$ bin/extract-ituob-amendments.rb amendments.yaml 1000
Copyright
This gem is developed, maintained and funded by Ribose Inc.
License
The gem is available as open source under the terms of the 2-Clause BSD License.