File: README — Documentation by YARD 0.9.38

Purpose

prosereflect is a Ruby gem for working with the document structure used by the ProseMirror rich text editor.

It provides a set of models and utilities for parsing, manipulating, and accessing the hierarchical document tree structure represented in ProseMirror’s JSON/YAML format. This allows for convenient traversal and extraction of content from rich text documents.

Full documentation is available on the docs site.

Installation

Add this line to your application’s Gemfile:

gem 'prosereflect'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install prosereflect

Usage

Parsing ProseMirror documents

From YAML

require 'prosereflect'

# Parse from YAML string or file
yaml_content = File.read('document.yaml')
document = Prosereflect::Parser.parse_document(yaml_content)

# Access the document structure
document.content.each do |node|
  # Work with nodes
end

From JSON

require 'prosereflect'

# Parse from JSON string or file
json_content = File.read('document.json')
document = Prosereflect::Parser.parse_document(json_content)

Navigating the document

# Get all tables in the document
tables = document.tables

# Get all paragraphs
paragraphs = document.paragraphs

# Access the first table
first_table = document.find_first('table')

# Access header row and data rows in a table
header = first_table.header_row
data_rows = first_table.data_rows

# Access cells in a table
cell = first_table.cell_at(0, 0)  # First data row, first column

Accessing content

# Get text content from a paragraph
paragraph = document.paragraphs.first
text = paragraph.text_content

# Get text content from a table cell
cell = document.tables.first.cell_at(0, 0)
cell_text = cell.text_content

# Get cell content as separate lines
lines = cell.lines

Finding nodes

# Find the first node of a specific type
table = document.find_first('table')
paragraph = document.find_first('paragraph')

# Find all nodes of a specific type
tables = document.find_all('table')
text_nodes = document.find_all('text')

# Find child nodes of a specific type
table_cells = table.find_children(TableCell)

HTML Conversion

The gem provides functionality to convert between HTML and ProseMirror document models.

From HTML

require 'prosereflect'

# Parse from HTML string
html_content = '<p>This is a <strong>bold</strong> text in a paragraph.</p>'
document = Prosereflect::Input::Html.parse(html_content)

# Access the document structure
paragraph = document.paragraphs.first
text_content = paragraph.text_content # "This is a bold text in a paragraph."

User Mentions

The gem supports user mentions in documents, which can be useful for social features or collaborative editing.

# Create a document with user mentions
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Hello ')

# Add a user mention
user = Prosereflect::User.new
user.id = '123'
paragraph.add_child(user)

paragraph.add_text('!')

# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<p>Hello <user-mention data-id=\"123\"></user-mention>!</p>"

# Parse HTML with user mentions
html_content = '<p>Hello <user-mention data-id="123"></user-mention>!</p>'
document = Prosereflect::Input::Html.parse(html_content)

# Access user mentions
user_mentions = document.find_all('user')
first_user = user_mentions.first
user_id = first_user.id # => "123"

User mentions are represented as <user-mention> elements in HTML with a data-id attribute containing the user’s identifier. When parsing HTML, these elements are converted to User nodes in the document model.

Common use cases: - Mentioning users in comments or messages - Tagging users in collaborative documents - Tracking user references in content

To HTML

require 'prosereflect'

# Create a document
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Plain text')
paragraph.add_text(' with bold', [Prosereflect::Mark::Bold.new])

# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<html><body><p>Plain text<strong> with bold</strong></p></body></html>"

Round-trip Conversion

# Start with HTML
original_html = '<p>This is <em>styled</em> text.</p>'

# Convert to document model
document = Prosereflect::Input::Html.parse(original_html)

# Modify the document if needed
document.paragraphs.first.add_text(' with additions')

# Convert back to HTML
modified_html = Prosereflect::Output::Html.convert(document)

Data model

The prosereflect gem represents the document structure as a hierarchy of node objects.

+-------------------+
|      Document     |
|                   |
| +content          |
+--------+----------+
         |
         | 1..*
+--------v----------+
|        Node       |
|                   |
| -type             |
| -attrs            |
| -marks            |
| +content          |
+-------------------+
         |
    +----+----+---------------------+-------------+
    |         |                     |             |
+---v---+ +---v----------+  +-------v--------+  +-v-----+
|Table  | |  Paragraph   |  |     Text       |  | User  |
|       | |              |  |                |  |       |
+---+---+ +--------------+  +----------------+  +-------+
    |
    |
+---v-----------+
|   TableRow    |
|               |
+---+-----------+
    |
+---v-----------+
|   TableCell   |
|               |
+---------------+

Classes

Node

Base class for all node types.

type: The node type (e.g., "doc", "paragraph", "text", "table")
content: A collection of child nodes
attrs: Attributes specific to the node type
marks: Formatting marks applied to the node

Document

Top-level container representing a ProseMirror document.

content: A collection of top-level nodes in the document

Paragraph

Represents a paragraph of text.

text_content: Returns the combined text content of all child text nodes

Text

Represents a text node.

text: The text content of the node

User

Represents a user mention in the document.

id: The unique identifier of the referenced user
type: Always set to "user"
content: Always empty (user mentions cannot have child nodes)

Table

Represents a table structure.

rows: Collection of table rows
header_row: First row if it contains header cells
data_rows: All non-header rows

Heading

Represents a heading element (h1-h6).

level: The heading level (1-6)
text_content: Returns the combined text content of all child text nodes
content: Collection of child nodes (text, styled text, etc.)

Image

Represents an image element.

src: The image source URL
alt: Alternative text description
title: Image tooltip text
width: Image width in pixels
height: Image height in pixels

HorizontalRule

Represents a horizontal rule (hr) element.

style: Border style (solid, dashed, dotted)
width: Rule width (px or %)
thickness: Border thickness in pixels

BulletList

Represents an unordered list.

bullet_style: List style type (disc, circle, square)
items: Collection of list items

OrderedList

Represents an ordered list.

start: Starting number for the list
items: Collection of list items

ListItem

Represents a list item within ordered or unordered lists.

content: Collection of child nodes (can contain paragraphs, nested lists, etc.)
text_content: Returns the combined text content

Blockquote

Represents a blockquote element.

citation: Optional citation URL
blocks: Collection of content blocks within the quote

CodeBlockWrapper

Container for code blocks with additional attributes.

line_numbers: Whether to display line numbers
highlight_lines: Array of line numbers to highlight
code_blocks: Collection of code blocks

CodeBlock

Represents a code block with syntax highlighting.

content: The code content
language: Programming language for syntax highlighting

Mark

Base class for text formatting marks.

Available Mark Types

Bold: Bold text formatting
Italic: Italic text formatting
Code: Inline code formatting
Link: Hyperlink with href attribute
Strike: Strikethrough text
Subscript: Subscript text
Superscript: Superscript text
Underline: Underlined text

TableRow

Represents a row in a table.

cells: All cells in the row

TableCell

Represents a cell in a table.

paragraphs: All paragraphs in the cell
text_content: All text content combined
lines: Text content split into separate lines

Development

Adding test fixtures

The repository includes a utility script bin/extract-ituob-amendments.rb to extract ProseMirror content from the ITU Operational Bulletin for test fixtures.

Syntax:

$ bin/extract-ituob-amendments.rb {filename} {issue_number}

Where,

{filename}: The amendments YAML file to extract from. The script expects the {filename} file in the format used by the ITU Operational Bulletin data repository: https://github.com/ituob/itu-ob-data/
{issue_number}: The issue number to use in the generated file names.

This command:

Extract ProseMirror content from the specified amendments file
Generate both YAML and JSON files in the current directory
Name files according to the pattern ituob-<issue_number>-<publication>.<format>

These generated files can be moved to spec/fixtures/ituob-<issue_number>/ to use in tests.

$ bin/extract-ituob-amendments.rb amendments.yaml 1000

Copyright

This gem is developed, maintained and funded by Ribose Inc.

License

The gem is available as open source under the terms of the 2-Clause BSD License.