Class: Inkmark
- Inherits:
-
Object
- Object
- Inkmark
- Defined in:
- lib/inkmark.rb,
lib/inkmark.rb,
lib/inkmark/toc.rb,
lib/inkmark/event.rb,
lib/inkmark/options.rb,
lib/inkmark/version.rb
Overview
Inkmark is a very fast, feature-rich, AI-first CommonMark/GFM markdown renderer backed by the Rust pulldown-cmark parser.
Default behavior: GFM extensions (tables, strikethrough, tasklists, footnotes) are enabled; raw HTML is suppressed. Override via options.
### Presets
Four named bundles of options cover the common profiles:
-
:gfm(the default): CommonMark + core GFM only. -
:commonmark: strict CommonMark, no GFM. -
:recommended: opinionated bundle for modern web content (smart punctuation, auto heading IDs, lazy images, autolinks + nofollow, URL scheme allowlists, emoji shortcodes, syntax highlighting, frontmatter). -
:trusted::recommendedplus raw-HTML pass-through. **Use only for fully trusted content.**
See Options::PRESETS.
### Raw HTML safety
Raw HTML is suppressed by default; every <tag> in the source is escaped to text. Enable pass-through with raw_html: true or the :trusted preset **only for trusted input**. Inkmark does not sanitize raw HTML beyond the narrow GFM tagfilter; sanitize before rendering user-influenced content.
Defined Under Namespace
Classes: Error, Event, Options, Toc
Constant Summary collapse
- VERSION =
Current gem version.
"0.1.1"
Instance Attribute Summary collapse
-
#options ⇒ Object
Returns the value of attribute options.
-
#source ⇒ String
The markdown source string that will be rendered.
Class Method Summary collapse
-
.chunks_by_heading(source, options: nil, truncate: nil) ⇒ Array<Hash>
Chunk
sourceby heading into an Array of section Hashes. -
.chunks_by_size(source, chars: nil, words: nil, overlap: 0, at: :block, options: nil) ⇒ Array<Hash>
Split
sourceinto sliding-window chunks bounded by a character and/or word budget. -
.default_options ⇒ Inkmark::Options
The class-level default options used when no per-instance options are given.
-
.default_options=(value) ⇒ Inkmark::Options
Replace the class-level default options.
-
.highlight_css(theme: nil) ⇒ String
Return the CSS stylesheet for syntax-highlighted code blocks.
-
.highlight_themes ⇒ Array<String>
Return an array of available syntax-highlighting theme names.
-
.normalize_truncate_params(params) ⇒ Object
private
Normalize and validate truncation params coming from either the Inkmark.truncate_markdown kwargs or the Inkmark.chunks_by_heading
truncate:kwarg. -
.normalize_window_params(chars:, words:, overlap:, at:) ⇒ Object
private
Validate sliding-window chunking params.
-
.to_html(source, options: nil) ⇒ String
Render
sourcemarkdown to HTML in one call. -
.to_markdown(source, options: nil) ⇒ String
Render
sourcemarkdown through the filter pipeline and serialize back to Markdown text. -
.to_plain_text(source, options: nil) ⇒ String
Render
sourcethrough the filter pipeline and serialize to plain text. -
.truncate_markdown(source, chars: nil, words: nil, at: :block, marker: "…", options: nil) ⇒ String
Truncate a Markdown document to fit a char and/or word budget.
Instance Method Summary collapse
-
#chunks_by_heading(truncate: nil) ⇒ Array<Hash>
Chunk the document by heading into an Array of section Hashes, with filter-applied Markdown content.
-
#chunks_by_size(chars: nil, words: nil, overlap: 0, at: :block) ⇒ Array<Hash>
Split the stored document into sliding-window chunks.
-
#extracts ⇒ Hash?
Return structured extracts for the element kinds requested via extract: { … }, or
nilwhen no kinds were requested. -
#frontmatter ⇒ Hash?
Return the parsed frontmatter as a Hash, or
nilwhen the document has no frontmatter block or thefrontmatteroption is not enabled. -
#initialize(source = nil, options: nil) ⇒ Inkmark
constructor
Create a new renderer for
source. -
#on(kind) {|event| ... } ⇒ self
Register a handler block for a document element kind.
-
#statistics ⇒ Hash?
Return the collected document statistics as a Hash, or
nilwhen neitherstatisticsnortocis enabled. -
#to_html ⇒ String
Render the stored source to HTML using the stored options.
-
#to_markdown ⇒ String
Apply the filter pipeline and serialize back to Markdown text.
-
#to_plain_text ⇒ String
Serialize the parsed document to plain text.
-
#to_s ⇒ String
Coerce the renderer to a String by returning the stored source.
-
#toc ⇒ Inkmark::Toc?
Return the table of contents as a Toc value object, exposing
#to_markdown/#to_html/#to_s(markdown). -
#truncate_markdown(chars: nil, words: nil, at: :block, marker: "…") ⇒ String
Truncate the stored document.
-
#walk ⇒ self
Walk the document, firing all registered handlers, without producing HTML output.
Constructor Details
#initialize(source = nil, options: nil) ⇒ Inkmark
Create a new renderer for source.
417 418 419 420 421 |
# File 'lib/inkmark.rb', line 417 def initialize(source = nil, options: nil) self.source = source self. = @handlers = nil end |
Instance Attribute Details
#options ⇒ Object
Returns the value of attribute options.
431 |
# File 'lib/inkmark.rb', line 431 attr_reader :source, :options |
#source ⇒ String
The markdown source string that will be rendered. Always a String (never nil); a nil assignment is stored as an empty string.
431 432 433 |
# File 'lib/inkmark.rb', line 431 def source @source end |
Class Method Details
.chunks_by_heading(source, options: nil, truncate: nil) ⇒ Array<Hash>
Chunk source by heading into an Array of section Hashes. Each section’s :content is filter-applied Markdown (emoji expanded, autolinks resolved, allowlists applied). Designed for feeding RAG / embedding pipelines that want pre-HTML chunks with clean content.
Sections are hierarchical: a ## section’s :content includes any nested ### subsections, which also appear as their own entries. Content before the first heading (if any) is emitted as a preamble entry with heading: nil and level: 0.
Filter the returned array with plain Enumerable—by heading, level, id, or any other field. See the “Section extraction” in the README for recipes.
**HTML-emitting filters** (syntax_highlight, images: { lazy: true }, links: { nofollow: true }) embed raw HTML into :content when enabled. For RAG pipelines you almost always want these off so chunks stay pure Markdown.
151 152 153 154 155 156 157 158 |
# File 'lib/inkmark.rb', line 151 def chunks_by_heading(source, options: nil, truncate: nil) source = source.to_s return [] if source.empty? opts_hash = () opts_hash[:truncate] = normalize_truncate_params(truncate) if truncate _native_chunks_by_heading(source, opts_hash) end |
.chunks_by_size(source, chars: nil, words: nil, overlap: 0, at: :block, options: nil) ⇒ Array<Hash>
Split source into sliding-window chunks bounded by a character and/or word budget. Adjacent chunks can share trailing context via overlap, which preserves continuity for embedding models. Unlike chunks_by_heading, this ignores document structure and walks the filter-applied Markdown sequentially — useful for heading-free or heading-uneven documents.
183 184 185 186 187 188 189 190 191 192 |
# File 'lib/inkmark.rb', line 183 def chunks_by_size(source, chars: nil, words: nil, overlap: 0, at: :block, options: nil) source = source.to_s return [] if source.empty? opts_hash = () opts_hash[:__window] = normalize_window_params( chars: chars, words: words, overlap: overlap, at: at ) _native_chunks_by_size(source, opts_hash) end |
.default_options ⇒ Inkmark::Options
The class-level default options used when no per-instance options are given.
358 359 360 |
# File 'lib/inkmark.rb', line 358 def @default_options ||= Inkmark::Options.new end |
.default_options=(value) ⇒ Inkmark::Options
Replace the class-level default options.
368 369 370 371 372 373 374 375 |
# File 'lib/inkmark.rb', line 368 def (value) @default_options = case value when Inkmark::Options then value.dup when Hash then Inkmark::Options.new(value) else raise TypeError, "default_options must be a Hash or Inkmark::Options, got #{value.class}" end end |
.highlight_css(theme: nil) ⇒ String
Return the CSS stylesheet for syntax-highlighted code blocks. Pair this with syntax_highlight: true in the rendering options.
343 344 345 |
# File 'lib/inkmark.rb', line 343 def highlight_css(theme: nil) _syntax_css(theme) end |
.highlight_themes ⇒ Array<String>
Return an array of available syntax-highlighting theme names. Memoized—the theme list is fixed at compile time.
351 352 353 |
# File 'lib/inkmark.rb', line 351 def highlight_themes @highlight_themes ||= _syntax_themes.freeze end |
.normalize_truncate_params(params) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Normalize and validate truncation params coming from either the truncate_markdown kwargs or the chunks_by_heading truncate: kwarg. Accepts a Hash with :chars/:words/:at/ :marker keys, or positional kwargs (collected by the caller into a Hash). Returns a Hash ready to hand to the native side.
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 |
# File 'lib/inkmark.rb', line 254 def normalize_truncate_params(params) if params.respond_to?(:to_hash) params = params.to_hash end unless params.is_a?(Hash) raise TypeError, "truncate must be a Hash, got #{params.class}" end unknown = params.keys - [:chars, :words, :at, :marker] unless unknown.empty? raise ArgumentError, "unknown truncate key(s): #{unknown.inspect}; " \ "expected :chars, :words, :at, :marker" end chars = params[:chars] words = params[:words] at = params.fetch(:at, :block) marker = params.fetch(:marker, "…") if chars.nil? && words.nil? raise ArgumentError, "truncate requires at least one of :chars or :words" end if chars && !chars.is_a?(Integer) raise ArgumentError, ":chars must be an Integer, got #{chars.class}" end if words && !words.is_a?(Integer) raise ArgumentError, ":words must be an Integer, got #{words.class}" end unless %i[block word].include?(at) raise ArgumentError, ":at must be :block or :word, got #{at.inspect}" end unless marker.nil? || marker.is_a?(String) raise ArgumentError, ":marker must be a String or nil, got #{marker.class}" end if marker && chars && marker.length >= chars raise ArgumentError, ":marker (#{marker.length} chars) must be shorter than :chars budget (#{chars})" end {chars: chars, words: words, at: at.to_s, marker: marker} end |
.normalize_window_params(chars:, words:, overlap:, at:) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Validate sliding-window chunking params. Keeps chunks_by_size tight by raising on obvious misconfiguration rather than silent clamping — invalid overlap or missing budget is almost always a swapped-arg bug.
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
# File 'lib/inkmark.rb', line 301 def normalize_window_params(chars:, words:, overlap:, at:) if chars.nil? && words.nil? raise ArgumentError, "chunks_by_size requires at least one of :chars or :words" end if chars && !chars.is_a?(Integer) raise ArgumentError, ":chars must be an Integer, got #{chars.class}" end if words && !words.is_a?(Integer) raise ArgumentError, ":words must be an Integer, got #{words.class}" end if chars && chars <= 0 raise ArgumentError, ":chars must be positive, got #{chars}" end if words && words <= 0 raise ArgumentError, ":words must be positive, got #{words}" end unless overlap.is_a?(Integer) raise ArgumentError, ":overlap must be an Integer, got #{overlap.class}" end if overlap < 0 raise ArgumentError, ":overlap must be non-negative, got #{overlap}" end if chars && overlap >= chars raise ArgumentError, ":overlap (#{overlap}) must be less than :chars budget (#{chars})" end unless %i[block word].include?(at) raise ArgumentError, ":at must be :block or :word, got #{at.inspect}" end {chars: chars, words: words, overlap: overlap, at: at.to_s} end |
.to_html(source, options: nil) ⇒ String
Render source markdown to HTML in one call.
This is a class-method fast path that skips Inkmark instance and Options copy allocation for the common one-shot render pattern. When the caller passes options: nil (the default), we reuse the cached frozen hash that Inkmark::Options#to_native_hash_frozen returns; the cache lives on the Options instance itself and is invalidated by the Options mutation methods, so Inkmark.default_options.tables = false followed by Inkmark.to_html(src) picks up the new value without stale-cache bugs.
**Raw HTML safety.** raw_html: false (the default) escapes every raw HTML tag in the source—safe for untrusted input. Enable raw_html: true (or preset: :trusted) only for content you fully trust, and run the output through a dedicated HTML sanitizer before displaying it.
90 91 92 93 94 |
# File 'lib/inkmark.rb', line 90 def to_html(source, options: nil) source = source.to_s return "" if source.empty? _native_to_html(source, ()) end |
.to_markdown(source, options: nil) ⇒ String
Render source markdown through the filter pipeline and serialize back to Markdown text.
The same event-level filters as to_html are applied (emoji expansion, allowlists, autolink, etc.), then the event stream is serialized back to Markdown using pulldown-cmark-to-cmark. Use this as a preprocessing step in pipelines that consume Markdown: LLM prompts, secondary renderers, content storage.
HTML-emitting filters (syntax_highlight, images: { lazy: true }, links: { nofollow: true }) embed raw HTML verbatim in the Markdown output when enabled. That is valid CommonMark but may break downstream consumers. See the “Markdown-to-Markdown pipeline” section in the README.
114 115 116 117 118 |
# File 'lib/inkmark.rb', line 114 def to_markdown(source, options: nil) source = source.to_s return "" if source.empty? _native_to_markdown(source, ()) end |
.to_plain_text(source, options: nil) ⇒ String
Render source through the filter pipeline and serialize to plain text. Markdown syntax (emphasis, headings, list bullets, fences) is stripped; inline content is preserved. Links become “text (url)”; images become “alt (src)”; tables are tab-separated; code blocks keep their raw body.
Designed as a preprocessor for embedding models, token counting, LLM input, and any downstream consumer that treats Markdown syntax as noise.
241 242 243 244 245 |
# File 'lib/inkmark.rb', line 241 def to_plain_text(source, options: nil) source = source.to_s return "" if source.empty? _native_to_plain_text(source, ()) end |
.truncate_markdown(source, chars: nil, words: nil, at: :block, marker: "…", options: nil) ⇒ String
Truncate a Markdown document to fit a char and/or word budget. Returns filter-applied Markdown cut at either the last block boundary that fits (+at: :block+) or the last Unicode word boundary that fits (+at: :word+).
Designed as a preprocessing step for LLM context-window budgeting and RAG chunk normalization. The marker (default “…”) is appended only when truncation actually occurred and counts toward the budget, so chars: 4000 always yields output ≤ 4000 codepoints.
218 219 220 221 222 223 224 225 226 |
# File 'lib/inkmark.rb', line 218 def truncate_markdown(source, chars: nil, words: nil, at: :block, marker: "…", options: nil) source = source.to_s return "" if source.empty? params = normalize_truncate_params( chars: chars, words: words, at: at, marker: marker ) _native_truncate_markdown(source, params, ()) end |
Instance Method Details
#chunks_by_heading(truncate: nil) ⇒ Array<Hash>
Chunk the document by heading into an Array of section Hashes, with filter-applied Markdown content. See chunks_by_heading for the output shape.
575 576 577 578 579 580 |
# File 'lib/inkmark.rb', line 575 def chunks_by_heading(truncate: nil) return [] if @source.empty? opts_hash = @options.to_native_hash_frozen.dup opts_hash[:truncate] = Inkmark.normalize_truncate_params(truncate) if truncate Inkmark._native_chunks_by_heading(@source, opts_hash) end |
#chunks_by_size(chars: nil, words: nil, overlap: 0, at: :block) ⇒ Array<Hash>
Split the stored document into sliding-window chunks. See chunks_by_size for the full parameter contract.
587 588 589 590 591 592 593 594 |
# File 'lib/inkmark.rb', line 587 def chunks_by_size(chars: nil, words: nil, overlap: 0, at: :block) return [] if @source.empty? opts_hash = @options.to_native_hash_frozen.dup opts_hash[:__window] = Inkmark.normalize_window_params( chars: chars, words: words, overlap: overlap, at: at ) Inkmark._native_chunks_by_size(@source, opts_hash) end |
#extracts ⇒ Hash?
Return structured extracts for the element kinds requested via extract: { … }, or nil when no kinds were requested.
The returned Hash is keyed by the same symbols you passed in (:images, :links, :code_blocks, :headings, :footnote_definitions); each value is an Array of record Hashes including a :byte_range Range for slicing the original source.
toc: true auto-enables extract[:headings]—the heading walk is shared, so you get the structured view for free.
Collected during #to_html as a side-effect of the single-pass render. Calling this before to_html triggers the render.
666 667 668 669 670 |
# File 'lib/inkmark.rb', line 666 def extracts return nil unless extract_requested? to_html unless @extracts_data @extracts_data end |
#frontmatter ⇒ Hash?
Return the parsed frontmatter as a Hash, or nil when the document has no frontmatter block or the frontmatter option is not enabled.
The raw YAML text is extracted by Rust during the event walk; parsing uses Ruby’s stdlib YAML.safe_load so all standard YAML types (strings, numbers, arrays, nested hashes) are supported.
684 685 686 687 688 689 |
# File 'lib/inkmark.rb', line 684 def frontmatter return @frontmatter if defined?(@frontmatter) return @frontmatter = nil unless @options[:frontmatter] to_html unless @frontmatter_raw @frontmatter = @frontmatter_raw ? YAML.safe_load(@frontmatter_raw) : nil end |
#on(kind) {|event| ... } ⇒ self
Register a handler block for a document element kind.
The block receives a Event object when an element of kind is encountered. Handlers fire post-order—children before parents—so container elements (tables, blockquotes, lists) see their children populated when the handler runs.
Multiple handlers for the same kind are supported and fire in registration order. Returns self for chaining.
Trigger handlers by calling #to_html (render + transform) or #walk (analysis only, no HTML output).
490 491 492 493 494 |
# File 'lib/inkmark.rb', line 490 def on(kind, &block) (@handlers ||= {})[kind.to_sym] ||= [] @handlers[kind.to_sym] << block self end |
#statistics ⇒ Hash?
Return the collected document statistics as a Hash, or nil when neither statistics nor toc is enabled.
When statistics: true, the full hash includes language detection, character/word counts, code block count, and image/link arrays. When only toc: true, a lightweight hash with heading_count is returned.
Collected during #to_html. Calling this before to_html triggers the render.
641 642 643 644 645 |
# File 'lib/inkmark.rb', line 641 def statistics return nil unless @options[:statistics] || @options[:toc] to_html unless @statistics_data @statistics_data end |
#to_html ⇒ String
Render the stored source to HTML using the stored options.
When statistics: true or toc: true is set, the render uses a single-pass entry point that also collects stats and TOC data as side-effects (set as instance variables by the Rust side). Call #statistics or #toc after to_html to read the collected data.
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 |
# File 'lib/inkmark.rb', line 521 def to_html return "" if @source.empty? if @handlers Inkmark._native_render_with_handlers(@source, @options.to_native_hash_frozen, @handlers) elsif @options[:statistics] || @options[:toc] || @options[:frontmatter] || extract_requested? result = Inkmark._native_render_full(@source, @options.to_native_hash_frozen) @toc_value = if result[:toc] || result[:toc_html] Inkmark::Toc.new(markdown: result[:toc] || "", html: result[:toc_html] || "") end @statistics_data = result[:statistics] @extracts_data = result[:extracts] @frontmatter_raw = result[:frontmatter] result[:html] else Inkmark._native_to_html(@source, @options.to_native_hash_frozen) end end |
#to_markdown ⇒ String
Apply the filter pipeline and serialize back to Markdown text.
Runs the same event-level filters as #to_html (controlled by the same options object), then serializes the event stream to Markdown. Useful as a preprocessing step in LLM or multi-renderer pipelines.
HTML-emitting filters (syntax_highlight, images: { lazy: true }, links: { nofollow: true }) embed raw HTML in the output when enabled—see the “Markdown-to-Markdown pipeline” section in the README for guidance on which filters to enable.
551 552 553 554 |
# File 'lib/inkmark.rb', line 551 def to_markdown return "" if @source.empty? Inkmark._native_to_markdown(@source, @options.to_native_hash_frozen) end |
#to_plain_text ⇒ String
Serialize the parsed document to plain text. Runs the same event- level filters as #to_html (controlled by the same options object). See to_plain_text for output format details.
561 562 563 564 |
# File 'lib/inkmark.rb', line 561 def to_plain_text return "" if @source.empty? Inkmark._native_to_plain_text(@source, @options.to_native_hash_frozen) end |
#to_s ⇒ String
Coerce the renderer to a String by returning the stored source. Mirrors the wrapper idiom used by Pathname, URI, etc.: the stringified form of the wrapper is its carried value. Explicit renderings (HTML, Markdown, plain text) are available via #to_html, #to_markdown, #to_plain_text, and #chunks_by_heading.
441 442 443 |
# File 'lib/inkmark.rb', line 441 def to_s @source end |
#toc ⇒ Inkmark::Toc?
Return the table of contents as a Toc value object, exposing #to_markdown / #to_html / #to_s (markdown). Returns nil when no TOC was requested (neither toc, statistics, nor extract: { headings: true } is set).
Collected during #to_html as a side-effect of the single-pass render. If to_html hasn’t been called yet, calling this triggers it.
623 624 625 626 627 |
# File 'lib/inkmark.rb', line 623 def toc return nil unless toc_surface_requested? to_html unless defined?(@toc_value) && @toc_value @toc_value end |
#truncate_markdown(chars: nil, words: nil, at: :block, marker: "…") ⇒ String
Truncate the stored document. See truncate_markdown for the full parameter contract.
601 602 603 604 605 606 607 |
# File 'lib/inkmark.rb', line 601 def truncate_markdown(chars: nil, words: nil, at: :block, marker: "…") return "" if @source.empty? params = Inkmark.normalize_truncate_params( chars: chars, words: words, at: at, marker: marker ) Inkmark._native_truncate_markdown(@source, params, @options.to_native_hash_frozen) end |
#walk ⇒ self
Walk the document, firing all registered handlers, without producing HTML output. Use this for analysis—collecting headings, extracting links, building a TOC—when you don’t need to render.
Returns self.
507 508 509 510 511 |
# File 'lib/inkmark.rb', line 507 def walk return self if @source.empty? Inkmark._native_walk(@source, @options.to_native_hash_frozen, @handlers || {}) self end |