Module: Coradoc

Extended by:
Configurable
Defined in:
lib/coradoc/coradoc.rb,
lib/coradoc.rb,
lib/coradoc/cli.rb,
lib/coradoc/hooks.rb,
lib/coradoc/input.rb,
lib/coradoc/query.rb,
lib/coradoc/errors.rb,
lib/coradoc/logger.rb,
lib/coradoc/output.rb,
lib/coradoc/version.rb,
lib/coradoc/visitor.rb,
lib/coradoc/registry.rb,
lib/coradoc/transform.rb,
lib/coradoc/core_model.rb,
lib/coradoc/validation.rb,
lib/coradoc/configurable.rb,
lib/coradoc/format_module.rb,
lib/coradoc/core_model/toc.rb,
lib/coradoc/core_model/base.rb,
lib/coradoc/core_model/term.rb,
lib/coradoc/core_model/block.rb,
lib/coradoc/core_model/image.rb,
lib/coradoc/core_model/table.rb,
lib/coradoc/document_builder.rb,
lib/coradoc/core_model/builder.rb,
lib/coradoc/processor_registry.rb,
lib/coradoc/core_model/footnote.rb,
lib/coradoc/core_model/metadata.rb,
lib/coradoc/serializer/registry.rb,
lib/coradoc/core_model/list_item.rb,
lib/coradoc/document_manipulator.rb,
lib/coradoc/core_model/list_block.rb,
lib/coradoc/core_model/bibliography.rb,
lib/coradoc/core_model/toc_generator.rb,
lib/coradoc/core_model/inline_element.rb,
lib/coradoc/core_model/definition_item.rb,
lib/coradoc/core_model/definition_list.rb,
lib/coradoc/core_model/annotation_block.rb,
lib/coradoc/core_model/children_content.rb,
lib/coradoc/core_model/builder/detection.rb,
lib/coradoc/core_model/element_attribute.rb,
lib/coradoc/core_model/bibliography_entry.rb,
lib/coradoc/core_model/structural_element.rb,
lib/coradoc/core_model/builder/list_builder.rb,
lib/coradoc/core_model/builder/text_builder.rb,
lib/coradoc/core_model/builder/block_builder.rb,
lib/coradoc/core_model/builder/element_builder.rb

Overview

Coradoc - A hub-and-spoke document transformation library

Coradoc provides a unified document model (CoreModel) and transformation infrastructure for converting between document formats such as AsciiDoc, HTML, and Markdown.

## Architecture

Coradoc uses a hub-and-spoke architecture where CoreModel acts as the canonical document representation. Each format (AsciiDoc, HTML, Markdown) has its own model and transformers to/from CoreModel.

“‘ Source Format → Source Model → CoreModel → Target Model → Target Format “`

## Quick Start

Examples:

Parsing documents

require 'coradoc'

# Parse AsciiDoc to CoreModel
doc = Coradoc.parse("= Title\n\nContent", format: :asciidoc)

Converting between formats

# Convert AsciiDoc to HTML
html = Coradoc.convert(adoc_text, from: :asciidoc, to: :html)

# Convert Markdown to AsciiDoc
adoc = Coradoc.convert(md_text, from: :markdown, to: :asciidoc)

Using the hooks system

Coradoc::Hooks.register(:before_parse) do |content, format:|
  puts "Parsing #{format} document..."
  content
end

See Also:

Defined Under Namespace

Modules: Configurable, CoreModel, FormatModule, Hooks, Input, Output, ProcessorRegistry, Query, Serializer, Transform, Validation, Visitor Classes: CLI, DocumentBuilder, DocumentManipulator, Error, FileNotFoundError, Logger, ParseError, Registry, TransformationError, UnsupportedFormatError, ValidationError

Constant Summary collapse

ERROR_SUGGESTIONS =

Suggestion patterns for common parsing errors

These patterns are matched against error messages and source content to provide helpful suggestions for fixing common issues.

[
  {
    pattern: /unterminated.*string|unexpected.*end.*of.*input|expected.*["']/i,
    suggestion: 'Check for unclosed quotes or strings',
    examples: ["'text'", '"text"']
  },
  {
    pattern: /unexpected.*indentation|indentation.*error|inconsistent.*indent/i,
    suggestion: 'Check indentation - use consistent spaces or tabs',
    examples: ['  indented line', '    nested item']
  },
  {
    pattern: /missing.*separator|expected.*delimiter|missing.*comma/i,
    suggestion: 'Add missing separator between elements',
    examples: ['item1, item2', 'key: value']
  },
  {
    pattern: /invalid.*attribute|unknown.*attribute|attribute.*not.*allowed/i,
    suggestion: 'Check attribute spelling and allowed values',
    examples: ['[role=example]', '[source,ruby]']
  },
  {
    pattern: /invalid.*heading|heading.*level|expected.*heading/i,
    suggestion: 'Use valid heading syntax with = or # markers',
    examples: ['= Level 1', '== Level 2', '### Level 3']
  },
  {
    pattern: /invalid.*list|list.*marker|expected.*list.*item/i,
    suggestion: 'Use correct list markers (*, -, ., or numbered)',
    examples: ['* bullet', '. ordered', 'term:: definition']
  },
  {
    pattern: /invalid.*link|malformed.*url|link.*syntax/i,
    suggestion: 'Use correct link syntax: text[url] or link:url[]',
    examples: ['Google[https://google.com]', 'link:file.adoc[]']
  },
  {
    pattern: /invalid.*table|table.*delimiter|expected.*separator/i,
    suggestion: 'Check table syntax with | delimiters',
    examples: ["|===\n| Cell 1 | Cell 2\n|==="]
  },
  {
    pattern: /invalid.*block|block.*delimiter|unterminated.*block/i,
    suggestion: 'Ensure block delimiters match (----, ****, ====, etc.)',
    examples: ["----\ncode\n----", "====\nexample\n===="]
  },
  {
    pattern: /invalid.*macro|unknown.*macro|macro.*syntax/i,
    suggestion: 'Check macro syntax: name:target[attributes]',
    examples: ['include::file.adoc[]', 'image::image.png[]']
  }
].freeze
VERSION =
'2.0.1'

Class Method Summary collapse

Methods included from Configurable

load_configuration, reset_configuration!

Class Method Details

.binary_format?(format) ⇒ Boolean

Check if a format requires binary (file path) input

Parameters:

  • format (Symbol)

    the format to check

Returns:

  • (Boolean)

    true if the format is binary



258
259
260
261
# File 'lib/coradoc/coradoc.rb', line 258

def binary_format?(format)
  opts = registry.options_for(format)
  opts&.fetch(:binary, false) == true
end

.build(&block) ⇒ Object



191
192
193
# File 'lib/coradoc/document_builder.rb', line 191

def self.build(&block)
  DocumentBuilder.build(&block)
end

.configConfiguration

Shortcut to configuration

Returns:



512
513
514
# File 'lib/coradoc/configurable.rb', line 512

def self.config
  Configurable.configuration
end

.configure {|Configuration| ... } ⇒ void

This method returns an undefined value.

Shortcut to configure

Yields:



520
521
522
# File 'lib/coradoc/configurable.rb', line 520

def self.configure(&block)
  Configurable.configure(&block) if block_given?
end

.convert(text, from:, to:, **options) ⇒ String

Convert document text from one format to another

This is the main entry point for format conversion. It handles the complete pipeline: parse -> transform to CoreModel -> transform to target -> serialize

Examples:

Convert AsciiDoc to HTML

html = Coradoc.convert(adoc_text, from: :asciidoc, to: :html)

Convert HTML to AsciiDoc

adoc = Coradoc.convert(html_text, from: :html, to: :asciidoc)

Parameters:

  • text (String)

    the source document text

  • from (Symbol)

    the source format (:asciidoc, :html, :markdown)

  • to (Symbol)

    the target format (:asciidoc, :html, :markdown)

  • options (Hash)

    additional options for the conversion

Returns:

  • (String)

    the converted document text

Raises:



135
136
137
138
139
140
141
# File 'lib/coradoc/coradoc.rb', line 135

def convert(text, from:, to:, **options)
  # Parse to CoreModel
  core = parse(text, format: from)

  # Convert to target format
  serialize(core, to: to, **options)
end

.convert_file(path, from: nil, to:, **options) ⇒ String

Convert a file from one format to another

Examples:

html = Coradoc.convert_file("document.adoc", to: :html)
adoc = Coradoc.convert_file("report.docx", to: :asciidoc)

Parameters:

  • path (String)

    path to the source document file

  • from (Symbol, nil) (defaults to: nil)

    source format (auto-detected if nil)

  • to (Symbol)

    target format

  • options (Hash)

    additional options

Returns:

  • (String)

    the converted document text

Raises:



246
247
248
249
250
251
252
# File 'lib/coradoc/coradoc.rb', line 246

def convert_file(path, from: nil, to:, **options)
  source_format = from || detect_format(path)
  raise UnsupportedFormatError, "Could not detect format for: #{path}" unless source_format

  core = parse_file(path, format: source_format)
  serialize(core, to: to, **options)
end

.describe_element(elem) ⇒ String

Describe an element for display

Parameters:

  • elem (Object)

    element to describe

Returns:

  • (String)

    human-readable description



378
379
380
381
382
383
384
385
386
387
388
389
390
391
# File 'lib/coradoc/coradoc.rb', line 378

def describe_element(elem)
  return elem.to_s unless elem.is_a?(CoreModel::Base)

  type = elem.class.name.split('::').last
  if elem.respond_to?(:title) && elem.title
    "#{type}: #{elem.title}"
  elsif elem.respond_to?(:content) && elem.content
    preview = elem.content.to_s[0..50]
    preview += '...' if elem.content.to_s.length > 50
    "#{type}: #{preview}"
  else
    type
  end
end

.detect_format(filename) ⇒ Symbol?

Detect format from a file extension

Examples:

Coradoc.detect_format("document.adoc")  # => :asciidoc
Coradoc.detect_format("file.md")        # => :markdown

Parameters:

  • filename (String)

    Filename or extension to detect

Returns:

  • (Symbol, nil)

    the detected format symbol



196
197
198
199
200
201
202
203
# File 'lib/coradoc/coradoc.rb', line 196

def detect_format(filename)
  ext = File.extname(filename).downcase
  registry.each do |name, _mod|
    opts = registry.options_for(name)
    return name if opts[:extensions]&.include?(ext)
  end
  nil
end

.document_stats(doc) ⇒ Hash

Gather statistics about a parsed document

Parameters:

Returns:

  • (Hash)

    statistics including element counts, title, etc.



361
362
363
364
365
366
367
368
369
370
371
372
# File 'lib/coradoc/coradoc.rb', line 361

def document_stats(doc)
  stats = {}

  stats[:title] = doc.title if doc.respond_to?(:title) && doc.title

  if doc.respond_to?(:children)
    stats[:child_count] = count_elements(doc)
    stats[:element_counts] = count_element_types(doc)
  end

  stats
end

.file_info(path) ⇒ Hash

Get file metadata for display

Parameters:

  • path (String)

    path to the file

Returns:

  • (Hash)

    metadata including :size, :format, and :lines (for text formats)



332
333
334
335
336
337
# File 'lib/coradoc/coradoc.rb', line 332

def file_info(path)
  fmt = detect_format(path)
  info = { size: File.size(path), format: fmt }
  info[:lines] = File.read(path).lines.count unless binary_format?(fmt)
  info
end

.format_capabilitiesHash<Symbol, Hash<Symbol, Boolean>>

Get capability summary for all registered formats

Returns a hash mapping each format name to its capabilities (parse: bool, serialize: bool). Useful for CLI display and introspection.

Returns:

  • (Hash<Symbol, Hash<Symbol, Boolean>>)


308
309
310
311
312
313
314
315
# File 'lib/coradoc/coradoc.rb', line 308

def format_capabilities
  registered_formats.each_with_object({}) do |name, caps|
    caps[name] = {
      parse: parse_format?(name),
      serialize: serialize_format?(name)
    }
  end
end

.get_format(format_name) ⇒ Module?

Get a registered format

Parameters:

  • format_name (Symbol)

    the format name

Returns:

  • (Module, nil)

    the format module or nil if not found



78
79
80
# File 'lib/coradoc/coradoc.rb', line 78

def get_format(format_name)
  registry.get(format_name)
end

.manipulate(document) ⇒ DocumentManipulator

Create a DocumentManipulator for chainable operations

Examples:

Chainable document manipulation

html = Coradoc.manipulate(doc)
  .transform_text(&:upcase)
  .add_toc
  .to_html

Parameters:

Returns:



184
185
186
# File 'lib/coradoc/coradoc.rb', line 184

def manipulate(document)
  DocumentManipulator.new(document)
end

.normalize_format(name) ⇒ Symbol?

Normalize a format name string to a symbol

Handles common aliases like “adoc” → :asciidoc, “md” → :markdown.

Parameters:

  • name (String, Symbol, nil)

    the format name to normalize

Returns:

  • (Symbol, nil)

    the normalized format symbol, or nil



269
270
271
272
273
274
275
276
277
278
# File 'lib/coradoc/coradoc.rb', line 269

def normalize_format(name)
  return nil unless name

  key = name.to_s.downcase
  registry.each do |fmt_name, _mod|
    opts = registry.options_for(fmt_name)
    return fmt_name if opts[:aliases]&.include?(key)
  end
  key.to_sym
end

.parse(text, format:) ⇒ Coradoc::CoreModel::Base, Object

Parse text to a document model

This is the main entry point for parsing documents. It automatically selects the appropriate parser based on the format.

Examples:

Parse AsciiDoc

doc = Coradoc.parse("= Title\n\nContent", format: :asciidoc)
doc = Coradoc.parse(File.read("doc.adoc"), format: :asciidoc)

Parse and get CoreModel

core = Coradoc.parse(text, format: :asciidoc)  # Returns CoreModel

Parameters:

  • text (String)

    the document text to parse

  • format (Symbol)

    the source format (:asciidoc, :html, :markdown)

Returns:

Raises:



105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/coradoc/coradoc.rb', line 105

def parse(text, format:)
  format_module = get_format(format)
  unless format_module
    raise UnsupportedFormatError,
          "Format '#{format}' is not registered. " \
          "Available formats: #{registered_formats.join(', ')}"
  end

  text = Hooks.invoke(:before_parse, text, format: format)
  result = format_module.parse_to_core(text)
  Hooks.invoke(:after_parse, result, format: format)
end

.parse_file(path, format: nil) ⇒ Coradoc::CoreModel::Base

Parse a document from a file path

Handles both text formats (reads file content) and binary formats (passes file path directly to the format module).

Examples:

doc = Coradoc.parse_file("document.adoc")
doc = Coradoc.parse_file("report.docx", format: :docx)

Parameters:

  • path (String)

    path to the document file

  • format (Symbol, nil) (defaults to: nil)

    source format (auto-detected if nil)

Returns:

Raises:



218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
# File 'lib/coradoc/coradoc.rb', line 218

def parse_file(path, format: nil)
  raise FileNotFoundError, path unless File.exist?(path)

  source_format = format || detect_format(path)
  raise UnsupportedFormatError, "Could not detect format for: #{path}" unless source_format

  format_module = get_format(source_format)
  raise UnsupportedFormatError, "Format '#{source_format}' is not registered" unless format_module

  if binary_format?(source_format)
    format_module.parse_to_core(path)
  else
    content = File.read(path)
    parse(content, format: source_format)
  end
end

.parse_format?(format) ⇒ Boolean

Check if a format supports parsing (reading input)

Parameters:

  • format (Symbol)

    the format to check

Returns:

  • (Boolean)

    true if the format can parse



297
298
299
300
# File 'lib/coradoc/coradoc.rb', line 297

def parse_format?(format)
  mod = get_format(format)
  mod&.respond_to?(:parse_to_core) || mod&.respond_to?(:parse) || false
end

.register_format(format_name, format_module, **options) ⇒ void

This method returns an undefined value.

Register a format gem

Parameters:

  • format_name (Symbol)

    the format name (e.g., :asciidoc, :html, :markdown)

  • format_module (Module)

    the format module

  • options (Hash)

    optional configuration (e.g., extensions: [])



69
70
71
72
# File 'lib/coradoc/coradoc.rb', line 69

def register_format(format_name, format_module, **options)
  registry.register(format_name, format_module, options)
  FormatModule.validate!(format_module, format_name)
end

.registered_formatsArray<Symbol>

List all registered formats

Returns:

  • (Array<Symbol>)

    list of registered format names



85
86
87
# File 'lib/coradoc/coradoc.rb', line 85

def registered_formats
  registry.list
end

.registryRegistry

Get the format registry

Returns:



59
60
61
# File 'lib/coradoc/coradoc.rb', line 59

def registry
  @registry ||= Registry.new
end

.resolve_output_format(output_file, default: :html) ⇒ Symbol

Resolve the output format from a filename, with a default

Parameters:

  • output_file (String, nil)

    output filename to detect from

  • default (Symbol) (defaults to: :html)

    default format when detection fails (default: :html)

Returns:

  • (Symbol)

    the resolved format



322
323
324
325
326
# File 'lib/coradoc/coradoc.rb', line 322

def resolve_output_format(output_file, default: :html)
  return default unless output_file

  detect_format(output_file) || default
end

.serialize(model, to:, **options) ⇒ String

Serialize a CoreModel to a specific format

Parameters:

  • model (Coradoc::CoreModel::Base)

    the CoreModel to serialize

  • to (Symbol)

    the target format

  • options (Hash)

    additional options

Returns:

  • (String)

    the serialized document

Raises:



165
166
167
168
169
170
171
172
# File 'lib/coradoc/coradoc.rb', line 165

def serialize(model, to:, **options)
  format_module = get_format(to)
  raise UnsupportedFormatError, "Format '#{to}' is not registered" unless format_module

  model = Hooks.invoke(:before_serialize, model, format: to)
  result = format_module.serialize(model, **options)
  Hooks.invoke(:after_serialize, result, format: to)
end

.serialize_format?(format) ⇒ Boolean

Check if a format supports serialization (writing output)

Parameters:

  • format (Symbol)

    the format to check

Returns:

  • (Boolean)

    true if the format can serialize



284
285
286
287
288
289
290
291
# File 'lib/coradoc/coradoc.rb', line 284

def serialize_format?(format)
  mod = get_format(format)
  return false unless mod

  return mod.serialize? if mod.respond_to?(:serialize?)

  true
end

.strip_unicode(string, only: nil) ⇒ String

Strip unicode whitespace from a string

Parameters:

  • string (String)

    the string to strip

  • only (Symbol, nil) (defaults to: nil)

    what to strip: :begin, :end, or nil for both

Returns:

  • (String)

    the stripped string



398
399
400
401
402
403
404
405
406
407
408
409
# File 'lib/coradoc/coradoc.rb', line 398

def strip_unicode(string, only: nil)
  return string if string.nil?

  case only
  when :begin
    string.sub(/^\p{Zs}+/, '')
  when :end
    string.sub(/\p{Zs}+$/, '')
  else
    string.sub(/^\p{Zs}+/, '').sub(/\p{Zs}+$/, '')
  end
end

.to_core(model) ⇒ Coradoc::CoreModel::Base

Transform a model to CoreModel

Parameters:

  • model (Object)

    a format-specific model

Returns:

Raises:



147
148
149
150
151
152
153
154
155
156
157
# File 'lib/coradoc/coradoc.rb', line 147

def to_core(model)
  return model if model.is_a?(CoreModel::Base)

  registry.each_value do |format_module|
    next unless format_module.respond_to?(:handles_model?) && format_module.handles_model?(model)

    return format_module.to_core(model)
  end

  raise TransformationError, "No transformer found for #{model.class}"
end

.validate_file(path, format: nil) ⇒ Coradoc::Validation::Result

Validate a document file

Parses the file and validates against auto-generated schema. Returns a Coradoc::Validation::Result.

Parameters:

  • path (String)

    path to the document file

  • format (Symbol, nil) (defaults to: nil)

    source format (auto-detected if nil)

Returns:

Raises:



348
349
350
351
352
353
354
355
# File 'lib/coradoc/coradoc.rb', line 348

def validate_file(path, format: nil)
  doc = parse_file(path, format: format)

  schema = Validation::SchemaGenerator.generate(doc.class)
  return schema.validate(doc) if schema

  Validation::Result.new
end