Class: Coradoc::AsciiDoc::Transformer

Inherits:
Parslet::Transform
  • Object
show all
Defined in:
lib/coradoc/asciidoc/transformer.rb,
lib/coradoc/asciidoc/transformer/list_rules.rb,
lib/coradoc/asciidoc/transformer/misc_rules.rb,
lib/coradoc/asciidoc/transformer/text_rules.rb,
lib/coradoc/asciidoc/transformer/block_rules.rb,
lib/coradoc/asciidoc/transformer/header_rules.rb,
lib/coradoc/asciidoc/transformer/inline_rules.rb,
lib/coradoc/asciidoc/transformer/structural_rules.rb

Overview

Parslet::Transform subclass that converts AST to AsciiDoc model objects.

This transformer uses a modular rule system where each group of rules is defined in a separate file for maintainability.

Rule modules (each autoloaded):

  • HeaderRules: Document header, author, revision

  • InlineRules: Inline formatting (bold, italic, etc.)

  • TextRules: Text elements and paragraphs

  • BlockRules: Block elements (example, admonition, etc.)

  • ListRules: List items and list types

  • StructuralRules: Sections, tables, documents

  • MiscRules: Comments, attributes, media elements

Defined Under Namespace

Modules: BlockRules, HeaderRules, InlineRules, ListRules, MiscRules, StructuralRules, TextRules

Class Method Summary collapse

Class Method Details

.build_table_cell(format, content) ⇒ Model::TableCell

Helper method for building table cells with format specification

Parameters:

  • format (Hash, String, Object)

    Cell format specification from parser

  • content (Object)

    Cell content

Returns:



166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/coradoc/asciidoc/transformer.rb', line 166

def self.build_table_cell(format, content)
  cell_opts = {}

  # Extract style first for content parsing
  style = nil

  # Parse format specification if present
  if format.is_a?(Hash)
    # Colspan
    cell_opts[:colspan] = format[:colspan].to_i if format[:colspan]

    # Rowspan (remove leading dot)
    if format[:rowspan]
      rowspan_str = format[:rowspan].to_s
      rowspan_str = rowspan_str.sub(/^\./, '')
      cell_opts[:rowspan] = rowspan_str.to_i if rowspan_str.match?(/^\d+$/)
    end

    # Horizontal alignment
    cell_opts[:halign] = format[:halign].to_s if format[:halign]

    # Vertical alignment (remove leading dot)
    if format[:valign]
      valign_str = format[:valign].to_s
      valign_str = valign_str.sub(/^\./, '')
      cell_opts[:valign] = valign_str if %w[< ^ >].include?(valign_str)
    end

    # Style
    style = format[:style].to_s if format[:style]
    cell_opts[:style] = style

    # Repeat marker
    cell_opts[:repeat] = true if format[:repeat]
  elsif format.is_a?(String)
    # Parse format string like ".2+^.^" or "4+^" or ".3+a"
    # Format: [colspan][.rowspan][halign][valign][style][*]
    format_str = format.to_s

    # Parse colspan (digits before +)
    cell_opts[:colspan] = Regexp.last_match(1).to_i if format_str =~ /^(\d+)\+/

    # Parse rowspan (.digits)
    cell_opts[:rowspan] = Regexp.last_match(1).to_i if format_str =~ /\.(\d+)/

    # Parse horizontal alignment (^ < >)
    # Note: In AsciiDoc, ^ is center, < is left, > is right
    cell_opts[:halign] = Regexp.last_match(0) if format_str =~ /[<>^]/

    # Parse vertical alignment (.<. ^. >.)
    cell_opts[:valign] = Regexp.last_match(0)[1] if format_str =~ /\.[.^<>]/

    # Parse style (d=decimal, s=strong, e=emphasis, m=monospace, a=asciidoc, l=literal, h=header)
    style = Regexp.last_match(0) if format_str =~ /[dsemalhv]/
    cell_opts[:style] = style

    # Parse repeat marker
    cell_opts[:repeat] = true if format_str.include?('*')
  end

  # Parse content based on style
  parsed_content = parse_inline_content(content, style)
  cell_opts[:content] = parsed_content

  Model::TableCell.new(**cell_opts)
end

.extract_inline_content(data) ⇒ Object

Helper method for extracting inline content (used by InlineRules)



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# File 'lib/coradoc/asciidoc/transformer.rb', line 43

def self.extract_inline_content(data)
  if data.is_a?(Hash) && data.key?(:content)
    data[:content]
  elsif data.is_a?(Array)
    data.map do |item|
      if item.is_a?(Hash) && item.key?(:text)
        text = item[:text]
        if text.is_a?(Model::Base) && text.class.attributes.key?(:content)
          text.content
        elsif text.is_a?(Model::Base)
          text
        else
          text.to_s
        end
      else
        item
      end
    end
  else
    data
  end
end

.extract_simple_inline_content(data) ⇒ Object

Helper method for extracting simple inline content



67
68
69
70
71
72
73
74
75
76
77
# File 'lib/coradoc/asciidoc/transformer.rb', line 67

def self.extract_simple_inline_content(data)
  if data.is_a?(Hash) && data.key?(:content)
    data[:content]
  elsif data.is_a?(Array)
    data.map do |item|
      item.is_a?(Hash) && item.key?(:text) ? item[:text].to_s : item
    end.join
  else
    data
  end
end

.group_cells_into_rows(cells, explicit_col_count = nil) ⇒ Array<Model::TableRow>

Group cells into rows based on column count

AsciiDoc table row semantics:

  • Column count is determined by cols attribute or first row

  • A new row starts when previous row has ‘column_count` cells

  • Cells with colspan > 1 take multiple column slots

Parameters:

  • cells (Array<Model::TableCell>)

    Flat list of cells

  • explicit_col_count (Integer, nil) (defaults to: nil)

    Column count from cols attribute

Returns:



278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
# File 'lib/coradoc/asciidoc/transformer.rb', line 278

def self.group_cells_into_rows(cells, explicit_col_count = nil)
  return [] if cells.nil? || cells.empty?

  # Normalize cells to ensure they're TableCell objects
  normalized_cells = cells.map do |cell|
    case cell
    when Model::TableCell
      cell
    when Hash
      content = cell[:text] || cell[:content] || ''
      Model::TableCell.new(content: parse_inline_content(content))
    else
      Model::TableCell.new(content: parse_inline_content(cell))
    end
  end

  # Determine column count
  # If explicit_col_count is provided, use it
  # Otherwise, count cells until we find a row boundary
  col_count = explicit_col_count

  if col_count.nil? || col_count.zero?
    # Infer from first row - count cells until we have a complete row
    # A complete row is when the total column slots equals a consistent number
    col_count = infer_column_count(normalized_cells)
  end

  # If still no column count, assume all cells are one row
  col_count = normalized_cells.size if col_count.nil? || col_count.zero?

  # Group cells into rows
  rows = []
  current_row_cells = []
  current_col_slots = 0

  normalized_cells.each do |cell|
    # Get colspan (default 1)
    colspan = cell.is_a?(Model::TableCell) && cell.colspan ? cell.colspan : 1

    current_row_cells << cell
    current_col_slots += colspan

    # Check if row is complete
    next unless current_col_slots >= col_count

    rows << Model::TableRow.new(columns: current_row_cells)
    current_row_cells = []
    current_col_slots = 0
  end

  # Handle remaining cells (incomplete last row)
  rows << Model::TableRow.new(columns: current_row_cells) if current_row_cells.any?

  rows
end

.infer_column_count(cells) ⇒ Object

Infer column count from cells Look for patterns where rows have consistent cell counts



336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
# File 'lib/coradoc/asciidoc/transformer.rb', line 336

def self.infer_column_count(cells)
  return nil if cells.nil? || cells.empty?

  col_slots = cells.map do |cell|
    cell.is_a?(Model::TableCell) && cell.colspan ? cell.colspan : 1
  end

  total_cells = col_slots.sum

  # Find all valid column counts
  possible_cols = (1..[total_cells, 12].min).select do |candidate|
    next false if candidate > total_cells
    next false if total_cells % candidate != 0

    slots_used = 0
    valid = true

    col_slots.each do |slots|
      slots_used += slots
      if slots_used == candidate
        slots_used = 0
      elsif slots_used > candidate
        valid = false
        break
      end
    end

    valid && slots_used.zero?
  end

  possible_cols.max || col_slots.first || 1
end

.legacy_transform(syntax_tree) ⇒ Object

Deprecated.

Use transform instead

Legacy transform method (deprecated)



401
402
403
# File 'lib/coradoc/asciidoc/transformer.rb', line 401

def self.legacy_transform(syntax_tree)
  new.apply(syntax_tree)
end

.parse_block_content(text) ⇒ Array

Parse block-level AsciiDoc content (for ‘a’ style cells)

Parameters:

  • text (String)

    Raw text containing AsciiDoc blocks

Returns:

  • (Array)

    Parsed block content



112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# File 'lib/coradoc/asciidoc/transformer.rb', line 112

def self.parse_block_content(text)
  return [Coradoc::AsciiDoc::Model::TextElement.new(content: '')] if text.nil? || text.to_s.strip.empty?

  parser = Coradoc::AsciiDoc::Parser::Base.new
  text_str = text.to_s

  # Try parsing as a list if content contains list markers
  # List markers can appear after other content (e.g., "Title:\n\n* item")
  if /^(\*+|-+|\d+\.)/m.match?(text_str)
    # Extract just the list portion
    list_match = text_str.match(/\n(\*+|-+|\d+\.)(.*)$/m)
    if list_match
      list_text = list_match[1] + list_match[2]
      begin
        ast = parser.list.parse(list_text)
        transformed = new.apply(ast)

        # Parse the text before the list as inline content
        before_list = text_str[0, list_match.begin(1) - 1].strip
        before_elements = []
        unless before_list.empty?
          begin
            before_ast = parser.text_any.parse(before_list)
            before_transformed = new.apply(before_ast)
            before_array = before_transformed.is_a?(Array) ? before_transformed : [before_transformed]
            before_elements = [Coradoc::AsciiDoc::Model::TextElement.new(content: before_array)]
          rescue Parslet::ParseFailed
            before_elements = [Coradoc::AsciiDoc::Model::TextElement.new(content: before_list)]
          end
        end

        return before_elements + [transformed]
      rescue Parslet::ParseFailed
        # Fall through to inline parsing
      end
    end
  end

  # Try parsing as inline content
  begin
    ast = parser.text_any.parse(text_str)
    transformed = new.apply(ast)
    content_array = transformed.is_a?(Array) ? transformed : [transformed]
    [Coradoc::AsciiDoc::Model::TextElement.new(content: content_array)]
  rescue Parslet::ParseFailed
    # If parsing fails, return the text as a simple TextElement
    [Coradoc::AsciiDoc::Model::TextElement.new(content: text_str)]
  end
end

.parse_cols_attribute(attrs) ⇒ Integer?

Parse the cols attribute to determine column count

Parameters:

Returns:

  • (Integer, nil)

    Column count or nil if not specified



236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/coradoc/asciidoc/transformer.rb', line 236

def self.parse_cols_attribute(attrs)
  return nil if attrs.nil?

  # Get the cols value from named attributes
  cols_value = if attrs.is_a?(Model::AttributeList)
                 attrs.named.find { |n| n.name.to_s == 'cols' }&.value
               elsif attrs.is_a?(Hash)
                 attrs['cols'] || attrs[:cols]
               end

  return nil if cols_value.nil?

  # cols can be:
  # - A single number: "3" -> 3 columns
  # - A list: "1,2,1" -> 3 columns
  # - With multipliers: "3*" -> 3 columns
  # - Quoted: "\"3\"" -> 3 columns
  cols_str = cols_value.is_a?(Array) ? cols_value.first.to_s : cols_value.to_s

  # Remove surrounding quotes if present
  cols_str = cols_str.gsub(/^["']|["']$/, '')

  # Handle multiplier syntax: "3*" means 3 columns
  return Regexp.last_match(1).to_i if cols_str =~ /^(\d+)\*$/

  # Handle comma-separated list: count the parts
  return cols_str.split(',').size if cols_str.include?(',')

  # Single number
  cols_str.to_i if /^\d+$/.match?(cols_str)
end

.parse_inline_content(text, style = nil) ⇒ Array<TextElement>

Helper method for parsing inline content from raw text This is used for table cells where content is captured as raw text

Parameters:

  • text (String)

    Raw text to parse

  • style (String, nil) (defaults to: nil)

    Cell style (‘a’ for AsciiDoc, ‘l’ for literal, etc.)

Returns:

  • (Array<TextElement>)

    Parsed content as array of TextElement objects



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'lib/coradoc/asciidoc/transformer.rb', line 84

def self.parse_inline_content(text, style = nil)
  return [Coradoc::AsciiDoc::Model::TextElement.new(content: '')] if text.nil? || text.to_s.strip.empty?

  # For AsciiDoc style cells, parse as block content
  return parse_block_content(text) if style == 'a'

  # For literal style cells, preserve text as-is
  return [Coradoc::AsciiDoc::Model::TextElement.new(content: text.to_s)] if style == 'l'

  # For default cells, parse inline content
  parser = Coradoc::AsciiDoc::Parser::Base.new
  begin
    ast = parser.text_any.parse(text.to_s)
    # Transform the AST to model objects
    transformed = new.apply(ast)

    # Wrap in TextElement
    content_array = transformed.is_a?(Array) ? transformed : [transformed]
    [Coradoc::AsciiDoc::Model::TextElement.new(content: content_array)]
  rescue Parslet::ParseFailed
    # If parsing fails, return the text as a simple TextElement
    [Coradoc::AsciiDoc::Model::TextElement.new(content: text.to_s)]
  end
end

.regroup_table_rows(rows, attrs = nil) ⇒ Array<Model::TableRow>

Regroup parser-level rows into proper AsciiDoc rows. The parser produces one “row” per line; this flattens all cells and regroups by the cols attribute, then marks the first row as header.

Parameters:

Returns:



376
377
378
379
380
381
382
383
384
385
386
387
388
389
# File 'lib/coradoc/asciidoc/transformer.rb', line 376

def self.regroup_table_rows(rows, attrs = nil)
  return rows if rows.nil? || rows.empty?

  col_count = parse_cols_attribute(attrs)
  all_cells = rows.flat_map do |r|
    r.is_a?(Model::TableRow) ? r.columns : []
  end

  return rows if all_cells.empty?

  grouped = group_cells_into_rows(all_cells, col_count)
  grouped.first.header = true unless grouped.empty?
  grouped
end

.transform(syntax_tree) ⇒ Object

Transform a syntax tree using this transformer’s rules

Parameters:

  • syntax_tree (Hash, Array)

    The AST from the parser

Returns:

  • (Object)

    The transformed model object(s)



395
396
397
# File 'lib/coradoc/asciidoc/transformer.rb', line 395

def self.transform(syntax_tree)
  new.apply(syntax_tree)
end