Module: Sourcerer::Sync::BlockParser

Defined in:: lib/sourcerer/sync/block_parser.rb

Overview

Parses tagged regions from any text file, regardless of comment style

Recognizes AsciiDoc ‘tag::`/`end::` markers in HTML comments, AsciiDoc line comments,

and shell/Ruby/YAML comments.

The trailing ‘[]` is optional. See the project README for the full tag-syntax reference.

Defined Under Namespace

Classes: Block, ParseError, TextSegment

Constant Summary collapse

DEFAULT_CANONICAL_PREFIX = Default prefix that marks a block as canonical (managed by Sync/Cast).

'universal-'

DEFAULT_TAG_SYNTAX_START = Default opening tag marker template. ‘<tagged_block_name>` is the placeholder for the block name character class. A trailing `[]` is treated as optional in the compiled pattern.

'tag::<tagged_block_name>[]'

DEFAULT_TAG_SYNTAX_END = Default closing tag marker template.

'end::<tagged_block_name>[]'

DEFAULT_COMMENT_SYNTAX_PATTERNS = Default comment-wrapper templates. ‘<tag_syntax>` is the placeholder for the compiled tag marker pattern. A space between the comment delimiter and `<tag_syntax>` compiles as `s*`.

[
  '<!-- <tag_syntax> -->',
  '// <tag_syntax>',
  '# <tag_syntax>'
].freeze

DEFAULT_TAG_PATTERNS = Default compiled pattern set, built from the three DEFAULT_* template constants. Retained for backward compatibility; prefer the template constants for customisation.

build_tag_patterns(
DEFAULT_TAG_SYNTAX_START,
DEFAULT_TAG_SYNTAX_END,
DEFAULT_COMMENT_SYNTAX_PATTERNS).freeze

TAG_PATTERNS = Backward-compatible alias for DEFAULT_TAG_PATTERNS.

DEFAULT_TAG_PATTERNS

Class Method Summary collapse

.build_tag_patterns(tag_start, tag_end, comment_patterns) ⇒ Array<Hash>

Compile template strings into a patterns array compatible with BlockParser.parse.
.comment_template_to_full_regex(comment_template, inner_regex) ⇒ String

Wrap a compiled inner-tag regex fragment with a comment-wrapper template.
.extract_canonical(segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX) ⇒ Hash{String => Block}

Extract all canonical blocks (those whose tag name starts with ‘canonical_prefix`) as a Hash keyed by tag name.
.parse(text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil) ⇒ Array<TextSegment, Block>

Parse a text string into an array of TextSegment and Block objects.
.tag_template_to_inner_regex(template) ⇒ String

Compile a tag marker template string into a plain regex fragment (no ‘A` anchor).

Class Method Details

.build_tag_patterns(tag_start, tag_end, comment_patterns) ⇒ `Array<Hash>`

Compile template strings into a patterns array compatible with parse.

Each entry in the returned array is a ‘Regexp, close: Regexp` hash. This is the same shape as DEFAULT_TAG_PATTERNS and may be passed directly to parse via the `tag_patterns:` keyword to avoid recompilation per call.

Parameters:

tag_start (String) —

opening tag template (default DEFAULT_TAG_SYNTAX_START)
tag_end (String) —

closing tag template (default DEFAULT_TAG_SYNTAX_END)
comment_patterns (Array<String>) —

comment-wrapper templates (default DEFAULT_COMMENT_SYNTAX_PATTERNS)

Returns:

(Array<Hash>)

# File 'lib/sourcerer/sync/block_parser.rb', line 95

def self.build_tag_patterns tag_start, tag_end, comment_patterns
  open_inner  = tag_template_to_inner_regex(tag_start)
  close_inner = tag_template_to_inner_regex(tag_end)
  comment_patterns.map do |cp|
    {
      open:  Regexp.new(comment_template_to_full_regex(cp, open_inner)),
      close: Regexp.new(comment_template_to_full_regex(cp, close_inner))
    }
  end
end

.comment_template_to_full_regex(comment_template, inner_regex) ⇒ `String`

Wrap a compiled inner-tag regex fragment with a comment-wrapper template.

‘<tag_syntax>` in `comment_template` is replaced by `inner_regex`. Adjacent literal spaces around `<tag_syntax>` are compiled as `s*`. The result is anchored to `A`.

Parameters:

comment_template (String) —

e.g. ‘’<!– <tag_syntax> –>‘`
inner_regex (String) —

regex source from tag_template_to_inner_regex

Returns:

(String) —

full anchored regex source string

# File 'lib/sourcerer/sync/block_parser.rb', line 73

def self.comment_template_to_full_regex comment_template, inner_regex
  halves    = comment_template.split('<tag_syntax>', 2)
  left_raw  = halves[0]
  right_raw = halves[1].to_s
  left_trim  = left_raw.rstrip
  right_trim = right_raw.lstrip
  left_re  = Regexp.escape(left_trim) + (left_trim == left_raw ? '' : '\s*')
  right_re = (right_trim == right_raw ? '' : '\s*') + Regexp.escape(right_trim)
  "\\A#{left_re}#{inner_regex}#{right_re}"
end

.extract_canonical(segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX) ⇒ `Hash{String => Block}`

Extract all canonical blocks (those whose tag name starts with

`canonical_prefix`) as a Hash keyed by tag name.

Because parse already filters for canonical blocks when given the

same `canonical_prefix`, this method is largely a deduplication check.

It raises ParseError if more than one canonical block carries the same

tag name, which would make synchronization ambiguous.

Parameters:

segments (Array<TextSegment, Block>)
canonical_prefix (String) (defaults to: DEFAULT_CANONICAL_PREFIX) —

Prefix that identifies managed blocks

Returns:

(Hash{String => Block})

# File 'lib/sourcerer/sync/block_parser.rb', line 230

def self.extract_canonical segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX
  result = {}
  segments.each do |s|
    next unless s.is_a?(Block) && s.tag.start_with?(canonical_prefix)

    raise ParseError, "Duplicate canonical block '#{s.tag}'" if result.key?(s.tag)

    result[s.tag] = s
  end
  result
end

.parse(text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil) ⇒ `Array<TextSegment, Block>`

Parse a text string into an array of TextSegment and Block objects.

The result is ordered and reconstructable: joining every element’s

serialized form reproduces the original text character-perfectly.

Only blocks whose tag name starts with ‘canonical_prefix` are parsed as

proper {Block} objects; all other tag markers (open and close) are
treated as ordinary text.

This makes the parser robust against files that use tag markers for unrelated

purposes (e.g. AsciiDoc `include::` target regions or non-canonical project sections)
regardless of whether those regions are properly closed or even nested.

When a canonical block is open, every line is treated as content until

the matching close marker appears (including any inner tag markers).

Canonical blocks therefore cannot be nested.

Parameters:

text (String) —

Full text of the file to parse
canonical_prefix (String) (defaults to: DEFAULT_CANONICAL_PREFIX) —

Only tags starting with this prefix are parsed as managed Block objects (default DEFAULT_CANONICAL_PREFIX).
tag_syntax_start (String) (defaults to: DEFAULT_TAG_SYNTAX_START) —

Opening tag template; used to build patterns when ‘tag_patterns:` is not given (default DEFAULT_TAG_SYNTAX_START).
tag_syntax_end (String) (defaults to: DEFAULT_TAG_SYNTAX_END) —

Closing tag template (default DEFAULT_TAG_SYNTAX_END).
comment_syntax_patterns (Array<String>) (defaults to: DEFAULT_COMMENT_SYNTAX_PATTERNS) —

Comment-wrapper templates (default DEFAULT_COMMENT_SYNTAX_PATTERNS).
tag_patterns (Array<Hash>, nil) (defaults to: nil) —

Pre-compiled pattern set; skips template compilation when provided. Build once with build_tag_patterns and reuse.

Returns:

(Array<TextSegment, Block>)

Raises:

(ParseError) —

if a canonical tag is opened but never closed.

# File 'lib/sourcerer/sync/block_parser.rb', line 144

def self.parse text,
  canonical_prefix: DEFAULT_CANONICAL_PREFIX,
  tag_syntax_start: DEFAULT_TAG_SYNTAX_START,
  tag_syntax_end: DEFAULT_TAG_SYNTAX_END,
  comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS,
  tag_patterns: nil
  patterns = tag_patterns ||
             build_tag_patterns(tag_syntax_start, tag_syntax_end, comment_syntax_patterns)
  lines = text.lines
  segments = []
  text_acc = []
  block_state = nil # nil or { tag:, open_line:, content_lines: [] }

  lines.each do |line|
    stripped = line.chomp

    if block_state.nil?
      tag = detect_open_tag(stripped, patterns)
      if tag&.start_with?(canonical_prefix)
        segments << TextSegment.new(content: text_acc.join) unless text_acc.empty?
        text_acc = []
        block_state = { tag: tag, open_line: line, content_lines: [] }
      else
        # Non-canonical open tags and all close tags at the top level are
        # treated as ordinary text.
        text_acc << line
      end
    else
      close_tag = detect_close_tag(stripped, patterns)
      if close_tag == block_state[:tag]
        segments << Block.new(
          tag: block_state[:tag],
          open_line: block_state[:open_line],
          content: block_state[:content_lines].join,
          close_line: line)
        block_state = nil
      else
        # Nested open tags or mismatched close tags: treat as block content
        block_state[:content_lines] << line
      end
    end
  end

  raise ParseError, "Unclosed canonical tag '#{block_state[:tag]}'" if block_state

  segments << TextSegment.new(content: text_acc.join) unless text_acc.empty?
  segments
end

.tag_template_to_inner_regex(template) ⇒ `String`

Compile a tag marker template string into a plain regex fragment (no ‘A` anchor).

‘<tagged_block_name>` is replaced with the `(?<tag>+)` named capture group. A trailing `[]` in the template becomes `(?:[])?` (optional literal brackets).

Parameters:

template (String) —

e.g. ‘’tag::<tagged_block_name>[]‘`

Returns:

(String) —

regex source string

# File 'lib/sourcerer/sync/block_parser.rb', line 56

def self.tag_template_to_inner_regex template
  parts  = template.split('<tagged_block_name>', 2)
  left   = Regexp.escape(parts[0])
  right  = parts[1].to_s
  suffix = right == '[]' ? '(?:\[\])?' : Regexp.escape(right)
  "#{left}(?<tag>[\\w-]+)#{suffix}"
end

Module: Sourcerer::Sync::BlockParser

Overview

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.build_tag_patterns(tag_start, tag_end, comment_patterns) ⇒ Array<Hash>

.comment_template_to_full_regex(comment_template, inner_regex) ⇒ String

.extract_canonical(segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX) ⇒ Hash{String => Block}

.parse(text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil) ⇒ Array<TextSegment, Block>

.tag_template_to_inner_regex(template) ⇒ String

.build_tag_patterns(tag_start, tag_end, comment_patterns) ⇒ `Array<Hash>`

.comment_template_to_full_regex(comment_template, inner_regex) ⇒ `String`

.extract_canonical(segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX) ⇒ `Hash{String => Block}`

.parse(text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil) ⇒ `Array<TextSegment, Block>`

.tag_template_to_inner_regex(template) ⇒ `String`