Module: Sourcerer::Sync::BlockParser

Defined in:
lib/sourcerer/sync/block_parser.rb

Overview

Parses tagged regions from any text file, regardless of comment style

Recognizes AsciiDoc ‘tag::`/`end::` markers in HTML comments, AsciiDoc line comments,

and shell/Ruby/YAML comments.

The trailing ‘[]` is optional. See the project README for the full tag-syntax reference.

Defined Under Namespace

Classes: Block, ParseError, TextSegment

Constant Summary collapse

DEFAULT_CANONICAL_PREFIX =

Default prefix that marks a block as canonical (managed by Sync/Cast).

'universal-'
DEFAULT_TAG_SYNTAX_START =

Default opening tag marker template. ‘<tagged_block_name>` is the placeholder for the block name character class. A trailing `[]` is treated as optional in the compiled pattern.

'tag::<tagged_block_name>[]'
DEFAULT_TAG_SYNTAX_END =

Default closing tag marker template.

'end::<tagged_block_name>[]'
DEFAULT_COMMENT_SYNTAX_PATTERNS =

Default comment-wrapper templates. ‘<tag_syntax>` is the placeholder for the compiled tag marker pattern. A space between the comment delimiter and `<tag_syntax>` compiles as `s*`.

[
  '<!-- <tag_syntax> -->',
  '// <tag_syntax>',
  '# <tag_syntax>'
].freeze
DEFAULT_TAG_PATTERNS =

Default compiled pattern set, built from the three DEFAULT_* template constants. Retained for backward compatibility; prefer the template constants for customisation.

build_tag_patterns(
DEFAULT_TAG_SYNTAX_START,
DEFAULT_TAG_SYNTAX_END,
DEFAULT_COMMENT_SYNTAX_PATTERNS).freeze
TAG_PATTERNS =

Backward-compatible alias for DEFAULT_TAG_PATTERNS.

DEFAULT_TAG_PATTERNS

Class Method Summary collapse

Class Method Details

.build_tag_patterns(tag_start, tag_end, comment_patterns) ⇒ Array<Hash>

Compile template strings into a patterns array compatible with parse.

Each entry in the returned array is a ‘Regexp, close: Regexp` hash. This is the same shape as DEFAULT_TAG_PATTERNS and may be passed directly to parse via the `tag_patterns:` keyword to avoid recompilation per call.

Parameters:

Returns:

  • (Array<Hash>)


95
96
97
98
99
100
101
102
103
104
# File 'lib/sourcerer/sync/block_parser.rb', line 95

def self.build_tag_patterns tag_start, tag_end, comment_patterns
  open_inner  = tag_template_to_inner_regex(tag_start)
  close_inner = tag_template_to_inner_regex(tag_end)
  comment_patterns.map do |cp|
    {
      open:  Regexp.new(comment_template_to_full_regex(cp, open_inner)),
      close: Regexp.new(comment_template_to_full_regex(cp, close_inner))
    }
  end
end

.comment_template_to_full_regex(comment_template, inner_regex) ⇒ String

Wrap a compiled inner-tag regex fragment with a comment-wrapper template.

‘<tag_syntax>` in `comment_template` is replaced by `inner_regex`. Adjacent literal spaces around `<tag_syntax>` are compiled as `s*`. The result is anchored to `A`.

Parameters:

  • comment_template (String)

    e.g. ‘’<!– <tag_syntax> –>‘`

  • inner_regex (String)

    regex source from tag_template_to_inner_regex

Returns:

  • (String)

    full anchored regex source string



73
74
75
76
77
78
79
80
81
82
# File 'lib/sourcerer/sync/block_parser.rb', line 73

def self.comment_template_to_full_regex comment_template, inner_regex
  halves    = comment_template.split('<tag_syntax>', 2)
  left_raw  = halves[0]
  right_raw = halves[1].to_s
  left_trim  = left_raw.rstrip
  right_trim = right_raw.lstrip
  left_re  = Regexp.escape(left_trim) + (left_trim == left_raw ? '' : '\s*')
  right_re = (right_trim == right_raw ? '' : '\s*') + Regexp.escape(right_trim)
  "\\A#{left_re}#{inner_regex}#{right_re}"
end

.extract_canonical(segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX) ⇒ Hash{String => Block}

Extract all canonical blocks (those whose tag name starts with

`canonical_prefix`) as a Hash keyed by tag name.

Because parse already filters for canonical blocks when given the

same `canonical_prefix`, this method is largely a deduplication check.

It raises ParseError if more than one canonical block carries the same

tag name, which would make synchronization ambiguous.

Parameters:

  • segments (Array<TextSegment, Block>)
  • canonical_prefix (String) (defaults to: DEFAULT_CANONICAL_PREFIX)

    Prefix that identifies managed blocks

Returns:

  • (Hash{String => Block})


230
231
232
233
234
235
236
237
238
239
240
# File 'lib/sourcerer/sync/block_parser.rb', line 230

def self.extract_canonical segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX
  result = {}
  segments.each do |s|
    next unless s.is_a?(Block) && s.tag.start_with?(canonical_prefix)

    raise ParseError, "Duplicate canonical block '#{s.tag}'" if result.key?(s.tag)

    result[s.tag] = s
  end
  result
end

.parse(text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil) ⇒ Array<TextSegment, Block>

Parse a text string into an array of TextSegment and Block objects.

The result is ordered and reconstructable: joining every element’s

serialized form reproduces the original text character-perfectly.

Only blocks whose tag name starts with ‘canonical_prefix` are parsed as

proper {Block} objects; all other tag markers (open and close) are
treated as ordinary text.

This makes the parser robust against files that use tag markers for unrelated

purposes (e.g. AsciiDoc `include::` target regions or non-canonical project sections)
regardless of whether those regions are properly closed or even nested.

When a canonical block is open, every line is treated as content until

the matching close marker appears (including any inner tag markers).

Canonical blocks therefore cannot be nested.

Parameters:

  • text (String)

    Full text of the file to parse

  • canonical_prefix (String) (defaults to: DEFAULT_CANONICAL_PREFIX)

    Only tags starting with this prefix are parsed as managed Block objects (default DEFAULT_CANONICAL_PREFIX).

  • tag_syntax_start (String) (defaults to: DEFAULT_TAG_SYNTAX_START)

    Opening tag template; used to build patterns when ‘tag_patterns:` is not given (default DEFAULT_TAG_SYNTAX_START).

  • tag_syntax_end (String) (defaults to: DEFAULT_TAG_SYNTAX_END)

    Closing tag template (default DEFAULT_TAG_SYNTAX_END).

  • comment_syntax_patterns (Array<String>) (defaults to: DEFAULT_COMMENT_SYNTAX_PATTERNS)

    Comment-wrapper templates (default DEFAULT_COMMENT_SYNTAX_PATTERNS).

  • tag_patterns (Array<Hash>, nil) (defaults to: nil)

    Pre-compiled pattern set; skips template compilation when provided. Build once with build_tag_patterns and reuse.

Returns:

Raises:

  • (ParseError)

    if a canonical tag is opened but never closed.



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/sourcerer/sync/block_parser.rb', line 144

def self.parse text,
  canonical_prefix: DEFAULT_CANONICAL_PREFIX,
  tag_syntax_start: DEFAULT_TAG_SYNTAX_START,
  tag_syntax_end: DEFAULT_TAG_SYNTAX_END,
  comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS,
  tag_patterns: nil
  patterns = tag_patterns ||
             build_tag_patterns(tag_syntax_start, tag_syntax_end, comment_syntax_patterns)
  lines = text.lines
  segments = []
  text_acc = []
  block_state = nil # nil or { tag:, open_line:, content_lines: [] }

  lines.each do |line|
    stripped = line.chomp

    if block_state.nil?
      tag = detect_open_tag(stripped, patterns)
      if tag&.start_with?(canonical_prefix)
        segments << TextSegment.new(content: text_acc.join) unless text_acc.empty?
        text_acc = []
        block_state = { tag: tag, open_line: line, content_lines: [] }
      else
        # Non-canonical open tags and all close tags at the top level are
        # treated as ordinary text.
        text_acc << line
      end
    else
      close_tag = detect_close_tag(stripped, patterns)
      if close_tag == block_state[:tag]
        segments << Block.new(
          tag: block_state[:tag],
          open_line: block_state[:open_line],
          content: block_state[:content_lines].join,
          close_line: line)
        block_state = nil
      else
        # Nested open tags or mismatched close tags: treat as block content
        block_state[:content_lines] << line
      end
    end
  end

  raise ParseError, "Unclosed canonical tag '#{block_state[:tag]}'" if block_state

  segments << TextSegment.new(content: text_acc.join) unless text_acc.empty?
  segments
end

.tag_template_to_inner_regex(template) ⇒ String

Compile a tag marker template string into a plain regex fragment (no ‘A` anchor).

‘<tagged_block_name>` is replaced with the `(?<tag>+)` named capture group. A trailing `[]` in the template becomes `(?:[])?` (optional literal brackets).

Parameters:

  • template (String)

    e.g. ‘’tag::<tagged_block_name>[]‘`

Returns:

  • (String)

    regex source string



56
57
58
59
60
61
62
# File 'lib/sourcerer/sync/block_parser.rb', line 56

def self.tag_template_to_inner_regex template
  parts  = template.split('<tagged_block_name>', 2)
  left   = Regexp.escape(parts[0])
  right  = parts[1].to_s
  suffix = right == '[]' ? '(?:\[\])?' : Regexp.escape(right)
  "#{left}(?<tag>[\\w-]+)#{suffix}"
end