Module: Rigor::Builtins::RegexRefinement

Defined in:
lib/rigor/builtins/regex_refinement.rb

Overview

Maps a curated table of canonical regex sub-patterns onto the imported refinement carriers Rigor already ships (‘decimal-int-string`, `hex-int-string`, `octal-int-string`, `lowercase-string`, `uppercase-string`, `numeric-string`). See `docs/type-specification/imported-built-in-types.md` for the registry the refinements come from and `docs/MILESTONES.md` § “v0.1.1 — Planned” Track 1 slice 1 for the binding scope of this recogniser.

The intended consumer is ‘Inference::Narrowing.analyse_match_write`: given `if /(?<year>d+)/ =~ str; year; end`, the v0.1.0 baseline narrows `year` to plain `String`; v0.1.1 introspects the regex source and narrows further to `decimal-int-string` whenever the named-capture body matches one of the rows in RULES.

Recognised body shapes (each row admits the ‘+` quantifier and the bounded `n` / `n,m` forms with `n >= 1`):

- `\d`                     -> decimal-int-string
- `\h`                     -> hex-int-string
- `[0-9a-fA-F]`            -> hex-int-string
- `[0-9a-f]`, `[0-9A-F]`   -> hex-int-string
- `[0-7]`                  -> octal-int-string
- `[a-z]`                  -> lowercase-string
- `[A-Z]`                  -> uppercase-string
- `[[:digit:]]`            -> numeric-string

Anything outside the table returns ‘nil` so the calling narrowing site falls back to its previous behaviour (plain `String`). Arbitrary regex semantic equivalence is undecidable, so the table is intentionally a small audited set of canonical shapes rather than a general equivalence checker.

Class Method Summary collapse

Class Method Details

.for_capture_body(body) ⇒ Rigor::Type?

Returns the matching imported refinement carrier, or ‘nil` if `body` is not a recognised shape.

Parameters:

  • body (String, nil)

    a regex sub-pattern, typically the inner body of a ‘(?<name>body)` named capture. Anchors (`A`, `z`, `^`, `$`) are not stripped — the recogniser table targets bodies that the regex engine treats as anchored to the capture group bounds.

Returns:

  • (Rigor::Type, nil)

    the matching imported refinement carrier, or ‘nil` if `body` is not a recognised shape.



75
76
77
78
79
80
81
82
83
# File 'lib/rigor/builtins/regex_refinement.rb', line 75

def for_capture_body(body)
  return nil if body.nil? || body.empty?

  rule = RULES.find { |pattern, _| pattern.match?(body) }
  return nil if rule.nil?
  return nil unless valid_bounds?(body)

  Type::Combinator.public_send(rule.last)
end

.valid_bounds?(body) ⇒ Boolean

Filters the bounded-quantifier forms to ones whose lower bound is at least 1 and whose upper bound (if any) is at least the lower bound. Without this, ‘d0,5` would be accepted even though it admits the empty string, which is not a valid `decimal-int-string`.

Returns:

  • (Boolean)


90
91
92
93
94
95
96
97
98
99
100
101
# File 'lib/rigor/builtins/regex_refinement.rb', line 90

def valid_bounds?(body)
  m = BOUND_RE.match(body)
  return true if m.nil?

  low = Integer(m[1])
  return false if low < 1

  high = m[2] && Integer(m[2])
  return true if high.nil?

  low <= high
end