Module: Iriq::Specificity
- Defined in:
- lib/iriq/specificity.rb
Overview
Per-Recognizer claim strength. Higher specificity wins when multiple Recognizers fire on the same segment; the ensemble picks the max(specificity × confidence).
The bands below capture the current type taxonomy at coarse-grain: they’re explicitly NOT linear “how confident” scores. They encode “how surprising would it be for this Recognizer to fire by accident on a different actual type.” UUID’s shape is so distinctive that a non-UUID producing that string is vanishingly unlikely (SEMANTIC); a 4-digit integer could plausibly be a year, an HTTP status, or an ID, so ‘:integer` claims only TYPED.
Calibration corpus tests in spec/iriq/calibration_spec.rb / Go’s calibration_test.go are the source of truth for whether these values are well-chosen — adjust them and re-run to validate.
Constant Summary collapse
- SEMANTIC =
Unambiguous semantic shapes — the regex effectively can’t fire by accident. (UUID, JWT, email with @, URL with ://, color hex.)
1.0- STRUCTURED =
Restrictive structured patterns. Could collide with broader types at edges. (date, file with known ext, ipv4, mime.)
0.8- BOUNDED =
Digit-shaped with an additional bound — range or allowlist — that makes the shape alone meaningful. (timestamp, currency, country, boolean.)
0.7- TYPED =
Lexically broad but typed. (integer, float, version.)
0.5- PATTERN =
Generic pattern-based shape. (slug.)
0.3- FALLBACK =
Generic fallback shapes. (literal, opaque_id.)
0.1