Module: Woods::Console::SqlTableScanner
- Defined in:
- lib/woods/console/sql_table_scanner.rb
Overview
Extracts table and schema-qualified identifiers from a SQL string.
Handles both JOIN-style and ANSI-89 comma-join syntax across MySQL and PostgreSQL quoting styles (‘backtick`, `“double”`, bare). Schema-qualified identifiers (`schema.table`, `“schema”.“table”`, “ `db`.`table` “) are returned as `schema.table` strings so callers can compare against either the bare or qualified form.
Noise (comments, string literals, dollar-quoted bodies) is stripped via SqlNoiseStripper before scanning so that identifiers embedded in literal content are never surfaced.
All methods are module-level and stateless — pass a SQL string in, receive an array of identifier strings out.
Constant Summary collapse
- JOIN_REFERENCE =
Matches a JOIN token followed by its target identifier. The identifier may be schema-qualified in any quoting style — ‘“schema”.“table”`, “ `db`.`table` “, bare `schema.table`, or the mixed `schema.“table”` / “ schema.`table` “ forms — and the optional schema prefix is captured separately so callers can compare against either the bare or qualified configured form. An optional `ONLY` keyword (PostgreSQL inheritance opt-out) is consumed before the identifier so it does not hide the table name. ANSI-89 comma joins are handled separately — see FROM_CLAUSE.
/ \b(?:STRAIGHT_)?JOIN\s+ (?:ONLY\s+)? (?: (?: `(?<jschema_bt>[^`]+)` | "(?<jschema_dq>[^"]+)" | (?<jschema_bare>\w+) ) \. )? (?: `(?<backtick>[^`]+)` | "(?<double>[^"]+)" | (?<bare>\w+(?:\.\w+)?) ) /xi- FROM_CLAUSE =
Matches a FROM clause and captures its body up to the next clause terminator. The body may be a single table or a comma-joined list.
An inner ‘FROM` is also a terminator — this is H-3 of the bypass series. Without it, a FROM-clause subquery like `FROM (SELECT * FROM blocked) AS a` would be swallowed by the outer clause’s ‘.+?` match, and the inner `FROM blocked` would never be re-scanned because `.scan` advances past consumed input. Treating every `FROM` as its own independent scan match is what keeps CTEs, UNIONs, and nested subqueries in coverage.
/ \bFROM\s+ (?<clause>.+?) (?= \b(?:WHERE|GROUP|HAVING|ORDER|LIMIT|OFFSET|UNION|INTERSECT|EXCEPT| STRAIGHT_JOIN|JOIN|INNER|OUTER|LEFT|RIGHT|FULL|CROSS|FROM)\b | [;)] | \z ) /xim- LEAD_IDENT =
Matches a leading table identifier at the start of a FROM-list chunk. The identifier may carry an optional schema prefix in any quoting style — ‘“schema”.“table”`, “ `db`.`table` “, or the mixed `schema.“table”` / “ schema.`table` “ form — captured separately so callers can match against bare or qualified configured forms.
/ \A (?: (?: `(?<schema_bt>[^`]+)` | "(?<schema_dq>[^"]+)" | (?<schema_bare>\w+) ) \. )? (?: `(?<backtick>[^`]+)` | "(?<double>[^"]+)" | (?<bare>\w+(?:\.\w+)?) ) /xi- ONLY_PREFIX =
PostgreSQL ONLY keyword that appears between FROM and the table identifier. Strip it so the lead-identifier regex sees the table directly. Anchored with ‘A` because callers strip leading whitespace first via #strip.
/\AONLY\s+/i
Class Method Summary collapse
-
.identifiers_in(sql) ⇒ Array<String>
Returns every table/schema-qualified identifier referenced in the SQL string.
Class Method Details
.identifiers_in(sql) ⇒ Array<String>
Returns every table/schema-qualified identifier referenced in the SQL string. Noise (comments, string literals, dollar-quoted bodies) is stripped before scanning. Both JOIN-style and ANSI-89 comma-join syntax are handled.
115 116 117 118 119 120 121 122 123 |
# File 'lib/woods/console/sql_table_scanner.rb', line 115 def self.identifiers_in(sql) return [] if sql.nil? || sql.empty? stripped = strip_noise(sql) results = [] collect_join_identifiers(stripped, results) collect_from_identifiers(stripped, results) results end |