Module: Woods::Console::SqlTableScanner

Defined in:
lib/woods/console/sql_table_scanner.rb

Overview

Extracts table and schema-qualified identifiers from a SQL string.

Handles both JOIN-style and ANSI-89 comma-join syntax across MySQL and PostgreSQL quoting styles (‘backtick`, `“double”`, bare). Schema-qualified identifiers (`schema.table`, `“schema”.“table”`, “ `db`.`table` “) are returned as `schema.table` strings so callers can compare against either the bare or qualified form.

Noise (comments, string literals, dollar-quoted bodies) is stripped via SqlNoiseStripper before scanning so that identifiers embedded in literal content are never surfaced.

All methods are module-level and stateless — pass a SQL string in, receive an array of identifier strings out.

Examples:

SqlTableScanner.identifiers_in('SELECT * FROM users JOIN orders ON ...')
# => ["users", "orders"]

SqlTableScanner.identifiers_in('SELECT * FROM "audit"."events"')
# => ["audit.events"]

Constant Summary collapse

JOIN_REFERENCE =

Matches a JOIN token followed by its target identifier. The identifier may be schema-qualified in any quoting style — ‘“schema”.“table”`, “ `db`.`table` “, bare `schema.table`, or the mixed `schema.“table”` / “ schema.`table` “ forms — and the optional schema prefix is captured separately so callers can compare against either the bare or qualified configured form. An optional `ONLY` keyword (PostgreSQL inheritance opt-out) is consumed before the identifier so it does not hide the table name. ANSI-89 comma joins are handled separately — see FROM_CLAUSE.

/
  \b(?:STRAIGHT_)?JOIN\s+
  (?:ONLY\s+)?
  (?:
    (?:
      `(?<jschema_bt>[^`]+)` |
      "(?<jschema_dq>[^"]+)" |
      (?<jschema_bare>\w+)
    )
    \.
  )?
  (?:
    `(?<backtick>[^`]+)` |
    "(?<double>[^"]+)"   |
    (?<bare>\w+(?:\.\w+)?)
  )
/xi
FROM_CLAUSE =

Matches a FROM clause and captures its body up to the next clause terminator. The body may be a single table or a comma-joined list.

An inner ‘FROM` is also a terminator — this is H-3 of the bypass series. Without it, a FROM-clause subquery like `FROM (SELECT * FROM blocked) AS a` would be swallowed by the outer clause’s ‘.+?` match, and the inner `FROM blocked` would never be re-scanned because `.scan` advances past consumed input. Treating every `FROM` as its own independent scan match is what keeps CTEs, UNIONs, and nested subqueries in coverage.

/
  \bFROM\s+
  (?<clause>.+?)
  (?=
    \b(?:WHERE|GROUP|HAVING|ORDER|LIMIT|OFFSET|UNION|INTERSECT|EXCEPT|
         STRAIGHT_JOIN|JOIN|INNER|OUTER|LEFT|RIGHT|FULL|CROSS|FROM)\b
    | [;)]
    | \z
  )
/xim
LEAD_IDENT =

Matches a leading table identifier at the start of a FROM-list chunk. The identifier may carry an optional schema prefix in any quoting style — ‘“schema”.“table”`, “ `db`.`table` “, or the mixed `schema.“table”` / “ schema.`table` “ form — captured separately so callers can match against bare or qualified configured forms.

/
  \A
  (?:
    (?:
      `(?<schema_bt>[^`]+)` |
      "(?<schema_dq>[^"]+)" |
      (?<schema_bare>\w+)
    )
    \.
  )?
  (?:
    `(?<backtick>[^`]+)` |
    "(?<double>[^"]+)"   |
    (?<bare>\w+(?:\.\w+)?)
  )
/xi
ONLY_PREFIX =

PostgreSQL ONLY keyword that appears between FROM and the table identifier. Strip it so the lead-identifier regex sees the table directly. Anchored with ‘A` because callers strip leading whitespace first via #strip.

/\AONLY\s+/i

Class Method Summary collapse

Class Method Details

.identifiers_in(sql) ⇒ Array<String>

Returns every table/schema-qualified identifier referenced in the SQL string. Noise (comments, string literals, dollar-quoted bodies) is stripped before scanning. Both JOIN-style and ANSI-89 comma-join syntax are handled.

Parameters:

  • sql (String, nil)

    the SQL string to scan

Returns:

  • (Array<String>)

    identifiers in the order they were encountered; may contain duplicates if the same table is referenced multiple times



115
116
117
118
119
120
121
122
123
# File 'lib/woods/console/sql_table_scanner.rb', line 115

def self.identifiers_in(sql)
  return [] if sql.nil? || sql.empty?

  stripped = strip_noise(sql)
  results = []
  collect_join_identifiers(stripped, results)
  collect_from_identifiers(stripped, results)
  results
end