Class: RailsAiContext::Tools::Query

Inherits:
BaseTool
  • Object
show all
Defined in:
lib/rails_ai_context/tools/query.rb

Defined Under Namespace

Classes: ResultProxy

Constant Summary collapse

BLOCKED_KEYWORDS =

── Layer 1: SQL validation ─────────────────────────────────────

/\b(INSERT|UPDATE|DELETE|DROP|ALTER|TRUNCATE|CREATE|GRANT|REVOKE|SET|COPY|MERGE|REPLACE)\b/i
BLOCKED_CLAUSES =
/\bFOR\s+(UPDATE|SHARE|NO\s+KEY\s+UPDATE)\b/i
BLOCKED_SHOWS =
/\bSHOW\s+(GRANTS|PROCESSLIST|BINLOG|SLAVE|MASTER|REPLICAS)\b/i
SELECT_INTO =
/\bSELECT\b[^;]*\bINTO\b/i
MULTI_STATEMENT =
/;\s*\S/
ALLOWED_PREFIX =
/\A\s*(SELECT|WITH|SHOW|EXPLAIN|DESCRIBE|DESC)\b/i
BLOCKED_FUNCTIONS =

SELECT-callable functions that give the caller a filesystem / network exfiltration primitive even though the query is technically a SELECT. These pass ‘SET TRANSACTION READ ONLY` because they’re reads from the DB engine’s perspective, but they bypass the gem’s ‘sensitive_patterns` file allowlist entirely by pivoting through the database process.

Postgres: pg_read_file / pg_read_binary_file / pg_ls_dir / pg_stat_file

(file read), lo_import / lo_export (large-object I/O),
dblink* (cross-db exfiltration), COPY ... FROM/TO PROGRAM
(arbitrary command execution when COPY permissions permit).

MySQL: LOAD_FILE (scalar file read outside SELECT INTO contexts). SQLite: load_extension (shared-library load — disabled by default

but harden in defense).
/\b(
  pg_read_binary_file | pg_read_file |
  pg_ls_dir | pg_ls_logdir | pg_ls_tmpdir | pg_ls_waldir | pg_ls_archive_statusdir |
  pg_stat_file | pg_file_settings | pg_current_logfile |
  lo_import | lo_export |
  dblink[a-z_]* |
  LOAD\s+DATA |
  load_file | load_extension
)\b/ix
BLOCKED_OUTPUT =

MySQL ‘SELECT … INTO OUTFILE ’path’‘ / `INTO DUMPFILE ’path’‘ are caught by SELECT_INTO already, but make an explicit pattern so the error message is accurate.

/\bINTO\s+(OUTFILE|DUMPFILE)\b/i
SENSITIVE_COLUMN_SUFFIXES =

Defense against the column-aliasing redaction bypass:

SELECT password_digest AS x FROM users       -- bypasses result.columns redaction
SELECT substring(password_digest, 1, 60) ... -- column name becomes "substring"
SELECT md5(session_data) FROM sessions       -- column name becomes "md5"

Post-execution redaction operates on the column names the DB returns, which the caller controls via aliases and expressions. The only defense that works is to reject queries that TEXTUALLY reference any sensitive column, before execution. Users who need to query non-sensitive columns with a similar name can subtract from ‘config.query_redacted_columns` in an initializer.

%w[
  password_digest password_hash encrypted_password
  password_reset_token confirmation_token unlock_token
  remember_token reset_password_token api_key api_secret
  access_token refresh_token jti otp_secret session_data
  secret_key secret private_key
].freeze
TAUTOLOGY_PATTERNS =

SQL injection tautology patterns: OR 1=1, OR true, OR ”=”, UNION SELECT, etc.

[
  /\bOR\s+1\s*=\s*1\b/i,
  /\bOR\s+true\b/i,
  /\bOR\s+'[^']*'\s*=\s*'[^']*'/i,
  /\bOR\s+"[^"]*"\s*=\s*"[^"]*"/i,
  /\bOR\s+\d+\s*=\s*\d+/i,
  /\bUNION\s+(ALL\s+)?SELECT\b/i
].freeze
HARD_ROW_CAP =
1000

Constants inherited from BaseTool

BaseTool::SESSION_CONTEXT, BaseTool::SHARED_CACHE

Class Method Summary collapse

Methods inherited from BaseTool

abstract!, abstract?, cache_key, cached_context, config, extract_method_source_from_file, extract_method_source_from_string, find_closest_match, fuzzy_find_key, inherited, not_found_response, paginate, rails_app, registered_tools, reset_all_caches!, reset_cache!, session_queries, session_record, session_reset!, set_call_params, text_response

Class Method Details

.call(sql: nil, limit: nil, format: "table", explain: false, server_context: nil, **_extra) ⇒ Object



111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/rails_ai_context/tools/query.rb', line 111

def self.call(sql: nil, limit: nil, format: "table", explain: false, server_context: nil, **_extra)
  set_call_params(sql: sql&.truncate(60))
  # ── Environment guard ───────────────────────────────────────
  unless config.allow_query_in_production || !Rails.env.production?
    return text_response(
      "rails_query is disabled in production for data privacy. " \
      "Set config.allow_query_in_production = true to override."
    )
  end

  # ── ActiveRecord guard (api-only apps) ──────────────────────
  # Must come BEFORE any code that rescues ActiveRecord::* — Ruby
  # resolves rescue class constants at raise time, and `rescue
  # ActiveRecord::ConnectionNotEstablished` crashes with NameError
  # on apps where ActiveRecord is not loaded (e.g.
  # `rails new --api --skip-active-record`).
  unless defined?(ActiveRecord::Base)
    return text_response(
      "Database queries are unavailable: ActiveRecord is not loaded in this app. " \
      "This happens on API-only apps created with `rails new --api --skip-active-record`. " \
      "rails_query requires a database connection to function."
    )
  end

  # ── Layer 1: SQL validation ─────────────────────────────────
  valid, error = validate_sql(sql)
  return text_response(error) unless valid

  # ── EXPLAIN mode ────────────────────────────────────────────
  if explain
    return execute_explain(sql.strip, config.query_timeout)
  end

  # Resolve row limit
  row_limit = limit ? [ limit.to_i, HARD_ROW_CAP ].min : config.query_row_limit
  row_limit = [ row_limit, 1 ].max
  timeout_seconds = config.query_timeout

  # ── Layers 2-3: Execute with DB-level safety + row limit ────
  result = execute_safely(sql.strip, row_limit, timeout_seconds)

  # ── Layer 4: Redact sensitive columns ───────────────────────
  redacted = redact_results(result)

  # ── Format output ───────────────────────────────────────────
  output = case format
  when "csv"
    format_csv(redacted)
  else
    format_table(redacted)
  end

  text_response(output)
rescue ActiveRecord::ConnectionNotEstablished, ActiveRecord::NoDatabaseError => e
  text_response("Database unavailable: #{clean_error_message(e.message)}\n\n**Troubleshooting:**\n- Check `config/database.yml` for correct host/port/credentials\n- Try `RAILS_ENV=test` if the development DB is remote\n- Run `bin/rails db:create` if the database doesn't exist yet")
rescue ActiveRecord::StatementInvalid => e
  if e.message.match?(/timeout|statement_timeout|MAX_EXECUTION_TIME/i)
    text_response("Query exceeded #{config.query_timeout} second timeout. Simplify the query or add indexes.")
  elsif e.message.match?(/could not find|does not exist|Unknown database/i)
    text_response("Database not found: #{clean_error_message(e.message)}\n\n**Troubleshooting:**\n- Run `bin/rails db:create` to create the database\n- Check `config/database.yml` for the correct database name\n- Try `RAILS_ENV=test` if the development DB is remote")
  else
    text_response("SQL error: #{clean_error_message(e.message)}")
  end
rescue => e
  text_response("Query failed: #{clean_error_message(e.message)}")
end

.references_sensitive_column?(cleaned_sql) ⇒ Boolean

Returns the first sensitive column name referenced by the SQL, or nil. Checks both the user’s configured redacted columns (config.query_redacted_columns) AND the hard-coded suffix list (SENSITIVE_COLUMN_SUFFIXES). The match is case-insensitive and word-bounded so unrelated identifiers containing a sensitive substring are not false-positives.

Returns:

  • (Boolean)


242
243
244
245
246
247
248
249
250
251
252
# File 'lib/rails_ai_context/tools/query.rb', line 242

def self.references_sensitive_column?(cleaned_sql)
  down = cleaned_sql.downcase
  # Build the combined list ONCE per call and dedupe.
  configured = Array(config.query_redacted_columns).map { |c| c.to_s.downcase }
  suffixed   = SENSITIVE_COLUMN_SUFFIXES.map(&:downcase)
  (configured + suffixed).uniq.each do |col|
    next if col.empty?
    return col if down.match?(/\b#{Regexp.escape(col)}\b/)
  end
  nil
end

.strip_sql_comments(sql) ⇒ Object

── SQL comment stripping ───────────────────────────────────────



179
180
181
182
183
184
185
# File 'lib/rails_ai_context/tools/query.rb', line 179

def self.strip_sql_comments(sql)
  sql
    .gsub(/\/\*.*?\*\//m, " ")   # Block comments: /* ... */
    .gsub(/--[^\n]*/, " ")        # Line comments: -- ...
    .gsub(/^\s*#[^\n]*/m, " ")   # MySQL-style comments: # at line start only
    .squeeze(" ").strip
end

.validate_sql(sql) ⇒ Object

── SQL validation (Layer 1) ────────────────────────────────────



188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# File 'lib/rails_ai_context/tools/query.rb', line 188

def self.validate_sql(sql)
  return [ false, "SQL query is required." ] if sql.nil? || sql.strip.empty?

  cleaned = strip_sql_comments(sql)

  # Check multi-statement and clause patterns first — they provide more
  # specific error messages than the generic keyword blocker.
  return [ false, "Blocked: multiple statements (no semicolons)" ] if cleaned.match?(MULTI_STATEMENT)
  return [ false, "Blocked: FOR UPDATE/SHARE clause" ] if cleaned.match?(BLOCKED_CLAUSES)
  return [ false, "Blocked: sensitive SHOW command" ] if cleaned.match?(BLOCKED_SHOWS)
  return [ false, "Blocked: SELECT INTO creates a table" ] if cleaned.match?(SELECT_INTO)
  return [ false, "Blocked: SELECT INTO OUTFILE / DUMPFILE writes to disk" ] if cleaned.match?(BLOCKED_OUTPUT)

  # Block database functions that give a filesystem/network primitive —
  # pg_read_file, lo_import, dblink, LOAD_FILE, load_extension, etc.
  # These pass SET TRANSACTION READ ONLY but bypass sensitive_patterns.
  if (m = cleaned.match(BLOCKED_FUNCTIONS))
    return [ false, "Blocked: dangerous function #{m[0]} (filesystem/network primitive)" ]
  end

  # Check for SQL injection tautology patterns (OR 1=1, UNION SELECT, etc.)
  tautology = TAUTOLOGY_PATTERNS.find { |p| cleaned.match?(p) }
  return [ false, "Blocked: SQL injection pattern detected (#{cleaned[tautology]})" ] if tautology

  # Check blocked keywords before the allowed-prefix fallback so that
  # INSERT/UPDATE/DELETE/DROP etc. get a specific "Blocked" error
  # rather than the generic "Only SELECT... allowed" message.
  if (m = cleaned.match(BLOCKED_KEYWORDS))
    return [ false, "Blocked: contains #{m[0]}" ]
  end

  return [ false, "Only SELECT, WITH, SHOW, EXPLAIN, DESCRIBE allowed" ] unless cleaned.match?(ALLOWED_PREFIX)

  # Column-aliasing redaction bypass defense: reject any query that
  # textually references a sensitive column name. See the comment on
  # SENSITIVE_COLUMN_SUFFIXES above — post-execution redaction cannot
  # survive `SELECT password_digest AS x`.
  if (offending = references_sensitive_column?(cleaned))
    return [ false,
      "Blocked: query references sensitive column `#{offending}`. " \
      "Post-execution redaction cannot survive aliases / expressions, so " \
      "the entire query is rejected. Remove the reference or subtract " \
      "from config.query_redacted_columns in an initializer if this " \
      "column is not actually sensitive in your app." ]
  end

  [ true, nil ]
end