Class: RailsAiContext::Tools::Query

Inherits:

BaseTool

Object
MCP::Tool
BaseTool
RailsAiContext::Tools::Query

show all

Defined in:: lib/rails_ai_context/tools/query.rb

Defined Under Namespace

Classes: ResultProxy

Constant Summary collapse

BLOCKED_KEYWORDS = ── Layer 1: SQL validation ─────────────────────────────────────

/\b(INSERT|UPDATE|DELETE|DROP|ALTER|TRUNCATE|CREATE|GRANT|REVOKE|SET|COPY|MERGE|REPLACE)\b/i

BLOCKED_CLAUSES =

/\bFOR\s+(UPDATE|SHARE|NO\s+KEY\s+UPDATE)\b/i

BLOCKED_SHOWS =

/\bSHOW\s+(GRANTS|PROCESSLIST|BINLOG|SLAVE|MASTER|REPLICAS)\b/i

SELECT_INTO =

/\bSELECT\b[^;]*\bINTO\b/i

MULTI_STATEMENT =

/;\s*\S/

ALLOWED_PREFIX =

/\A\s*(SELECT|WITH|SHOW|EXPLAIN|DESCRIBE|DESC)\b/i

BLOCKED_FUNCTIONS = SELECT-callable functions that give the caller a filesystem / network exfiltration primitive even though the query is technically a SELECT. These pass ‘SET TRANSACTION READ ONLY` because they’re reads from the DB engine’s perspective, but they bypass the gem’s ‘sensitive_patterns` file allowlist entirely by pivoting through the database process. Postgres: pg_read_file / pg_read_binary_file / pg_ls_dir / pg_stat_file (file read), lo_import / lo_export (large-object I/O), dblink* (cross-db exfiltration), COPY ... FROM/TO PROGRAM (arbitrary command execution when COPY permissions permit). MySQL: LOAD_FILE (scalar file read outside SELECT INTO contexts). SQLite: load_extension (shared-library load — disabled by default but harden in defense).

/\b(
  pg_read_binary_file | pg_read_file |
  pg_ls_dir | pg_ls_logdir | pg_ls_tmpdir | pg_ls_waldir | pg_ls_archive_statusdir |
  pg_stat_file | pg_file_settings | pg_current_logfile |
  lo_import | lo_export |
  dblink[a-z_]* |
  LOAD\s+DATA |
  load_file | load_extension
)\b/ix

BLOCKED_OUTPUT = MySQL ‘SELECT … INTO OUTFILE ’path’‘ / `INTO DUMPFILE ’path’‘ are caught by SELECT_INTO already, but make an explicit pattern so the error message is accurate.

/\bINTO\s+(OUTFILE|DUMPFILE)\b/i

SENSITIVE_COLUMN_SUFFIXES = Defense against the column-aliasing redaction bypass: SELECT password_digest AS x FROM users -- bypasses result.columns redaction SELECT substring(password_digest, 1, 60) ... -- column name becomes "substring" SELECT md5(session_data) FROM sessions -- column name becomes "md5" Post-execution redaction operates on the column names the DB returns, which the caller controls via aliases and expressions. The only defense that works is to reject queries that TEXTUALLY reference any sensitive column, before execution. Users who need to query non-sensitive columns with a similar name can subtract from ‘config.query_redacted_columns` in an initializer.

%w[
  password_digest password_hash encrypted_password
  password_reset_token confirmation_token unlock_token
  remember_token reset_password_token api_key api_secret
  access_token refresh_token jti otp_secret session_data
  secret_key secret private_key
].freeze

TAUTOLOGY_PATTERNS = SQL injection tautology patterns: OR 1=1, OR true, OR ”=”, UNION SELECT, etc.

[
  /\bOR\s+1\s*=\s*1\b/i,
  /\bOR\s+true\b/i,
  /\bOR\s+'[^']*'\s*=\s*'[^']*'/i,
  /\bOR\s+"[^"]*"\s*=\s*"[^"]*"/i,
  /\bOR\s+\d+\s*=\s*\d+/i,
  /\bUNION\s+(ALL\s+)?SELECT\b/i
].freeze

HARD_ROW_CAP =

Constants inherited from BaseTool

BaseTool::SESSION_CONTEXT, BaseTool::SHARED_CACHE

Class Method Summary collapse

.call(sql: nil, limit: nil, format: "table", explain: false, server_context: nil, **_extra) ⇒ Object
.references_sensitive_column?(cleaned_sql) ⇒ Boolean

Returns the first sensitive column name referenced by the SQL, or nil.
.strip_sql_comments(sql) ⇒ Object

── SQL comment stripping ───────────────────────────────────────.
.validate_sql(sql) ⇒ Object

── SQL validation (Layer 1) ────────────────────────────────────.

Methods inherited from BaseTool

abstract!, abstract?, cache_key, cached_context, config, extract_method_source_from_file, extract_method_source_from_string, find_closest_match, fuzzy_find_key, inherited, not_found_response, paginate, rails_app, registered_tools, reset_all_caches!, reset_cache!, session_queries, session_record, session_reset!, set_call_params, text_response

Class Method Details

.call(sql: nil, limit: nil, format: "table", explain: false, server_context: nil, **_extra) ⇒ `Object`

# File 'lib/rails_ai_context/tools/query.rb', line 111

def self.call(sql: nil, limit: nil, format: "table", explain: false, server_context: nil, **_extra)
  set_call_params(sql: sql&.truncate(60))
  # ── Environment guard ───────────────────────────────────────
  unless config.allow_query_in_production || !Rails.env.production?
    return text_response(
      "rails_query is disabled in production for data privacy. " \
      "Set config.allow_query_in_production = true to override."
    )
  end

  # ── ActiveRecord guard (api-only apps) ──────────────────────
  # Must come BEFORE any code that rescues ActiveRecord::* — Ruby
  # resolves rescue class constants at raise time, and `rescue
  # ActiveRecord::ConnectionNotEstablished` crashes with NameError
  # on apps where ActiveRecord is not loaded (e.g.
  # `rails new --api --skip-active-record`).
  unless defined?(ActiveRecord::Base)
    return text_response(
      "Database queries are unavailable: ActiveRecord is not loaded in this app. " \
      "This happens on API-only apps created with `rails new --api --skip-active-record`. " \
      "rails_query requires a database connection to function."
    )
  end

  # ── Layer 1: SQL validation ─────────────────────────────────
  valid, error = validate_sql(sql)
  return text_response(error) unless valid

  # ── EXPLAIN mode ────────────────────────────────────────────
  if explain
    return execute_explain(sql.strip, config.query_timeout)
  end

  # Resolve row limit
  row_limit = limit ? [ limit.to_i, HARD_ROW_CAP ].min : config.query_row_limit
  row_limit = [ row_limit, 1 ].max
  timeout_seconds = config.query_timeout

  # ── Layers 2-3: Execute with DB-level safety + row limit ────
  result = execute_safely(sql.strip, row_limit, timeout_seconds)

  # ── Layer 4: Redact sensitive columns ───────────────────────
  redacted = redact_results(result)

  # ── Format output ───────────────────────────────────────────
  output = case format
  when "csv"
    format_csv(redacted)
  else
    format_table(redacted)
  end

  text_response(output)
rescue ActiveRecord::ConnectionNotEstablished, ActiveRecord::NoDatabaseError => e
  text_response("Database unavailable: #{clean_error_message(e.message)}\n\n**Troubleshooting:**\n- Check `config/database.yml` for correct host/port/credentials\n- Try `RAILS_ENV=test` if the development DB is remote\n- Run `bin/rails db:create` if the database doesn't exist yet")
rescue ActiveRecord::StatementInvalid => e
  if e.message.match?(/timeout|statement_timeout|MAX_EXECUTION_TIME/i)
    text_response("Query exceeded #{config.query_timeout} second timeout. Simplify the query or add indexes.")
  elsif e.message.match?(/could not find|does not exist|Unknown database/i)
    text_response("Database not found: #{clean_error_message(e.message)}\n\n**Troubleshooting:**\n- Run `bin/rails db:create` to create the database\n- Check `config/database.yml` for the correct database name\n- Try `RAILS_ENV=test` if the development DB is remote")
  else
    text_response("SQL error: #{clean_error_message(e.message)}")
  end
rescue => e
  text_response("Query failed: #{clean_error_message(e.message)}")
end

.references_sensitive_column?(cleaned_sql) ⇒ `Boolean`

Returns the first sensitive column name referenced by the SQL, or nil. Checks both the user’s configured redacted columns (config.query_redacted_columns) AND the hard-coded suffix list (SENSITIVE_COLUMN_SUFFIXES). The match is case-insensitive and word-bounded so unrelated identifiers containing a sensitive substring are not false-positives.

Returns:

(Boolean)

# File 'lib/rails_ai_context/tools/query.rb', line 242

def self.references_sensitive_column?(cleaned_sql)
  down = cleaned_sql.downcase
  # Build the combined list ONCE per call and dedupe.
  configured = Array(config.query_redacted_columns).map { |c| c.to_s.downcase }
  suffixed   = SENSITIVE_COLUMN_SUFFIXES.map(&:downcase)
  (configured + suffixed).uniq.each do |col|
    next if col.empty?
    return col if down.match?(/\b#{Regexp.escape(col)}\b/)
  end
  nil
end

.strip_sql_comments(sql) ⇒ `Object`

── SQL comment stripping ───────────────────────────────────────

# File 'lib/rails_ai_context/tools/query.rb', line 179

def self.strip_sql_comments(sql)
  sql
    .gsub(/\/\*.*?\*\//m, " ")   # Block comments: /* ... */
    .gsub(/--[^\n]*/, " ")        # Line comments: -- ...
    .gsub(/^\s*#[^\n]*/m, " ")   # MySQL-style comments: # at line start only
    .squeeze(" ").strip
end

.validate_sql(sql) ⇒ `Object`

── SQL validation (Layer 1) ────────────────────────────────────

# File 'lib/rails_ai_context/tools/query.rb', line 188

def self.validate_sql(sql)
  return [ false, "SQL query is required." ] if sql.nil? || sql.strip.empty?

  cleaned = strip_sql_comments(sql)

  # Check multi-statement and clause patterns first — they provide more
  # specific error messages than the generic keyword blocker.
  return [ false, "Blocked: multiple statements (no semicolons)" ] if cleaned.match?(MULTI_STATEMENT)
  return [ false, "Blocked: FOR UPDATE/SHARE clause" ] if cleaned.match?(BLOCKED_CLAUSES)
  return [ false, "Blocked: sensitive SHOW command" ] if cleaned.match?(BLOCKED_SHOWS)
  return [ false, "Blocked: SELECT INTO creates a table" ] if cleaned.match?(SELECT_INTO)
  return [ false, "Blocked: SELECT INTO OUTFILE / DUMPFILE writes to disk" ] if cleaned.match?(BLOCKED_OUTPUT)

  # Block database functions that give a filesystem/network primitive —
  # pg_read_file, lo_import, dblink, LOAD_FILE, load_extension, etc.
  # These pass SET TRANSACTION READ ONLY but bypass sensitive_patterns.
  if (m = cleaned.match(BLOCKED_FUNCTIONS))
    return [ false, "Blocked: dangerous function #{m[0]} (filesystem/network primitive)" ]
  end

  # Check for SQL injection tautology patterns (OR 1=1, UNION SELECT, etc.)
  tautology = TAUTOLOGY_PATTERNS.find { |p| cleaned.match?(p) }
  return [ false, "Blocked: SQL injection pattern detected (#{cleaned[tautology]})" ] if tautology

  # Check blocked keywords before the allowed-prefix fallback so that
  # INSERT/UPDATE/DELETE/DROP etc. get a specific "Blocked" error
  # rather than the generic "Only SELECT... allowed" message.
  if (m = cleaned.match(BLOCKED_KEYWORDS))
    return [ false, "Blocked: contains #{m[0]}" ]
  end

  return [ false, "Only SELECT, WITH, SHOW, EXPLAIN, DESCRIBE allowed" ] unless cleaned.match?(ALLOWED_PREFIX)

  # Column-aliasing redaction bypass defense: reject any query that
  # textually references a sensitive column name. See the comment on
  # SENSITIVE_COLUMN_SUFFIXES above — post-execution redaction cannot
  # survive `SELECT password_digest AS x`.
  if (offending = references_sensitive_column?(cleaned))
    return [ false,
      "Blocked: query references sensitive column `#{offending}`. " \
      "Post-execution redaction cannot survive aliases / expressions, so " \
      "the entire query is rejected. Remove the reference or subtract " \
      "from config.query_redacted_columns in an initializer if this " \
      "column is not actually sensitive in your app." ]
  end

  [ true, nil ]
end