Module: EzLogsAgent::BulkSqlParser
- Defined in:
- lib/ez_logs_agent/bulk_sql_parser.rb
Overview
Pure-functional parser for the four ActiveRecord bulk operations that bypass per-row callbacks: delete_all, update_all, insert_all, upsert_all. Used by BulkDatabaseCapturer to turn an AS::Notifications “sql.active_record” payload into a structured wire shape the server can humanize.
## What it extracts
For delete_all:
{ operation: :delete_all, where_template: "status = $1", where_binds: [{column:, value:}] }
For update_all:
{ operation: :update_all, set: {"status" => "paid"}, where_template:, where_binds: }
For insert_all / upsert_all:
{ operation: :insert_all|:upsert_all, columns: ["name", "email"] }
(No values — per the product decision; column SHAPE only.)
For anything else (subqueries, joins, raw SQL the regex can’t parse, malformed binds): returns { unparseable: true }. BulkDatabaseCapturer falls back to shipping row_count + operation + model_class with no template / binds / set — the timeline still reads “Bulk delete: 50,000 orders” minus the WHERE detail.
## Why regex (not Arel / pg_query)
Both Arel and pg_query are adapter-specific and either slow or heavy to add as a runtime dependency on every customer host app. Regex on the standardized AR-emitted SQL string is fast (sub-millisecond on typical statements) and adapter-tolerant. The graceful “unparseable” branch covers the long tail.
## Adapter quoting handled
-
PostgreSQL: “orders”.“status” = $1 (double-quoted, $N placeholders)
-
SQLite: “orders”.“status” = ? (double-quoted, ? placeholders)
-
MySQL: ‘orders`.`status` = ? (backticks, ? placeholders)
Constant Summary collapse
- IDENTIFIER =
Loose match for an identifier wrapped in any of the three quote styles AR uses. Captures the unquoted name.
/["`]([^"`]+)["`]/.freeze
- COLUMN =
Either a “qualified” or “bare” identifier reference, used in WHERE/SET clause columns. AR prefixes the table name in some paths and not in others — accept both. Captures just the column.
/(?:#{IDENTIFIER}\.)?#{IDENTIFIER}/.freeze
- PLACEHOLDER =
A bind placeholder — Postgres uses $N (1-indexed), SQLite/MySQL use ?. The order of placeholders in the SQL corresponds 1:1 with the order of values in ‘binds` / `type_casted_binds`.
/(?:\$\d+|\?)/.freeze
Class Method Summary collapse
-
.build_template_and_binds(where_sql, type_casted_binds) ⇒ Object
Given a WHERE clause SQL and the binds in placeholder order, walk the placeholders and pair each one with the column to its LEFT (the standard ‘“table”.“col” = $1` shape AR emits).
-
.detect_operation(sql) ⇒ Symbol?
Returns the symbolic operation name we expect downstream.
-
.extract_insert_columns(sql) ⇒ Object
— INSERT INTO “users” (“name”,“email”) VALUES …
-
.extract_set(sql) ⇒ Object
Returns the SET-clause SQL between “SET” and “WHERE”/end-of-string.
-
.extract_where(sql) ⇒ Object
Returns the WHERE-clause SQL (without the keyword), or nil if absent.
-
.nearest_column_left_of(where_sql, pos) ⇒ Object
Finds the most recent column identifier to the left of ‘pos` in the WHERE clause.
-
.parse(sql:, type_casted_binds:) ⇒ Hash
Always returns a Hash.
-
.parse_delete(sql, type_casted_binds) ⇒ Object
— DELETE FROM “orders” WHERE “orders”.“status” = $1 —.
-
.parse_insert(sql) ⇒ Object
— INSERT INTO “users” (“name”,“email”) VALUES (…), (…) —.
-
.parse_set_assignments(set_sql) ⇒ Object
SET clause is comma-separated ‘“col” = <placeholder|literal>` pairs.
-
.parse_update(sql, type_casted_binds) ⇒ Object
— UPDATE “orders” SET “status” = $1 WHERE “orders”.“status” = $2 — SET binds come first in placeholder order, then WHERE binds.
-
.split_top_level_commas(str) ⇒ Object
Splits on commas that are NOT inside (parens) or quotes.
-
.strip_query_log_tags(sql) ⇒ Object
Strip Rails 7+ Query Log Tags (‘/*application=’X’,action=‘Y’*/‘) AND any other trailing SQL comments.
-
.zip_set_values(set_pairs, set_binds) ⇒ Object
Walks set_pairs in order, taking the next bind for each placeholder RHS and pulling the literal text for non-placeholder RHS (e.g., ‘updated_at = ’2026-06-05 12:00:00’‘).
Class Method Details
.build_template_and_binds(where_sql, type_casted_binds) ⇒ Object
Given a WHERE clause SQL and the binds in placeholder order, walk the placeholders and pair each one with the column to its LEFT (the standard ‘“table”.“col” = $1` shape AR emits). Returns the template (placeholder-preserved) and the binds tagged with column.
Unrecognized shapes (subselects, joins, NULL checks without binds) leave the template intact but produce no column for the bind, in which case the bind ships with column: nil. The display layer handles nil-column binds.
263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 263 def build_template_and_binds(where_sql, type_casted_binds) return [nil, []] if where_sql.nil? template = where_sql binds = [] bind_index = 0 # Walk the template scanning each placeholder, looking backward # for the nearest column identifier to its left. template.scan(PLACEHOLDER) do match_data = Regexp.last_match next unless match_data column = nearest_column_left_of(template, match_data.begin(0)) value = type_casted_binds[bind_index] binds << { column: column, value: value } bind_index += 1 end [template, binds] end |
.detect_operation(sql) ⇒ Symbol?
Returns the symbolic operation name we expect downstream. Detected from SQL shape, independent of the ‘payload` Rails version variance (see plan §“insert_all/upsert_all payload :name varies”). Upsert detection requires the `ON CONFLICT` (PG) or `ON DUPLICATE KEY` (MySQL) clause — bare INSERTs are insert_all.
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 105 def detect_operation(sql) return nil if sql.nil? sql_up = sql.lstrip.upcase return :delete_all if sql_up.start_with?("DELETE FROM ") return :update_all if sql_up.start_with?("UPDATE ") if sql_up.start_with?("INSERT INTO ") # Disambiguate insert_all vs upsert_all: # - insert_all on PG/SQLite emits `ON CONFLICT DO NOTHING` (still insert). # - upsert_all on PG/SQLite emits `ON CONFLICT ... DO UPDATE SET ...`. # - upsert_all on MySQL emits `ON DUPLICATE KEY UPDATE ...`. if (sql_up.include?("ON CONFLICT") && sql_up.include?("DO UPDATE")) || sql_up.include?("ON DUPLICATE KEY") return :upsert_all end :insert_all end end |
.extract_insert_columns(sql) ⇒ Object
— INSERT INTO “users” (“name”,“email”) VALUES … — Extracts the ordered column list. Returns nil if the open paren column list isn’t present (e.g., INSERT … DEFAULT VALUES).
301 302 303 304 305 306 307 308 309 310 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 301 def extract_insert_columns(sql) # First parenthesized list after the table name. m = sql.match(/\AINSERT INTO\s+#{IDENTIFIER}\s*\(([^)]+)\)/i) return nil unless m columns_block = m[2] columns_block.split(",").map do |col| col.strip.gsub(/\A["`]|["`]\z/, "") end.reject(&:empty?) end |
.extract_set(sql) ⇒ Object
Returns the SET-clause SQL between “SET” and “WHERE”/end-of-string.
177 178 179 180 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 177 def extract_set(sql) match = sql.match(/\sSET\s+(.+?)(?:\s+WHERE\s+|\s+RETURNING\b|\z)/i) match && match[1].strip end |
.extract_where(sql) ⇒ Object
Returns the WHERE-clause SQL (without the keyword), or nil if absent. Stops at end-of-string, RETURNING, ORDER BY, LIMIT (AR rarely emits these on bulk DML but cheap insurance).
171 172 173 174 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 171 def extract_where(sql) match = sql.match(/\sWHERE\s+(.+?)(?:\s+(?:RETURNING|ORDER BY|LIMIT)\b|\z)/i) match && match[1].strip end |
.nearest_column_left_of(where_sql, pos) ⇒ Object
Finds the most recent column identifier to the left of ‘pos` in the WHERE clause. Used to attribute each bind to its column name so we can mask sensitive values by name downstream.
288 289 290 291 292 293 294 295 296 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 288 def nearest_column_left_of(where_sql, pos) prefix = where_sql[0...pos] # Look for the last identifier before the operator (=, <, >, etc.) match = prefix.match(/#{COLUMN}\s*(?:=|<>|!=|<=|>=|<|>|LIKE|IN|IS)\s*\z/i) return nil unless match # COLUMN has two capture groups: [table, col] or [_, col]. match[2] || match[1] end |
.parse(sql:, type_casted_binds:) ⇒ Hash
Returns Always returns a Hash. Either the parsed structure (keys above) or ‘{ unparseable: true }`.
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 63 def parse(sql:, type_casted_binds:) return { unparseable: true } if sql.nil? || sql.empty? sql_stripped = (sql.strip) case sql_stripped when /\ADELETE FROM /i parse_delete(sql_stripped, type_casted_binds || []) when /\AUPDATE /i parse_update(sql_stripped, type_casted_binds || []) when /\AINSERT INTO /i parse_insert(sql_stripped) else { unparseable: true } end rescue StandardError # If the regex engine, binds zipping, or any sub-parse step raises, # ship unparseable rather than crash the capture handler. The # BulkDatabaseCapturer's own rescue would also catch this, but # defending here means the rest of the capturer sees a uniform # return shape. { unparseable: true } end |
.parse_delete(sql, type_casted_binds) ⇒ Object
— DELETE FROM “orders” WHERE “orders”.“status” = $1 —
128 129 130 131 132 133 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 128 def parse_delete(sql, type_casted_binds) where_sql = extract_where(sql) template, binds = build_template_and_binds(where_sql, type_casted_binds) { operation: :delete_all, where_template: template, where_binds: binds } end |
.parse_insert(sql) ⇒ Object
— INSERT INTO “users” (“name”,“email”) VALUES (…), (…) —
158 159 160 161 162 163 164 165 166 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 158 def parse_insert(sql) operation = detect_operation(sql) return { unparseable: true } unless operation == :insert_all || operation == :upsert_all columns = extract_insert_columns(sql) return { unparseable: true } if columns.nil? || columns.empty? { operation: operation, columns: columns } end |
.parse_set_assignments(set_sql) ⇒ Object
SET clause is comma-separated ‘“col” = <placeholder|literal>` pairs. AR inlines non-symbol values as literals (no placeholder) — we still surface those in the result so the reader sees what changed. Returns [[col_name, placeholder_or_literal_str], …] + total bind count.
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 186 def parse_set_assignments(set_sql) pairs = [] bind_count = 0 split_top_level_commas(set_sql).each do |assignment| m = assignment.match(/\A#{COLUMN}\s*=\s*(.+)\z/) next unless m col = m[2] || m[1] rhs = m[3].strip pairs << [col, rhs] bind_count += 1 if rhs.match?(PLACEHOLDER) end [pairs, bind_count] end |
.parse_update(sql, type_casted_binds) ⇒ Object
— UPDATE “orders” SET “status” = $1 WHERE “orders”.“status” = $2 — SET binds come first in placeholder order, then WHERE binds.
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 137 def parse_update(sql, type_casted_binds) set_sql = extract_set(sql) where_sql = extract_where(sql) return { unparseable: true } if set_sql.nil? set_pairs, set_bind_count = parse_set_assignments(set_sql) set_binds = type_casted_binds.first(set_bind_count) where_binds_raw = type_casted_binds.drop(set_bind_count) set_hash = zip_set_values(set_pairs, set_binds) where_template, where_binds = build_template_and_binds(where_sql, where_binds_raw) { operation: :update_all, set: set_hash, where_template: where_template, where_binds: where_binds } end |
.split_top_level_commas(str) ⇒ Object
Splits on commas that are NOT inside (parens) or quotes. Bulk DML SET / WHERE almost never nests, but defensive against function calls like ‘coalesce(x, 0)`.
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 222 def split_top_level_commas(str) result = [] depth = 0 in_quote = false quote_char = nil buffer = +"" str.each_char do |ch| if in_quote buffer << ch in_quote = false if ch == quote_char elsif ch == "'" || ch == '"' || ch == "`" in_quote = true quote_char = ch buffer << ch elsif ch == "(" depth += 1 buffer << ch elsif ch == ")" depth -= 1 buffer << ch elsif ch == "," && depth.zero? result << buffer.strip unless buffer.empty? buffer = +"" else buffer << ch end end result << buffer.strip unless buffer.empty? result end |
.strip_query_log_tags(sql) ⇒ Object
Strip Rails 7+ Query Log Tags (‘/*application=’X’,action=‘Y’*/‘) AND any other trailing SQL comments. They land at the end of every statement when `config.active_record.query_log_tags_enabled = true` and are pure noise on the timeline — they leak the host app’s name and controller into the user-visible filter line. Removing them at parse time means we never ship them on the wire.
93 94 95 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 93 def (sql) sql.gsub(%r{/\*.*?\*/}m, "").rstrip end |
.zip_set_values(set_pairs, set_binds) ⇒ Object
Walks set_pairs in order, taking the next bind for each placeholder RHS and pulling the literal text for non-placeholder RHS (e.g., ‘updated_at = ’2026-06-05 12:00:00’‘).
206 207 208 209 210 211 212 213 214 215 216 217 |
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 206 def zip_set_values(set_pairs, set_binds) bind_idx = 0 set_pairs.each_with_object({}) do |(col, rhs), acc| if rhs.match?(PLACEHOLDER) acc[col] = set_binds[bind_idx] bind_idx += 1 else # Strip surrounding quotes to surface the actual value. acc[col] = rhs.gsub(/\A['"]|['"]\z/, "") end end end |