Module: Parse::PipelineSecurity
- Defined in:
- lib/parse/pipeline_security.rb
Overview
Canonical security validator for MongoDB aggregation pipelines and filter hashes that the SDK forwards to the driver or to Parse Server.
Previously the codebase had three different validators with three different rule sets:
-
‘Parse::Agent::PipelineValidator` — strict allowlist for the Agent (read-only paths only)
-
‘Parse::Query#validate_pipeline!` — outer-stage-only denylist
-
‘Parse::MongoDB.assert_no_denied_operators!` — recursive denylist of server-side JS operators
‘Parse::AtlasSearch.convert_filter_for_mongodb` was a complete passthrough that bypassed all three. A user-supplied filter containing `$where`/`$expr`/`$function`/`$regex` was injected straight into the pipeline `$match` stage, bypassing every existing constraint guard.
This module consolidates the rules. Every entry point that forwards a caller-supplied pipeline or filter to MongoDB now routes through one of the two public methods here:
-
PipelineSecurity.validate_pipeline! — strict mode (allowlist + size/depth caps). Used by ‘Parse::Agent` and by `Parse::Query#aggregate` for user-facing aggregation entry points.
-
PipelineSecurity.validate_filter! — permissive mode (recursive denylist only). Used by ‘Parse::MongoDB.find/aggregate` and Atlas Search filter passthrough where the pipeline is constructed by SDK code but a user-controlled filter hash is interpolated. Refuses `$where`/`$function`/`$accumulator` and the data-mutating stages at any nesting depth.
Policy: allowlist top-level, denylist recursive
Strict mode enforces ALLOWED_STAGES ONLY at the top-level stage key — nested sub-pipelines (inside ‘$lookup.pipeline`, `$unionWith.pipeline`, `$facet.*`, `$graphLookup`) are walked with the operator denylist but NOT with the stage allowlist. This is intentional: Atlas Search and uncommon-but-legitimate read stages like `$densify` and `$fill` must be allowed inside sub-pipelines even when the outer pipeline is strict-validated. The denylist is the security boundary; the allowlist is a shape check.
Caveat for Query#aggregate callers
‘Parse::Query#aggregate` routes through PipelineSecurity.validate_filter!, not PipelineSecurity.validate_pipeline!, so user-supplied pipelines are checked against the denylist only. Permissive mode does NOT block `$lookup`, `$graphLookup`, or `$unionWith` reading from arbitrary collections — these are legitimate read stages but powerful enough to cross Parse ACL/CLP boundaries when the source collection lacks row-level enforcement. **Never pass raw attacker-controlled input into `Parse::Query#aggregate`.** Construct the pipeline in SDK code and interpolate only validated values.
Capability gap: ‘$expr`
‘$expr` itself is not in DENIED_OPERATORS. The recursive walker catches `$function`/`$accumulator` nested inside `$expr`, so the immediate JavaScript-execution risk is closed. A future Atlas operator gated under `$expr` would slip until DENIED_OPERATORS is extended. Defense-in-depth callers concerned about expensive aggregation expressions (`$regexMatch` ReDoS, large `$reduce` loops) should validate user input shape before reaching this module.
Defined Under Namespace
Classes: Error
Constant Summary collapse
- DENIED_OPERATORS =
Operators that are ALWAYS refused at any nesting depth. These either execute server-side JavaScript (‘$where`, `$function`, `$accumulator`) or mutate the database (`$out`, `$merge`) or the server itself (`$collMod`, `$createIndex`, `$dropIndex`, `$planCacheSetFilter`, `$planCacheClear`). None of them are needed for read queries.
%w[ $where $function $accumulator $out $merge $collMod $createIndex $dropIndex $planCacheSetFilter $planCacheClear ].freeze
- DENIED_FIELD_REFS =
Field-reference paths (string values inside ‘$expr` whose first byte is `$`) that point at server-internal columns and must never be reachable from a user-influenced pipeline. A boolean expression inside `$expr` over any of these is a 1-bit-per-query side channel that bisects the value of a bcrypt hash, session token, or password-reset token. Names match Parse Server’s internal column layout (cf. MongoStorageAdapter).
%w[ $_hashed_password $_password_history $_session_token $_sessionToken $_email_verify_token $_perishable_token $_failed_login_count $_account_lockout_expires_at $_rperm $_wperm $_auth_data ].freeze
- DENIED_FIELD_REF_PREFIXES =
String prefix for per-provider auth-data field references inside $expr. Parse Server stores per-provider columns as ‘_auth_data_facebook`, `_auth_data_google`, etc. — none of these should be reachable from a user-influenced pipeline. The prefix `$auth_data` covers all of them without requiring an exhaustive list.
%w[$_auth_data_].freeze
- ALLOWED_UNDERSCORE_COLLECTIONS =
MongoDB collection names that an SDK aggregation IS permitted to name in ‘from:`/`coll:`. Any name starting with `_` outside this set is refused as an internal Parse Server collection. The four entries here are the only `_`-prefixed collections that hold Parse SDK data classes; everything else with a leading `_` is server-managed state (`_SCHEMA` discloses class-level permissions; `_Hooks` discloses Cloud Code webhook URLs + secret keys; `_GraphQLConfig` discloses GraphQL schema state; `_Audit` holds operational telemetry; `_Idempotency`/`_PushStatus`/ `_JobStatus`/`_JobSchedule`/`_GlobalConfig`/`_Audience` hold internal Parse Server bookkeeping).
%w[_User _Role _Installation _Session].freeze
- INTERNAL_FIELDS_DENYLIST =
Field names that are internal to Parse Server’s storage layout and must never appear in returned documents. Most are stripped by ‘Parse::MongoDB.convert_document_to_parse`, but a raw-result path (`raw: true`) bypasses that conversion and would otherwise surface the bcrypt hash, session token, or reset token.
‘sessionToken` / `session_token` (no leading underscore) are the credential column on `_Session` rows. Unlike the `_User`-side `_session_token`, the Session class declares it as a regular property, so without this entry a master-key agent that has had the class explicitly unhidden would receive raw bearer tokens in every row of a `query_class(“_Session”)` response. The denylist is the process-level floor — independent of class-visibility state — so even a deliberate `agent_unhidden` on `_Session` (or a compromised superadmin tool) cannot exfiltrate active tokens.
%w[ _hashed_password _password_history _session_token _sessionToken sessionToken session_token _email_verify_token _perishable_token _failed_login_count _account_lockout_expires_at _rperm _wperm _tombstone _auth_data ].freeze
- INTERNAL_FIELDS_PREFIX_DENYLIST =
Prefix covering per-provider auth-data columns (‘_auth_data_facebook`, `_auth_data_google`, …). Used by strip_internal_fields and by the walk_for_denied! field-name screen.
%w[_auth_data_].freeze
- FORENSIC_OPERATORS =
Forensic string-introspection operators. When any of these appears INSIDE ‘$expr` with a field-reference input string, the query becomes a per-character oracle even though the operator itself is otherwise legitimate. Refused inside `$expr` regardless of the input — the validator does not try to introspect operand shapes deeply, and these operators have no legitimate use against Parse-Server-managed columns from an SDK aggregation.
%w[ $regexMatch $regexFind $regexFindAll $substr $substrBytes $substrCP $indexOfBytes $indexOfCP $strLenBytes $strLenCP $strcasecmp ].freeze
- ALLOWED_STAGES =
Top-level pipeline stages permitted by the strict validator. The set covers Parse-Stack’s own aggregation use, plus Atlas Search entry points (‘$search`, `$searchMeta`, `$listSearchIndexes`) so that `Parse::AtlasSearch` calls do not break. `$vectorSearch` is included for `Parse::VectorSearch` — like `$search`, it is a read-only Atlas index stage and must be the FIRST stage of the pipeline (Atlas refuses it otherwise).
%w[ $match $group $sort $project $limit $skip $unwind $lookup $count $addFields $set $unset $bucket $bucketAuto $facet $sample $sortByCount $replaceRoot $replaceWith $redact $graphLookup $unionWith $search $searchMeta $listSearchIndexes $vectorSearch ].freeze
- STAGE0_ONLY_ATLAS_STAGES =
Atlas operators that are valid only as the FIRST stage of a pipeline (Atlas refuses them anywhere else). They are present in ALLOWED_STAGES so the SDK’s own modules — ‘Parse::AtlasSearch` and `Parse::VectorSearch` — can emit them; both of those modules bypass validate_pipeline! and build their pipelines internally. Caller-supplied pipelines (e.g. through `Parse::Agent::Tools.aggregate`) must NOT include these stages: the Agent’s tenant-scope ‘$match` prepend would push them off stage 0, and the proper agent surface for full-text and vector search is the dedicated `atlas_search` / `semantic_search` tools, not raw aggregate.
%w[ $search $searchMeta $vectorSearch $listSearchIndexes ].freeze
- MAX_REGEX_PATTERN_LENGTH =
Cap on the length of a caller-supplied ‘$regex` (or the `regex:` field inside `$regexMatch` / `$regexFind` / `$regexFindAll`) pattern string. ReDoS protection: doesn’t catch every pathological pattern (small patterns like ‘(a+)+$` can still backtrack catastrophically), but caps the worst class of caller-shipped patterns and stops the “1MB regex” denial-of-service shape that an attacker could send through `vector_filter:` / `filter:` / `where:`. Legitimate Parse-Server queries are well under this.
512- MAX_PIPELINE_STAGES =
Cap on number of top-level stages in a strict-validated pipeline.
20- MAX_DEPTH =
Cap on nested object/array depth during recursive walks. Stops a caller from forcing the validator into a near-infinite traversal. Legitimate Parse-generated pipelines with ‘$facet` containing `$lookup` with `let` and correlated sub-pipelines (`$match.$expr. $and.`) can reach depth 12+ on a normal read, so we keep comfortable headroom above the real ceiling.
20
Class Method Summary collapse
-
.assert_collection_allowed!(name) ⇒ Object
Refuses any collection name reserved for Parse Server’s internal state.
-
.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void
Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that reference a protected field via ‘$<field>` on the RHS of a `$project` / `$addFields` / `$set` / `$group` / `$bucket` / `$replaceWith` / `$lookup.let` clause.
-
.strip_internal_fields(doc) ⇒ Object
Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep – raw search documents are flat).
-
.valid_filter?(node) ⇒ Boolean
True if the node passes permissive validation.
-
.valid_pipeline?(pipeline) ⇒ Boolean
True if the pipeline passes strict validation.
-
.validate_filter!(node, allow_internal_fields: false) ⇒ true
Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS.
-
.validate_pipeline!(pipeline) ⇒ true
Strict validation: pipeline must be a non-empty Array of Hashes, each Hash’s top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.
Class Method Details
.assert_collection_allowed!(name) ⇒ Object
Refuses any collection name reserved for Parse Server’s internal state. Accepts the four SDK-data system classes (‘_User`, `_Role`, `_Installation`, `_Session`) and any non-`_`-prefixed name. Used by `LookupRewriter` and by the Agent’s pipeline walker to enforce a hard floor independent of any per-Agent ‘MetadataRegistry.hidden?` policy.
309 310 311 312 313 314 315 316 317 318 319 320 321 |
# File 'lib/parse/pipeline_security.rb', line 309 def assert_collection_allowed!(name) return if name.nil? str = name.to_s return if str.empty? return unless str.start_with?("_") return if ALLOWED_UNDERSCORE_COLLECTIONS.include?(str) raise Error.new( "SECURITY: Collection '#{str}' is reserved for Parse Server's internal " \ "state and is not reachable from an SDK aggregation pipeline.", operator: str, reason: :denied_internal_collection, ) end |
.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void
This method returns an undefined value.
Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that reference a protected field via ‘$<field>` on the RHS of a `$project` / `$addFields` / `$set` / `$group` / `$bucket` / `$replaceWith` / `$lookup.let` clause.
The protectedFields enforcement layer (CLPScope.redact_protected_fields!) strips the field by NAME from the result rows. But a pipeline can launder a protected field through a rename:
{ "$addFields" => { "ssn_copy" => "$ssn" } }
{ "$project" => { "renamed" => "$ssn", "objectId" => 1 } }
{ "$group" => { "_id" => "$ssn", "n" => { "$sum" => 1 } } }
The post-fetch strip walks the rows and deletes ‘ssn` keys, but the value is now stored under `ssn_copy` / `renamed` / `_id`, so the strip walks past it. This scanner runs BEFORE the pipeline reaches Mongo: any `$<field>` string whose unprefixed name is in the class’s protected-fields set raises CLPScope::Denied so the caller knows the join was refused, rather than silently leaking the renamed value.
Variable references (‘$$ROOT`, `$$CURRENT`, `$$user_var`) are NOT field references — they’re aggregation variables. The walker checks the leading ‘$` is single, not double, before treating the string as a field path.
Master mode + nil resolution short-circuit at the entry: the walker is a no-op when the caller can read everything anyway.
374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
# File 'lib/parse/pipeline_security.rb', line 374 def refuse_protected_field_references!(pipeline, collection_name, resolution) return if resolution.nil? || (resolution.respond_to?(:master?) && resolution.master?) return if pipeline.nil? || pipeline.empty? perms = resolution.respond_to?(:permission_strings) ? resolution. : nil return if perms.nil? # Lazy-require to avoid forcing CLPScope load order when the # caller hasn't otherwise needed it. require_relative "clp_scope" unless defined?(Parse::CLPScope) protected_set = Parse::CLPScope.protected_fields_for(collection_name, perms) return if protected_set.nil? || protected_set.empty? pipeline.each_with_index do |stage, idx| walk_for_protected_ref!(stage, protected_set, collection_name, "pipeline[#{idx}]") end nil end |
.strip_internal_fields(doc) ⇒ Object
Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep – raw search documents are flat). Returns a new Hash; the input is not mutated. Non-Hash inputs return unchanged so callers can pipe arbitrary cursor entries through this.
327 328 329 330 331 332 333 334 335 |
# File 'lib/parse/pipeline_security.rb', line 327 def strip_internal_fields(doc) return doc unless doc.is_a?(Hash) doc.each_with_object({}) do |(key, value), out| k = key.to_s next if INTERNAL_FIELDS_DENYLIST.include?(k) next if INTERNAL_FIELDS_PREFIX_DENYLIST.any? { |prefix| k.start_with?(prefix) } out[key] = value end end |
.valid_filter?(node) ⇒ Boolean
Returns true if the node passes permissive validation.
290 291 292 293 294 295 |
# File 'lib/parse/pipeline_security.rb', line 290 def valid_filter?(node) validate_filter!(node) true rescue Error false end |
.valid_pipeline?(pipeline) ⇒ Boolean
Returns true if the pipeline passes strict validation.
282 283 284 285 286 287 |
# File 'lib/parse/pipeline_security.rb', line 282 def valid_pipeline?(pipeline) validate_pipeline!(pipeline) true rescue Error false end |
.validate_filter!(node, allow_internal_fields: false) ⇒ true
Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS. Does NOT check the top-level stage allowlist or the stage count cap. Used by direct-MongoDB sinks where callers have explicit intent and want flexibility in stage selection, but server-side JS and data-mutating operators must still be refused.
276 277 278 279 |
# File 'lib/parse/pipeline_security.rb', line 276 def validate_filter!(node, allow_internal_fields: false) walk_for_denied!(node, depth: 0, allow_internal_fields: allow_internal_fields) true end |
.validate_pipeline!(pipeline) ⇒ true
Strict validation: pipeline must be a non-empty Array of Hashes, each Hash’s top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/parse/pipeline_security.rb', line 238 def validate_pipeline!(pipeline) unless pipeline.is_a?(Array) raise Error.new("Pipeline must be an Array, got #{pipeline.class}", reason: :invalid_type) end if pipeline.empty? raise Error.new("Pipeline cannot be empty", reason: :empty_pipeline) end if pipeline.size > MAX_PIPELINE_STAGES raise Error.new( "Pipeline exceeds maximum of #{MAX_PIPELINE_STAGES} stages (got #{pipeline.size})", reason: :too_many_stages, ) end pipeline.each_with_index do |stage, idx| validate_stage!(stage, idx) end true end |