Module: Parse::PipelineSecurity
- Defined in:
- lib/parse/pipeline_security.rb
Overview
Canonical security validator for MongoDB aggregation pipelines and filter hashes that the SDK forwards to the driver or to Parse Server.
Previously the codebase had three different validators with three different rule sets:
-
‘Parse::Agent::PipelineValidator` — strict allowlist for the Agent (read-only paths only)
-
‘Parse::Query#validate_pipeline!` — outer-stage-only denylist
-
‘Parse::MongoDB.assert_no_denied_operators!` — recursive denylist of server-side JS operators
‘Parse::AtlasSearch.convert_filter_for_mongodb` was a complete passthrough that bypassed all three. A user-supplied filter containing `$where`/`$expr`/`$function`/`$regex` was injected straight into the pipeline `$match` stage, bypassing every existing constraint guard.
This module consolidates the rules. Every entry point that forwards a caller-supplied pipeline or filter to MongoDB now routes through one of the two public methods here:
-
PipelineSecurity.validate_pipeline! — strict mode (allowlist + size/depth caps). Used by ‘Parse::Agent` and by `Parse::Query#aggregate` for user-facing aggregation entry points.
-
PipelineSecurity.validate_filter! — permissive mode (recursive denylist only). Used by ‘Parse::MongoDB.find/aggregate` and Atlas Search filter passthrough where the pipeline is constructed by SDK code but a user-controlled filter hash is interpolated. Refuses `$where`/`$function`/`$accumulator` and the data-mutating stages at any nesting depth.
Policy: allowlist top-level, denylist recursive
Strict mode enforces ALLOWED_STAGES ONLY at the top-level stage key — nested sub-pipelines (inside ‘$lookup.pipeline`, `$unionWith.pipeline`, `$facet.*`, `$graphLookup`) are walked with the operator denylist but NOT with the stage allowlist. This is intentional: Atlas Search and uncommon-but-legitimate read stages like `$densify` and `$fill` must be allowed inside sub-pipelines even when the outer pipeline is strict-validated. The denylist is the security boundary; the allowlist is a shape check.
Caveat for Query#aggregate callers
‘Parse::Query#aggregate` routes through PipelineSecurity.validate_filter!, not PipelineSecurity.validate_pipeline!, so user-supplied pipelines are checked against the denylist only. Permissive mode does NOT block `$lookup`, `$graphLookup`, or `$unionWith` reading from arbitrary collections — these are legitimate read stages but powerful enough to cross Parse ACL/CLP boundaries when the source collection lacks row-level enforcement. **Never pass raw attacker-controlled input into `Parse::Query#aggregate`.** Construct the pipeline in SDK code and interpolate only validated values.
Capability gap: ‘$expr`
‘$expr` itself is not in DENIED_OPERATORS. The recursive walker catches `$function`/`$accumulator` nested inside `$expr`, so the immediate JavaScript-execution risk is closed. A future Atlas operator gated under `$expr` would slip until DENIED_OPERATORS is extended. Defense-in-depth callers concerned about expensive aggregation expressions (`$regexMatch` ReDoS, large `$reduce` loops) should validate user input shape before reaching this module.
Defined Under Namespace
Classes: Error
Constant Summary collapse
- DENIED_OPERATORS =
Operators that are ALWAYS refused at any nesting depth. These either execute server-side JavaScript (‘$where`, `$function`, `$accumulator`) or mutate the database (`$out`, `$merge`) or the server itself (`$collMod`, `$createIndex`, `$dropIndex`, `$planCacheSetFilter`, `$planCacheClear`). None of them are needed for read queries.
%w[ $where $function $accumulator $out $merge $collMod $createIndex $dropIndex $planCacheSetFilter $planCacheClear ].freeze
- DENIED_FIELD_REFS =
Field-reference paths (string values inside ‘$expr` whose first byte is `$`) that point at server-internal columns and must never be reachable from a user-influenced pipeline. A boolean expression inside `$expr` over any of these is a 1-bit-per-query side channel that bisects the value of a bcrypt hash, session token, or password-reset token. Names match Parse Server’s internal column layout (cf. MongoStorageAdapter).
%w[ $_hashed_password $_password_history $_session_token $_sessionToken $_email_verify_token $_perishable_token $_failed_login_count $_account_lockout_expires_at $_rperm $_wperm $_auth_data ].freeze
- DENIED_FIELD_REF_PREFIXES =
String prefix for per-provider auth-data field references inside $expr. Parse Server stores per-provider columns as ‘_auth_data_facebook`, `_auth_data_google`, etc. — none of these should be reachable from a user-influenced pipeline. The prefix `$auth_data` covers all of them without requiring an exhaustive list.
%w[$_auth_data_].freeze
- ALLOWED_UNDERSCORE_COLLECTIONS =
MongoDB collection names that an SDK aggregation IS permitted to name in ‘from:`/`coll:`. Any name starting with `_` outside this set is refused as an internal Parse Server collection. The four entries here are the only `_`-prefixed collections that hold Parse SDK data classes; everything else with a leading `_` is server-managed state (`_SCHEMA` discloses class-level permissions; `_Hooks` discloses Cloud Code webhook URLs + secret keys; `_GraphQLConfig` discloses GraphQL schema state; `_Audit` holds operational telemetry; `_Idempotency`/`_PushStatus`/ `_JobStatus`/`_JobSchedule`/`_GlobalConfig`/`_Audience` hold internal Parse Server bookkeeping).
%w[_User _Role _Installation _Session].freeze
- INTERNAL_FIELDS_DENYLIST =
Field names that are internal to Parse Server’s storage layout and must never appear in returned documents. Most are stripped by ‘Parse::MongoDB.convert_document_to_parse`, but a raw-result path (`raw: true`) bypasses that conversion and would otherwise surface the bcrypt hash, session token, or reset token.
‘sessionToken` / `session_token` (no leading underscore) are the credential column on `_Session` rows. Unlike the `_User`-side `_session_token`, the Session class declares it as a regular property, so without this entry a master-key agent that has had the class explicitly unhidden would receive raw bearer tokens in every row of a `query_class(“_Session”)` response. The denylist is the process-level floor — independent of class-visibility state — so even a deliberate `agent_unhidden` on `_Session` (or a compromised superadmin tool) cannot exfiltrate active tokens.
%w[ _hashed_password _password_history _session_token _sessionToken sessionToken session_token _email_verify_token _perishable_token _failed_login_count _account_lockout_expires_at _rperm _wperm _tombstone _auth_data ].freeze
- INTERNAL_FIELDS_PREFIX_DENYLIST =
Prefix covering per-provider auth-data columns (‘_auth_data_facebook`, `_auth_data_google`, …). Used by strip_internal_fields and by the walk_for_denied! field-name screen.
%w[_auth_data_].freeze
- FORENSIC_OPERATORS =
Forensic string-introspection operators. When any of these appears INSIDE ‘$expr` with a field-reference input string, the query becomes a per-character oracle even though the operator itself is otherwise legitimate. Refused inside `$expr` regardless of the input — the validator does not try to introspect operand shapes deeply, and these operators have no legitimate use against Parse-Server-managed columns from an SDK aggregation.
%w[ $regexMatch $regexFind $regexFindAll $substr $substrBytes $substrCP $indexOfBytes $indexOfCP $strLenBytes $strLenCP $strcasecmp ].freeze
- ALLOWED_STAGES =
Top-level pipeline stages permitted by the strict validator. The set covers Parse-Stack’s own aggregation use, plus Atlas Search entry points (‘$search`, `$searchMeta`, `$listSearchIndexes`) so that `Parse::AtlasSearch` calls do not break.
%w[ $match $group $sort $project $limit $skip $unwind $lookup $count $addFields $set $unset $bucket $bucketAuto $facet $sample $sortByCount $replaceRoot $replaceWith $redact $graphLookup $unionWith $search $searchMeta $listSearchIndexes ].freeze
- MAX_PIPELINE_STAGES =
Cap on number of top-level stages in a strict-validated pipeline.
20- MAX_DEPTH =
Cap on nested object/array depth during recursive walks. Stops a caller from forcing the validator into a near-infinite traversal. Legitimate Parse-generated pipelines with ‘$facet` containing `$lookup` with `let` and correlated sub-pipelines (`$match.$expr. $and.`) can reach depth 12+ on a normal read, so we keep comfortable headroom above the real ceiling.
20
Class Method Summary collapse
-
.assert_collection_allowed!(name) ⇒ Object
Refuses any collection name reserved for Parse Server’s internal state.
-
.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void
Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that reference a protected field via ‘$<field>` on the RHS of a `$project` / `$addFields` / `$set` / `$group` / `$bucket` / `$replaceWith` / `$lookup.let` clause.
-
.strip_internal_fields(doc) ⇒ Object
Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep – raw search documents are flat).
-
.valid_filter?(node) ⇒ Boolean
True if the node passes permissive validation.
-
.valid_pipeline?(pipeline) ⇒ Boolean
True if the pipeline passes strict validation.
-
.validate_filter!(node, allow_internal_fields: false) ⇒ true
Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS.
-
.validate_pipeline!(pipeline) ⇒ true
Strict validation: pipeline must be a non-empty Array of Hashes, each Hash’s top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.
Class Method Details
.assert_collection_allowed!(name) ⇒ Object
Refuses any collection name reserved for Parse Server’s internal state. Accepts the four SDK-data system classes (‘_User`, `_Role`, `_Installation`, `_Session`) and any non-`_`-prefixed name. Used by `LookupRewriter` and by the Agent’s pipeline walker to enforce a hard floor independent of any per-Agent ‘MetadataRegistry.hidden?` policy.
282 283 284 285 286 287 288 289 290 291 292 293 294 |
# File 'lib/parse/pipeline_security.rb', line 282 def assert_collection_allowed!(name) return if name.nil? str = name.to_s return if str.empty? return unless str.start_with?("_") return if ALLOWED_UNDERSCORE_COLLECTIONS.include?(str) raise Error.new( "SECURITY: Collection '#{str}' is reserved for Parse Server's internal " \ "state and is not reachable from an SDK aggregation pipeline.", operator: str, reason: :denied_internal_collection, ) end |
.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void
This method returns an undefined value.
Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that reference a protected field via ‘$<field>` on the RHS of a `$project` / `$addFields` / `$set` / `$group` / `$bucket` / `$replaceWith` / `$lookup.let` clause.
The protectedFields enforcement layer (CLPScope.redact_protected_fields!) strips the field by NAME from the result rows. But a pipeline can launder a protected field through a rename:
{ "$addFields" => { "ssn_copy" => "$ssn" } }
{ "$project" => { "renamed" => "$ssn", "objectId" => 1 } }
{ "$group" => { "_id" => "$ssn", "n" => { "$sum" => 1 } } }
The post-fetch strip walks the rows and deletes ‘ssn` keys, but the value is now stored under `ssn_copy` / `renamed` / `_id`, so the strip walks past it. This scanner runs BEFORE the pipeline reaches Mongo: any `$<field>` string whose unprefixed name is in the class’s protected-fields set raises CLPScope::Denied so the caller knows the join was refused, rather than silently leaking the renamed value.
Variable references (‘$$ROOT`, `$$CURRENT`, `$$user_var`) are NOT field references — they’re aggregation variables. The walker checks the leading ‘$` is single, not double, before treating the string as a field path.
Master mode + nil resolution short-circuit at the entry: the walker is a no-op when the caller can read everything anyway.
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
# File 'lib/parse/pipeline_security.rb', line 347 def refuse_protected_field_references!(pipeline, collection_name, resolution) return if resolution.nil? || (resolution.respond_to?(:master?) && resolution.master?) return if pipeline.nil? || pipeline.empty? perms = resolution.respond_to?(:permission_strings) ? resolution. : nil return if perms.nil? # Lazy-require to avoid forcing CLPScope load order when the # caller hasn't otherwise needed it. require_relative "clp_scope" unless defined?(Parse::CLPScope) protected_set = Parse::CLPScope.protected_fields_for(collection_name, perms) return if protected_set.nil? || protected_set.empty? pipeline.each_with_index do |stage, idx| walk_for_protected_ref!(stage, protected_set, collection_name, "pipeline[#{idx}]") end nil end |
.strip_internal_fields(doc) ⇒ Object
Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep – raw search documents are flat). Returns a new Hash; the input is not mutated. Non-Hash inputs return unchanged so callers can pipe arbitrary cursor entries through this.
300 301 302 303 304 305 306 307 308 |
# File 'lib/parse/pipeline_security.rb', line 300 def strip_internal_fields(doc) return doc unless doc.is_a?(Hash) doc.each_with_object({}) do |(key, value), out| k = key.to_s next if INTERNAL_FIELDS_DENYLIST.include?(k) next if INTERNAL_FIELDS_PREFIX_DENYLIST.any? { |prefix| k.start_with?(prefix) } out[key] = value end end |
.valid_filter?(node) ⇒ Boolean
Returns true if the node passes permissive validation.
263 264 265 266 267 268 |
# File 'lib/parse/pipeline_security.rb', line 263 def valid_filter?(node) validate_filter!(node) true rescue Error false end |
.valid_pipeline?(pipeline) ⇒ Boolean
Returns true if the pipeline passes strict validation.
255 256 257 258 259 260 |
# File 'lib/parse/pipeline_security.rb', line 255 def valid_pipeline?(pipeline) validate_pipeline!(pipeline) true rescue Error false end |
.validate_filter!(node, allow_internal_fields: false) ⇒ true
Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS. Does NOT check the top-level stage allowlist or the stage count cap. Used by direct-MongoDB sinks where callers have explicit intent and want flexibility in stage selection, but server-side JS and data-mutating operators must still be refused.
249 250 251 252 |
# File 'lib/parse/pipeline_security.rb', line 249 def validate_filter!(node, allow_internal_fields: false) walk_for_denied!(node, depth: 0, allow_internal_fields: allow_internal_fields) true end |
.validate_pipeline!(pipeline) ⇒ true
Strict validation: pipeline must be a non-empty Array of Hashes, each Hash’s top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
# File 'lib/parse/pipeline_security.rb', line 211 def validate_pipeline!(pipeline) unless pipeline.is_a?(Array) raise Error.new("Pipeline must be an Array, got #{pipeline.class}", reason: :invalid_type) end if pipeline.empty? raise Error.new("Pipeline cannot be empty", reason: :empty_pipeline) end if pipeline.size > MAX_PIPELINE_STAGES raise Error.new( "Pipeline exceeds maximum of #{MAX_PIPELINE_STAGES} stages (got #{pipeline.size})", reason: :too_many_stages, ) end pipeline.each_with_index do |stage, idx| validate_stage!(stage, idx) end true end |