Class: Gitlab::SecretDetection::Core::Scanner
- Inherits:
-
Object
- Object
- Gitlab::SecretDetection::Core::Scanner
- Defined in:
- lib/gitlab/secret_detection/core/scanner.rb
Overview
Scan is responsible for running Secret Detection scan operation
Constant Summary collapse
- DEFAULT_SCAN_TIMEOUT_SECS =
default time limit(in seconds) for running the scan operation per invocation
180- DEFAULT_PAYLOAD_TIMEOUT_SECS =
default time limit(in seconds) for running the scan operation on a single payload
30- DEFAULT_PATTERN_MATCHER_TAGS =
Tags used for creating default pattern matcher
['gitlab_blocking'].freeze
- MAX_PROCS_PER_REQUEST =
Max no of child processes to spawn per request ref: gitlab.com/gitlab-org/gitlab/-/issues/430160
5- MIN_CHUNK_SIZE_PER_PROC_BYTES =
Minimum cumulative size of the payloads required to spawn and run the scan within a new subprocess.
2_097_152- RUN_IN_SUBPROCESS =
Whether to run scan in subprocesses or not. Default is false.
false
Instance Method Summary collapse
-
#initialize(rules:, logger: Logger.new($stdout)) ⇒ Scanner
constructor
Initializes the instance with logger along with following operations: 1.
-
#secrets_scan(payloads, timeout: DEFAULT_SCAN_TIMEOUT_SECS, payload_timeout: DEFAULT_PAYLOAD_TIMEOUT_SECS, exclusions: {}, tags: DEFAULT_PATTERN_MATCHER_TAGS, subprocess: RUN_IN_SUBPROCESS) ⇒ Object
Runs Secret Detection scan on the list of given payloads.
Constructor Details
#initialize(rules:, logger: Logger.new($stdout)) ⇒ Scanner
Initializes the instance with logger along with following operations:
-
Extract keywords from the parsed ruleset to use it for matching keywords before regex operation.
-
Build and Compile rule regex patterns obtained from the ruleset with
DEFAULT_PATTERN_MATCHER_TAGS
tags. Raises RulesetCompilationError in case the regex pattern compilation fails.
33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/gitlab/secret_detection/core/scanner.rb', line 33 def initialize(rules:, logger: Logger.new($stdout)) @logger = logger @rules = rules @keywords = create_keywords(rules) @default_keyword_matcher = build_keyword_matcher( tags: DEFAULT_PATTERN_MATCHER_TAGS, include_missing_tags: false ) @default_pattern_matcher, @default_rules = build_pattern_matcher( tags: DEFAULT_PATTERN_MATCHER_TAGS, include_missing_tags: false ) # includes only gitlab_blocking rules end |
Instance Method Details
#secrets_scan(payloads, timeout: DEFAULT_SCAN_TIMEOUT_SECS, payload_timeout: DEFAULT_PAYLOAD_TIMEOUT_SECS, exclusions: {}, tags: DEFAULT_PATTERN_MATCHER_TAGS, subprocess: RUN_IN_SUBPROCESS) ⇒ Object
Runs Secret Detection scan on the list of given payloads. Both the total scan duration and the duration for each payload is time bound via timeout and payload_timeout respectively.
payloads-
Array of payloads where each payload should have ‘id` and `data` properties.
timeout-
No of seconds(accepts floating point for smaller time values) to limit the total scan duration
payload_timeout-
No of seconds(accepts floating point for smaller time values) to limit
the scan duration on each payload
exclusions-
Hash with keys: :raw_value, :rule and values of arrays of either
GRPC::Exclusion objects (when used as a standalone service)
or Security::ProjectSecurityExclusion objects (when used as gem).
:raw_value - Exclusions in the :raw array are the raw values to ignore.
:rule - Exclusions in the :rule array are the rules to exclude from the ruleset used for the scan.
Each rule is represented by its ID. For example: `gitlab_personal_access_token`
for representing Gitlab Personal Access Token. By default, no rule is excluded from the ruleset.
tags-
Array of tag values to filter from the default ruleset when determining the rules used for the scan.
For example: Add `gitlab_blocking` to include only rules for Push Protection. Defaults to [`gitlab_blocking`] (+DEFAULT_PATTERN_MATCHER_TAGS+).
NOTE: Running the scan in fork mode primarily focuses on reducing the memory consumption of the scan by offloading regex operations on large payloads to sub-processes. However, it does not assure the improvement in the overall latency of the scan, specifically in the case of smaller payloads, where the overhead of forking a new process adds to the overall latency of the scan instead. More reference on Subprocess-based execution is found here: gitlab.com/gitlab-org/gitlab/-/issues/430160.
Returns an instance of Gitlab::SecretDetection::Core::Response by following below structure:
status: One of the Core::Status values
results: [SecretDetection::Finding]
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# File 'lib/gitlab/secret_detection/core/scanner.rb', line 78 def secrets_scan( payloads, timeout: DEFAULT_SCAN_TIMEOUT_SECS, payload_timeout: DEFAULT_PAYLOAD_TIMEOUT_SECS, exclusions: {}, tags: DEFAULT_PATTERN_MATCHER_TAGS, subprocess: RUN_IN_SUBPROCESS ) return Core::Response.new(status: Core::Status::INPUT_ERROR) unless validate_scan_input(payloads) # assign defaults since grpc passing zero timeout value to `Timeout.timeout(..)` makes it effectively useless. timeout = DEFAULT_SCAN_TIMEOUT_SECS unless timeout.positive? payload_timeout = DEFAULT_PAYLOAD_TIMEOUT_SECS unless payload_timeout.positive? = DEFAULT_PATTERN_MATCHER_TAGS if .empty? Timeout.timeout(timeout) do keyword_matcher = build_keyword_matcher(tags:) matched_payloads = filter_by_keywords(keyword_matcher, payloads) next Core::Response.new(status: Core::Status::NOT_FOUND) if matched_payloads.empty? # the pattern matcher will filter rules by tags so we use the filtered rule list pattern_matcher, active_rules = build_pattern_matcher(tags:) scan_args = { payloads: matched_payloads, payload_timeout:, pattern_matcher:, exclusions:, rules: active_rules }.freeze logger.info( message: "Scan input parameters for running Secret Detection scan", timeout:, payload_timeout:, given_total_payloads: payloads.length, scannable_payloads_post_keyword_filter: matched_payloads.length, tags:, run_in_subprocess: subprocess, given_exclusions: format_exclusions_hash(exclusions) ) secrets, applied_exclusions = subprocess ? run_scan_within_subprocess(**scan_args) : run_scan(**scan_args) scan_status = overall_scan_status(secrets) logger.info( message: "Secret Detection scan completed with #{secrets.length} secrets detected in the given payloads", detected_secrets_metadata: (secrets), applied_exclusions: format_exclusions_arr(applied_exclusions) ) Core::Response.new(status: scan_status, results: secrets, applied_exclusions:) end rescue Timeout::Error => e logger.error "Secret detection operation timed out: #{e}" Core::Response.new(status: Core::Status::SCAN_TIMEOUT) end |