Class: Archsight::Import::LicenseAnalyzer

Inherits:
Object
  • Object
show all
Defined in:
lib/archsight/import/license_analyzer.rb

Overview

License detection and dependency license scanning for repositories

Detects the repository’s own license from LICENSE/COPYING files and SPDX headers, then scans dependency licenses using language-specific tools when available.

Examples:

analyzer = Archsight::Import::LicenseAnalyzer.new("/path/to/repo")
result = analyzer.analyze
result["license_spdx"]     # => "Apache-2.0"
result["dependency_risk"]  # => "low"

Constant Summary collapse

SPDX_PATTERNS =

SPDX patterns: match against LICENSE/COPYING file content Order matters - more specific patterns first

[
  { id: "Apache-2.0",   re: /Apache License.*(?:Version 2|v2\.0)/mi },
  { id: "MIT",          re: /\bMIT License\b|Permission is hereby granted, free of charge/mi },
  { id: "BSD-3-Clause", re: /BSD 3-Clause|Redistribution and use.*three conditions/mi },
  { id: "BSD-2-Clause", re: /BSD 2-Clause|Simplified BSD/mi },
  { id: "GPL-3.0",      re: /GNU GENERAL PUBLIC LICENSE.*Version 3/mi },
  { id: "GPL-2.0",      re: /GNU GENERAL PUBLIC LICENSE.*Version 2/mi },
  { id: "AGPL-3.0",     re: /GNU AFFERO GENERAL PUBLIC LICENSE.*Version 3/mi },
  { id: "LGPL-3.0",     re: /GNU LESSER GENERAL PUBLIC LICENSE.*Version 3/mi },
  { id: "LGPL-2.1",     re: /GNU LESSER GENERAL PUBLIC LICENSE.*Version 2\.1/mi },
  { id: "MPL-2.0",      re: /Mozilla Public License.*(?:Version 2|v2\.0)/mi },
  { id: "ISC",          re: /\bISC License\b|ISC\s+license/mi },
  { id: "Unlicense",    re: /\bThis is free and unencumbered software\b/mi },
  { id: "CC0-1.0",      re: /Creative Commons.*CC0|CC0 1\.0 Universal/mi },
  { id: "BSL-1.0",      re: /Boost Software License/mi },
  { id: "BUSL-1.1",     re: /Business Source License.*1\.1/mi },
  { id: "EUPL-1.2",     re: /European Union Public Licen[cs]e.*1\.2/mi }
].freeze
CATEGORIES =

License category classification

{
  "permissive" => %w[Apache-2.0 MIT BSD-3-Clause BSD-2-Clause ISC Unlicense CC0-1.0 BSL-1.0 0BSD Ruby],
  "copyleft" => %w[GPL-3.0 GPL-2.0 AGPL-3.0],
  "weak-copyleft" => %w[LGPL-3.0 LGPL-2.1 MPL-2.0 EUPL-1.2 CDDL-1.0],
  "source-available" => %w[BUSL-1.1],
  "proprietary" => %w[proprietary]
}.freeze
CATEGORY_LOOKUP =
CATEGORIES.each_with_object({}) do |(cat, ids), h|
  ids.each { |id| h[id] = cat }
end.freeze
PROPRIETARY_RE =

Proprietary / copyright patterns — matched against the trimmed string

/
  \bcopyright\b | \bproprietary\b | \bUNLICENSED\b |
  \binternal\b |
  \ACustom:\s |
  \b[a-z0-9-]+\.(com|io|de|net|org|cloud)\b |
  \(c\)\s
/xi
CUSTOM_LICENSE_VALUES =

Custom non-SPDX values we accept

Set.new(%w[NOASSERTION proprietary unknown]).freeze
LICENSE_FILES =

License file names to search (in order of priority)

%w[
  LICENSE LICENSE.md LICENSE.txt
  LICENCE LICENCE.md LICENCE.txt
  COPYING COPYING.md COPYING.txt
].freeze
ECOSYSTEM_MANIFESTS =

Manifest files that indicate an ecosystem

{
  "go" => %w[go.mod],
  "python" => %w[requirements.txt setup.py pyproject.toml Pipfile],
  "ruby" => %w[Gemfile Gemfile.lock],
  "java" => %w[pom.xml build.gradle build.gradle.kts],
  "nodejs" => %w[package.json],
  "rust" => %w[Cargo.toml]
}.freeze
LANGUAGE_TO_ECOSYSTEM =

Map scc language names to ecosystem keys. When scc data is available, only ecosystems matching detected languages are probed.

{
  "Go" => "go",
  "Python" => "python",
  "Ruby" => "ruby",
  "Java" => "java", "Kotlin" => "java", "Groovy" => "java", "Scala" => "java",
  "JavaScript" => "nodejs", "TypeScript" => "nodejs", "JSX" => "nodejs", "TSX" => "nodejs",
  "Rust" => "rust"
}.freeze
@@command_cache =

Cache of resolved command variants per ecosystem. After the first repo probes which tool works, all subsequent repos reuse it. { “go” => [“go-licenses”, …args], “nodejs” => :none, … }

{}
@@command_cache_mutex =

rubocop:disable Style/ClassVars

Mutex.new

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(repo_path, options = {}) ⇒ LicenseAnalyzer

Returns a new instance of LicenseAnalyzer.

Parameters:

  • repo_path (String)

    path to the git repository

  • options (Hash) (defaults to: {})

    optional settings

Options Hash (options):

  • :languages (Array<String>)

    scc language names (e.g. [“Go”, “Python”]) When provided, only ecosystems matching these languages are scanned for dependencies.



103
104
105
106
107
# File 'lib/archsight/import/license_analyzer.rb', line 103

def initialize(repo_path, options = {})
  @repo_path = repo_path
  @options = options
  @languages = options[:languages]
end

Class Method Details

.reset_command_cache!Object

Reset the command cache (useful in tests)



110
111
112
# File 'lib/archsight/import/license_analyzer.rb', line 110

def self.reset_command_cache!
  @@command_cache_mutex.synchronize { @@command_cache = {} } # rubocop:disable Style/ClassVars
end

Instance Method Details

#analyzeObject



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/archsight/import/license_analyzer.rb', line 114

def analyze
  repo_license = detect_repo_license
  dep_data = scan_dependencies

  result = {}
  result["license_spdx"] = repo_license[:spdx]
  result["license_file"] = repo_license[:file] if repo_license[:file]
  result["license_category"] = repo_license[:category]

  result["dependency_count"] = dep_data[:count]
  result["dependency_ecosystems"] = dep_data[:ecosystems].join(",") if dep_data[:ecosystems].any?
  result["dependency_licenses"] = dep_data[:licenses].join(",") if dep_data[:licenses].any?
  result["dependency_copyleft"] = dep_data[:copyleft].to_s
  result["dependency_risk"] = dep_data[:risk]
  result["dependency_license_counts"] = dep_data[:license_counts] if dep_data[:license_counts].any?

  result
end