Class: OllamaAgent::Indexing::RepoScanner

Inherits:
Object
  • Object
show all
Defined in:
lib/ollama_agent/indexing/repo_scanner.rb

Overview

Scans a repository and returns a file inventory with language tags. Language detection is extension-based (no external gems required). Used by ContextPacker to select relevant files for the agent context.

Defined Under Namespace

Classes: FileEntry

Constant Summary collapse

LANGUAGE_EXTENSIONS =
{
  ruby: %w[.rb .rake .gemspec],
  javascript: %w[.js .jsx .mjs .cjs],
  typescript: %w[.ts .tsx],
  python: %w[.py .pyw],
  go: %w[.go],
  rust: %w[.rs],
  java: %w[.java],
  kotlin: %w[.kt .kts],
  swift: %w[.swift],
  cpp: %w[.cpp .cc .cxx .hpp .hh .h],
  c: %w[.c .h],
  csharp: %w[.cs],
  php: %w[.php],
  elixir: %w[.ex .exs],
  erlang: %w[.erl .hrl],
  haskell: %w[.hs .lhs],
  scala: %w[.scala],
  clojure: %w[.clj .cljs .cljc],
  shell: %w[.sh .bash .zsh .fish],
  yaml: %w[.yml .yaml],
  json: %w[.json .jsonc],
  toml: %w[.toml],
  markdown: %w[.md .mdx .markdown],
  html: %w[.html .htm .xhtml],
  css: %w[.css .scss .sass .less],
  sql: %w[.sql],
  dockerfile: %w[Dockerfile],
  terraform: %w[.tf .tfvars],
  proto: %w[.proto]
}.freeze
IGNORED_DIRS =
%w[
  .git .svn .hg .bzr
  node_modules vendor .bundle
  tmp log coverage .nyc_output dist build out target
  __pycache__ .pytest_cache .mypy_cache .tox venv env .venv
  .ollama_agent .idea .vscode .cursor
].freeze
IGNORED_FILES =
%w[
  Gemfile.lock yarn.lock package-lock.json pnpm-lock.yaml
  .DS_Store Thumbs.db *.min.js *.min.css
].freeze

Instance Method Summary collapse

Constructor Details

#initialize(root:, exclude_dirs: nil, max_file_size: 1_048_576) ⇒ RepoScanner

Returns a new instance of RepoScanner.



58
59
60
61
62
63
# File 'lib/ollama_agent/indexing/repo_scanner.rb', line 58

def initialize(root:, exclude_dirs: nil, max_file_size: 1_048_576)
  @root            = File.expand_path(root)
  @exclude_dirs    = (exclude_dirs || []) + IGNORED_DIRS
  @max_file_size   = max_file_size
  @ext_map         = build_ext_map
end

Instance Method Details

#recently_modified(n: 20) ⇒ Object

Files most recently modified.



117
118
119
# File 'lib/ollama_agent/indexing/repo_scanner.rb', line 117

def recently_modified(n: 20)
  scan.max_by(n, &:modified_at)
end

#scan(languages: nil) ⇒ Array<FileEntry>

Scan the repository and return FileEntry objects.

Parameters:

  • languages (Array<Symbol>, nil) (defaults to: nil)

    filter to specific languages

Returns:



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# File 'lib/ollama_agent/indexing/repo_scanner.rb', line 68

def scan(languages: nil)
  results = []

  Find.find(@root) do |path|
    basename = File.basename(path)

    if File.directory?(path)
      Find.prune if prune_dir?(path, basename)
      next
    end

    next unless File.file?(path)
    next if ignored_file?(basename)

    size = File.size(path)
    next if size > @max_file_size

    lang = detect_language(path)
    next if languages && !languages.map(&:to_sym).include?(lang)

    rel = path.sub("#{@root}/", "")
    results << FileEntry.new(
      path: path,
      relative_path: rel,
      language: lang,
      size: size,
      modified_at: File.mtime(path)
    )
  rescue StandardError
    next
  end

  results.sort_by(&:relative_path)
end

#statsObject

Summary statistics about the repository.



104
105
106
107
108
109
110
111
112
113
114
# File 'lib/ollama_agent/indexing/repo_scanner.rb', line 104

def stats
  files = scan
  by_lang = files.group_by(&:language)

  {
    total_files: files.size,
    total_bytes: files.sum(&:size),
    root: @root,
    languages: by_lang.transform_values { |fs| { files: fs.size, bytes: fs.sum(&:size) } }
  }
end