Module: DurableHuggingfaceHub::Cache
- Defined in:
- lib/durable_huggingface_hub/cache.rb
Defined Under Namespace
Classes: DeleteCacheStrategy
Class Method Summary collapse
-
.cached_assets_path(repo_id:, repo_type: "model", cache_dir: nil) ⇒ Pathname?
Get the path to cached assets for a repository.
-
.get_refs_for_commit(repo_dir, commit_hash) ⇒ Array<String>
Gets refs (branches/tags) that point to a specific commit.
-
.scan_cache_dir(cache_dir: nil) ⇒ DurableHuggingfaceHub::Types::HFCacheInfo
Scans the cache directory and returns comprehensive information about cached content.
-
.scan_file(file_path, commit_hash) ⇒ DurableHuggingfaceHub::Types::CachedFileInfo
Scans a single file and returns file information.
-
.scan_repository(repo_dir) ⇒ DurableHuggingfaceHub::Types::CachedRepoInfo?
Scans a single repository directory and returns repository information.
-
.scan_revision(repo_dir, revision_dir, repo_type) ⇒ DurableHuggingfaceHub::Types::CachedRevisionInfo?
Scans a revision directory and returns revision information.
Class Method Details
.cached_assets_path(repo_id:, repo_type: "model", cache_dir: nil) ⇒ Pathname?
Get the path to cached assets for a repository.
This utility function helps locate cached files and directories for a specific repository.
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
# File 'lib/durable_huggingface_hub/cache.rb', line 258 def self.cached_assets_path(repo_id:, repo_type: "model", cache_dir: nil) DurableHuggingfaceHub::Utils::Validators.validate_repo_id(repo_id) repo_type = DurableHuggingfaceHub::Utils::Validators.validate_repo_type(repo_type) cache_dir = FileDownload.resolve_cache_dir(cache_dir) # Build the expected repository directory name repo_id_parts = repo_id.split("/") if repo_id_parts.length == 2 folder_name = "#{repo_type}s--#{repo_id_parts[0]}--#{repo_id_parts[1]}" else folder_name = "#{repo_type}s--#{repo_id}" end repo_path = cache_dir.join(folder_name) repo_path.exist? ? repo_path : nil end |
.get_refs_for_commit(repo_dir, commit_hash) ⇒ Array<String>
Gets refs (branches/tags) that point to a specific commit.
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/durable_huggingface_hub/cache.rb', line 218 def self.get_refs_for_commit(repo_dir, commit_hash) refs = [] refs_dir = repo_dir.join("refs") return refs unless refs_dir.exist? refs_dir.glob("**/*") do |ref_file| next if ref_file.directory? begin ref_commit = ref_file.read.strip if ref_commit == commit_hash # Get relative path from refs directory rel_path = ref_file.relative_path_from(refs_dir).to_s refs << rel_path end rescue # Skip unreadable ref files next end end refs end |
.scan_cache_dir(cache_dir: nil) ⇒ DurableHuggingfaceHub::Types::HFCacheInfo
Scans the cache directory and returns comprehensive information about cached content.
This method analyzes the cache structure and provides detailed information about all cached repositories, revisions, and files.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/durable_huggingface_hub/cache.rb', line 27 def self.scan_cache_dir(cache_dir: nil) cache_dir = FileDownload.resolve_cache_dir(cache_dir) unless cache_dir.exist? # Return empty cache info if directory doesn't exist return DurableHuggingfaceHub::Types::HFCacheInfo.new( cache_dir: cache_dir, repos: [], size: 0 ) end repos = [] total_size = 0 # Scan each repository directory cache_dir.each_child do |repo_dir| next unless repo_dir.directory? repo_info = scan_repository(repo_dir) next unless repo_info repos << repo_info total_size += repo_info.size end DurableHuggingfaceHub::Types::HFCacheInfo.new( cache_dir: cache_dir, repos: repos, size: total_size ) end |
.scan_file(file_path, commit_hash) ⇒ DurableHuggingfaceHub::Types::CachedFileInfo
Scans a single file and returns file information.
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
# File 'lib/durable_huggingface_hub/cache.rb', line 172 def self.scan_file(file_path, commit_hash) # Get file stats, handling broken symlinks stat = begin file_path.stat rescue Errno::ENOENT # For broken symlinks, use lstat to get link info file_path.lstat end # Try to get ETag from blob metadata if this is a symlink etag = nil if file_path.symlink? begin target_path = file_path.readlink if target_path.absolute? # This should point to a blob file blob_name = target_path.basename.to_s etag = blob_name if blob_name.match?(/^[a-f0-9]{40,}$/) # SHA-like hash end rescue Errno::ENOENT # Broken symlink, no ETag available etag = nil end else # For direct files, we might not have ETag info etag = nil end # Build attributes hash attrs = { file_path: file_path, size: stat.size, etag: etag, commit_hash: commit_hash, last_accessed: stat.atime, last_modified: stat.mtime } DurableHuggingfaceHub::Types::CachedFileInfo.new(attrs) end |
.scan_repository(repo_dir) ⇒ DurableHuggingfaceHub::Types::CachedRepoInfo?
Scans a single repository directory and returns repository information.
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/durable_huggingface_hub/cache.rb', line 64 def self.scan_repository(repo_dir) # Parse repo_id and repo_type from directory name # Format: {repo_type}s--{namespace}--{name} or {repo_type}s--{name} dir_name = repo_dir.basename.to_s match = dir_name.match(/^(\w+)s--(.+)$/) return nil unless match repo_type = match[1] # "model", "dataset", or "space" repo_id_part = match[2] # Convert back to repo_id format (handle both namespace/name and just name) if repo_id_part.include?("--") repo_id = repo_id_part.gsub("--", "/") else repo_id = repo_id_part end revisions = [] total_size = 0 last_accessed = nil last_modified = nil # Scan snapshots directory snapshots_dir = repo_dir.join("snapshots") if snapshots_dir.exist? snapshots_dir.each_child do |revision_dir| next unless revision_dir.directory? revision_info = scan_revision(repo_dir, revision_dir, repo_type) next unless revision_info revisions << revision_info total_size += revision_info.size # Track last accessed/modified times if revision_info.last_modified last_modified = [last_modified, revision_info.last_modified].compact.max end revision_info.files.each do |file_info| if file_info.last_accessed last_accessed = [last_accessed, file_info.last_accessed].compact.max end end end end return nil if revisions.empty? DurableHuggingfaceHub::Types::CachedRepoInfo.new( repo_id: repo_id, repo_type: repo_type, revisions: revisions, size: total_size, last_accessed: last_accessed, last_modified: last_modified ) end |
.scan_revision(repo_dir, revision_dir, repo_type) ⇒ DurableHuggingfaceHub::Types::CachedRevisionInfo?
Scans a revision directory and returns revision information.
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/durable_huggingface_hub/cache.rb', line 129 def self.scan_revision(repo_dir, revision_dir, repo_type) commit_hash = revision_dir.basename.to_s files = [] total_size = 0 last_modified = nil # Get refs pointing to this commit refs = get_refs_for_commit(repo_dir, commit_hash) # Scan all files in the revision revision_dir.glob("**/*") do |file_path| next if file_path.directory? begin file_info = scan_file(file_path, commit_hash) files << file_info total_size += file_info.size if file_info.last_modified last_modified = [last_modified, file_info.last_modified].compact.max end rescue => e # Skip files that can't be analyzed next end end return nil if files.empty? DurableHuggingfaceHub::Types::CachedRevisionInfo.new( commit_hash: commit_hash, refs: refs, files: files, size: total_size, last_modified: last_modified ) end |