Module: DurableHuggingfaceHub::Cache

Defined in:
lib/durable_huggingface_hub/cache.rb

Defined Under Namespace

Classes: DeleteCacheStrategy

Class Method Summary collapse

Class Method Details

.cached_assets_path(repo_id:, repo_type: "model", cache_dir: nil) ⇒ Pathname?

Get the path to cached assets for a repository.

This utility function helps locate cached files and directories for a specific repository.

Examples:

Get cache path for a model

cache_path = DurableHuggingfaceHub::Cache.cached_assets_path(
  repo_id: "bert-base-uncased",
  repo_type: "model"
)
puts cache_path # /home/user/.cache/huggingface/hub/models--bert-base-uncased

Parameters:

  • repo_id (String)

    Repository ID

  • repo_type (String) (defaults to: "model")

    Type of repository (“model”, “dataset”, or “space”)

  • cache_dir (String, Pathname, nil) (defaults to: nil)

    Custom cache directory

Returns:

  • (Pathname, nil)

    Path to the repository’s cache directory, or nil if not found



258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'lib/durable_huggingface_hub/cache.rb', line 258

def self.cached_assets_path(repo_id:, repo_type: "model", cache_dir: nil)
  DurableHuggingfaceHub::Utils::Validators.validate_repo_id(repo_id)
  repo_type = DurableHuggingfaceHub::Utils::Validators.validate_repo_type(repo_type)

  cache_dir = FileDownload.resolve_cache_dir(cache_dir)

  # Build the expected repository directory name
  repo_id_parts = repo_id.split("/")
  if repo_id_parts.length == 2
    folder_name = "#{repo_type}s--#{repo_id_parts[0]}--#{repo_id_parts[1]}"
  else
    folder_name = "#{repo_type}s--#{repo_id}"
  end

  repo_path = cache_dir.join(folder_name)
  repo_path.exist? ? repo_path : nil
end

.get_refs_for_commit(repo_dir, commit_hash) ⇒ Array<String>

Gets refs (branches/tags) that point to a specific commit.

Parameters:

  • repo_dir (Pathname)

    Repository directory

  • commit_hash (String)

    Commit hash to find refs for

Returns:

  • (Array<String>)

    List of refs pointing to this commit



218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
# File 'lib/durable_huggingface_hub/cache.rb', line 218

def self.get_refs_for_commit(repo_dir, commit_hash)
  refs = []
  refs_dir = repo_dir.join("refs")

  return refs unless refs_dir.exist?

  refs_dir.glob("**/*") do |ref_file|
    next if ref_file.directory?

    begin
      ref_commit = ref_file.read.strip
      if ref_commit == commit_hash
        # Get relative path from refs directory
        rel_path = ref_file.relative_path_from(refs_dir).to_s
        refs << rel_path
      end
    rescue
      # Skip unreadable ref files
      next
    end
  end

  refs
end

.scan_cache_dir(cache_dir: nil) ⇒ DurableHuggingfaceHub::Types::HFCacheInfo

Scans the cache directory and returns comprehensive information about cached content.

This method analyzes the cache structure and provides detailed information about all cached repositories, revisions, and files.

Examples:

Scan default cache directory

cache_info = DurableHuggingfaceHub.scan_cache_dir

Scan custom cache directory

cache_info = DurableHuggingfaceHub.scan_cache_dir(cache_dir: "/custom/cache")

Parameters:

  • cache_dir (String, Pathname, nil) (defaults to: nil)

    Custom cache directory path. If nil, uses the default cache directory.

Returns:

Raises:

  • (ArgumentError)

    If cache_dir is invalid



27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/durable_huggingface_hub/cache.rb', line 27

def self.scan_cache_dir(cache_dir: nil)
  cache_dir = FileDownload.resolve_cache_dir(cache_dir)

  unless cache_dir.exist?
    # Return empty cache info if directory doesn't exist
    return DurableHuggingfaceHub::Types::HFCacheInfo.new(
      cache_dir: cache_dir,
      repos: [],
      size: 0
    )
  end

  repos = []
  total_size = 0

  # Scan each repository directory
  cache_dir.each_child do |repo_dir|
    next unless repo_dir.directory?

    repo_info = scan_repository(repo_dir)
    next unless repo_info

    repos << repo_info
    total_size += repo_info.size
  end

  DurableHuggingfaceHub::Types::HFCacheInfo.new(
    cache_dir: cache_dir,
    repos: repos,
    size: total_size
  )
end

.scan_file(file_path, commit_hash) ⇒ DurableHuggingfaceHub::Types::CachedFileInfo

Scans a single file and returns file information.

Parameters:

  • file_path (Pathname)

    Path to the file

  • commit_hash (String)

    Commit hash this file belongs to

Returns:



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
# File 'lib/durable_huggingface_hub/cache.rb', line 172

def self.scan_file(file_path, commit_hash)
  # Get file stats, handling broken symlinks
  stat = begin
    file_path.stat
  rescue Errno::ENOENT
    # For broken symlinks, use lstat to get link info
    file_path.lstat
  end

  # Try to get ETag from blob metadata if this is a symlink
  etag = nil
  if file_path.symlink?
    begin
      target_path = file_path.readlink
      if target_path.absolute?
        # This should point to a blob file
        blob_name = target_path.basename.to_s
        etag = blob_name if blob_name.match?(/^[a-f0-9]{40,}$/) # SHA-like hash
      end
    rescue Errno::ENOENT
      # Broken symlink, no ETag available
      etag = nil
    end
  else
    # For direct files, we might not have ETag info
    etag = nil
  end

  # Build attributes hash
  attrs = {
    file_path: file_path,
    size: stat.size,
    etag: etag,
    commit_hash: commit_hash,
    last_accessed: stat.atime,
    last_modified: stat.mtime
  }

  DurableHuggingfaceHub::Types::CachedFileInfo.new(attrs)
end

.scan_repository(repo_dir) ⇒ DurableHuggingfaceHub::Types::CachedRepoInfo?

Scans a single repository directory and returns repository information.

Parameters:

  • repo_dir (Pathname)

    Repository directory to scan

Returns:



64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/durable_huggingface_hub/cache.rb', line 64

def self.scan_repository(repo_dir)
  # Parse repo_id and repo_type from directory name
  # Format: {repo_type}s--{namespace}--{name} or {repo_type}s--{name}
  dir_name = repo_dir.basename.to_s
  match = dir_name.match(/^(\w+)s--(.+)$/)
  return nil unless match

  repo_type = match[1] # "model", "dataset", or "space"
  repo_id_part = match[2]

  # Convert back to repo_id format (handle both namespace/name and just name)
  if repo_id_part.include?("--")
    repo_id = repo_id_part.gsub("--", "/")
  else
    repo_id = repo_id_part
  end

  revisions = []
  total_size = 0
  last_accessed = nil
  last_modified = nil

  # Scan snapshots directory
  snapshots_dir = repo_dir.join("snapshots")
  if snapshots_dir.exist?
    snapshots_dir.each_child do |revision_dir|
      next unless revision_dir.directory?

      revision_info = scan_revision(repo_dir, revision_dir, repo_type)
      next unless revision_info

      revisions << revision_info
      total_size += revision_info.size

      # Track last accessed/modified times
      if revision_info.last_modified
        last_modified = [last_modified, revision_info.last_modified].compact.max
      end

      revision_info.files.each do |file_info|
        if file_info.last_accessed
          last_accessed = [last_accessed, file_info.last_accessed].compact.max
        end
      end
    end
  end

  return nil if revisions.empty?

  DurableHuggingfaceHub::Types::CachedRepoInfo.new(
    repo_id: repo_id,
    repo_type: repo_type,
    revisions: revisions,
    size: total_size,
    last_accessed: last_accessed,
    last_modified: last_modified
  )
end

.scan_revision(repo_dir, revision_dir, repo_type) ⇒ DurableHuggingfaceHub::Types::CachedRevisionInfo?

Scans a revision directory and returns revision information.

Parameters:

  • repo_dir (Pathname)

    Repository directory

  • revision_dir (Pathname)

    Revision directory to scan

  • repo_type (String)

    Type of repository

Returns:



129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# File 'lib/durable_huggingface_hub/cache.rb', line 129

def self.scan_revision(repo_dir, revision_dir, repo_type)
  commit_hash = revision_dir.basename.to_s
  files = []
  total_size = 0
  last_modified = nil

  # Get refs pointing to this commit
  refs = get_refs_for_commit(repo_dir, commit_hash)

  # Scan all files in the revision
  revision_dir.glob("**/*") do |file_path|
    next if file_path.directory?

    begin
      file_info = scan_file(file_path, commit_hash)
      files << file_info
      total_size += file_info.size

      if file_info.last_modified
        last_modified = [last_modified, file_info.last_modified].compact.max
      end
    rescue => e
      # Skip files that can't be analyzed
      next
    end
  end

  return nil if files.empty?

  DurableHuggingfaceHub::Types::CachedRevisionInfo.new(
    commit_hash: commit_hash,
    refs: refs,
    files: files,
    size: total_size,
    last_modified: last_modified
  )
end