Module: DurableHuggingfaceHub::Utils::Paths

Defined in:
lib/durable_huggingface_hub/utils/paths.rb

Overview

Path manipulation and filtering utilities.

This module provides functions for working with file paths, including expansion, filtering, and pattern matching.

Class Method Summary collapse

Class Method Details

.expand_path(path) ⇒ Pathname

Expands a path, resolving home directory and environment variables.

Examples:

Paths.expand_path("~/models")  # => Pathname("/home/user/models")
Paths.expand_path("$HOME/data")  # => Pathname("/home/user/data")

Parameters:

  • path (String, Pathname)

    Path to expand

Returns:

  • (Pathname)

    Expanded path



20
21
22
23
24
25
26
27
28
29
30
31
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 20

def self.expand_path(path)
  path_str = path.to_s

  # Expand environment variables
  path_str = path_str.gsub(/\$([A-Z_][A-Z0-9_]*)|\$\{([A-Z_][A-Z0-9_]*)\}/) do
    key = Regexp.last_match(1) || Regexp.last_match(2)
    ENV[key] || ""
  end

  # Expand home directory
  Pathname.new(path_str).expand_path
end

.extract_path(obj, key) ⇒ String?

Extracts path from an object (string or hash).

Parameters:

  • obj (String, Hash)

    Object

  • key (String, Symbol, nil)

    Key for hash extraction

Returns:

  • (String, nil)

    Extracted path



218
219
220
221
222
223
224
225
226
227
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 218

def self.extract_path(obj, key)
  case obj
  when String
    obj
  when Hash
    key ? (obj[key] || obj[key.to_s] || obj[key.to_sym]) : nil
  else
    nil
  end
end

.filter_repo_objects(objects, allow_patterns: nil, ignore_patterns: nil, key: nil) ⇒ Array

Filters a list of repository objects (files) based on allow and ignore patterns.

This function implements the filtering logic used by HuggingFace Hub for selecting which files to include in operations like snapshot downloads.

Examples:

Filter file list with glob patterns

files = ["config.json", "model.safetensors", "README.md", "data/train.csv"]
Paths.filter_repo_objects(files, allow_patterns: ["*.json", "*.safetensors"])
# => ["config.json", "model.safetensors"]

Filter with ignore patterns

files = ["model.bin", "config.json", "training_log.txt"]
Paths.filter_repo_objects(files, ignore_patterns: ["*.txt"])
# => ["model.bin", "config.json"]

Filter hash objects

files = [{ path: "config.json" }, { path: "model.bin" }]
Paths.filter_repo_objects(files, allow_patterns: "*.json", key: :path)
# => [{ path: "config.json" }]

Parameters:

  • objects (Array<String>, Array<Hash>)

    List of file paths or file info hashes

  • allow_patterns (Array<String>, String, nil) (defaults to: nil)

    Patterns to allow (globs or regexes)

  • ignore_patterns (Array<String>, String, nil) (defaults to: nil)

    Patterns to ignore (globs or regexes)

  • key (String, Symbol, nil) (defaults to: nil)

    Key to extract path from hash objects

Returns:

  • (Array)

    Filtered list of objects



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 58

def self.filter_repo_objects(objects, allow_patterns: nil, ignore_patterns: nil, key: nil)
  return objects if objects.nil? || objects.empty?

  # Normalize patterns to arrays
  allow_patterns = normalize_patterns(allow_patterns)
  ignore_patterns = normalize_patterns(ignore_patterns)

  # If no patterns, return all objects
  return objects if allow_patterns.nil? && ignore_patterns.nil?

  objects.select do |obj|
    path = extract_path(obj, key)
    next false if path.nil?

    should_include?(path, allow_patterns: allow_patterns, ignore_patterns: ignore_patterns)
  end
end

.matches_any_pattern?(path, patterns) ⇒ Boolean

Checks if a path matches any of the given patterns.

Examples:

Paths.matches_any_pattern?("config.json", ["*.json", "*.yaml"])  # => true
Paths.matches_any_pattern?("data.txt", ["*.json", "*.yaml"])  # => false

Parameters:

  • path (String)

    File path to check

  • patterns (Array<String>)

    Glob or regex patterns

Returns:

  • (Boolean)

    True if path matches any pattern



111
112
113
114
115
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 111

def self.matches_any_pattern?(path, patterns)
  return false if patterns.nil? || patterns.empty?

  patterns.any? { |pattern| matches_pattern?(path, pattern) }
end

.matches_pattern?(path, pattern) ⇒ Boolean

Checks if a path matches a single pattern.

Supports both glob patterns and regular expressions.

Examples:

Glob patterns

Paths.matches_pattern?("config.json", "*.json")  # => true
Paths.matches_pattern?("data/train.csv", "data/*.csv")  # => true
Paths.matches_pattern?("model.bin", "*.json")  # => false

Regex patterns

Paths.matches_pattern?("config.json", /\.json$/)  # => true

Parameters:

  • path (String)

    File path to check

  • pattern (String, Regexp)

    Glob pattern or regex

Returns:

  • (Boolean)

    True if path matches pattern



132
133
134
135
136
137
138
139
140
141
142
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 132

def self.matches_pattern?(path, pattern)
  case pattern
  when Regexp
    !pattern.match(path).nil?
  when String
    # Convert glob pattern to regex
    File.fnmatch?(pattern, path, File::FNM_PATHNAME | File::FNM_EXTGLOB)
  else
    false
  end
end

.normalize_patterns(patterns) ⇒ Array<String>?

Normalizes pattern input to an array.

Parameters:

  • patterns (Array, String, nil)

    Patterns

Returns:

  • (Array<String>, nil)

    Normalized patterns



205
206
207
208
209
210
211
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 205

def self.normalize_patterns(patterns)
  return nil if patterns.nil?
  return [patterns] if patterns.is_a?(String) || patterns.is_a?(Regexp)
  return patterns if patterns.is_a?(Array)

  nil
end

.safe_join(base, *parts) ⇒ Pathname

Joins path components safely, ensuring no path traversal.

Examples:

Paths.safe_join("/cache", "models", "bert")
# => Pathname("/cache/models/bert")

Parameters:

  • base (String, Pathname)

    Base path

  • *parts (String)

    Path components to join

Returns:

  • (Pathname)

    Joined path

Raises:



173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 173

def self.safe_join(base, *parts)
  # Validate that no part is an absolute path
  parts.each do |part|
    if part.to_s.start_with?("/")
      raise ValidationError.new(
        "path",
        "Path component cannot be absolute: #{part}"
      )
    end
  end

  base_path = Pathname.new(base).expand_path
  joined_path = parts.reduce(base_path) { |path, part| path.join(part) }
  final_path = joined_path.expand_path

  # Ensure the final path is within the base path
  unless final_path.to_s.start_with?(base_path.to_s)
    raise ValidationError.new(
      "path",
      "Path traversal detected: result would escape base directory"
    )
  end

  final_path
end

.sanitize_filename(filename) ⇒ String

Sanitizes a filename by removing or replacing unsafe characters.

Examples:

Paths.sanitize_filename("my file!.txt")  # => "my_file_.txt"
Paths.sanitize_filename("test/file.json")  # => "test_file.json"

Parameters:

  • filename (String)

    Filename to sanitize

Returns:

  • (String)

    Sanitized filename



152
153
154
155
156
157
158
159
160
161
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 152

def self.sanitize_filename(filename)
  # Replace path separators with underscores
  sanitized = filename.gsub(%r{[/\\]}, "_")

  # Replace spaces with underscores
  sanitized = sanitized.gsub(/\s/, "_")

  # Replace other problematic characters
  sanitized.gsub(/[<>:"|?*]/, "_")
end

.should_include?(path, allow_patterns: nil, ignore_patterns: nil) ⇒ Boolean

Checks if a path should be included based on allow and ignore patterns.

Examples:

Paths.should_include?("config.json", allow_patterns: ["*.json"])  # => true
Paths.should_include?("data.txt", allow_patterns: ["*.json"])  # => false
Paths.should_include?("temp.log", ignore_patterns: ["*.log"])  # => false

Parameters:

  • path (String)

    File path to check

  • allow_patterns (Array<String>, nil) (defaults to: nil)

    Patterns to allow

  • ignore_patterns (Array<String>, nil) (defaults to: nil)

    Patterns to ignore

Returns:

  • (Boolean)

    True if path should be included



87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 87

def self.should_include?(path, allow_patterns: nil, ignore_patterns: nil)
  # If ignore patterns specified and path matches, exclude it
  if ignore_patterns && matches_any_pattern?(path, ignore_patterns)
    return false
  end

  # If allow patterns specified, path must match at least one
  if allow_patterns
    return matches_any_pattern?(path, allow_patterns)
  end

  # If no allow patterns, include by default (unless already ignored above)
  true
end