Module: DurableHuggingfaceHub::Utils::Paths
- Defined in:
- lib/durable_huggingface_hub/utils/paths.rb
Overview
Path manipulation and filtering utilities.
This module provides functions for working with file paths, including expansion, filtering, and pattern matching.
Class Method Summary collapse
-
.expand_path(path) ⇒ Pathname
Expands a path, resolving home directory and environment variables.
-
.extract_path(obj, key) ⇒ String?
Extracts path from an object (string or hash).
-
.filter_repo_objects(objects, allow_patterns: nil, ignore_patterns: nil, key: nil) ⇒ Array
Filters a list of repository objects (files) based on allow and ignore patterns.
-
.matches_any_pattern?(path, patterns) ⇒ Boolean
Checks if a path matches any of the given patterns.
-
.matches_pattern?(path, pattern) ⇒ Boolean
Checks if a path matches a single pattern.
-
.normalize_patterns(patterns) ⇒ Array<String>?
Normalizes pattern input to an array.
-
.safe_join(base, *parts) ⇒ Pathname
Joins path components safely, ensuring no path traversal.
-
.sanitize_filename(filename) ⇒ String
Sanitizes a filename by removing or replacing unsafe characters.
-
.should_include?(path, allow_patterns: nil, ignore_patterns: nil) ⇒ Boolean
Checks if a path should be included based on allow and ignore patterns.
Class Method Details
.expand_path(path) ⇒ Pathname
Expands a path, resolving home directory and environment variables.
20 21 22 23 24 25 26 27 28 29 30 31 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 20 def self.(path) path_str = path.to_s # Expand environment variables path_str = path_str.gsub(/\$([A-Z_][A-Z0-9_]*)|\$\{([A-Z_][A-Z0-9_]*)\}/) do key = Regexp.last_match(1) || Regexp.last_match(2) ENV[key] || "" end # Expand home directory Pathname.new(path_str). end |
.extract_path(obj, key) ⇒ String?
Extracts path from an object (string or hash).
218 219 220 221 222 223 224 225 226 227 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 218 def self.extract_path(obj, key) case obj when String obj when Hash key ? (obj[key] || obj[key.to_s] || obj[key.to_sym]) : nil else nil end end |
.filter_repo_objects(objects, allow_patterns: nil, ignore_patterns: nil, key: nil) ⇒ Array
Filters a list of repository objects (files) based on allow and ignore patterns.
This function implements the filtering logic used by HuggingFace Hub for selecting which files to include in operations like snapshot downloads.
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 58 def self.filter_repo_objects(objects, allow_patterns: nil, ignore_patterns: nil, key: nil) return objects if objects.nil? || objects.empty? # Normalize patterns to arrays allow_patterns = normalize_patterns(allow_patterns) ignore_patterns = normalize_patterns(ignore_patterns) # If no patterns, return all objects return objects if allow_patterns.nil? && ignore_patterns.nil? objects.select do |obj| path = extract_path(obj, key) next false if path.nil? should_include?(path, allow_patterns: allow_patterns, ignore_patterns: ignore_patterns) end end |
.matches_any_pattern?(path, patterns) ⇒ Boolean
Checks if a path matches any of the given patterns.
111 112 113 114 115 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 111 def self.matches_any_pattern?(path, patterns) return false if patterns.nil? || patterns.empty? patterns.any? { |pattern| matches_pattern?(path, pattern) } end |
.matches_pattern?(path, pattern) ⇒ Boolean
Checks if a path matches a single pattern.
Supports both glob patterns and regular expressions.
132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 132 def self.matches_pattern?(path, pattern) case pattern when Regexp !pattern.match(path).nil? when String # Convert glob pattern to regex File.fnmatch?(pattern, path, File::FNM_PATHNAME | File::FNM_EXTGLOB) else false end end |
.normalize_patterns(patterns) ⇒ Array<String>?
Normalizes pattern input to an array.
205 206 207 208 209 210 211 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 205 def self.normalize_patterns(patterns) return nil if patterns.nil? return [patterns] if patterns.is_a?(String) || patterns.is_a?(Regexp) return patterns if patterns.is_a?(Array) nil end |
.safe_join(base, *parts) ⇒ Pathname
Joins path components safely, ensuring no path traversal.
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 173 def self.safe_join(base, *parts) # Validate that no part is an absolute path parts.each do |part| if part.to_s.start_with?("/") raise ValidationError.new( "path", "Path component cannot be absolute: #{part}" ) end end base_path = Pathname.new(base). joined_path = parts.reduce(base_path) { |path, part| path.join(part) } final_path = joined_path. # Ensure the final path is within the base path unless final_path.to_s.start_with?(base_path.to_s) raise ValidationError.new( "path", "Path traversal detected: result would escape base directory" ) end final_path end |
.sanitize_filename(filename) ⇒ String
Sanitizes a filename by removing or replacing unsafe characters.
152 153 154 155 156 157 158 159 160 161 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 152 def self.sanitize_filename(filename) # Replace path separators with underscores sanitized = filename.gsub(%r{[/\\]}, "_") # Replace spaces with underscores sanitized = sanitized.gsub(/\s/, "_") # Replace other problematic characters sanitized.gsub(/[<>:"|?*]/, "_") end |
.should_include?(path, allow_patterns: nil, ignore_patterns: nil) ⇒ Boolean
Checks if a path should be included based on allow and ignore patterns.
87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/durable_huggingface_hub/utils/paths.rb', line 87 def self.should_include?(path, allow_patterns: nil, ignore_patterns: nil) # If ignore patterns specified and path matches, exclude it if ignore_patterns && matches_any_pattern?(path, ignore_patterns) return false end # If allow patterns specified, path must match at least one if allow_patterns return matches_any_pattern?(path, allow_patterns) end # If no allow patterns, include by default (unless already ignored above) true end |