Module: DurableHuggingfaceHub::Utils::Validators

Defined in:
lib/durable_huggingface_hub/utils/validators.rb

Overview

Input validation utilities for HuggingFace Hub parameters.

This module provides validation functions for repository IDs, revisions, filenames, and other user inputs to ensure they meet HuggingFace Hub requirements.

Constant Summary collapse

MAX_REPO_ID_LENGTH =

Maximum length for repository ID

96

Class Method Summary collapse

Class Method Details

.require_non_empty(value, name) ⇒ String

Validates that a string is not empty.

Parameters:

  • value (String)

    String to check

  • name (String)

    Parameter name for error message

Returns:

  • (String)

    The value if not empty

Raises:



227
228
229
230
231
232
233
# File 'lib/durable_huggingface_hub/utils/validators.rb', line 227

def self.require_non_empty(value, name)
  if value.nil? || (value.respond_to?(:empty?) && value.empty?)
    raise ValidationError.new(name, "#{name} cannot be empty")
  end

  value
end

.require_non_nil(value, name) ⇒ Object

Validates that a value is not nil.

Parameters:

  • value (Object)

    Value to check

  • name (String)

    Parameter name for error message

Returns:

  • (Object)

    The value if not nil

Raises:



213
214
215
216
217
218
219
# File 'lib/durable_huggingface_hub/utils/validators.rb', line 213

def self.require_non_nil(value, name)
  if value.nil?
    raise ValidationError.new(name, "#{name} is required and cannot be nil")
  end

  value
end

.validate_filename(filename) ⇒ String

Validates a filename for use in repository paths.

Ensures filename doesn’t contain path traversal sequences or other potentially dangerous patterns.

Examples:

Valid filenames

Validators.validate_filename("config.json")
Validators.validate_filename("models/pytorch_model.bin")
Validators.validate_filename("data/train.csv")

Invalid filenames

Validators.validate_filename("../etc/passwd")  # raises
Validators.validate_filename("/absolute/path")  # raises

Parameters:

  • filename (String)

    Filename to validate

Returns:

  • (String)

    The validated filename

Raises:



155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# File 'lib/durable_huggingface_hub/utils/validators.rb', line 155

def self.validate_filename(filename)
  if filename.nil? || filename.empty?
    raise ValidationError.new("filename", "Filename cannot be empty")
  end

  # Disallow absolute paths
  if filename.start_with?("/")
    raise ValidationError.new("filename", "Filename cannot be an absolute path")
  end

  # Disallow path traversal
  if filename.include?("../") || filename.include?("..\\")
    raise ValidationError.new("filename", "Filename cannot contain path traversal sequences")
  end

  # Disallow null bytes
  if filename.include?("\0")
    raise ValidationError.new("filename", "Filename cannot contain null bytes")
  end

  # Disallow Windows reserved names
  basename = File.basename(filename)
  windows_reserved = %w[CON PRN AUX NUL COM1 COM2 COM3 COM4 COM5 COM6 COM7 COM8 COM9
                       LPT1 LPT2 LPT3 LPT4 LPT5 LPT6 LPT7 LPT8 LPT9]
  if windows_reserved.include?(basename.upcase)
    raise ValidationError.new("filename", "Filename cannot use Windows reserved names")
  end

  filename
end

.validate_repo_id(repo_id, repo_type: nil) ⇒ String

Validates a repository ID format.

Rules:

  • Between 1 and 96 characters

  • Either “repo_name” or “namespace/repo_name”

  • Contains only [a-zA-Z0-9] or “-”, “_”, “.”

  • Cannot have “–” or “..” sequences

  • Cannot end with “.git”

  • Name parts cannot start or end with “.”, “-”, or “_”

Examples:

Valid repository IDs

Validators.validate_repo_id("bert-base-uncased")
Validators.validate_repo_id("huggingface/transformers")
Validators.validate_repo_id("my-org/my.model-v2")

Invalid repository IDs

Validators.validate_repo_id("")  # raises ValidationError
Validators.validate_repo_id("foo--bar")  # raises ValidationError
Validators.validate_repo_id("foo.git")  # raises ValidationError

Parameters:

  • repo_id (String)

    Repository ID to validate

  • repo_type (String, nil) (defaults to: nil)

    Repository type (optional, for error messages)

Returns:

  • (String)

    The validated repo_id

Raises:



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# File 'lib/durable_huggingface_hub/utils/validators.rb', line 37

def self.validate_repo_id(repo_id, repo_type: nil)
  if repo_id.nil?
    raise ValidationError.new("repo_id", "Repository ID cannot be empty")
  end

  unless repo_id.is_a?(String)
    raise ValidationError.new("repo_id", "Repository ID must be a string, not #{repo_id.class}: '#{repo_id}'")
  end

  if repo_id.empty?
    raise ValidationError.new("repo_id", "Repository ID cannot be empty")
  end

  if repo_id.length > MAX_REPO_ID_LENGTH
    raise ValidationError.new("repo_id", "Repository ID is too long (max #{MAX_REPO_ID_LENGTH} characters)")
  end

  # Check for multiple slashes
  if repo_id.count("/") > 1
    raise ValidationError.new("repo_id", "Repository ID must be in format 'repo_name' or 'namespace/repo_name': '#{repo_id}'")
  end

  # Check for "--" and ".." sequences
  if repo_id.include?("--") || repo_id.include?("..")
    raise ValidationError.new("repo_id", "Cannot have -- or .. in repo_id: '#{repo_id}'")
  end

  # Check for .git suffix
  if repo_id.end_with?(".git")
    raise ValidationError.new("repo_id", "Repository ID cannot end with '.git': '#{repo_id}'")
  end

  # Validate with regex pattern (equivalent to Python REPO_ID_REGEX)
  unless repo_id.match?(/\A(\b[\w\-.]+\b\/)?\b[\w\-.]{1,96}\b\z/)
    raise ValidationError.new("repo_id", "Repository ID must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: '#{repo_id}'")
  end

  # Additional validation for namespace/repo format
  if repo_id.include?("/")
    namespace, name = repo_id.split("/", 2)

    if namespace.empty? || name.empty?
      raise ValidationError.new("repo_id", "Both namespace and name must be non-empty")
    end

    # Validate no leading/trailing special chars in parts
    [namespace, name].each do |part|
      if part.start_with?(".", "-", "_") || part.end_with?(".", "-", "_")
        raise ValidationError.new("repo_id", "Repository name parts cannot start or end with '.', '-', or '_'")
      end
    end
  elsif repo_id.start_with?(".", "-", "_") || repo_id.end_with?(".", "-", "_")
    raise ValidationError.new("repo_id", "Repository name cannot start or end with '.', '-', or '_'")
  end

  repo_id
end

.validate_repo_type(repo_type) ⇒ String

Validates a repository type.

Examples:

Validators.validate_repo_type("model")
Validators.validate_repo_type("dataset")

Parameters:

  • repo_type (String)

    Repository type

Returns:

  • (String)

    The validated repo_type

Raises:



195
196
197
198
199
200
201
202
203
204
205
# File 'lib/durable_huggingface_hub/utils/validators.rb', line 195

def self.validate_repo_type(repo_type)
  unless Constants::REPO_TYPES.include?(repo_type)
    valid_types = Constants::REPO_TYPES.join(", ")
    raise ValidationError.new(
      "repo_type",
      "Invalid repository type '#{repo_type}'. Must be one of: #{valid_types}"
    )
  end

  repo_type
end

.validate_revision(revision) ⇒ String

Validates a revision (branch, tag, or commit SHA).

Valid formats:

  • Branch names: “main”, “dev”, “feature/my-feature”

  • Tags: “v1.0.0”, “release-2023”

  • Commit SHAs: 40 hexadecimal characters

Examples:

Validators.validate_revision("main")
Validators.validate_revision("v1.0.0")
Validators.validate_revision("a" * 40)  # commit SHA

Parameters:

  • revision (String)

    Revision to validate

Returns:

  • (String)

    The validated revision

Raises:



110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/durable_huggingface_hub/utils/validators.rb', line 110

def self.validate_revision(revision)
  if revision.nil? || revision.empty?
    raise ValidationError.new("revision", "Revision cannot be empty")
  end

  # Check length (reasonable max for branch/tag names)
  if revision.length > 255
    raise ValidationError.new("revision", "Revision name is too long")
  end

  # If it looks like a commit SHA (40 hex chars), validate that
  if revision.match?(Constants::REGEX_COMMIT_OID)
    return revision
  end

  # For branch/tag names, allow alphanumeric, hyphen, underscore, dot, slash
  unless revision.match?(/\A[a-zA-Z0-9._\/-]+\z/)
    raise ValidationError.new("revision", "Revision contains invalid characters")
  end

  # Disallow leading/trailing slashes
  if revision.start_with?("/") || revision.end_with?("/")
    raise ValidationError.new("revision", "Revision cannot start or end with '/'")
  end

  revision
end