stable_diffusion_ruby
Ruby SDK for Stable Diffusion. Run outpainting, inpainting, and text-to-image generation from Ruby — locally or on remote GPUs via Modal. Supports both SDXL and SD 3.5 Large on either backend.
Installation
gem "stable_diffusion_ruby"
Quick start
The gem has two backends and two models — mix and match as you like:
Backends:
| Local Python | Modal (remote GPU) | |
|---|---|---|
| Runs on | Your machine (MPS/CUDA/CPU) | Modal A100 GPU |
| Setup | Python venv with diffusers | Modal account + S3 bucket |
| Cost | Free | ~$0.01 per image |
Models:
| SDXL | SD 3.5 Large | |
|---|---|---|
| Quality | Good | Best |
| Local speed | ~30s on M-series Mac | Needs CUDA GPU (too slow on CPU/MPS) |
| Modal speed | ~5s | ~5s |
Any combination works. SDXL runs well locally on a Mac. SD 3.5 produces better results but realistically needs a CUDA GPU or Modal. If you pass model: "sd35" and Modal is configured, the gem automatically routes to Modal. Otherwise it runs locally via Python.
Configuration
# config/initializers/stable_diffusion.rb (Rails)
# or anywhere before first use
StableDiffusionRuby.configure do |config|
# --- Local Python backend ---
# Point to a Python that has diffusers, torch, etc. installed.
# Runs both SDXL and SD 3.5 locally. See "Local setup" below.
config.python_executable = "/path/to/venv/bin/python3"
# --- Modal backend (remote GPU) ---
# Optional. When set, model: "sd35" routes to Modal instead of local Python.
# See "Production / Modal deployment" below.
config.modal_endpoint = ENV["MODAL_ENDPOINT_URL"]
# S3-compatible storage for shipping files to/from Modal.
# Works with AWS S3, Cloudflare R2, MinIO, etc.
config.s3_bucket = ENV["S3_BUCKET"]
config.s3_region = ENV["S3_REGION"]
config.s3_endpoint = ENV["S3_ENDPOINT"]
config.s3_access_key = ENV["S3_ACCESS_KEY"]
config.s3_secret_key = ENV["S3_SECRET_KEY"]
# Optional
config.logger = Rails.logger # defaults to $stdout
config.presigned_url_expiry = 3600 # seconds, default 1 hour
end
Usage
Outpaint — extend a square image to 16:9
result = StableDiffusionRuby.outpaint(
source_path: "/path/to/square.png",
prompt: "seamless landscape extension", # optional, auto-generates from image
model: "sd35", # "sd35" or "sdxl" — see "How it works" for routing
seed_offset: 42 # optional, for reproducibility
)
result.output_path # => "/tmp/sd_a1b2c3_output.png"
result.success? # => true
Generate — text-to-image
result = StableDiffusionRuby.generate(
prompt: "a neon-lit cityscape at midnight",
style: "photorealistic", # or "anime"
model: "sd35"
)
result.output_path # => "/tmp/sd_d4e5f6_output.png"
Inpaint — repaint a masked region
result = StableDiffusionRuby.inpaint(
image_path: "/path/to/scene.png",
mask_path: "/path/to/mask.png", # white = repaint, black = keep
prompt: "a red sports car",
model: "sdxl"
)
result.output_path # => "/tmp/sd_g7h8i9_output.png"
Choosing a backend
By default, the gem auto-detects: model: "sd35" routes to Modal when configured, everything else runs locally. You can override this with the backend: parameter:
# Force Modal for SDXL (needs SDXL weights on the Modal volume — see modal_setup.py)
result = StableDiffusionRuby.generate(prompt: "test", model: "sdxl", backend: :modal)
# Force local Python for SD 3.5 (even when Modal is configured)
result = StableDiffusionRuby.outpaint(source_path: "img.png", model: "sd35", backend: :python)
# Auto-detect (default) — sd35 goes to Modal if available, else local
result = StableDiffusionRuby.outpaint(source_path: "img.png", model: "sd35")
Valid values: :modal, :python, or nil (auto-detect).
Client instance
client = StableDiffusionRuby::Client.new
result = client.outpaint(source_path: "img.png", model: "sdxl", backend: :modal)
Error handling
result = StableDiffusionRuby.generate(prompt: "test")
if result.success?
# use result.output_path
else
puts result.error # human-readable error message
end
# Or let it raise:
begin
result = StableDiffusionRuby.outpaint(source_path: "img.png", model: "sd35")
rescue StableDiffusionRuby::BackendError => e
# Modal or Python subprocess failed
rescue StableDiffusionRuby::StorageError => e
# S3 upload/download failed
rescue StableDiffusionRuby::ConfigurationError => e
# Missing required config (no modal_endpoint, no python_executable, etc.)
end
How it works
When backend: is not specified, the gem auto-detects:
model |
Modal configured? | Backend used |
|---|---|---|
"sd35" |
Yes | Modal (remote A100 GPU) |
"sd35" |
No | Local Python (needs CUDA GPU) |
"sdxl" |
Yes or No | Local Python (works on MPS/CUDA/CPU) |
nil |
— | Local Python (auto-selects: SD 3.5 on CUDA, SDXL on MPS/CPU) |
When backend: is specified, it overrides auto-detection — any model runs on any backend. Pass backend: :modal to run SDXL on Modal, or backend: :python to force SD 3.5 locally.
Modal backend flow
Your Ruby app
-> uploads input image to S3, gets presigned GET URL
-> creates presigned PUT URL for the output
-> POSTs { source_url, output_upload_url } to Modal endpoint
-> Modal downloads input, runs inference (SD 3.5 or SDXL), uploads result to PUT URL
-> gem downloads result from S3 to local /tmp file
-> cleans up all temp S3 objects
-> returns Result with output_path
Python backend flow
Your Ruby app
-> sends JSON { command, params } to worker.py via stdin
-> worker.py loads SDXL pipeline, runs inference
-> returns JSON { output_path } on stdout
-> gem returns Result with output_path
Local setup
The gem bundles its own Python inference scripts. You just need a Python environment with the right packages installed. The gem calls that Python as a subprocess — it doesn't need to be in the same virtualenv as your app.
Option A: Dedicated venv (recommended)
# Create a venv — location doesn't matter, pick anywhere convenient
python3 -m venv ~/.stable_diffusion_venv
source ~/.stable_diffusion_venv/bin/activate
# Install dependencies from the gem's bundled requirements.txt
pip install -r $(bundle show stable_diffusion_ruby)/python/requirements.txt
# This installs: diffusers, transformers, torch, accelerate, Pillow, numpy,
# opencv-python-headless, sentencepiece, protobuf (~3GB total)
Then configure:
StableDiffusionRuby.configure do |config|
config.python_executable = File.("~/.stable_diffusion_venv/bin/python3")
end
Option B: Existing project venv
If your project already has a Python venv with diffusers and torch installed (e.g. for other ML tasks), just point to it:
StableDiffusionRuby.configure do |config|
config.python_executable = Rails.root.join("python", ".venv", "bin", "python3").to_s
end
Option C: System Python
If your system Python has the packages installed globally (e.g. in a Docker container):
StableDiffusionRuby.configure do |config|
config.python_executable = "/usr/bin/python3"
end
First run
The first call will download SDXL model weights from HuggingFace (~7GB). This is cached by diffusers in ~/.cache/huggingface/ and reused for subsequent calls. The model also stays loaded in memory across calls within the same Python process, but since each call spawns a new subprocess, the model is reloaded each time. For faster repeated calls, use the Modal backend.
Production / Modal deployment
For production, you'll want the Modal backend — it runs SD 3.5 Large on A100 GPUs with models preloaded in memory. Cold starts take ~15s, warm requests ~5s.
Prerequisites
- A Modal account (free tier gives $30/month credits)
- An S3-compatible bucket for file transfer (AWS S3, Cloudflare R2, MinIO, etc.)
- A HuggingFace account with access to SD 3.5 Large
Step 1: Install the Modal CLI
pip install modal
modal token new
Step 2: Download model weights to a Modal Volume
This downloads SD 3.5 Large (~12GB) to a persistent Modal Volume so it doesn't re-download on each container start:
# Set your HuggingFace token as a Modal secret first:
# modal secret create huggingface HF_TOKEN=hf_xxxxx
modal run $(bundle show stable_diffusion_ruby)/python/modal_setup.py
Step 3: Deploy the inference endpoint
modal deploy $(bundle show stable_diffusion_ruby)/python/modal_app.py
This prints a URL like:
https://yourname--stable-diffusion-ruby-inference-outpaint.modal.run
The base URL (everything before -outpaint) is your modal_endpoint:
https://yourname--stable-diffusion-ruby-inference
Step 4: Configure
StableDiffusionRuby.configure do |config|
config.modal_endpoint = "https://yourname--stable-diffusion-ruby-inference"
config.s3_bucket = "my-bucket"
config.s3_region = "us-east-1"
config.s3_access_key = ENV["S3_ACCESS_KEY"]
config.s3_secret_key = ENV["S3_SECRET_KEY"]
# config.s3_endpoint = "https://xxx.r2.cloudflarestorage.com" # for R2
end
How Modal billing works
- Containers spin up on demand and shut down after 60s of idle time
- You pay only for GPU-seconds used (~$0.01 per image)
- No fixed costs, no reserved instances
- The
modal_setup.pyvolume stores model weights so containers start fast
Production architecture
Your Rails app (any host: Heroku, Render, EC2, etc.)
|
|-- config.modal_endpoint = "https://..."
|-- config.s3_bucket = "my-bucket"
|
v
S3 bucket (input images + presigned URLs)
|
v
Modal A100 GPU container
|-- SD 3.5 Large preloaded from Volume
|-- Downloads input from S3 GET URL
|-- Runs inference (~5s)
|-- Uploads result to S3 PUT URL
|
v
S3 bucket (output images)
|
v
Your Rails app downloads result, cleans up S3 temp files
Your Rails app never touches a GPU. It uploads to S3, calls Modal, downloads the result. Works from any hosting provider.
Image processing features
The Python inference code includes:
- Auto-prompting — analyzes source image colors, tone, saturation, contrast, and texture to generate contextual prompts
- Laplacian pyramid blending — seamless multi-band compositing between generated and original regions
- Reinhard color transfer — matches generated region colors to the original image distribution
- Feathered masking — cosine-ramped mask edges to avoid ringing artifacts
- Pipeline caching — model stays loaded in memory across requests on Modal (not across local subprocess calls)
Customizing prompts
The gem uses prompt templates for generation and outpainting. You can override them by placing files in a custom prompts directory, or by always passing explicit prompt: arguments.
Default prompt files (bundled with the gem in python/prompts/):
outpaint.txt— template for outpainting (uses{tone},{colors},{vibrancy},{contrast},{texture}placeholders filled from image analysis)outpaint_negative.txt— negative prompt for outpaintinggenerate_photorealistic.txt— style prompt for photorealistic generationgenerate_anime.txt— style prompt for anime generationgenerate_negative.txt— negative prompt for generation
License
MIT