stable_diffusion_ruby

Ruby SDK for Stable Diffusion. Run outpainting, inpainting, and text-to-image generation from Ruby — locally or on remote GPUs via Modal. Supports both SDXL and SD 3.5 Large on either backend.

Installation

gem "stable_diffusion_ruby"

Quick start

The gem has two backends and two models — mix and match as you like:

Backends:

	Local Python	Modal (remote GPU)
Runs on	Your machine (MPS/CUDA/CPU)	Modal A100 GPU
Setup	Python venv with diffusers	Modal account + S3 bucket
Cost	Free	~$0.01 per image

Models:

	SDXL	SD 3.5 Large
Quality	Good	Best
Local speed	~30s on M-series Mac	Needs CUDA GPU (too slow on CPU/MPS)
Modal speed	~5s	~5s

Any combination works. SDXL runs well locally on a Mac. SD 3.5 produces better results but realistically needs a CUDA GPU or Modal. If you pass model: "sd35" and Modal is configured, the gem automatically routes to Modal. Otherwise it runs locally via Python.

Configuration

# config/initializers/stable_diffusion.rb (Rails)
# or anywhere before first use

StableDiffusionRuby.configure do |config|
  # --- Local Python backend ---
  # Point to a Python that has diffusers, torch, etc. installed.
  # Runs both SDXL and SD 3.5 locally. See "Local setup" below.
  config.python_executable = "/path/to/venv/bin/python3"

  # --- Modal backend (remote GPU) ---
  # Optional. When set, model: "sd35" routes to Modal instead of local Python.
  # See "Production / Modal deployment" below.
  config.modal_endpoint = ENV["MODAL_ENDPOINT_URL"]

  # S3-compatible storage for shipping files to/from Modal.
  # Works with AWS S3, Cloudflare R2, MinIO, etc.
  config.s3_bucket     = ENV["S3_BUCKET"]
  config.s3_region     = ENV["S3_REGION"]
  config.s3_endpoint   = ENV["S3_ENDPOINT"]
  config.s3_access_key = ENV["S3_ACCESS_KEY"]
  config.s3_secret_key = ENV["S3_SECRET_KEY"]

  # Optional
  config.logger = Rails.logger          # defaults to $stdout
  config.presigned_url_expiry = 3600    # seconds, default 1 hour
end

Usage

Outpaint — extend a square image to 16:9

result = StableDiffusionRuby.outpaint(
  source_path: "/path/to/square.png",
  prompt: "seamless landscape extension",  # optional, auto-generates from image
  model: "sd35",       # "sd35" or "sdxl" — see "How it works" for routing
  seed_offset: 42      # optional, for reproducibility
)

result.output_path  # => "/tmp/sd_a1b2c3_output.png"
result.success?     # => true

Generate — text-to-image

result = StableDiffusionRuby.generate(
  prompt: "a neon-lit cityscape at midnight",
  style: "photorealistic",  # or "anime"
  model: "sd35"
)

result.output_path  # => "/tmp/sd_d4e5f6_output.png"

Inpaint — repaint a masked region

result = StableDiffusionRuby.inpaint(
  image_path: "/path/to/scene.png",
  mask_path: "/path/to/mask.png",  # white = repaint, black = keep
  prompt: "a red sports car",
  model: "sdxl"
)

result.output_path  # => "/tmp/sd_g7h8i9_output.png"

Choosing a backend

By default, the gem auto-detects: model: "sd35" routes to Modal when configured, everything else runs locally. You can override this with the backend: parameter:

# Force Modal for SDXL (needs SDXL weights on the Modal volume — see modal_setup.py)
result = StableDiffusionRuby.generate(prompt: "test", model: "sdxl", backend: :modal)

# Force local Python for SD 3.5 (even when Modal is configured)
result = StableDiffusionRuby.outpaint(source_path: "img.png", model: "sd35", backend: :python)

# Auto-detect (default) — sd35 goes to Modal if available, else local
result = StableDiffusionRuby.outpaint(source_path: "img.png", model: "sd35")

Valid values: :modal, :python, or nil (auto-detect).

Client instance

client = StableDiffusionRuby::Client.new
result = client.outpaint(source_path: "img.png", model: "sdxl", backend: :modal)

Error handling

result = StableDiffusionRuby.generate(prompt: "test")

if result.success?
  # use result.output_path
else
  puts result.error  # human-readable error message
end

# Or let it raise:
begin
  result = StableDiffusionRuby.outpaint(source_path: "img.png", model: "sd35")
rescue StableDiffusionRuby::BackendError => e
  # Modal or Python subprocess failed
rescue StableDiffusionRuby::StorageError => e
  # S3 upload/download failed
rescue StableDiffusionRuby::ConfigurationError => e
  # Missing required config (no modal_endpoint, no python_executable, etc.)
end

How it works

When backend: is not specified, the gem auto-detects:

`model`	Modal configured?	Backend used
`"sd35"`	Yes	Modal (remote A100 GPU)
`"sd35"`	No	Local Python (needs CUDA GPU)
`"sdxl"`	Yes or No	Local Python (works on MPS/CUDA/CPU)
`nil`	—	Local Python (auto-selects: SD 3.5 on CUDA, SDXL on MPS/CPU)

When backend: is specified, it overrides auto-detection — any model runs on any backend. Pass backend: :modal to run SDXL on Modal, or backend: :python to force SD 3.5 locally.

Your Ruby app
  -> uploads input image to S3, gets presigned GET URL
  -> creates presigned PUT URL for the output
  -> POSTs { source_url, output_upload_url } to Modal endpoint
  -> Modal downloads input, runs inference (SD 3.5 or SDXL), uploads result to PUT URL
  -> gem downloads result from S3 to local /tmp file
  -> cleans up all temp S3 objects
  -> returns Result with output_path

Python backend flow

Your Ruby app
  -> sends JSON { command, params } to worker.py via stdin
  -> worker.py loads SDXL pipeline, runs inference
  -> returns JSON { output_path } on stdout
  -> gem returns Result with output_path

Local setup

The gem bundles its own Python inference scripts. You just need a Python environment with the right packages installed. The gem calls that Python as a subprocess — it doesn't need to be in the same virtualenv as your app.

Option A: Dedicated venv (recommended)

# Create a venv — location doesn't matter, pick anywhere convenient
python3 -m venv ~/.stable_diffusion_venv
source ~/.stable_diffusion_venv/bin/activate

# Install dependencies from the gem's bundled requirements.txt
pip install -r $(bundle show stable_diffusion_ruby)/python/requirements.txt

# This installs: diffusers, transformers, torch, accelerate, Pillow, numpy,
# opencv-python-headless, sentencepiece, protobuf (~3GB total)

Then configure:

StableDiffusionRuby.configure do |config|
  config.python_executable = File.expand_path("~/.stable_diffusion_venv/bin/python3")
end

Option B: Existing project venv

If your project already has a Python venv with diffusers and torch installed (e.g. for other ML tasks), just point to it:

StableDiffusionRuby.configure do |config|
  config.python_executable = Rails.root.join("python", ".venv", "bin", "python3").to_s
end

Option C: System Python

If your system Python has the packages installed globally (e.g. in a Docker container):

StableDiffusionRuby.configure do |config|
  config.python_executable = "/usr/bin/python3"
end

First run

The first call will download SDXL model weights from HuggingFace (~7GB). This is cached by diffusers in ~/.cache/huggingface/ and reused for subsequent calls. The model also stays loaded in memory across calls within the same Python process, but since each call spawns a new subprocess, the model is reloaded each time. For faster repeated calls, use the Modal backend.

For production, you'll want the Modal backend — it runs SD 3.5 Large on A100 GPUs with models preloaded in memory. Cold starts take ~15s, warm requests ~5s.

Prerequisites

A Modal account (free tier gives $30/month credits)
An S3-compatible bucket for file transfer (AWS S3, Cloudflare R2, MinIO, etc.)
A HuggingFace account with access to SD 3.5 Large

pip install modal
modal token new

This downloads SD 3.5 Large (~12GB) to a persistent Modal Volume so it doesn't re-download on each container start:

# Set your HuggingFace token as a Modal secret first:
#   modal secret create huggingface HF_TOKEN=hf_xxxxx

modal run $(bundle show stable_diffusion_ruby)/python/modal_setup.py

Step 3: Deploy the inference endpoint

modal deploy $(bundle show stable_diffusion_ruby)/python/modal_app.py

This prints a URL like:

https://yourname--stable-diffusion-ruby-inference-outpaint.modal.run

The base URL (everything before -outpaint) is your modal_endpoint:

https://yourname--stable-diffusion-ruby-inference

Step 4: Configure

StableDiffusionRuby.configure do |config|
  config.modal_endpoint = "https://yourname--stable-diffusion-ruby-inference"

  config.s3_bucket     = "my-bucket"
  config.s3_region     = "us-east-1"
  config.s3_access_key = ENV["S3_ACCESS_KEY"]
  config.s3_secret_key = ENV["S3_SECRET_KEY"]
  # config.s3_endpoint = "https://xxx.r2.cloudflarestorage.com"  # for R2
end

Containers spin up on demand and shut down after 60s of idle time
You pay only for GPU-seconds used (~$0.01 per image)
No fixed costs, no reserved instances
The modal_setup.py volume stores model weights so containers start fast

Production architecture

Your Rails app (any host: Heroku, Render, EC2, etc.)
  |
  |-- config.modal_endpoint = "https://..."
  |-- config.s3_bucket = "my-bucket"
  |
  v
S3 bucket (input images + presigned URLs)
  |
  v
Modal A100 GPU container
  |-- SD 3.5 Large preloaded from Volume
  |-- Downloads input from S3 GET URL
  |-- Runs inference (~5s)
  |-- Uploads result to S3 PUT URL
  |
  v
S3 bucket (output images)
  |
  v
Your Rails app downloads result, cleans up S3 temp files

Your Rails app never touches a GPU. It uploads to S3, calls Modal, downloads the result. Works from any hosting provider.

Image processing features

The Python inference code includes:

Auto-prompting — analyzes source image colors, tone, saturation, contrast, and texture to generate contextual prompts
Laplacian pyramid blending — seamless multi-band compositing between generated and original regions
Reinhard color transfer — matches generated region colors to the original image distribution
Feathered masking — cosine-ramped mask edges to avoid ringing artifacts
Pipeline caching — model stays loaded in memory across requests on Modal (not across local subprocess calls)

Customizing prompts

The gem uses prompt templates for generation and outpainting. You can override them by placing files in a custom prompts directory, or by always passing explicit prompt: arguments.

Default prompt files (bundled with the gem in python/prompts/):

outpaint.txt — template for outpainting (uses {tone}, {colors}, {vibrancy}, {contrast}, {texture} placeholders filled from image analysis)
outpaint_negative.txt — negative prompt for outpainting
generate_photorealistic.txt — style prompt for photorealistic generation
generate_anime.txt — style prompt for anime generation
generate_negative.txt — negative prompt for generation

License

MIT