S3arch
Full-text search for DynamoDB on AWS Lambda using SQLite FTS5. Per-owner indexes stored on S3, queried from Lambda /tmp with LRU eviction and DynamoDB version tracking.
Why?
DynamoDB doesn't have native full-text search. OpenSearch/Elasticsearch is expensive and complex for many use cases. S3arch gives you fast, typo-tolerant search with zero infrastructure beyond what you already have (Lambda, S3, DynamoDB).
How it works:
DynamoDB (source) → Stream → EventBridge Pipe → SQS → Indexer Lambda
↓
SQLite FTS5 DB
↓
S3 (per-owner)
↓
Search Lambda ← /tmp cache (LRU)
Each owner (user, tenant, account) gets their own SQLite database. The indexer rebuilds it on every change. The searcher downloads it to /tmp on first query, then serves subsequent queries from the warm cache until the version changes.
Installation
gem 's3arch'
Requires the sqlite3 native extension available at runtime. On Lambda, build the layer with rake lambda:build_layer (see Building the Lambda Layer).
Usage
Configuration
require 's3arch'
S3arch.configure do |c|
c.from_env! # Reads S3ARCH_* env vars
c.source_index = 'UserIndex' # GSI for owner lookup
c.owner_key = 'userId' # Partition key for owner
c.searchable_fields = %w[name description tags] # FTS5 indexed fields
c. = %w[status created_at] # Stored for filtering (not searched)
c.record_filter = ->(item) { item['status'] == 'active' }
c.logger = Logger.new($stdout)
end
Lambda Handlers
Indexer (triggered by SQS from DynamoDB Streams):
def handler(event:, context:)
S3arch::Handler.indexer(event)
end
Searcher (invoked directly or via API Gateway):
def handler(event:, context:)
result = S3arch::Handler.search(event)
# => { record_ids: ["id1", "id2", ...], search_mode: "fts5" }
end
Search Event Format
{
"query": "blue chair",
"owner_ids": ["user-123", "user-456"],
"filters": { "status": "active" }
}
Manual Indexing
indexer = S3arch::Indexer.new
indexer.rebuild("user-123")
Environment Variables
| Variable | Used By | Description |
|---|---|---|
S3ARCH_SOURCE_TABLE |
Indexer | DynamoDB source table name |
S3ARCH_SOURCE_INDEX |
Indexer | GSI name for owner lookup (default: UserIndex) |
S3ARCH_INDEX_BUCKET |
Both | S3 bucket for SQLite index files |
S3ARCH_VERSION_TABLE |
Both | DynamoDB version tracking table |
Infrastructure
A Terraform module is provided at terraform/ that provisions:
- S3 bucket for index storage (with 90-day lifecycle)
- DynamoDB version tracking table (PAY_PER_REQUEST)
- SQS queue + DLQ for the indexer
- EventBridge Pipe (DynamoDB Stream → SQS)
module "s3arch" {
source = "github.com/stowzilla/s3arch//terraform"
app_name = "myapp"
environment = "production"
source_table_name = aws_dynamodb_table.items.name
source_table_arn = aws_dynamodb_table.items.arn
source_table_stream_arn = aws_dynamodb_table.items.stream_arn
}
Outputs include indexer_env_vars, searcher_env_vars, indexer_permissions, and searcher_permissions for easy Lambda configuration.
Configuration Options
| Option | Default | Description |
|---|---|---|
source_table |
— | DynamoDB table with source records |
source_index |
— | GSI name for querying by owner |
owner_key |
"user_id" |
Partition key field for owner lookup |
index_bucket |
— | S3 bucket for SQLite files |
version_table |
— | DynamoDB table for version tracking |
searchable_fields |
["name", "description"] |
Fields indexed in FTS5 |
metadata_fields |
["status", "created_at"] |
Fields stored for filtering |
record_filter |
->(_) { true } |
Proc to filter records during indexing |
owner_extractor |
(extracts from DynamoDB stream image) | Proc to extract owner_id from stream event |
version_ttl |
30 (seconds) |
How long to cache version checks |
max_results |
50 |
Maximum search results returned |
max_cached_dbs |
20 |
Max databases cached in /tmp |
How Search Works
- Query arrives with
owner_idsand a search string - For each owner, check DynamoDB version table (cached for
version_ttlseconds) - If version changed (or first request), download SQLite DB from S3 to
/tmp - Run FTS5
MATCHquery with prefix matching (term*) - Apply metadata filters, sort by rank, return record IDs
- LRU eviction when
/tmpfills up
Building the Lambda Layer
S3arch requires the sqlite3 native extension at runtime. A Rake task is included to build and publish the Lambda layer:
# Build the layer zip (outputs to pkg/sqlite-layer.zip)
rake s3arch:layer:build
# Build and publish in one step
rake s3arch:layer:publish PROFILE=devzilla
# Customize the build
rake s3arch:layer:build RUBY_VERSION=3.4 ARCHITECTURE=x86_64
# Publish to a specific region/account
rake s3arch:layer:publish PROFILE=production REGION=us-west-2 LAYER_NAME=stowzilla-sqlite3-ruby
| Environment Variable | Default | Description |
|---|---|---|
RUBY_VERSION |
3.4 |
Ruby runtime version |
ARCHITECTURE |
x86_64 |
x86_64 or arm64 |
OUTPUT |
pkg/sqlite-layer.zip |
Output path for the zip |
PROFILE |
(none) | AWS CLI profile for publishing |
REGION |
us-east-1 |
AWS region to publish to |
LAYER_NAME |
stowzilla-sqlite3-ruby |
Layer name in AWS |
S3ARCH_VERSION |
~> current minor |
Version constraint for s3arch in the layer |
Requires Docker to be running (uses the official AWS SAM build images).
Rake Tasks
rake s3arch:layer:build # Build the Lambda layer zip
rake s3arch:layer:publish # Build and publish to AWS
rake s3arch:rebuild # Rebuild index for an owner (OWNER_ID=xxx)
rake s3arch:info # Show config and version info
To make these tasks available in your app, add to your Rakefile:
require 's3arch/tasks'
Requirements
- Ruby >= 3.2
- AWS Lambda with
/tmpstorage (recommend 2048MB ephemeral) - SQLite3 native extension (via Lambda layer)
- DynamoDB table with streams enabled
- S3 bucket for index storage
- Docker (for building the Lambda layer)
License
MIT