S3arch

Full-text search for DynamoDB on AWS Lambda using SQLite FTS5. Per-owner indexes stored on S3, queried from Lambda /tmp with LRU eviction and DynamoDB version tracking.

Why?

DynamoDB doesn't have native full-text search. OpenSearch/Elasticsearch is expensive and complex for many use cases. S3arch gives you fast, typo-tolerant search with zero infrastructure beyond what you already have (Lambda, S3, DynamoDB).

How it works:

DynamoDB (source) → Stream → EventBridge Pipe → SQS → Indexer Lambda
                                                         ↓
                                                    SQLite FTS5 DB
                                                         ↓
                                                    S3 (per-owner)
                                                         ↓
                                    Search Lambda ← /tmp cache (LRU)

Each owner (user, tenant, account) gets their own SQLite database. The indexer rebuilds it on every change. The searcher downloads it to /tmp on first query, then serves subsequent queries from the warm cache until the version changes.

Installation

gem 's3arch'

Requires the sqlite3 native extension available at runtime. On Lambda, build the layer with rake lambda:build_layer (see Building the Lambda Layer).

Usage

Configuration

require 's3arch'

S3arch.configure do |c|
  c.from_env!                                    # Reads S3ARCH_* env vars
  c.source_index = 'UserIndex'                   # GSI for owner lookup
  c.owner_key = 'userId'                         # Partition key for owner
  c.searchable_fields = %w[name description tags] # FTS5 indexed fields
  c.metadata_fields = %w[status created_at]       # Stored for filtering (not searched)
  c.record_filter = ->(item) { item['status'] == 'active' }
  c.logger = Logger.new($stdout)
end

Lambda Handlers

Indexer (triggered by SQS from DynamoDB Streams):

def handler(event:, context:)
  S3arch::Handler.indexer(event)
end

Searcher (invoked directly or via API Gateway):

def handler(event:, context:)
  result = S3arch::Handler.search(event)
  # => { record_ids: ["id1", "id2", ...], search_mode: "fts5" }
end

Search Event Format

{
  "query": "blue chair",
  "owner_ids": ["user-123", "user-456"],
  "filters": { "status": "active" }
}

Manual Indexing

indexer = S3arch::Indexer.new
indexer.rebuild("user-123")

Environment Variables

Variable	Used By	Description
`S3ARCH_SOURCE_TABLE`	Indexer	DynamoDB source table name
`S3ARCH_SOURCE_INDEX`	Indexer	GSI name for owner lookup (default: `UserIndex`)
`S3ARCH_INDEX_BUCKET`	Both	S3 bucket for SQLite index files
`S3ARCH_VERSION_TABLE`	Both	DynamoDB version tracking table

Infrastructure

A Terraform module is provided at terraform/ that provisions:

S3 bucket for index storage (with 90-day lifecycle)
DynamoDB version tracking table (PAY_PER_REQUEST)
SQS queue + DLQ for the indexer
EventBridge Pipe (DynamoDB Stream → SQS)

module "s3arch" {
  source                  = "github.com/stowzilla/s3arch//terraform"
  app_name                = "myapp"
  environment             = "production"
  source_table_name       = aws_dynamodb_table.items.name
  source_table_arn        = aws_dynamodb_table.items.arn
  source_table_stream_arn = aws_dynamodb_table.items.stream_arn
}

Outputs include indexer_env_vars, searcher_env_vars, indexer_permissions, and searcher_permissions for easy Lambda configuration.

Configuration Options

Option	Default	Description
`source_table`	—	DynamoDB table with source records
`source_index`	—	GSI name for querying by owner
`owner_key`	`"user_id"`	Partition key field for owner lookup
`index_bucket`	—	S3 bucket for SQLite files
`version_table`	—	DynamoDB table for version tracking
`searchable_fields`	`["name", "description"]`	Fields indexed in FTS5
`metadata_fields`	`["status", "created_at"]`	Fields stored for filtering
`record_filter`	`->(_) { true }`	Proc to filter records during indexing
`owner_extractor`	(extracts from DynamoDB stream image)	Proc to extract owner_id from stream event
`version_ttl`	`30` (seconds)	How long to cache version checks
`max_results`	`50`	Maximum search results returned
`max_cached_dbs`	`20`	Max databases cached in `/tmp`

How Search Works

Query arrives with owner_ids and a search string
For each owner, check DynamoDB version table (cached for version_ttl seconds)
If version changed (or first request), download SQLite DB from S3 to /tmp
Run FTS5 MATCH query with prefix matching (term*)
Apply metadata filters, sort by rank, return record IDs
LRU eviction when /tmp fills up

Building the Lambda Layer

S3arch requires the sqlite3 native extension at runtime. A Rake task is included to build and publish the Lambda layer:

# Build the layer zip (outputs to pkg/sqlite-layer.zip)
rake s3arch:layer:build

# Build and publish in one step
rake s3arch:layer:publish PROFILE=devzilla

# Customize the build
rake s3arch:layer:build RUBY_VERSION=3.4 ARCHITECTURE=x86_64

# Publish to a specific region/account
rake s3arch:layer:publish PROFILE=production REGION=us-west-2 LAYER_NAME=stowzilla-sqlite3-ruby

Environment Variable	Default	Description
`RUBY_VERSION`	`3.4`	Ruby runtime version
`ARCHITECTURE`	`x86_64`	`x86_64` or `arm64`
`OUTPUT`	`pkg/sqlite-layer.zip`	Output path for the zip
`PROFILE`	(none)	AWS CLI profile for publishing
`REGION`	`us-east-1`	AWS region to publish to
`LAYER_NAME`	`stowzilla-sqlite3-ruby`	Layer name in AWS
`S3ARCH_VERSION`	`~> current minor`	Version constraint for s3arch in the layer

Requires Docker to be running (uses the official AWS SAM build images).

Rake Tasks

rake s3arch:layer:build    # Build the Lambda layer zip
rake s3arch:layer:publish  # Build and publish to AWS
rake s3arch:rebuild        # Rebuild index for an owner (OWNER_ID=xxx)
rake s3arch:info           # Show config and version info

To make these tasks available in your app, add to your Rakefile:

require 's3arch/tasks'

Requirements

Ruby >= 3.2
AWS Lambda with /tmp storage (recommend 2048MB ephemeral)
SQLite3 native extension (via Lambda layer)
DynamoDB table with streams enabled
S3 bucket for index storage
Docker (for building the Lambda layer)

License

MIT