Crawlscope
crawlscope is a small Ruby gem for sitemap-driven SEO validation.
It is built by Ethos Link and used in production by Reviato.
It is designed for Rails apps and plain Ruby scripts that want:
- deterministic sitemap crawling
- structured validation issues instead of free-form strings
- app-configurable rule and schema registries
- first-party rake tasks instead of a large DSL
- optional browser rendering for JavaScript-heavy pages
It works in three modes:
- as a plain Ruby library
- as a standalone CLI
- as Rails rake tasks through the included Railtie
The default rule set includes:
- metadata validation
- structured-data validation
- uniqueness checks
- internal-link checks
Installation
Add this line to your application's Gemfile:
gem "crawlscope"
And then execute:
bundle install
Or install it directly:
gem install crawlscope
If you want browser rendering, also add:
gem "ferrum"
crawlscope only loads Ferrum when you run in browser mode.
CLI Usage
Validate a site directly from the gem:
crawlscope validate --base-url https://example.com
Validate only specific rules:
crawlscope validate --base-url https://example.com --rules metadata,links
Validate structured data on one or more URLs:
crawlscope ldjson --url https://example.com/article
crawlscope ldjson --url https://example.com/a --url https://example.com/b --summary
If you do not pass --sitemap, crawlscope defaults to:
https://example.com/sitemap.xmlfor real site URLspublic/sitemap.xmlfor localhost-style development URLs when that file exists
Child sitemap indexes are supported automatically.
Ruby Usage
require "crawlscope"
audit = Crawlscope::Audit.new(
base_url: "https://example.com",
sitemap_path: "https://example.com/sitemap.xml",
rules: Crawlscope::RuleRegistry.default(site_name: "Example").rules,
schema_registry: Crawlscope::SchemaRegistry.default
)
result = audit.call
puts result.ok?
puts result.issues.to_a.map(&:message)
Result Shape
Crawlscope::Audit returns a Crawlscope::Result with:
urls: sitemap URLs selected for validationpages: fetched page snapshotsissues: structured issues withcode,severity,category,url, andmessage
result.ok? returns false if any error, warning, or notice is present.
Rails Usage
In an initializer:
Crawlscope.configure do |config|
config.base_url = -> { "https://example.com" }
config.sitemap_path = -> { Rails.public_path.join("sitemap.xml").to_s }
config.site_name = "Example"
config.schema_registry = -> { Crawlscope::SchemaRegistry.default }
end
Then run:
bin/rails crawlscope:validate
Available environment overrides:
BASE_URLSITEMAPRULES=metadata,linksJS=1orRENDERER=browserTIMEOUT=30NETWORK_IDLE_TIMEOUT=10CONCURRENCY=5
Available tasks:
bin/rails crawlscope:validate
bin/rails crawlscope:validate:metadata
bin/rails crawlscope:validate:structured_data
bin/rails crawlscope:validate:uniqueness
bin/rails crawlscope:validate:links
bin/rails crawlscope:validate:ldjson URL=https://example.com/article
The same validation surface is also available in the gem repository itself through plain rake:
bundle exec rake crawlscope:validate BASE_URL=https://example.com
bundle exec rake crawlscope:validate:metadata BASE_URL=https://example.com
bundle exec rake crawlscope:validate:ldjson URL=https://example.com/article
Structured Data URL Audit
For one-off structured-data checks:
bin/rails crawlscope:validate:ldjson URL=https://example.com/article
bin/rails crawlscope:validate:ldjson URL='https://example.com/a;https://example.com/b' SUMMARY=1
bin/rails crawlscope:validate:ldjson URL=https://example.com/article REPORT_PATH=tmp/structured-data.json
Optional flags:
DEBUG=1: print detected itemsSUMMARY=1: print grouped failuresREPORT_PATH=...: write a JSON reportJS=1orRENDERER=browser: render with Ferrum
Rules
Built-in rules:
metadatastructured_datauniquenesslinks
Metadata
Checks:
- missing
<h1> - missing
<title> - title length
- repeated site name in the title
- missing meta description
- meta description length
- missing canonical link
- canonical mismatch
Structured Data
Checks:
- malformed JSON-LD
- missing required fields for supported schema types
- schema validation failures from the configured registry
- direct URL structured-data audits through
crawlscope:validate:ldjson
Uniqueness
Checks:
- duplicate titles
- duplicate meta descriptions
- duplicate content fingerprints
Links
Checks:
- broken internal links
- unresolved internal links
- low inbound anchor-link counts
Schema Registry
crawlscope ships with a default schema registry for common types such as:
ArticleFAQPageOrganizationProductReviewSoftwareApplicationWebApplicationWebSite
Host apps can replace or extend the registry:
Crawlscope.configure do |config|
config.schema_registry = -> { MyApp::StructuredData::SchemaRegistry.new }
end
That makes crawlscope useful as the audit engine while the app remains the owner of stricter product-specific schema rules.
Development
git clone https://github.com/ethos-link/crawlscope.git
cd crawlscope
bundle install
bundle exec rake test
bundle exec rake standard
bundle exec rake
Git hooks
We use lefthook with the Ruby commitlint gem to enforce Conventional Commits on every commit. We also use Standard Ruby to keep code style consistent. CI validates commit messages, Standard Ruby, tests, and git-cliff changelog generation on pull requests and pushes to main/master.
Run the hook installer once per clone:
bundle exec lefthook install
Install locally
rake install
Release
Releases are tag-driven and published by GitHub Actions to RubyGems. Local release commands never publish directly.
Install git-cliff locally before preparing a release. The release task regenerates CHANGELOG.md from Conventional Commits.
Before preparing a release, make sure you are on main or master with a clean worktree.
Then run one of:
bundle exec rake 'release:prepare[patch]'
bundle exec rake 'release:prepare[minor]'
bundle exec rake 'release:prepare[major]'
bundle exec rake 'release:prepare[0.1.0]'
The task will:
- Regenerate
CHANGELOG.mdwithgit-cliff. - Update
lib/crawlscope/version.rb. - Commit the release changes.
- Create and push the
vX.Y.Ztag.
The Release workflow then runs tests, publishes the gem to RubyGems, and creates the GitHub release from the changelog entry.
Contributing
- Fork it
- Create a branch (
git checkout -b feature/my-feature) - Commit your changes
- Push (
git push origin feature/my-feature) - Open a Pull Request
Please use Conventional Commits for commit messages.
License
MIT License, see LICENSE.txt
About
Made by the team at Ethos Link — practical software for growing businesses. We build tools for hospitality operators who need clear workflows, fast onboarding, and real human support.
We also build Reviato, “Capture. Interpret. Act.”. Turn guest feedback into clear next steps for your team. Collect private appraisals, spot patterns across reviews, and act before small issues turn into public ones.