How Webflow Handles Sitemaps Automatically
Webflow generates a sitemap.xml file automatically at yourdomain.com/sitemap.xml. It updates every time you publish. The sitemap includes all published pages and CMS collection items. This is convenient, but it also means utility pages, password-protected pages, and 'draft' pages that are technically published can end up in your sitemap—telling Google to crawl and index pages you never intended to rank.
Auditing Your Webflow Sitemap
- Open yourdomain.com/sitemap.xml in your browser.
- Check every URL. Are there style guide pages, 404 pages, or test pages listed?
- For unwanted pages, go to Page Settings in Webflow and toggle 'Exclude this page from sitemap'.
- For CMS items you do not want indexed, use a conditional visibility trick: set the item to 'Draft' status, or use a toggle field to control sitemap inclusion programmatically.
- After cleanup, republish and verify the sitemap no longer contains excluded URLs.
Understanding Webflow’s Default Robots.txt
Webflow generates a default robots.txt that allows all crawlers access to all pages. On staging subdomains (yoursite.webflow.io), Webflow adds a 'Disallow: /' directive to prevent indexing. Once you connect a custom domain, the robots.txt switches to permissive. You cannot directly edit robots.txt in Webflow’s UI—there is no file editor for it. However, you can override it by hosting a custom robots.txt file via a reverse proxy, Cloudflare Workers, or by redirecting /robots.txt to a CMS-hosted text page.
If your Webflow staging site (yoursite.webflow.io) is indexed in Google, it means the noindex directive was not in place or Google cached it before the directive was added. Submit a removal request in Search Console for the staging domain.
Custom Robots.txt with Cloudflare Workers
If you need to block specific crawlers or directories, the cleanest approach is a Cloudflare Worker that intercepts requests to /robots.txt and returns your custom content. This takes about 10 minutes to set up. Create a Worker that checks if the request URL path is /robots.txt, and if so, returns a new Response with your desired directives. Otherwise, pass the request through to Webflow’s origin.
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
const url = new URL(request.url)
if (url.pathname === '/robots.txt') {
const robotsTxt = `User-agent: *
Allow: /
Disallow: /style-guide
Disallow: /utility-pages/
Sitemap: https://yoursite.com/sitemap.xml`
return new Response(robotsTxt, {
headers: { 'Content-Type': 'text/plain' }
})
}
return fetch(request)
}Submitting Your Sitemap to Google Search Console
After connecting your custom domain and publishing, go to Google Search Console, select your property, navigate to Sitemaps in the left menu, and submit yourdomain.com/sitemap.xml. Google will crawl it and report any errors. Check back after 48 hours to verify all URLs are discovered. If you see 'Couldn’t fetch' errors, verify your custom domain DNS is properly configured and SSL is active.
Sitemap Best Practices for Webflow CMS
- Keep your sitemap under 50,000 URLs (Webflow will not hit this limit, but it is the Google maximum).
- Use descriptive slugs on CMS items—these become the URL paths in the sitemap.
- Update content regularly so Google sees fresh lastmod dates when recrawling.
- If using multiple CMS collections, verify all collections appear in the sitemap.
Need help with crawl management on your Webflow site? We handle sitemaps, robots.txt, and Search Console setup.
Talk to Our SEO Team→
