What is the Googlebot’s crawl budget?
A website’s crawl budget is the crawl rate limit set by Google to prevent sending the Googlebot to your website more frequently than it needs to be. Otherwise known as crawl demand.
• Crawl rate limit is designed not to eat up all your website bandwidth
• Fast loading, error-free websites are rewarded additional crawl budget
• The crawl demand is the frequency of how often Googlebot will crawl your pages
I’ve added in another two factors into the mixing-pot here, crawl rate limit and crawl demand. Those two factors mathematically and algorithmically decide how the crawl budget is set.
The crawl rate limit is the friendlier of the two, it’s purpose is to visit and crawl pages on your site at a frequency that will not harm or impact on server performance. The second factor is the interesting one, that’s the crawl demand. The crawl demand is the popularity signal, which translates to the higher inbound links a page has the higher the crawl demand will likely to be. Additionally, pages with a lesser amount inbound link count will mean the lower the crawl demand is.
Throwing both the crawl rate and crawl demand together is the deciding factor of how your crawl budget is set. Google state that for most website owners there’s no need for concern about your crawl budget and to an extent, I agree. It’s the larger websites with tens of thousands of URLs or more that might want to consider optimizing for their given crawl budget.
From what I have read into, the crawl budget is set on a 90-day rolling basis. Your actual crawl budget is the calculated average from the peaks and troughs. So, remember if someone does happen to ask you the crawl budget of your website you can reply to them your average crawl budget is xxx number of pages crawled per day.
How to optimise Googlebot’s crawl budget?
Moving on, how to optimise a website to make the most out of the crawl budget? As previously mentioned for small websites with a few hundred pages you’re probably not going see any noticeable differences. Nevertheless, it certainly wouldn’t have any negative impact. It’s the larger websites will tens of thousands of URLs that would be more of a concern. For example; eCommerce websites, forum communities, and well-aged publisher sites.
Crawl budget optimisation is not difficult to implement. Let’s get into the mindset first and think about the objectives. We want to instruct the Googlebot not to crawl areas of a website which have no need to be indexed in the organic (natural) search results. Why? They could be wasting the set crawl budget. Reducing the frequency that pages with meaningful and indexable URLs are crawled.
Where do you start?
Blocking off access to Googlebot to folders and URLs that it has no business snooping around, would be a good place to start. Any URL (webpage) that does not have meaningful (indexable) content can be blocked off in Robots.txt. To block those URLs individually could and probably would take months. To save time I would recommend using wildcards in Robots.txt.
If your website is running on a closed hosted platform such as Shopify that restrict editing of the Robots.txt file. You can work around that by using the URL Parameters tool in Google Search Console. Another method would be to content prune. Removing content from your website that no longer has any meaningful value to a visitor.
Which method you choose is your decision. Have a good think about it first before actioning any of the suggested methods above. If you have a small-medium website and very little technical, I wouldn’t recommend action any of the above except for content pruning. Just ensure you have backups of the content first. I say this as could potentially wipe out all organic listings for your website. If that happens it’s on you. You have been warned, although I can help you out if that happens.
Be mindful that redirects, canonicals, noindex and nofollow rules are counted towards the crawl budget. Also, ensure you remove any blocked or pruned URLs out of the XML sitemap. That goes for URLs with non-self-referencing canonicals. It’s wastage and sending out mixed signals.
• Block Googlebot from unwanted areas of your website
• Wildcards in Robots.txt are great timesavers
• Use URL Parameters tool in Search Console for closed-platform sites
• Content pruning of pages lacking in any meaningful content
• Exclude blocked URLs from sitemaps
• Remove URLs with canonicals from sitemaps
• ALWAYS test using Robots.txt testing & fetch as Google tool in Search Console
• Keep an eye on blocked resources in Search Console
Why should you optimise?
Crawl budget is not a ranking signal. It’s one of the many toolsets that Google has at its disposal to help with determining and understanding a collective of overall quality signals a website has to offer. Excluding Googlebot from crawling less-meaningful areas of a site helps with diverting attention towards those more meaningful pages a site has to offer. Reading between the lines it could offer a higher crawl frequency for those more ‘meaningful’ pages. Translation being stabilising organic search rankings. Less of the wild yo-yo effect in search rankings.
The sites who would benefit the most, in my opinion, are forum community websites with tens of thousands of threads with millions of posts. Crawl demand would likely to be lower as forum communities do not always generate quality inbound links. Large forums have been penalized in my opinion by the introduction of the crawl budget a few years back. A simple way to test this theory is to examine how many URLs are indexed in Search Console vs URLs count in the XML sitemaps.
Second on my list would be eCommerce stores, less reliance on canonicals, nofollows and noindexes on facets, pagination and hashbanged URLs could have a profound impact. The opportunity to increase the crawl rate frequency on lesser crawled sub-categories and product URLs would be a blessing from the legendary Matt Cutts himself.
Lastly, small website owners who’ve started to take an interest in optimising for crawl rate limits and crawl demand are far less likely to experience any impact with organic search visibility. I would recommend you start focusing on creating meaningful website content that will attract visitors and earn links to your sites at a pace you’re comfortable with.
• Optimising crawl budget can help to stabilize yo-yo ranking fluctuations
• Forum community-based websites can boost thread indexing count
• Increase Googlebot crawl frequency to meaningful pages
• Helps with ruling out other technical SEO related website issues
• May help with improving organic search traffic, sales and conversions
• Boosting the overall site quality signals to Google
How to increase crawl budget?
This is a straightforward answer, create lots of meaningful content on your website. That is a proven time-tested method of attracting and earning links. The reality of it all, it can be very resource and time-intensive.
Looking at this from another perspective.
• Higher link count to your website from other domains equals an increased crawl demand for your website.
• Authoritative websites will pass along a greater amount of crawl demand than less authoritative sites.
• Crawl demand passed from one site to another will increase the frequency of the crawl rate limit.
Is crawl budget a repurposed and repackaged version of PageRank? The indicators and traits are very similar. Google started pulling back on toolbar Pagerank a few years ago, maybe crawl budget is its replacement. Who knows? The similarities are certainly there for all to see.
Back to the question, how to increase crawl budget? Attract links from websites with a greater crawl budget than yours. 😊 If your team is running content marketing campaigns, start asking webmasters what their crawl budget is? Let’s see what response you get. Brilliant move Google.
Time to wrap this up now, the point of this article has been to answer some questions I had whizzing around in my head. Mainly the Correlation with Pagerank. Personally, as an SEO, I have never really given crawl budgets that much attention until about 12 months ago. As I have worked on larger sites, I’ve noticed the greater need for optimising towards crawl-budget. As a large forum community owner, I’ve noticed index count dropping, which pointed back towards reduced crawl rate limits.
It has led me to believe that over the last 12 months Google has started to forefront crawl budgets as a leading overall site quality signal. It’s carrying certainly more signal weight than I previously thought. I also strongly suspect the very foundations behind legacy toolbar PageRank are leading considerations for evaluating a website’s given crawl budget.
Hopefully, my article on crawl budgets has given additional insights that other articles have not covered. Those are the articles generally touch on and mention that crawl budgets decrease as well as increase. Please god, if you decide to implement any of the suggested actions above, take absolute precaution. You could wipe out whole swathes of traffic driving URLs indexed in the organic results.