DOM News

Tracking Site Page Importance Via Crawl Cache Dates

By Justin Seibert| 6 Min Read | November 24, 2008
LinkedInFacebookTwitter

We’ve been thinking about how we can bring more value to our readers on this here internet marketing blog and settled on running an occasional series of guest posts penned by various marketing thought leaders.  This remained filed in our Good Ideas We Feel Like We’re Too Busy to Implement drawer until now.  What finally got me out of my hammock to implement this series was being featured in a post by Donna FontenotSEO Scoop as part of a series meant to spread some love and introduce the search engine marketing community to other and often newer voices.

DazzlinDonna as she is better known on the internets has been involved in this industry for quite some time and really knows her stuff.  I am personally delighted to have Donna kick off our The Marketing Experts series and now present her post on a very specific way you can improve your own search engine optimization efforts (or stay on top of your SEO firm).  — Justin

Over two years ago, way back in September, 2006, Google let us know that they were changing the way they display cache dates for our pages.  Specifically, Vanessa Fox said:

We’ve recently changed the date we show for the cached page to reflect when Googlebot last accessed it (whether the page had changed or not). This should make it easier for you to determine the most recent date Googlebot visited the page.

This was good to know because SEOs had already noticed that Google crawling patterns could indicate a sort of “crawling sandbox” and were soon realizing that cache dates were the new PageRank.  Aaron Wall summed it up by saying “What Google frequently visits (and spends significant resources to keep updated) is what they consider important.”

What is important is not just when Google last crawled a web page, but how often it returns to re-cache the page.  This crawl frequency is an excellent indicator of the importance of a page.  A good way to determine which pages will likely struggle to rank well is to see how often Google recaches each page.  Unfortunately, there aren’t any really easy methods of tracking the crawl frequency, especially across many pages of a large website.

Michael Gray recently outlined his method of crawl tracking in his post, How To Figure Out What Parts Of Your Website Aren’t Being Crawled.  This method involves uniquely date stamping every page and then querying search engines over time to see which pages haven’t been cached using that date phrase.  It’s not a bad method, and frankly I don’t have a better one.  I’ve always used the more manual process of noting the cache date of each page from a cache: query in a spreadsheet, and then tracking the query daily for a period of one month.  This only works well for a limited number of pages of course, since this type of manual checking takes time.

There is a WordPress crawl rate tracker plugin for WordPress blogs, and a couple of so-so paid tools for tracking crawl rates available, but I’ve never found the perfect “works for all sites and does it all for free” tool.  For that reason, I always concentrate on tracking just a few pages that are most important to me.  If there are some pages that I’m most concerned with ranking well, I’ll track their cache dates until I’ve determined the “cache-importance” of the page (which usually takes just a month or so).  If a page has been deemed fairly unimportant by Google, based on a very slow cache/crawl rate, then I know it’s time to focus more attention on updating the page and marketing it better.  If the page is being crawled frequently, I’ll focus my attention elsewhere for now.

If you want to spend more time on analyzing the bots that come around, your best bet is to look to your log files for more information. Javascript-based web analytics programs such as Google Analytics won’t help with this analysis, as spiders don’t run javascript, so those types of analytics programs will never be able to track spiders’ crawls.  Use those types of analytics programs strictly for human visitor analysis, as they are great for giving you a more realistic view of actual user visitors.  You’ll need log file analyzers that create reports from your actual web server log files in order to track the crawl patterns of spiders.  Ideally, you want to know when a spider comes to a page, the path it takes through your site, and how often it comes back to each page.  Most log file analysis programs can handle this task to some degree.  This kind of detailed information can give you even more insight into which pages are considered to be the most important – and which need more work.

In summary, the more often your page is crawled and cached by Google’s spiders, the more important that page is in Google’s eyes.  The more important the page is, the more likely it is to show up in Google’s search results for related queries.  If the page isn’t being crawled and cached very often, you should focus more efforts on keeping the page updated with useful content, and spend more time promoting the page.

About Donna Fontenot

Donna Fontenot, aka DazzlinDonna, is an Internet Entrepreneur and SEO, who has long utilized search engine optimization and affiliate marketing to create a successful online business.  Her goal as an ebusiness coach is to help others make a living online from the comfort of their homes (and in their pajamas).  Her motto is “You’ll never shine if you don’t glow.”

About The Marketing Experts Series

The above post is part of an occasional series of guest posts by various marketing experts.  Through it we hope to expand our community’s understanding of a variety of marketing techniques and strategies to improve their own online (and offline) marketing efforts.  If you are interested in writing for The Marketing Experts series, let us know.

To get more information on this topic, contact us today for a free consultation or learn more about our status as a Google Partner before you reach out.


Full-Scope Online Marketing Services | justin-seibert-headshot

Written by Justin Seibert

Justin Seibert is the President of Direct Online Marketing. Justin holds a Bachelor of Arts from Vanderbilt University. He contributes a wide range of online business-oriented topics, including the subject of exporting. His contributions can be found on publications such as the Pittsburgh Business Times, AdAge, SES Magazine, and La Voz del interior. Justin and his family enjoy learning about new cultures during their travels.

View Justin Seibert's Full Bio

Related Articles

All Blogs
Internal Links SEO - How Much Do They Still Matter In 2022?
By Jonathan Bentz| 20 Min Read | March 9, 2024

How Much Do Internal Links Affect SEO? (Updated March 2024)

Every business on the planet wants its website to rank number one for every keyword…

Read Article right arrow
User Experience and SEO | UX and SEO | Whiteboard Meeting
By Kevin Hein| 7 Min Read | February 22, 2024

The Impact of User Experience on SEO

Table of Contents Introduction Bounce Rates and Dwell Time Mobile Optimization Page Speed and SEO…

Read Article right arrow
SEO Rules | SEO Best Practices | Hand Choosing Tactics
By Jim Foreman| 8 Min Read | February 15, 2024

The 5 Bedrock SEO Rules That Will Always (Probably) Be True

Table of Contents Introduction User Experience Matters Backlinks Build Authority Keywords Are the Compass of…

Read Article right arrow