Direct Online Marketing Logo - Online Lead Generation and Internet Marketing Specialists Free Online Marketing Consultation
left nav bar
direct online marketing bottom logo
right side of nav bar
dom divider

Tracking Site Page Importance Via Crawl Cache Dates

Posted on November 24th, 2008 by Justin Seibert in DOM News, Google - SEO, SEO, Search Engines, The Marketing Experts
TAGS: , , ,

We’ve been thinking about how we can bring more value to our readers on this here internet marketing blog and settled on running an occasional series of guest posts penned by various marketing thought leaders.  This remained filed in our Good Ideas We Feel Like We’re Too Busy to Implement drawer until now.  What finally got me out of my hammock to implement this series was being featured in a post by Donna FontenotSEO Scoop as part of a series meant to spread some love and introduce the search engine marketing community to other and often newer voices.

DazzlinDonna as she is better known on the internets has been involved in this industry for quite some time and really knows her stuff.  I am personally delighted to have Donna kick off our The Marketing Experts series and now present her post on a very specific way you can improve your own search engine optimization efforts (or stay on top of your SEO firm).  — Justin

Over two years ago, way back in September, 2006, Google let us know that they were changing the way they display cache dates for our pages.  Specifically, Vanessa Fox said:

We’ve recently changed the date we show for the cached page to reflect when Googlebot last accessed it (whether the page had changed or not). This should make it easier for you to determine the most recent date Googlebot visited the page.

This was good to know because SEOs had already noticed that Google crawling patterns could indicate a sort of “crawling sandbox” and were soon realizing that cache dates were the new PageRank.  Aaron Wall summed it up by saying “What Google frequently visits (and spends significant resources to keep updated) is what they consider important.”

What is important is not just when Google last crawled a web page, but how often it returns to re-cache the page.  This crawl frequency is an excellent indicator of the importance of a page.  A good way to determine which pages will likely struggle to rank well is to see how often Google recaches each page.  Unfortunately, there aren’t any really easy methods of tracking the crawl frequency, especially across many pages of a large website.

Michael Gray recently outlined his method of crawl tracking in his post, How To Figure Out What Parts Of Your Website Aren’t Being Crawled.  This method involves uniquely date stamping every page and then querying search engines over time to see which pages haven’t been cached using that date phrase.  It’s not a bad method, and frankly I don’t have a better one.  I’ve always used the more manual process of noting the cache date of each page from a cache: query in a spreadsheet, and then tracking the query daily for a period of one month.  This only works well for a limited number of pages of course, since this type of manual checking takes time.

There is a WordPress crawl rate tracker plugin for WordPress blogs, and a couple of so-so paid tools for tracking crawl rates available, but I’ve never found the perfect “works for all sites and does it all for free” tool.  For that reason, I always concentrate on tracking just a few pages that are most important to me.  If there are some pages that I’m most concerned with ranking well, I’ll track their cache dates until I’ve determined the “cache-importance” of the page (which usually takes just a month or so).  If a page has been deemed fairly unimportant by Google, based on a very slow cache/crawl rate, then I know it’s time to focus more attention on updating the page and marketing it better.  If the page is being crawled frequently, I’ll focus my attention elsewhere for now.

If you want to spend more time on analyzing the bots that come around, your best bet is to look to your log files for more information. Javascript-based web analytics programs such as Google Analytics won’t help with this analysis, as spiders don’t run javascript, so those types of analytics programs will never be able to track spiders’ crawls.  Use those types of analytics programs strictly for human visitor analysis, as they are great for giving you a more realistic view of actual user visitors.  You’ll need log file analyzers that create reports from your actual web server log files in order to track the crawl patterns of spiders.  Ideally, you want to know when a spider comes to a page, the path it takes through your site, and how often it comes back to each page.  Most log file analysis programs can handle this task to some degree.  This kind of detailed information can give you even more insight into which pages are considered to be the most important – and which need more work.

In summary, the more often your page is crawled and cached by Google’s spiders, the more important that page is in Google’s eyes.  The more important the page is, the more likely it is to show up in Google’s search results for related queries.  If the page isn’t being crawled and cached very often, you should focus more efforts on keeping the page updated with useful content, and spend more time promoting the page.

About Donna Fontenot

Donna Fontenot, aka DazzlinDonna, is an Internet Entrepreneur and SEO, who has long utilized search engine optimization and affiliate marketing to create a successful online business.  Her goal as an ebusiness coach is to help others make a living online from the comfort of their homes (and in their pajamas).  Her motto is “You’ll never shine if you don’t glow.”

About The Marketing Experts Series

The above post is part of an occasional series of guest posts by various marketing experts.  Through it we hope to expand our community’s understanding of a variety of marketing techniques and strategies to improve their own online (and offline) marketing efforts.  If you are interested in writing for The Marketing Experts series, let us know.

8 Comments

  1. Post: Tracking Site Page Importance Via Crawl Cache Dates http://twurl.nl/iymuys

    Trackback by Paul Woodhouse on November 24, 2008

  2. Another of my guest posts today: http://cli.gs/trackcache Tracking Site Page Importance Via Crawl Cache Dates

    Trackback by Donna Fontenot on November 24, 2008

  3. Thanks Donna for this nice and informative post.
    I have a question??
    If crawl rate is the new pagerank, then the sites/blogs with frequent updates must be having more crawl rates which means they are more important for search engines.Is this true?

    Comment by Amol on January 16, 2009

  4. Amol, frequent updates can help, but aren’t necessarily going to be the only solution.  (Is there ever just one solution to any seo problem?  LOL, not!)  You should also be getting others to link to your pages/posts.   You see, it’s not just a matter of getting the bot to come to a page soon after you post – but getting it to return to that same page to recache it on a frequent basis.  It won’t return often if it believes the page isn’t important enough or popular enough to do so.  That’s why promotion of the page is important, and frequency of updates doesn’t address that aspect of it.

    Comment by DazzlinDonna on January 16, 2009

  5. @melaniephung not exactly what you’re looking for, but might lead you in the right direction – http://cli.gs/NU4P6p

    Trackback by Donna Fontenot on January 22, 2009

  6. [...] Tracking Site Page Importance Via Crawl Cache Dates [...]

    Pingback by DazzlinDonna » 22 (+) SEO and Social Media Guest Posts on May 12, 2009

  7. [...] in a guest post I wrote on directom.com in November of 2008, I discussed a few ways to track cache date, including Michael [...]

    Pingback by Discover What Google Really Thinks Of Your Pages | Search Engine People Blog on November 23, 2009

  8. I don’t know of any way to directly influence the speed with which Google will crawl and index
    sites, or update the cache. Setting update frequency in your sitemap or
    meta tags will have no effect — Google crawls and indexes your site
    based on other factors, such as site authority and history. A sitemap
    just helps Google find all your pages eventually (on its own schedule).

    Updates to the Google cache typically trail updates to the actual search index by
    quite a bit — days or weeks. You cannot use the cache as an indication
    of what is in the index. A better way to check is to search on exact
    strings in your new pages — you may find them indexed but not yet in
    the cache. As to how to speed up crawling and indexing, the best way to do this is to
    build the authority of your site, and that means building inbound
    links.

    Comment by Barny on January 28, 2010

RSS feed for comments on this post. TrackBack URL

Leave a comment

blue line dom bottom right blue line
bottom blue
Privacy Policy · Terms of Use · Contact · Free Consultation · Site Map © 2006-10, Direct Online Marketing™