Bringing a Website Out of the Shadows
If your site can’t get found, it doesn’t matter how good it is.
For your products or services to show up in a Google search, you need to ensure that the search engine is “indexing” your website, effectively storing your information ahead of ranking you against competitive sites. Wiley Efficient Learning’s site performance indicated that it had an indexing problem.
Rebel had the answer: A Site Crawl & Indexation Audit that would help remove the roadblocks.
A Website in the Dark
There are any number of reasons that Google is unable to effectively crawl and index your site. (Indexing is the process Google uses to get information about your site, compare it to others, and then rank it.) Webpages might have keyword, metadata, and status issues, or related signals that create access issues for Googlebot — what Google uses to “crawl” sites for content — resulting in either your site’s content not showing up or rendering the wrong content, giving searchers the dreaded “404 Page Not Found” error. That was the issue for Wiley Efficient Learning — poor indexing was keeping their website off the search radar.
Uncovering the Roadblocks
The only way to truly gauge where the obstacles were – especially given the overall expansiveness of the Wiley site — was through a comprehensive Site Crawl & Indexation Audit process.
The exhaustive exploration included, but was not limited to:
URL statuses across the site. One of the first things we look at when crawling large websites is the spread of response codes that pop up for URLs across the website. These status codes are important because they can determine if and how your website is crawled by Google. While reviewing efficientlearning.com, we found a wide range of status code breakdowns, which has the capability to stop crawling.
Robots.txt files. These are instructions for crawlers, like Googlebot. When crawlers show up to your website to navigate your website, they first encounter your website’s robots.txt file. This file gives crawlers instructions about which pages they are allowed to visit and where your sitemaps are located. For Wiley, we found “broken” robot.txt files resulting in approximately 78 unindexed pages, all of which had valuable content.
Sitemaps. Sitemaps are used to tell search engines which of the pages on your site are most important. The rule of thumb is that you want pages only with a 200 status code (a response code that indicates that a request has succeeded) and that exclude pages that cannot be indexed. In Sitemap searching, Rebel found some sitemaps incorrectly redirecting back to the home page, hundreds of URLs excluded from sitemaps or orphaned, and duplicate and non-indexable URLs.
Canonical tags. Canonical tags are used to suggest to Google which URL is the correct version of a page that should be shown in Google. This is helpful when exact or near duplicates of a page exist across the website. For efficientlearning.com, we found 500 missing tags and several non-indexable tags.
The point of the Site Crawl & Indexation Audit is that most of us simply don’t know what we don’t know. The audit uncovered hundreds of “hidden” obstacles to an optimized, Google-friendly Wiley Efficient Learning site.