Are you troubleshooting another product disapproval in Google Merchant Center (GMC)? If it says “Product pages cannot be crawled because of robots.txt restriction” you’ve come to the right place. Find out some of the most common reasons Google Shopping products are disapproved because of a robots.txt restriction.
What is a robots.txt and why is it important?
The Robotstxt website recommends “Website owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. The “User-agent: *” means this section applies to all robots. The “Disallow: /” tells the robot that it should not visit any pages on the site.” Place your robots.txt file in the root of the website.
Most websites submit a robots.txt to search engines for a variety of reasons:
- Optimize organic search engine ranking– to tell search engines which web pages to index and which ones not to index
- Optimize paid search ads – If you don’t have a well-constructed robots.txt it can lower your quality scores, cause your ads to not be approved, disallow your products in GMC as part of your product listing ads and create a multitude of other problems
- Comply with advertising guidelines – advertising guidelines, especially for pharmaceutical products in some countries, may restrict a whole website or just pharmaceutical product pages from displaying in search
- Remove content not useful to search engines – login pages, duplicate content, some pdfs, thank you pages and any other content that doesn’t make sense to index can be put in folders that you disallow.
NOTE: Submitting a robots.txt file may not always achieve the above goals especially if you are set on excluding certain pages from showing up in search engines.
Reasons your pages still could be crawled by Google
SEM Rush warns website owners that their “website content, even if it is disallowed in your robot.txt file, may still be indexed if
the page has been linked from an external source, the bots will still flow through and index the page. Illegitimate bots will still crawl and index the content.”
Why the error “Product pages cannot be crawled because of robots.txt restriction” can prevent your product pages from being approved
Google looks at the user experience when approving shopping campaigns. They will use your robots.txt file to crawl your website product pages to compare your website pages to your Google Shopping ads. If the page content doesn’t match then they may disallow certain products.
- Google can’t access your landing pages
For Google to access your whole site, ensure that your robots.txt file allows user-agents ‘Googlebot’ to crawl your website. - Google can’t access your images
For Google to access your whole site, your robots.txt file should allow user-agents ‘Googlebot-image’ to crawl your site.
Magento eCommerce robots.txt file sample
If you are looking for a sample to start from, here is an example of a Magento robot.txt file from the digital marketing agency Blue Acorn:
User-agent: *
Disallow: /index.php/
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /customer/
Disallow: /checkout/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Allow: /media/catalog/product/
Disallow: /*.php$
Disallow: /skin/
Disallow: /catalog/product/view/
User-agent: Googlebot-Image
Disallow: /
Allow: /media/catalog/product/
Sitemap: http://example.com/sitemap/sitemap
The Google Robots.txt Test
What’s next? Once you have finished your robots.txt file the next step is to test it. Google’s robots.txt tester will identify what specific URLs on your site are being blocked from Google web crawlers.
The Google robots.txt test webpage suggests the following steps to test your file:Test your robots.txt file
- Open the tester tool for your site, and scroll through the robots.txt code to locate the highlighted syntax warnings and logic errors. The number of syntax warnings and logic errors is shown immediately below the editor.
- Type in the URL of a page on your site in the text box at the bottom of the page.
- Select the user-agent you want to simulate in the dropdown list to the right of the text box.
- Click the TEST button to test access.
- Check to see if TEST button now reads ACCEPTED or BLOCKED to find out if the URL you entered is blocked from Google web crawlers.
- Edit the file on the page and retest as necessary. Note that changes made in the page are not saved to your site! See the next step.
- Copy your changes to your robots.txt file on your site. This tool does not make changes to the actual file on your site, it only tests against the copy hosted in the tool.
For more help troubleshooting GMC product disapprovals contact Highstreet.io to speak with experts in Product Feed Management.
More about Product Feeds:
How to create a Google Local Inventory Products Feed
How to create an Instagram Product Feed
4 Product Feed Solutions to Save you Time & Money