The WordPress SEO Tutorial Disallow WP-Admin via a robots.txt article is part of a series of SEO tutorials to support the Stallion WordPress SEO Plugin and the Stallion Responsive WordPress SEO Theme.
This WordPress SEO tutorial covers how to stop Google etc… from indexing pages under /wp-admin/ via a robots.txt file.
Currently (October 2016) there’s no Stallion SEO Plugin or Stallion Responsive theme features to create/manage a robots.txt file. Use an FTP program like Filezilla to upload your robots.txt file.
Return to the main Stallion WordPress SEO Plugin Documentation article.
Stallion WordPress SEO Plugin Disallow WP-Admin Pages Tutorial
In the Stallion WordPress SEO Plugin version 2.* and the Stallion Responsive Theme version 8.4 there’s an option to block /wp-admin/ pages via a canonical URL. As I was upgrading the Stallion WordPress SEO plugin to version 3.0.0 I realized the feature doesn’t work, Doh!
For WordPress to add a canonical URL to a webpage it has to be loaded by a browser or a search engine spider in a way that viewing source of the page shows the canonical URL in the source code.
View the source of this post and you’ll see a canonical URL, but directly load a WordPress core PHP file under /wp-admin/ and it will either be blank or result in an error : means no canonical URL. There’s no way to block the indexing of PHP pages under /wp-admin/ by adding a canonical URL :-)
I will be removing the Plugin Admin Pages options in both the Stallion SEO Plugin version 3.0.0 and the Stallion Responsive Theme version 8.5. No harm done, just a waste of time.
WordPress SEO Tutorial
That being said it does make sense to block search engines from indexing everything under /wp-admin/ and the easiest way to achieve this is via your sites robots.txt file. I’m afraid that means using an FTP program like Filezilla to upload a modified robots.txt file to your server.
PHP Files Indexed Under /wp-admin/
Everything under example.tld/wp-admin/ are admin resources and shouldn’t be indexed by Google. Occasionally WordPress pages under the admin section will be indexed by Google by mistake.
A simple Google site: search can show if this issue exists on your WordPress site, copy the entire line below and put in in a Google search.
site:https://stallion-theme.co.uk/wp-admin/
Replace this sites domain name https://stallion-theme.co.uk/ with your sites domain name.
The Google site: search below shows this website has no pages indexed under the /wp-admin/ folder (this is what you want to see, the robots.txt file blocks that folder). This site doesn’t have a problem.
On some servers loading a folder under the admin section of WordPress: example.tld/wp-admin/includes/ will show the PHP files etc…
Was easy enough to find an example:
This isn’t ideal for security reasons: it’s disappointing the WordPress development team haven’t added ‘blank’ index.php files (index.php file with this one line of code <?php // Silence is golden) to all /wp-admin/ (and /wp-includes/) folders so anyone loading them in a browser (like Google’s search engine spiders) see a blank page.
Since PHP pages could be indexed under the admin folders it makes sense to block them via the robots.txt file as described below.
WordPress robots.txt File Contents
The content of this sites robots.txt file (it is just a text file called robots.txt) can be seen by loading it in a browser. First step, load your robots.txt file in a browser and see if one exists and if so what’s in it. To see your robots.txt simply load your domains home page and add /robots.txt file to the end like this:
https://stallion-theme.co.uk/robots.txt
If a robots.txt file doesn’t exist use a txt editor and create a new file and save it as robots.txt file.
The important WordPress relevant lines from this sites robots.txt file are:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /license.txt
The first line:
User-agent: *
user-agent means for this user agent (could be a search engine spider or specific browser) use the rules on the next line. Since the user-agent is set to wildcard (*), it means ALL user agents (all search engine spiders : this can be set to specific search engine spiders).
The next line:
Disallow: /wp-admin/
is our first rule. Disallow to a search engine spider means do not spider or index anything under the folder /wp-admin/ (this includes sub-folders under /wp-admin/ like /wp-admin/includes/).
A robots.txt file with the following two lines will stop Google indexing anything under /wp-admin/.
User-agent: *
Disallow: /wp-admin/
Job done, Google respects the disallow rule, so Google won’t index anything under /wp-admin/.
The next rule:
Disallow /wp-includes/
stops search engines indexing everything under /wp-includes/.
Disallowing the /wp-includes/ folder isn’t always a good idea because there are WordPress CSS (Cascading Style Sheets) and JS (JavaScript) files that are sometimes loaded by WordPress theme/plugin features on the front end: the front end is what users and search engines see when they load your home page, categories, posts etc…
I can safely use this rule because this site uses the W3 Total Cache plugin and it’s set to combine all CSS files and JS files. If this site loads CSS/JS files from the /wp-includes/ folder, they aren’t linked in the code via the /wp-includes/ folder. View source of this page and you won’t find any links to files under /wp-includes/. The W3 Total Cache Plugin (with the right settings) combines the CS and JS files (for performance SEO reasons) and serves them from another folder.
This domain can safely disallow the /wp-includes/ folder as long as the W3 Total Cache plugin is active.
To check your site load the home page, a category, a tag, a post and a page and view their HTML source: in Firefox to view source it’s “Right Click” on the webpage, followed by Left Clicking “View Page Source”.
Search through the HTML code for instances of /wp-includes/, if you find some (most likely JS files) don’t use the “Disallow: /wp-includes/” rule, Google doesn’t like it: you might get notifications via Google Webmaster Tools about blocking JS/CSS files.
WordPress readme.html File and license.txt Security Concerns
The last two lines
Disallow: /readme.html
Disallow: /license.txt
Are related to two WordPress files which contain the WordPress version number: every version of WordPress has a unique version number. Hackers search for the WordPress version number to try to find WordPress sites that haven’t updated and are running versions of WordPress with security vulnerabilities.
Search Google for phrases within the readme.html file like this line:
Version 4.6.1 Semantic Personal Publishing Platform
or
“WordPress has no multi-million dollar marketing campaign or celebrity sponsors, but we do have something even better—you.”
Will indicate how many sites can be easily found this way.
It’s a no brainer blocking these via the robots.txt file.
If you’ve read a WordPress SEO tutorial recommending blocking other WordPress parts of a site they are almost certainly wrong. Anything blocked from search engines this way is severe, for example some people will recommend blocking all your tag archives via Disallow: /tags/ or your category archives via Disallow: /category/.
This is a huge mistake, use the Stallion WordPress SEO Plugin options related to Blocking Tags and Blocking Categories which recover most of the link benefit (it’s generally not a good ides to block categories and tags).
David Law