Alien Road Company

Control what you share with Google

Control what information Google sees on your site and what is shown in search results. There are a few reasons you might want to hide content from Google:

  • To keep data private: You might have private data hosted on your site that you want to keep other users from accessing. You can block Google from crawling such data so it doesn’t show up in search results.
  • To hide content of less value to your audience:Your website might have the same content in different places, which could negatively affect your page rankings in Google Search. A good example of where duplicate content can arise is a site-wide search function to help users navigate site content. Some search functions generate and display custom search results pages every time any user enters a query. Google can crawl all the custom search results pages individually if they are not blocked. As a result, Google sees a site with many similar pages and might categorize the duplicate content as spam, which could undermine page rankings in Google Search.Your website might share information generated by other third-party sources, which is available in other places on the web. Google sees less value in including your pages that include large amounts of duplicate content in Google Search results. You can block the copied content to improve what Google sees and boost your page rankings in Google Search.
  • To have Google focus on your important content: If you have a very large site (many thousands of URLs) and pages with less important content, or if you have a lot of duplicate content, you might want to prevent Google from crawling the duplicate or less important pages in order to focus on your more important content.

How to block content

Here are the main ways to block content from appearing in Google:

Methods
Remove the contentFor all content typesRemoving content from your site is the best way to ensure that it won’t appear in Google Search, or anywhere. If the information is already appearing in Google, you might need to take additional steps to make your removal permanent.
Password-protect your filesFor all content typesIf you have confidential or private content that you don’t want to appear in Google search results, the simplest and most effective way to block private URLs from appearing is to store them in a password-protected directory on your site server. Googlebot and all other web crawlers are unable to access content in password-protected directories.Advanced users: If you’re using Apache Web Server, you can edit your .htaccess file to password-protect the directory on your server. There are a lot of tools on the web that can help you do this.
robots.txt and/or emergency image removal requestFor imagesUse robots.txt rules to block images.
noindex directiveFor web pagesnoindex is a technique that tells Google not to read your page or let it appear in Google search results. Your pages can still be linked to and visited through other web pages, or directly visited by users with a link, but the pages will not appear in Google search results. This method requires technical savvy, and may not be available if you use a content management system to host your site.
Opt out of specific Google propertiesFor web pagesYou can tell Google not to include content from your site in specific Google properties, rather than all Google properties.
nosnippet meta tagFor Search result snippetsAdd the <meta name="robots" content="nosnippet"> tag to your page’s HTML head section to prevent a snippet from appearing in Search. However, note that this can generate a confusing message in search results (“No information is available for this page”).

Remove existing content from Google

Learn how to Remove a page hosted on your site from Google.

Remove a page hosted on your site from Google

If you don’t own the page, see Remove your personal information from Google instead.

For quick removals, use the Removals tool to remove a page hosted on your site from Google’s search results within a day.

Protect or remove all variations of the URL for the content that you want to remove. In many cases, different URLs can point to the same page. For example: example.com/puppiesexample.com/PUPPIES, and example.com/petchooser?pet=puppiesLearn how to find the right URL to block.

Make your removal permanent 

Requests made in the Removals tool last for about 6 months. To permanently block a page from Google Search results, take one of the following actions:

  • Remove or update the content on your page. This is the most secure way to prevent your information from appearing in other search engines that might not respect the noindex tag. It also ensures that other people can’t access your page.
  • Password-protect your page. Limiting access to your page enables the right users to view your page, while preventing Googlebot and other web crawlers from accessing it.
  • Add a noindex tag to your page. A noindex tag only blocks your page from showing up in Google search results. Users and other search engines that don’t support noindex can still access your page.

Don’t use robots.txt as a way to block your page. Learn more.

Remove an image from search results 

Learn how to remove images that are hosted on your site from search results.

Remove information from other Google properties 

To remove content from other Google properties, search the help documentation for your product to learn how to remove it. For example:

How do I remove content from a site that I don’t own?

See the help article on how to Remove your personal information from Google.

Remove images hosted on your site from search results

Trying to remove images of yourself? See Remove your personal information from Google instead.

For quick removal, use the Removals tool to remove images hosted on your site from Google’s search results within hours.

For non-emergency image removal

To prevent images from your site appearing in Google’s search results, add a robots.txt file to the root of the server that blocks the image. While it takes longer to remove an image from search results than it does to use the Removals tool, it gives you more flexibility and control through the use of wildcards or subpath blocking. It also applies to all search engines, whereas the Remove URLs tool only applies to Google.

For example, if you want Google to exclude the dogs.jpg image that appears on your site at www.yoursite.com/images/dogs.jpg, add the following to your robots.txt file:

User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

The next time Google crawls your site, we’ll see this rule and drop your image from our search results.

Rules may include special characters for more flexibility and control. The * character matches any sequence of characters, and patterns may end in $ to indicate the end of a path.

To remove multiple images on your site from our index, add a disallow rule for each image, or if the images share a common pattern such as a suffix in the filename, use a the * character in the filename. For example:

User-agent: Googlebot-Image
# Repeated 'disallow' rules for each image:
Disallow: /images/dogs.jpg
Disallow: /images/cats.jpg
Disallow: /images/llamas.jpg

# Wildcard character in the filename for
# images that share a common suffix:
Disallow: /images/animal-picture-*.jpg

To remove all the images on your site from our index, place the following robots.txt file in your server root:

User-agent: Googlebot-Image
Disallow: /

To remove all files of a specific file type (for example, to include .jpg but not .gif images), you’d use the following robots.txt entry:

User-agent: Googlebot-Image
Disallow: /*.gif$

By specifying Googlebot-Image as the User-agent, the images will be excluded from Google Images. If you would like to exclude the images from all Google searches (including Google Search and Google Images), specify the Googlebot user agent.

How do I remove images from properties that I don’t own?

See the Google Search help documentation on how to remove an image from search results.

When publishing documents and images on the web, you may unintentionally publish information beyond what is immediately visible to the human eye. In particular, information that you might not see, or that was intended to be redacted, might be included in some document formats and visible to search engines.

Because search engines index public material on the web, including images, content that is not completely redacted can potentially be findable in search engines. Assistive technologies like screen readers can make this seemingly “hidden” content more easily accessible, and common image understanding techniques like optical character recognition (OCR) similarly make it possible to search for this content.

Even though putting text in a tiny font, using a font color that’s the same as the background the text is on, or covering text with an image may make something invisible to the human eye, these methods don’t actually redact material in a way that prevents search engines from indexing it and making it findable.

Similarly, some document types include information in various ways that aren’t immediately visible. They might include the document’s change history, allowing users to see text that has been redacted or altered. They might retain the full versions of images that contain cropped or redacted information. There might also be metadata that’s included in a file, which is not immediately visible, that may list the names of people who accessed or edited the file.

All of this information can remain even when a document is exported or converted from one format to another. If you need to remove information from a file, it’s critical that the information is removed completely from the file before that file is made public.

Here are some best practices for how to appropriately redact information from documents that you don’t want to be indexed and made discoverable via Google Search.

Edit and export images before embedding them

Google Search lists images that it finds across the web, both those that are on web pages or those that are embedded into various document formats. Embedded images are sometimes edited using only the containing document’s editing tools. This can cause this redaction to fail when an image is indexed apart from the document. That is why it’s best to edit images before embedding them into a document, not after. In particular:

  • Crop out unwanted information from images before embedding them into documents. Some document editing tools (such as word processors or slide creation tools) will maintain any uncropped images that you use in the public version of the document, so be sure to review the tool’s documentation thoroughly.
  • Completely remove or obscure any text or other non-public parts of the image, as OCR systems may turn any image text seen into searchable text.
  • Remove any undesired metadata.

After following the suggestions in this document, export or save the updated images as non-vector or flattened image file formats such as PNG or WEBP. This prevents those parts of the images from being inadvertently included in a public document.

Edit or remove unwanted text before moving to a public file format

Before you generate the public document, remove any text that you don’t want displayed in the final version of the file. Move to a public format that does not keep your previous change history. Here are more specific tips:

  • Use proper document redacting tools if a file needs to have information redacted. For example, avoid placing black rectangles over text as a redaction method, as this can result in the text still being included in the public document.
  • Double-check the document metadata in the public file.
  • Follow the document redaction best practices for the format that you are using (PDF, image, etc).
  • Consider information in the URL or file name itself. Even if a part of a website is blocked by robots.txt, the URLs may be indexed in search (without their content). Use hashes in URL parameters instead of email addresses or names.
  • Consider using authentication to limit access to the redacted content. Serve the resulting login page with a noindex robots meta tag to block indexing.
  • When publishing, make sure that the website is verified in Google Search Console. This allows quick removal action, if needed.
  1. Remove the live document from the website or location where you published it.
  2. Use the Removals tool for the verified site to remove the documents in question from Search. Use a URL prefix if you need to remove many documents. For verified sites, a URL removal generally takes less than a day. This prevents the document in question from appearing for any searches for redacted content.
  3. Host the properly redacted document under a different URL. This makes sure that any newly indexed version is of the new document, and not an older version of the document (since recrawling of URLs and updating them in a search index can take a bit of time). Update any links to those documents.
  4. Contact any other site that may also be hosting the improperly redacted documents and ask them to take them down as well. Ask them to use the Removals tool in their Search Console account, or you can use the Outdated Content tool to ask Google’s systems to update the search results.
  5. Allow the URL removal requests to expire (this happens after the URLs were either updated in the Google Search index, or after about 6 months).

Author

alienroad