If you want to know about the difference between crawling and indexing, read this post at the end.
On this occasion, I talk to you about something that some people I have seen have doubts about or are not entirely clear about. I also wanted to talk about this topic, so here it is, a post about the differences between crawling and indexing in Google.
So Let’s start!
What is tracking for Google?
Google is a search engine, and as such, it needs to show the results of web pages for the searches carried out by its users. These results are found by tracing the network, that is, by jumping from one link to another.
In the same way, Google will not only track those URLs once but will do so more times later to check if the content has changed, a new one has been added, it still exists, etc.
What is indexing for Google?
That Google tracks a URL does not mean that it indexes it because after tracking it, it must analyze it and then decide whether to index it or not.
Simplifying everything could say that Google works through keywords and search intentions; to find any URL, the user must enter a keyword in the search engine.
Therefore the indexing of a URL implies the association of keywords or search intentions. Search for it. These could vary over time.
How to avoid Google tracking
To avoid tracking a URL by Google, we must use the robots.txt file.
In this file, we will use the disallow directive so that Google cannot track a specific area of our website. As it is a directive, when Google reads our robots.txt, it will have to pay attention to it yes or yes.
Blocking the crawling of a URL does not imply blocking its indexing; that is, Google may not read the content of that URL as it cannot crawl it, but that does not mean that it does not know of the existence of that URL due to both internal and external links.
Therefore, a URL that is blocked from crawling will still be indexed by Google.
How to avoid indexing in Google
To prevent the indexing of a URL in Google, we have to use the robot’s no index meta tag.
This label is a directive; as soon as Google sees it, it has to pay attention to it (it is not an option).
If the URL is already indexed in Google, it will not be instantly de-indexed because Google has to read that tag first.
We can wait until Google goes through that URL or try to speed it up in one of the following two ways:
-In the Google Search console, you enter the URL in the URL inspector, click “test published URL” (this step is more of a hobby of mine ), and then “request indexing.”
-As with the no index, if we want to remove that URL from the index, we can directly in Google Search Console go to the “remove URLs” option and request its removal.
When to block Google tracking
The limitation of crawling for Google is done to avoid spending search engine resources and crawling budget “foolishly.”
This can be useful to limit the Google bot to crawl certain URLs such as those of parameters or even for specific cases such as if Google continues to crawl areas that no longer exist on our website.
Using the same example, if there was a part on our website that no longer exists, we have not redirected it to another location.
We have only deleted it and checked in the logs that Google continues to track those URLs repeatedly (which could be due to external links); one solution would be to block the crawling of that URLs (s).
When to block Google indexing
As a general rule, we will avoid indexing URLs that we do not want to position, that is, URLs that do not have any value in terms of SEO, because if they are not going to position in Google, why do we want them to be there?
In the same way, the URLs that can be considered as duplicate content, thin content, etc., that, for some reason, we cannot eliminate or because we must leave for user usability.
Crawl and index lock
One mistake I’ve seen many times is to apply meta robots no index and disallow robots.txt to a directory or URL. After a while, they wonder, why isn’t Google removing those URLs? Google does what it wants!
In those cases, it is normal that it does not delete them since, to do so, it must see the robot has no index meta tag, but it cannot see it because you have blocked its crawling by robots.txt!
As you have been able to verify, crawling and indexing are two very different things; therefore, it is important to know how to differentiate one from the other to act correctly according to our needs.