Indexing goes hand in hand with tracking and forms the basis on which web positioning is based.
The direct consequence of indexing is that a URL is part of the indexes of the different search engines.
We are going to refer, as we usually do, to Google as a way of generalizing the different search engines since its use is the most widespread.
Without crawling, there is no indexing, and without indexing, there is no positioning.
Let’s look at all this in a little more detail.
What is the indexing of a web page?
Suppose you are wondering what is an index a website is. In that case, I want to clarify: Web indexing is a key factor in an SEO process because our management will depend on what and how much we position (and if we are interested in positioning what we position). )
When I talked about what SEO is, I said that its bases were based on the tracking and indexing of the web.
I leave you this video to see the difference and complementarity of both concepts.
Crawling makes the Google bot enter our web page and understand it.
After the above, it puts it in the queue for indexing.
Therefore we have to make sure that the web is accessible to search engines and, in addition, we must have as indexable only those pages that we want to position.
It is the indexable pages that can show in the SERPS.
What pages should be part of the indexing?
The issue is simple: we should only index those pages as part of our marketing strategy. Those pages that have a strategic component within our online business are the ones that will be part of the indexing.
For example, the best professionals in sector X, a post that is usually made to improve the dissemination of an article, is a post that does not need to be part of the indexing since its objective is not to rank for any keyword but to achieve great brand awareness.
Another type of page that we are interested in that is crawlable are the internal ones of a membership or the “sitemap” pages whose function is to facilitate the crawling of search engines.
Therefore, we may have a web page with many pages, and that number does not correspond to the pages that Google has indexed.
What factors affect indexing in Google?
How we have seen the process is crawling and indexing. After those two steps, we would already be part of the SERPs. As a result of this process, we can identify 3 factors that affect indexing :
- Discoverability: The ability to discover that content or URL would affect tracking. The Internet is known as a network because it is an interconnected system where we go from one site to another, discovering new sites. Search engine bots do the same thing, so that interconnection created through links facilitates the next factor affecting Google indexing.
- Tracking: to be indexed in Google, you must first discover us, and our website must be crawlable for search engines. It would not be the first time a client has come to me that was not on Google, and the first thing I find is that it is not trackable because they forgot to uncheck the “dissuades search engines” when they made the web. Therefore, Web indexing was impossible by not allowing Google bots to access the page.
- Indexing: Analogously to the above, if our page that allows tracking is specifically marked as not indexable, we have ballots that that URL is not part of the SERPs since it does not allow SEO indexing.
It is very common to find cases in which indexing is not allowed. Still, tracking is very common, especially with useless pages in our company strategy, such as the Cookies policy.
On the one hand, we tell the bots: “come in, welcome to my house,” and once inside, you don’t tell them that it is better to stay outside.
In this type of “conflict,” Google usually indexes that page. That’s why it’s important to understand what crawling and indexing are all about.
differences between crawling and indexing in web positioning
The issue is that since it is not known, nobody has discovered it; it is as if it did not exist. The same thing happens with your page: if search engines don’t understand it, they don’t discover it’s as if it didn’t exist.
How can we check the indexing in google?
One of the things that we must take into account about indexing is to be clear about what we want Google to show off our website in the search results.
That means it is not enough to see all the pages indexed. We must check which pages are being indexed because they interest us.
We only want to the index what we want to position.
We only want to rank those pages whose search terms, or the search intent it satisfies, help us get leads.
We will see two ways to check if our website is being indexed and what is being indexed on our page.
See pages indexed by Google
In addition to being a search engine, Google can become a useful tool to extract information from various areas that affect web positioning, including those related to indexing, thanks to its commands.
Use the site command: it will show us all the website pages Google has indexed. To use this command, you have to put in the search box «Site: yourdomain.com,» omitting the www. and no spaces.
See indexed pages in Search Console
An SEO’s best friend is Search Console. Spot.
Thanks to the Search Console, we can see our web page’s indexing level, checking the indexing through the coverage report.
Does Google not index my website?
It may be that your website does not appear in Google, and it may be because it does not meet the three indexing requirements (discovery, crawlability, and indexability). Some factors make it difficult or prevent your web page from indexing by search engines.
Factors that hinder web indexing
- The use of technologies or programming languages that are not crawlable by Google, such as Flash, Ajax, and Javascript, Is not crawlable. It will not be indexable.
- Use of Iframes (frames). Iframes are HTML tags that we use to embed the content of a document on a page. Google cannot read the content displayed by iframes and therefore will not index it.
- Internal link. If pages within our website lack internal or external links and are not included in the sitemap, that page will be difficult for Google to track. It is what is known as orphan pages.
- Links with Rel=” Nofollow.” This HTML tag indicates to the google bots is not to access that link or pass link juice to it.
- Poorly designed web architecture with a high crawl depth makes it difficult to find pages and therefore makes crawling difficult.
Redirects and error codes. The codes that the servers return can also affect the indexing of web pages.
- 200 (OK): all ok
- 301 (permanently moved): the content of a URL has been permanently moved
- 302 (Found): A page is temporarily located at another URL. If it lasts long, Google interprets it as a 301 redirect.
- 307 (temporary redirect): The original URL is not found and is directed to a different resource. Be careful because this response code does not always have to be wrong since it is the type of redirection that is usually done to migrate from HTTP:// to HTTPS://
- 404 (not found): the URL we want to access is not found.
- 410 (no longer available): the page we tried to access has been removed
- 500 Five hundred errors are server errors when the server cannot respond to the request. When the error persists for a long time, Google begins to deindex the content affected by this 5XX error.
How can I control the indexing of my web page?
We can and should control (or try to) the indexing of our website thanks to the technical SEO tools: Robots.txt and sitemap.xml
Robots.txt
The robot’s txt is a text file that we place in the domain’s root.
This robot’s file includes a series of rules that help us control tracking: limiting access to some bots and restricting access to certain pages.
When a search engine bot lands on a web page, the first thing it looks at is the robots.txt file to crawl the page according to specified rules.
At least in theory.
Sitemap.xml
The sitemap is a document that collects all the URLs of our page that we want search engines to index.
It includes a list of the URLs that make up the various areas of our website and the date and frequency with which that content is updated.
In the sitemap, we should only include the pages that we want to position in the different search engines. Everything whose positioning does not achieve our business objectives refers to.
some phase of our sales funnels; better not include it in the sitemap.
We can create our sitemaps and manage them easily with the famous Yoast SEO or Rank Math, the most used SEO plugins.
If you want to check if the sitemap you have created is correct, you can use Search Console again.
Meta robots tag
It is an Html tag that allows us to indicate to the different search engines if a page should be indexed. The meta robots tag is important because it is mandatory for search engines.
We can also use this meta robots tag so that they specifically refer to a specific crawler (or several)
X-robots-tag
Very similar to how the meta-robots tag works.
Because it’s in the header of the papers in this situation, we’ll have to access some files on our server. Thanks to the X-robots-tag, we have a greater configuration capacity. We can use regular expressions applicable to pages or files that share some particularities; in this way, we can redefine the given directives.
rel=” canonical”
The rel=” canonical” tag helps us handle the problem of duplicate content. Thanks to this tag, we indicate to Google which URL we want to be tracked and indexed so that it takes it as preferred.
Paginations: rel=prev/next
When there is a lot of content because it is an eCommerce and there are many products or articles because it is a blog, it is usually divided into different pages (as if it were a book). To manage it, we can do several things:
- Make an infinite scroll in which there are no paginations
- Establish a relationship between the different paginations (page 1 of n)
Conclusion on Web indexing
Indexing will allow a web page to appear on the Google results page (also known as SERPs), and our pages must be crawlable and indexable. In other words, indexing is the prelude to a website being visible on the Internet.
In addition, it is so important that it be crawlable and indexable. It knows what aspects make web indexing difficult to avoid or minimize to the best of our ability.
We have a lot of tools that allow us to have more control over the indexing: the robots.txt, the sitemap file, the meta robots tag, or the X-robots tag are just some of them, although we have others (canonical tags, paginations, or the hareflang markup which we haven’t talked about in this post)
Like everything that intervenes in a business, to know if something is working, we must control or measure it. To achieve this, we have a Search Console that warns us of possible indexing problems thanks to its coverage reports.
Just as important is the sitemap tester in the google search console that will allow us to check if our sitemap meets our needs.