One of the tools for managing the indexing of sites by search engines is the robots.txt file. It is mainly used to prevent all or only certain robots from downloading the content of certain page groups. This allows you to get rid of "garbage" in the search engine results and, in some cases, significantly improve the ranking of the resource. It is important to have the correct robots.txt file for successful application.
Necessary
text editor
Instructions
Step 1
Make a list of robots for which special exclusion rules will be set or directives of the extended robots.txt standard, as well as non-standard and specific directives (extensions of a specific search engine) will be used. Enter into this list the values of the User-Agent fields of the HTTP request headers sent by the selected robots to the site server. The names of the robots can also be found in the reference sections of the search engine sites.
Step 2
Select the groups of URLs of site resources to which access should be denied to each of the robots in the list compiled in the first step. Perform the same operation for all other robots (an indefinite set of indexing bots). In other words, the result should be several lists containing links to sections of the site, groups of pages or sources of media content that are prohibited from indexing. Each list must correspond to a separate robot. There should also be a list of prohibited URLs for all other bots. Make lists based on the comparison of the logical structure of the site with the physical location of the data on the server, as well as by grouping the URLs of the pages according to their functional characteristics. For example, you can include in the deny lists the contents of any service catalogs (grouped by location) or all user profile pages (grouped by purpose).
Step 3
Select the URL signs for each of the resources contained in the lists compiled in the second step. When processing exclusion lists for robots using only standard robots.txt directives and undefined robots, highlight the unique URL portions of the maximum length. For the remaining sets of addresses, templates can be created in accordance with the specifications of specific search engines.
Step 4
Create a robots.txt file. Add groups of directives to it, each of which corresponds to a set of prohibiting rules for a particular robot, the list of which was compiled in the first step. The latter should be followed by a group of directives for all other robots. Separate rule groups with a single blank line. Each ruleset must begin with a User-agent directive identifying the robot, followed by a Disallow directive, which prohibits indexing URL groups. Make the lines obtained in the third step with the values of the Disallow directives. Separate the directives and their meanings with a colon. Consider the following example: User-agent: YandexDisallow: / temp / data / images / User-agent: * Disallow: / temp / data / This set of directives instructs the main robot of the Yandex search engine not to index the URL. which contains the substring / temp / data / images /. It also prevents all other robots from indexing URLs containing / temp / data /.
Step 5
Supplement robots.txt with extended standard directives or specific search engine directives. Examples of such directives are: Host, Sitemap, Request-rate, Visit-time, Crawl-delay.