Page 1 of 1
Robots.txt: disallow a page
Posted: Wed Aug 28, 2024 7:01 pm
by bzc0fq@gmail.com
Is it possible to disallow a single page from a project?
Let's say that robots.txt looks like that:
User-agent: *
Allow: /
Allow: /aaa/
Allow: /bbb/
Allow: /ccc/
I would like it to be like that:
User-agent: *
Allow: /
Allow: /aaa/
Disallow: /aaa/a1.php
Allow: /aaa/a2.php
Allow: /bbb/
Allow: /ccc/
Could someone please advise on how can this be done?
Thanks
Re: Robots.txt: disallow a page
Posted: Wed Aug 28, 2024 7:32 pm
by BaconFries
Have you read the following
Adding robots.txt to your website reading from Pages and Folders
Under 'Pages and Folders' you can override rules for individual pages and folders
Re: Robots.txt: disallow a page
Posted: Wed Aug 28, 2024 7:53 pm
by bzc0fq@gmail.com
I have read this tutorial and set rule to 'disallow index, disallow follow' for a page, but robots.txt has not changed - no pages are listed, just folders within newly generated robots.txt file.

Re: Robots.txt: disallow a page
Posted: Wed Aug 28, 2024 8:14 pm
by Pablo
Note that some options do not affect robots.txt. For example, ''disallow index, disallow follow'' controls meta tags of the page, those are not robots.txt options.
Rule -> Allow index/disallow index for files and folders is added to robots.txt
The reason why all these options are combined in this dialog, is that they can be set from one place. But they can also be set via the page properties for each page.
Re: Robots.txt: disallow a page
Posted: Wed Aug 28, 2024 8:35 pm
by bzc0fq@gmail.com
OK, but I cannot set "allow/disallow index" for pages, only "allow/disallow index, allow/disallow follow" and "not set".
I can set "allow/disallow index" only for folders....
What am I doing wrong?
Re: Robots.txt: disallow a page
Posted: Thu Aug 29, 2024 5:43 am
by Pablo
1. Select the page in the site tree
2. Select the rule
Re: Robots.txt: disallow a page
Posted: Thu Aug 29, 2024 6:35 am
by bzc0fq@gmail.com
OK... please let me explain... I will use the example from this tutorial:
https://www.wysiwygwebbuilder.com/robots_txt.html
I followed the steps:
1. choose a PAGE in a project (Pages and Folders under the Website tree - done - easy

)
2. set Rule for the page: here I have exactly the same rules that are shown in the tutorial ("allow/disallow index, allow/disallow follow" and "not set" for the PAGES), so as I understand, after your comment - this is NOT possible to create an entry like:
DISALLOW: /index.php, right?
Whatever I do, I cannot create an entry within robots.txt that contains a page definition

- or I am too stupid to do this!
I do need this to restrict robots from indexing certain pages

Re: Robots.txt: disallow a page
Posted: Thu Aug 29, 2024 6:46 am
by Pablo
"allow/disallow index, allow/disallow follow" and "not set" for the PAGE.
For pages, this option controls meta tags not the robots.txt.
Code: Select all
<meta name="robots" content="noindex, nofollow">
robots.txt sets the global rules
meta tags override the rules for individually pages.
Re: Robots.txt: disallow a page
Posted: Thu Aug 29, 2024 7:11 am
by bzc0fq@gmail.com
should I read it in this way that I
do not need page entry in robots.txt because robots get the information in different way?
it makes sens

if so...
Re: Robots.txt: disallow a page
Posted: Thu Aug 29, 2024 8:17 am
by Pablo
Correct, the meta tags in the page overrride the information in robots.txt Therefor the generated robot.txt does not include the same information.
Re: Robots.txt: disallow a page
Posted: Thu Aug 29, 2024 8:30 am
by bzc0fq@gmail.com
Perfect...
Thank you for the explanation
