robots.txt

.htaccess

Ojito si existe un .htaccess en la aplicación, ya que en vez de poner el archivo en la raíz del dominio, deberemos guardar el archivo en la correspondiente carpeta a la que redirija éste.

Por ejemplo en un proyecto realizado con CakePHP, el archivo irá en la carpeta

app/webroot/robots.txt

htaccess vs robots.txt vs noindex

Use robots.txt to control what you want robots.txt-compliant 'good' spiders to *fetch* – bandwidth control, in other words. If they are nice by asking, give them a polite robots.txt reply. I specifically said 'fetch' here, because that is what is accomplished. Some search engines, including some of the majors, don't need to fetch a page to list it in their results; They can create a search result based on links they find on other sites pointing to your page, and the link text associated with that link. I find this annoying, but that leads us to…

Use the on-page meta-robots tag to control what search engines *list* in their results. (If you mark a page as “noindex,” then you must allow it to be fetched in robots.txt – otherwise the spider can't fetch and read the page to find the robots meta tag.)

Use .htaccess to stop rogue spiders that don't fetch, or that fetch and ignore robots.txt, and to insure that good spiders don't wander into forbidden territory due to a bug in their code or an error in your robots.txt or on-page meta-robots tags.

The three methods are complementary, but in no way are any of them equivalent.

Doxbo

Table of Contents

robots.txt

.htaccess

htaccess vs robots.txt vs noindex

Doxbo

User Tools

Site Tools

Table of Contents

robots.txt

.htaccess

htaccess vs robots.txt vs noindex

Page Tools