Googlebot and Site Redirects

Posted by reto on 02 October, 2003 19:45

At a first glance it seems like there is nothing on the web, that can hide from beeing indexed by Google. Not only html but twelve(!) other filetypes are getting indexed at the moment.
But Google is much pickier as one could assume, whereas its reasons are evident and reasonable:

1. Redirects
Googlebot (Google's Spider) doesn't follow the "http/1.1 302 Found" status code (resource temporarily moved). Instead you should use a "http/1.1 301 Moved Permanently" header to make Google follow the redirect.
To make the long story short: If you're using PHP to do the redirect (and many are using PHP these days) you should add the status code header manually because PHP sends a 302 Found status code by default.

This stops Google and therefore is only usefull if your site is really under maintenance at the moment:

<?php
    header
('Location: http://www.foo.com/bar/');
?>


This makes Google follow the redirect and index the Site: 

<?php
  header
('HTTP/1.1 301 Moved Permanently');
 
header('Location: http://www.foo.com/bar/');
?>


If you prefer to do the redirects within an .htaccess file (on Apache, of course), you could do it like this. Every Request to foo.com/ is redirected to foo.com/bar/: 

 #Redirect (this will result in a 301 permanently moved status code)
 RedirectMatch permanent ^/$ http://www.foo.com/bar/


I expect it's faster and less resource intensive to set up an .htaccess file because there is no need to parse any php code at all. Though it won't matter in most cases anyway. (untested assumption)

2. Sessions
Google doesn't follow links with a session attached. If you've enabled session.use_trans_sid in your php.ini you should check if Google is requesting the page. If your site displays fine without the use of sessions simply don't start one if google is visiting. ;-)

<?php
    
// session is not started to serve google
    
if( stristr$_SERVER['HTTP_USER_AGENT'], 'google') === false )
    {
        
session_start();
    }
?>


Add as many search engine bots as you like. A more sophisticated method (like regular expressions) is not needed here, but would of course work, too.

Categories

Links

Recently...

Recent Comments

Feed URL

Archives

Syndicate

Useless Info

Bad Behavior has blocked 72 access attempts in the last 7 days.