Handling Content Migration with 301 Redirects from a 404 Error Page

 
Published on 2007-02-11 by John Collins.

While migrating the content of the old version of this site from the existing system to the Alpha CMS, the main issue I encountered was a massive rise in the amount of 404 (file not found) errors on the website. When a site has been up since 2001 like this one, it gets a lot of links from other websites directly to the URLs of articles, rather than to the homepage, hence when these URLs changed due to the underlying content management system changing, I had a big problem.

This problem is actually made up of two equally important issues:

  1. How to get humans to find the content they are after, even though it is in a new location.
  2. How to get bots to find the content, and also hopefully update their databases of URLs to point to the new locations instead for future requests.

Luckily there is a standard HTTP directive that can solve these issues for us, the 301 directive which re-directs a request to a new (permanent) location for the requested resource. In this tutorial, I will show you how you can use this directive from a custom 404 error page, which will discover the URL which the user has requested and re-direct to the new location for that resource automatically.

The Code

Here is the code in its entirety. The file is a PHP script (404.php) which is set up on Apache to be the page to display when a 404 error occurs.

<?php
 
// get the URL of the requested file
$missing_file = $_SERVER["REQUEST_URI"];
 
// an array of old to new URL mappings for the re-directs
$URL_mappings = array(
 
    // as an example mapping, here is the old and new URLs for the "about" page
    "/about.php" => "/alpha/controller/view_article.php?oid=00000000065",
 
    // and another mapping for the links page
    "/links.php" => "/alpha/controller/view_article.php?oid=00000000066"
);
 
/*
 * now check the array keys to see if the requested (missing) resource exists, 
 * and if it does then re-direct to that resources new location.
 */
if (array_key_exists($missing_file, $URL_mappings)) {
 
    // set the correct HTTP header for the response
    header("HTTP/1.1 301 Moved Permanently");
 
    /*
     * Re-direct, notice how I am combining environment variables and the 
     * relative paths of the $URL_mappings array to form the final URL used.
     */
    header("Location: http://".$_SERVER["HTTP_HOST"].$URL_mappings[$missing_file]);
 
    // we're done here
    exit();
}else{
// else echo out a standard 404-error page here with your custom content
 
echo <<<END
 
<html>
 
<head>
 
<title>404 Error - File Not Found!</title>
 
</head>
 
<body>
 
<h1>404 Error - File Not Found!</h1>
 
<p>The file that you are looking for could not be found!</p>
 
</body>
 
</html>
 
END;
 
}
 
?>

The key component of this script is the $URL_mappings array, which is an associative (hash) array which uses the old URL for a resource as the key, and the new URL for a resource as the value, so that all we need to do is a call to the PHP function array_key_exists() to see if a mapping exists for the requested resource, like so:

if (array_key_exists($missing_file, $URL_mappings)) {
    // do the re-direct...
}

The $missing_file in question is simply the file which was requested by the user, which is available in the environment variable $_SERVER["REQUEST_URI"]. The work flow of this script looks like the following:

  1. Request for non-existent resource re-directed to 404.php by Apache.
  2. Get the URL of the requested resource and store it in $missing_file.
  3. Check the $URL_mappings array for the $missing_file.
  4. If $missing_file is in the array, re-direct to the new location.
  5. If $missing_file is not in the array, display a standard "file not found" message to the user as normal.

Conclusion

The script presented here is very simple but very powerful, and will save you a lot of trouble if you have a lot of content to migrate, which was the position I found myself in recently. It is good for search engine bots in particular, because rather than bots being hit with lots of 404 errors from your site all of a sudden, you can intelligently re-direct the bots to the new content, while also encouraging them to update their internal lists of URLs for your website. People will also like this method, as if they are kind enough to link to your content you do not want to discourage them from doing so by sending them a 404 error page, but instead will honour their link or bookmark by re-directing them to what they expect to see.


Updated 2021 : note that the above post is out-of-date, given this post was originally published in 2007, but is left here for archival purposes.