It’s become increasingly rare to see web page addresses with a file extension, such as “.html”. It’s a good idea to clean these up, and it’s easy to do.
In the early days of the web, most pages were crafted by hand. You wrote all the markup in a text editor, uploaded them using an FTP program, and viewed them in a web browser.
Nowadays most pages are stored in a database. When requested, the site’s code fetches the database entry and presents it to the browser via a template. The web address (URL) is no longer a page as such.
Although the two approaches are fundamentally different, the end user may perceive only one difference. A 1990s web page might have an address like this: www.mysite.com/this-is-a-page.html while its modern counterpart will have an address like this: www.mysite.com/this-is-a-page
Let’s consider that for a moment.
Advantages of “clean” web addresses
There are a couple of reasons to favour the more modern version
It's cleaner
Web addresses without an extension are less cluttered and easier to read
It's easier to maintain
The first version implicitly specifies the page technology, in the same way that a file ending in ".jpeg” is always an image. If you ever change your server technology, you’ll have a bunch of URLs that need to be changed and to have redirects put in place. It’s better for your URLs to be technology-agnostic in the first place.
The cleanup
If your site already has lots of pages with an unwanted extension, cleaning it up is easier than you might think.
Most web hosts have a file named .htaccess which allows you to specify rules for how your site can be accessed. Although the syntax is cryptic, the following two lines will allow you to omit the extension, so that an address ending in this-is-a-page will actually request this-is-a-page.html.
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(.+?)/?$ $1.html [L]
That was easy! However, you shouldn’t stop there.
Case study
My own site – the one you’re on now – is unusual in that it’s a mixture of static PHP pages (ending with the .php extension) and database-driven pages, none of which have an extension.
To clean up the .php pages so that their URLs matched the database-driven pages, I added a sequence of rules to the .htaccess file, as follows.
Skip selected pages
All content management systems (CMS) have static pages, usually organised in standard subdirectories. In a CMS such as WordPress or Drupal, these pages will end in .php. To avoid possible problems, I used the following rule first:
RewriteRule ^(web/)?(update|install|rebuild)(\.php)?/?$ - [L]
This rule says “allow all .php pages that match this pattern to pass through without further processing.” None of these pages are public-facing, so I saw no benefit in risking unwanted side effects caused by renaming them.
Skip pages that may already exist as directories
Wait, what? Let’s say my site has a subdirectory named about, at the same level as a page named about.php. If I remove the “.php” from the page, now I have two assets on my site named about, one a directory and one a page.
To avoid this problem, I add another pass-through rule that says: in all such situations, just let the request for the .php through without further processing. Here's how that looks:
RewriteCond %{REQUEST_FILENAME} ^(.+)\.php$ [NC] RewriteCond %1 -d RewriteRule ^ - [L]Redirect “old” to “new”
This is important, albeit a little confusing. The purpose of this exercise is to allow all requests for this-is-a-page to load this-is-a-page.php. However, a request for this-is-a-page.php will still load this-is-a-page.php.
To capture the benefits of this exercise, we need to ensure that any requests for this-is-a-page.php that have passed through the initial checks will redirect to this-is-a-page. Here's how that looks:
RewriteCond %{THE_REQUEST} \s(/[^?\s]+?)\.php(?:[\s?]|$) [NC] RewriteCond %{REQUEST_FILENAME} -f RewriteRule ^ %1 [R=302,L,NE]This is a redirect, which is to say, it causes your browser to load the page again, this time with the cleaned-up version of the URL.
The “R=302” is important. The 302 tells the browser that this is a temporary redirect. After you’ve tested everything, you should change the 302 to 301, which represents a permanent redirect. Search engines pick this up, and will remove the old URL from their index and replace it with the new one.
Why not just make it a 301 redirect to begin with? Because browsers have a habit of clinging tenaciously to 301s so that, if you need to reconfigure the rules, the old redirect may remain in place and may be difficult to override. It’s best to add a small extra step to spare yourself that aggravation.
The effect of this redirect is that, not only do you no longer need to use the extension when you link to your pages, you'll also ensure that any requests for the old version will get redirected to the new. With this rule, all traces of the old extension will be removed from your site.
Putting it all together
Putting all those rules together and wrapping them in a conditional to ensure that your server has access to the rewrite rules, we end up with something like the example below. Note that, although these rules should run in the prescribed sequence, your site may have its own rules that go before or after these.
Be sure to test this thoroughly!
<IfModule mod_rewrite.c>
RewriteEngine On
# Optionally, other rules go here …
# Bypass all .php files used by content management system
RewriteRule ^(web/)?(update|install|rebuild)(\.php)?/?$ - [L]
# Bypass potential name clashes
# If /foo.php is requested but /foo is a directory, do NOT redirect.
RewriteCond %{REQUEST_FILENAME} ^(.+)\.php$ [NC]
RewriteCond %1 -d
RewriteRule ^ - [L]
# Otherwise, redirect /.../foo.php -> /.../foo
# Test thoroughly before changing 302 to 301
RewriteCond %{THE_REQUEST} \s(/[^?\s]+?)\.php(?:[\s?]|$) [NC] RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ %1 [R=302,L,NE]
# Serve /path/to/page from /path/to/page.php (internal rewrite)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+?)/?$ $1.php [L]
# More rules (typically standard content management system rewrites)
</IfModule>
Test it out
All pages on this site have this rewrite and redirect in place. Note that the link below has the “real” filename – this-is-a-page.php – yet when you click on the link, the page that loads is the extensionless version.
More information
- Apache .htaccess tutorial (Official Apache guide to .htaccess basics)
- MDN (MDN Apache Configuration: .htaccess)
- AskApache (.htaccess Examples: Cookies, Variables, Custom Headers)