URL Layout

October 20, 2008 at 2:40 pm Leave a comment

URL Layout

Canonical URLs

Description:
On some webservers there are more than one URL for a resource. Usually there are canonical URLs (which should be actually used and distributed) and those which are just shortcuts, internal ones, etc. Independed which URL the user supplied with the request he should finally see the canonical one only.

Solution:
We do an external HTTP redirect for all non-canonical URLs to fix them in the location view of the Browser and for all subsequent requests. In the example ruleset below we replace /~user by the canonical /u/user and fix a missing trailing slash for /u/user.

RewriteRule   ^/~([^/]+)/?(.*)    /u/$1/$2  [R]
RewriteRule   ^/([uge])/([^/]+)$  /$1/$2/   [R]

Canonical Hostnames

Description:

Solution:
RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST}   !^$
RewriteCond %{SERVER_PORT} !^80$
RewriteRule ^/(.*)         http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R]
RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
RewriteCond %{HTTP_HOST}   !^$
RewriteRule ^/(.*)         http://fully.qualified.domain.name/$1 [L,R]

Moved DocumentRoot

Description:
Usually the DocumentRoot of the webserver directly relates to the URL “/”. But often this data is not really of top-level priority, it is perhaps just one entity of a lot of data pools. For instance at our Intranet sites there are /e/www/ (the homepage for WWW), /e/sww/ (the homepage for the Intranet) etc. Now because the data of the DocumentRoot stays at /e/www/ we had to make sure that all inlined images and other stuff inside this data pool work for subsequent requests.

Solution:
We just redirect the URL / to /e/www/. While is seems trivial it is actually trivial with mod_rewrite, only. Because the typical old mechanisms of URL Aliases (as provides by mod_alias and friends) only used prefix matching. With this you cannot do such a redirection because the DocumentRoot is a prefix of all URLs. With mod_rewrite it is really trivial:

RewriteEngine on
RewriteRule   ^/$  /e/www/  [R]

Trailing Slash Problem

Description:
Every webmaster can sing a song about the problem of the trailing slash on URLs referencing directories. If they are missing, the server dumps an error, because if you say /~quux/foo instead of /~quux/foo/ then the server searches for a file named foo. And because this file is a directory it complains. Actually is tries to fix it themself in most of the cases, but sometimes this mechanism need to be emulated by you. For instance after you have done a lot of complicated URL rewritings to CGI scripts etc.

Solution:
The solution to this subtle problem is to let the server add the trailing slash automatically. To do this correctly we have to use an external redirect, so the browser correctly requests subsequent images etc. If we only did a internal rewrite, this would only work for the directory page, but would go wrong when any images are included into this page with relative URLs, because the browser would request an in-lined object. For instance, a request for image.gif in /~quux/foo/index.html would become /~quux/image.gif without the external redirect! So, to do this trick we write:

RewriteEngine  on
RewriteBase    /~quux/
RewriteRule    ^foo$  foo/  [R]

The crazy and lazy can even do the following in the top-level .htaccess file of their homedir. But notice that this creates some processing overhead.

RewriteEngine  on
RewriteBase    /~quux/
RewriteCond    %{REQUEST_FILENAME}  -d
RewriteRule    ^(.+[^/])$           $1/  [R]

Webcluster through Homogeneous URL Layout

Description:
We want to create a homogenous and consistent URL layout over all WWW servers on a Intranet webcluster, i.e. all URLs (per definition server local and thus server dependent!) become actually server independed! What we want is to give the WWW namespace a consistent server-independend layout: no URL should have to include any physically correct target server. The cluster itself should drive us automatically to the physical target host.

Solution:
First, the knowledge of the target servers come from (distributed) external maps which contain information where our users, groups and entities stay. The have the form

user1  server_of_user1
user2  server_of_user2
:      :

We put them into files map.xxx-to-host. Second we need to instruct all servers to redirect URLs of the forms

/u/user/anypath
/g/group/anypath
/e/entity/anypath

to

http://physical-host/u/user/anypath
http://physical-host/g/group/anypath
http://physical-host/e/entity/anypath

when the URL is not locally valid to a server. The following ruleset does this for us by the help of the map files (assuming that server0 is a default server which will be used if a user has no entry in the map):

RewriteEngine on

RewriteMap      user-to-host   txt:/path/to/map.user-to-host
RewriteMap     group-to-host   txt:/path/to/map.group-to-host
RewriteMap    entity-to-host   txt:/path/to/map.entity-to-host

RewriteRule   ^/u/([^/]+)/?(.*)   http://${user-to-host:$1|server0}/u/$1/$2
RewriteRule   ^/g/([^/]+)/?(.*)  http://${group-to-host:$1|server0}/g/$1/$2
RewriteRule   ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2

RewriteRule   ^/([uge])/([^/]+)/?$          /$1/$2/.www/
RewriteRule   ^/([uge])/([^/]+)/([^.]+.+)   /$1/$2/.www/$3\
Advertisements

Entry filed under: Script. Tags: , , , , , , , , , , , , , , , .

The Apache configuration htpasswd, md5, mod_rewrite, Rsultats and wages the fall at HP

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


October 2008
M T W T F S S
« Sep   Feb »
 12345
6789101112
13141516171819
20212223242526
2728293031  

%d bloggers like this: