Filesystem Reorganization
Description:
This really is a hardcore example: a killer application which heavily
uses per-directory
RewriteRules to get a smooth look and feel on the Web while its data structure is never
touched or adjusted. Background: net.sw is my archive of freely available Unix software
packages, which I started to collect in 1992. It is both my hobby and job to to this, because
while I'm studying computer science I have also worked for many years as a system and
network administrator in my spare time. Every week I need some sort of software so I created a
deep hierarchy of directories where I stored the packages:
drwxrwxr-x 2 netsw users 512 Aug 3 18:39
Audio/
drwxrwxr-x 2 netsw users 512 Jul 9 14:37
Benchmark/
drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/
drwxrwxr-x 5 netsw users 512 Jul 9 00:41
Database/
drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/
drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/
drwxrwxr-x 5 netsw users 512 Jul 9 01:58
Hackers/
drwxrwxr-x 8 netsw users 512 Jul 9 03:19
InfoSys/
drwxrwxr-x 3 netsw users 512 Jul 9 03:21
Math/
drwxrwxr-x 3 netsw users 512 Jul 9 03:24
Misc/
drwxrwxr-x 9 netsw users 512 Aug 1 16:33
Network/
drwxrwxr-x 2 netsw users 512 Jul 9 05:53
Office/
drwxrwxr-x 7 netsw users 512 Jul 9 09:24
SoftEng/
drwxrwxr-x 7 netsw users 512 Jul 9 12:17
System/
drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/
drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/
In July 1996 I decided to make this archive public to the world
via a nice Web interface.
"Nice" means that I wanted to offer an interface where you can browse directly through the
archive hierarchy. And "nice" means that I didn't wanted to change anything inside this
hierarchy - not even by putting some CGI scripts at the top of it. Why? Because the above
structure should be later accessible via FTP as well, and I didn't want any Web or CGI stuff to
be there.
Solution:
The solution has two parts: The first is a set of CGI scripts which
create all the pages at all
directory levels on-the-fly. I put them under /e/netsw/.www/ as follows:
-rw-r--r-- 1 netsw users 1318 Aug 1 18:10
.wwwacl
drwxr-xr-x 18 netsw users 512 Aug 5 15:51
DATA/
-rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE
-rw-r--r-- 1 netsw users 659 Aug 4
09:27 TODO
-rw-r--r-- 1 netsw users 5697 Aug 1 18:01
netsw-about.html
-rwxr-xr-x 1 netsw users 579 Aug 2
10:33 netsw-access.pl
-rwxr-xr-x 1 netsw users 1532 Aug 1 17:35
netsw-changes.cgi
-rwxr-xr-x 1 netsw users 2866 Aug 5 14:49
netsw-home.cgi
drwxr-xr-x 2 netsw users 512 Jul 8
23:47 netsw-img/
-rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi
-rwxr-xr-x 1 netsw users 1589 Aug 3 18:43
netsw-search.cgi
-rwxr-xr-x 1 netsw users 1885 Aug 1 17:41
netsw-tree.cgi
-rw-r--r-- 1 netsw users 234 Jul 30 16:35
netsw-unlimit.lst
The DATA/ subdirectory holds the above directory structure, i.e.
the real net.sw stuff and
gets automatically updated via rdist from time to time. The second part of the problem remains:
how to link these two structures together into one smooth-looking URL tree? We want to hide
the DATA/ directory from the user while running the appropriate CGI scripts for the various
URLs. Here is the solution: first I put the following into the per-directory configuration file in the
DocumentRoot of the server to rewrite the announced URL /net.sw/ to the internal path
/e/netsw:
RewriteRule ^net.sw$ net.sw/
[R]
RewriteRule ^net.sw/(.*)$ e/netsw/$1
The first rule is for requests which miss the trailing slash! The
second rule does the real
thing. And then comes the killer configuration which stays in the per-directory config file
/e/netsw/.www/.wwwacl:
Options ExecCGI FollowSymLinks Includes MultiViews
RewriteEngine on
# we are reached via /net.sw/ prefix
RewriteBase /net.sw/
# first we rewrite the root dir to
# the handling cgi script
RewriteRule ^$
netsw-home.cgi [L]
RewriteRule ^index\.html$
netsw-home.cgi [L]
# strip out the subdirs when
# the browser requests us from perdir pages
RewriteRule ^.+/(netsw-[^/]+/.+)$ $1
[L]
# and now break the rewriting for local files
RewriteRule ^netsw-home\.cgi.* -
[L]
RewriteRule ^netsw-changes\.cgi.* -
[L]
RewriteRule ^netsw-search\.cgi.* -
[L]
RewriteRule ^netsw-tree\.cgi$
-
[L]
RewriteRule ^netsw-about\.html$ -
[L]
RewriteRule ^netsw-img/.*$
-
[L]
# anything else is a subdir which gets handled
# by another cgi script
RewriteRule !^netsw-lsdir\.cgi.* -
[C]
RewriteRule (.*)
netsw-lsdir.cgi/$1
Some hints for interpretation:
1. Notice the L (last) flag and no substitution
field ('-') in the forth part
2. Notice the ! (not) character and the C (chain)
flag at the first rule in the last part
3. Notice the catch-all pattern in the last rule