Alan’s blog

August 8, 2010

Names, URIs and why the web discards 50 years of computing experience

Filed under: HCI and usability, academic, web development — alan @ 5:46 pm

Names and naming have always been a big issue both in computer science and philosophy, and a topic I have posted on before (see “names – a file by any other name“).

In computer science, and in particular programming languages, a whole vocabulary has arisen to talk about names: scope, binding, referential transparency. As in philosophy, it is typically the association between a name and its ‘meaning’ that is of interest. Names and words, whether in programming languages or day-to-day language, are, what philosophers call, ‘intentional‘: they refer to something else. In computer science the ’something else’ is typically some data or code or a placeholder/variable containing data or code, and the key question of semantics or ‘meaning’ is about how to identify which variable, function or piece of data a name refers to in a particular context at a particular time.

The emphasis in computing has tended to be about:

(a) Making sure names have unambiguous meaning when looking locally inside code. Concerns such as referential transparency, avoiding dynamic binding and the deprecation of global variables are about this.

(b) Putting boundaries on where names can be seen/understood, both as a means to ensure (a) and also as part of encapsulation of semantics in object-based languages and abstract data types.

However, there has always been a tension between clarity of intention (in both the normal and philosophical sense) and abstraction/reuse. If names are totally unambiguous then it becomes impossible to say general things. Without a level of controlled ambiguity in language a legal statement such as “if a driver exceeds the speed limit they will be fined” would need to be stated separately for every citizen. Similarly in computing when we write:

function f(x) { return (x+1)*(x-1); }

The meaning of x is different when we use it in ‘f(2)’ or ‘f(3)’ and must be so to allow ‘f’ to be used generically. Crucially there is no internal ambiguity, the two ‘x’s refer to the same thing in a particular invocation of ‘f’, but the precise meaning of ‘x’ for each invocation is achieved by external binding (the argument list ‘(2)’).

Come the web and URLs and URIs.

Fiona@lovefibre was recently making a test copy of a website built using Wordpress. In a pure html website, this is easy (so long as you have used relative or site-relative links within the site), you just copy the files and put them in the new location and they work :-) Occasionally a more dynamic site does need to know its global name (URL), for example if you want to send a link in an email, but this can usually be achieved using configuration file. For example, there is a development version of Snip!t at cardiff.snip!t.org (rather then www.snipit.org), and there is just one configuration file that needs to be changed between this test site and the live one.

Similarly in a pristine Wordpress install there is just such a configuration file and one or two database entries. However, as soon as it has been used to create a site, the database content becomes filled with URLs. Some are in clear locations, but many are embedded within HTML fields or serialised plugin options. Copying and moving the database requires a series of SQL updates with string replacements matching the old site name and replacing it with the new — both tedious and needing extreme care not to corrupt the database in the process.

Is this just a case of Wordpress being poorly engineered?

In fact I feel more a problem endemic in the web and driven largely by the URL.

Recently I was experimenting with Firefox extensions. Being a good 21st century programmer I simply found an existing extension that was roughly similar to what I was after and started to alter it. First of course I changed its name and then found I needed to make changes through pretty much every file in the extension as the knowledge of the extension name seemed to permeate to the lowest level of the code. To be fair XUL has mechanisms to achieve a level of encapsulation introducing local URIs through the ‘chrome:’ naming scheme and having been through the process once. I maybe understand a bit better how to design extensions to make them less reliant on the external name, and also which names need to be changed and which are more like the ‘x’ in the ‘f(x)’ example. However, despite this, the experience was so different to the levels of encapsulation I have learnt to take for granted in traditional programming.

Much of the trouble resides with the URL. Going back to the two issues of naming, the URL focuses strongly on (a) making the name unambiguous by having a single universal namespace;  URLs are a bit like saying “let’s not just refer to ‘Alan’, but ‘the person with UK National Insurance Number XXXX’ so we know precisely who we are talking about”. Of course this focus on uniqueness of naming has a consequential impact on generality and abstraction. There are many visitors on Tiree over the summer and maybe one day I meet one at the shop and then a few days later pass the same person out walking; I don’t need to know the persons NI number or URL in order to say it was the same person.

Back to Snip!t, over the summer I spent some time working on the XML-based extension mechanism. As soon as these became even slightly complex I found URLs sneaking in, just like the Wordpress database :-( The use of namespaces in the XML file can reduce this by at least limiting full URLs to the XML header, but, still, embedded in every XML file are un-abstracted references … and my pride in keeping the test site and live site near identical was severely dented1.

In the years when the web was coming into being the Hypertext community had been reflecting on more than 30 years of practical experience, embodied particularly in the Dexter Model2. The Dexter model and some systems, such as Wendy Hall’s Microcosm3, incorporated external linkage; that is, the body of content had marked hot spots, but the association of these hot spots to other resources was in a separate external layer.

Sadly HTML opted for internal links in anchor and image tags in order to make html files self-contained, a pattern replicated across web technologies such as XML and RDF. At a practical level this is (i) why it is hard to have a single anchor link to multiple things, as was common in early Hypertext systems such as Intermedia, and (ii), as Fiona found, a real pain for maintenance!


  1. I actually resolved this by a nasty ‘hack’ of having internal functions alias the full site name when encountered and treating them as if they refer to the test site — very cludgy! [back]
  2. Halasz, F. and Schwartz, M. 1994. The Dexter hypertext reference model. Commun. ACM 37, 2 (Feb. 1994), 30-39. DOI= http://doi.acm.org/10.1145/175235.175237 [back]
  3. Hall, W., Davis, H., and Hutchings, G. 1996 Rethinking Hypermedia: the Microcosm Approach. Kluwer Academic Publishers. [back]

July 13, 2010

Apache: pretty URLs and rewrite loops

Filed under: academic, web development — alan @ 9:30 am

[another techie post - a problem I had and can see that other people have had too]

It is common in various web frameworks to pass pretty much everything through a central script using Apache .htaccess file and mod_rewrite.  For example enabling permalinks in a Wordpress blog generates an .htaccess file like this:

RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]

I use similar patterns for various sites such as vfridge (see recent post “Phoenix rises“) and Snip!t.  For Snip!t however I was using not a local .htaccess file, but an AliasMatch in httpd.conf, which meant I needed to ask Fiona every time I needed to do a change (as I can never remember the root passwords!).  It seemed easier (even if slightly less efficient) to move this to a local .htaccess file:

RewriteEngine On
RewriteBase /
RewriteRule ^(.*)$ code/top.php/$1 [L]

The intention is to map “/an/example?args” into “/code/top.php/an/example?args”.

Unfortunately this resulted in a “500 internal server error” page and in the Apache error log messages saying there were too many internal redirects.  This seems to be a common problem reported in forums (see here, here and here).  The reason for this is that .htaccess files are encountered very late in Apache’s processing and so anything rewritten by the rules gets thrown back into Apache’s processing almost as if they were a fresh request.  While the “[L]“(last)  flags says “don’t execute any more rules”, this means “no more rules on this pass”, but when Apache gets back to the .htaccess in the fresh round the rule gets encountered again and again leading to an infinite loop “/code/top/php/code/top.php/…/code/top.php/an/example?args”.

Happily, mod_rewrite thought of this and there is an additional “[NS]” (nosubreq) flag that says “only use this rule on the first pass”.  The mod_rewrite documentation for RewriteRule in Apache 1.3, 2.0 and 2.3 says:

Use the following rule for your decision: whenever you prefix some URLs with CGI-scripts to force them to be processed by the CGI-script, the chance is high that you will run into problems (or even overhead) on sub-requests. In these cases, use this flag.

I duly added the flag:

RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

This should work, but doesn’t.  I’m not sure why except that the Apache 2.2 documentation for NS|nosubreq reads:

NS|nosubreq

Use of the [NS] flag prevents the rule from being used on subrequests. For example, a page which is included using an SSI (Server Side Include) is a subrequest, and you may want to avoid rewrites happening on those subrequests.

Images, javascript files, or css files, loaded as part of an HTML page, are not subrequests – the browser requests them as separate HTTP requests.

This is identical to the documentation for 1.3, 2.0 and 2.3 except that quote about “URLs with CGI-scripts” is singularly missing.  I can’t find anything that says so, but my guess is that there was some bug (feature?) introduced 2.2 that is being fixed in 2.3.

Wordpress is immune from the infinite loop as the directive “RewriteCond %{REQUEST_FILENAME} !-f” says “if the file exists use that without rewriting”.  As “index.php” is a file, the rule does not rewrite a second time.  However, the layout of my files meant that I sometimes have an actual file in the pseudo location (e.g. /an/example really exists).  I could have reorganised the complete directory structure … but then I would have been still fixing all the broken links now!

Instead I simply added an explicit “please don’t rewrite my top.php script” condition:

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI}  !^/code/top.php/.*
RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

I suspect that this will be unnecessary when Apache upgrades to 2.3, but for now … it works :-)

July 26, 2009

fix for Wordpress shortcode bug

Filed under: academic, web development — alan @ 3:56 pm

I’m starting to use shortcodes heavily in WordPress1 as we are using it internally on the DEPtH project to coordinate our new TouchIT book.  There was minor bug which meant that HTML tags came out unbalanced (e.g. “<p></div></p”).

I’ve just been fixing it and posting a patch2, interestingly the bug was partly due to the fact that back-references in regular expressions count from the beginning of the regular expression, making it impossible to use them if the expression may be ‘glued’ into a larger one … lack of referential transparency!

For anyone having similar problems, full details and patch below (all WP and PHP techie stuff).

(more…)


  1. see section “using dynamic binding” in What’s wrong with dynamic binding? [back]
  2. TRAC ticket #10490 [back]

July 20, 2009

What’s wrong with dynamic binding?

Filed under: academic, web development — alan @ 8:14 pm

Dynamic scoping/binding of variables has a bad name, rather like GOTO and other remnants of the Bad Old Days before Structured Programming saved us all1.  But there are times when dynamic binding is useful and looking around it is very common in web scripting languages, event propagation, meta-level programming, and document styles.

So is it really so bad?

(more…)


  1. Strangely also the days when major advances in substance seemed to be more important than minor advances in nomenclature [back]

June 24, 2009

going SIOC (Semantically-Interlinked Online Communities)

Filed under: academic, web development — alan @ 2:31 pm

I’ve just SIOC enabled this blog using the SIOC Exporter for WordPress by Uldis Bojars.  Quoting from the SIOC project web site:

The SIOC initiative (Semantically-Interlinked Online Communities) aims to enable the integration of online community information. SIOC provides a Semantic Web ontology for representing rich data from the Social Web in RDF.

This means you can explore the blog as an RDF Graph including this post.

<sioc:Post rdf:about="http://www.alandix.com/blog/?p=176">
    <sioc:link rdf:resource="http://www.alandix.com/blog/?p=176"/>
    <sioc:has_container rdf:resource="http://www.alandix.com/blog/index.php?sioc_type=site#weblog"/>
    <dc:title>going SIOC (Semantically-Interlinked Online Communities)</dc:title>
    <sioc:has_creator>
        <sioc:User rdf:about="http://www.alandix.com/blog/author/admin/" rdfs:label="alan">
            <rdfs:seeAlso rdf:resource="http://www.alandix.com/blog/index.php?sioc_type=user&amp;sioc_id=1"/>
        </sioc:User>
    </sioc:has_creator>
...

May 13, 2007

tagging … I am not alone … or am I?

Filed under: academic, web development — alan @ 7:50 pm

I’ve noticed that I reuse very few tags … and thought I was just a poor tag-user. However, I read the other day a reference to a paper at the CSCW confernce last year; it reported that the average number of re-uses of a tag was just 1.311 . I thought this meant that most tags are never reused … I am not alone :-)

Having downloaded and read the paper it turns out that this is the average number of users who use a tag – that is most tags are used by only one person, in fact individuals reuse their own tags a lot more … so I am no-good tagger after all :-(

Incidentally, I use ultimate tag warrior plugin for wordpress and it seems OK. Only drawback is that if you want tags displayed with your post, they really get inserted into the post itself. This is not a problem for tags at the end, but would mess up an RSS feed if you like your tags above the post. I guess this is because wordpress does not have a handle for plugins to add things to display loops, so the only way to ensure the tags are displayed are to make them part of the post.

Also Nad sent me a link to a neat tag visualisation by Moritz Stefaner.


  1. Sen, S., Lam, S. K., Rashid, A., Cosley, D., Frankowski, D., Osterhouse, J., Harper, F. M., and Riedl, J. 2006. tagging, communities, vocabulary, evolution. In Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work {Banff, Alberta, Canada, November 04 – 08, 2006}. CSCW ‘06. ACM Press, New York, NY, 181-190. DOI= http://doi.acm.org/10.1145/1180875.1180904 [back]

January 1, 2007

footnotes

Filed under: web development — alan @ 8:08 pm

I’ve just installed the footnotes plugin in my Wordpress1 . It seems to work a dream.

I’ve used the following CSS:

ol.footnotes{
font-size:.9em;
color:#666666;
margin-top:0px;
}

ol.footnotes li{
list-style-type:decimal;
}


  1. makes footnotes like this [back]

Powered by WordPress