Alan’s blog

August 8, 2010

Names, URIs and why the web discards 50 years of computing experience

Filed under: HCI and usability, academic, web development — alan @ 5:46 pm

Names and naming have always been a big issue both in computer science and philosophy, and a topic I have posted on before (see “names – a file by any other name“).

In computer science, and in particular programming languages, a whole vocabulary has arisen to talk about names: scope, binding, referential transparency. As in philosophy, it is typically the association between a name and its ‘meaning’ that is of interest. Names and words, whether in programming languages or day-to-day language, are, what philosophers call, ‘intentional‘: they refer to something else. In computer science the ’something else’ is typically some data or code or a placeholder/variable containing data or code, and the key question of semantics or ‘meaning’ is about how to identify which variable, function or piece of data a name refers to in a particular context at a particular time.

The emphasis in computing has tended to be about:

(a) Making sure names have unambiguous meaning when looking locally inside code. Concerns such as referential transparency, avoiding dynamic binding and the deprecation of global variables are about this.

(b) Putting boundaries on where names can be seen/understood, both as a means to ensure (a) and also as part of encapsulation of semantics in object-based languages and abstract data types.

However, there has always been a tension between clarity of intention (in both the normal and philosophical sense) and abstraction/reuse. If names are totally unambiguous then it becomes impossible to say general things. Without a level of controlled ambiguity in language a legal statement such as “if a driver exceeds the speed limit they will be fined” would need to be stated separately for every citizen. Similarly in computing when we write:

function f(x) { return (x+1)*(x-1); }

The meaning of x is different when we use it in ‘f(2)’ or ‘f(3)’ and must be so to allow ‘f’ to be used generically. Crucially there is no internal ambiguity, the two ‘x’s refer to the same thing in a particular invocation of ‘f’, but the precise meaning of ‘x’ for each invocation is achieved by external binding (the argument list ‘(2)’).

Come the web and URLs and URIs.

Fiona@lovefibre was recently making a test copy of a website built using Wordpress. In a pure html website, this is easy (so long as you have used relative or site-relative links within the site), you just copy the files and put them in the new location and they work :-) Occasionally a more dynamic site does need to know its global name (URL), for example if you want to send a link in an email, but this can usually be achieved using configuration file. For example, there is a development version of Snip!t at cardiff.snip!t.org (rather then www.snipit.org), and there is just one configuration file that needs to be changed between this test site and the live one.

Similarly in a pristine Wordpress install there is just such a configuration file and one or two database entries. However, as soon as it has been used to create a site, the database content becomes filled with URLs. Some are in clear locations, but many are embedded within HTML fields or serialised plugin options. Copying and moving the database requires a series of SQL updates with string replacements matching the old site name and replacing it with the new — both tedious and needing extreme care not to corrupt the database in the process.

Is this just a case of Wordpress being poorly engineered?

In fact I feel more a problem endemic in the web and driven largely by the URL.

Recently I was experimenting with Firefox extensions. Being a good 21st century programmer I simply found an existing extension that was roughly similar to what I was after and started to alter it. First of course I changed its name and then found I needed to make changes through pretty much every file in the extension as the knowledge of the extension name seemed to permeate to the lowest level of the code. To be fair XUL has mechanisms to achieve a level of encapsulation introducing local URIs through the ‘chrome:’ naming scheme and having been through the process once. I maybe understand a bit better how to design extensions to make them less reliant on the external name, and also which names need to be changed and which are more like the ‘x’ in the ‘f(x)’ example. However, despite this, the experience was so different to the levels of encapsulation I have learnt to take for granted in traditional programming.

Much of the trouble resides with the URL. Going back to the two issues of naming, the URL focuses strongly on (a) making the name unambiguous by having a single universal namespace;  URLs are a bit like saying “let’s not just refer to ‘Alan’, but ‘the person with UK National Insurance Number XXXX’ so we know precisely who we are talking about”. Of course this focus on uniqueness of naming has a consequential impact on generality and abstraction. There are many visitors on Tiree over the summer and maybe one day I meet one at the shop and then a few days later pass the same person out walking; I don’t need to know the persons NI number or URL in order to say it was the same person.

Back to Snip!t, over the summer I spent some time working on the XML-based extension mechanism. As soon as these became even slightly complex I found URLs sneaking in, just like the Wordpress database :-( The use of namespaces in the XML file can reduce this by at least limiting full URLs to the XML header, but, still, embedded in every XML file are un-abstracted references … and my pride in keeping the test site and live site near identical was severely dented1.

In the years when the web was coming into being the Hypertext community had been reflecting on more than 30 years of practical experience, embodied particularly in the Dexter Model2. The Dexter model and some systems, such as Wendy Hall’s Microcosm3, incorporated external linkage; that is, the body of content had marked hot spots, but the association of these hot spots to other resources was in a separate external layer.

Sadly HTML opted for internal links in anchor and image tags in order to make html files self-contained, a pattern replicated across web technologies such as XML and RDF. At a practical level this is (i) why it is hard to have a single anchor link to multiple things, as was common in early Hypertext systems such as Intermedia, and (ii), as Fiona found, a real pain for maintenance!


  1. I actually resolved this by a nasty ‘hack’ of having internal functions alias the full site name when encountered and treating them as if they refer to the test site — very cludgy! [back]
  2. Halasz, F. and Schwartz, M. 1994. The Dexter hypertext reference model. Commun. ACM 37, 2 (Feb. 1994), 30-39. DOI= http://doi.acm.org/10.1145/175235.175237 [back]
  3. Hall, W., Davis, H., and Hutchings, G. 1996 Rethinking Hypermedia: the Microcosm Approach. Kluwer Academic Publishers. [back]

July 13, 2010

Apache: pretty URLs and rewrite loops

Filed under: academic, web development — alan @ 9:30 am

[another techie post - a problem I had and can see that other people have had too]

It is common in various web frameworks to pass pretty much everything through a central script using Apache .htaccess file and mod_rewrite.  For example enabling permalinks in a Wordpress blog generates an .htaccess file like this:

RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]

I use similar patterns for various sites such as vfridge (see recent post “Phoenix rises“) and Snip!t.  For Snip!t however I was using not a local .htaccess file, but an AliasMatch in httpd.conf, which meant I needed to ask Fiona every time I needed to do a change (as I can never remember the root passwords!).  It seemed easier (even if slightly less efficient) to move this to a local .htaccess file:

RewriteEngine On
RewriteBase /
RewriteRule ^(.*)$ code/top.php/$1 [L]

The intention is to map “/an/example?args” into “/code/top.php/an/example?args”.

Unfortunately this resulted in a “500 internal server error” page and in the Apache error log messages saying there were too many internal redirects.  This seems to be a common problem reported in forums (see here, here and here).  The reason for this is that .htaccess files are encountered very late in Apache’s processing and so anything rewritten by the rules gets thrown back into Apache’s processing almost as if they were a fresh request.  While the “[L]“(last)  flags says “don’t execute any more rules”, this means “no more rules on this pass”, but when Apache gets back to the .htaccess in the fresh round the rule gets encountered again and again leading to an infinite loop “/code/top/php/code/top.php/…/code/top.php/an/example?args”.

Happily, mod_rewrite thought of this and there is an additional “[NS]” (nosubreq) flag that says “only use this rule on the first pass”.  The mod_rewrite documentation for RewriteRule in Apache 1.3, 2.0 and 2.3 says:

Use the following rule for your decision: whenever you prefix some URLs with CGI-scripts to force them to be processed by the CGI-script, the chance is high that you will run into problems (or even overhead) on sub-requests. In these cases, use this flag.

I duly added the flag:

RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

This should work, but doesn’t.  I’m not sure why except that the Apache 2.2 documentation for NS|nosubreq reads:

NS|nosubreq

Use of the [NS] flag prevents the rule from being used on subrequests. For example, a page which is included using an SSI (Server Side Include) is a subrequest, and you may want to avoid rewrites happening on those subrequests.

Images, javascript files, or css files, loaded as part of an HTML page, are not subrequests – the browser requests them as separate HTTP requests.

This is identical to the documentation for 1.3, 2.0 and 2.3 except that quote about “URLs with CGI-scripts” is singularly missing.  I can’t find anything that says so, but my guess is that there was some bug (feature?) introduced 2.2 that is being fixed in 2.3.

Wordpress is immune from the infinite loop as the directive “RewriteCond %{REQUEST_FILENAME} !-f” says “if the file exists use that without rewriting”.  As “index.php” is a file, the rule does not rewrite a second time.  However, the layout of my files meant that I sometimes have an actual file in the pseudo location (e.g. /an/example really exists).  I could have reorganised the complete directory structure … but then I would have been still fixing all the broken links now!

Instead I simply added an explicit “please don’t rewrite my top.php script” condition:

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI}  !^/code/top.php/.*
RewriteRule ^(.*)$ code/top.php/$1 [L,NS]

I suspect that this will be unnecessary when Apache upgrades to 2.3, but for now … it works :-)

June 11, 2010

Phoenix rises – vfridge online again

Filed under: HCI and usability, academic, personal, web development — alan @ 7:47 pm

vfridge is back!

I mentioned ‘Project Phoenix’ in my last previous post, and this was it – getting vfridge up and running again.

Ten years ago I was part of a dot.com company aQtive1 with Russell Beale, Andy Wood and others.  Just before it folded in the aftermath of the dot.com crash, aQtive spawned a small spin-off vfridge.com.  The virtual fridge was a social networking web site before the term existed, and while vfridge the company went the way of most dot.coms, for some time after I kept the vfridge web site running on Fiona’s servers until it gradually ‘decayed’ partly due to Javascript/DOM changes and partly due to Java’s interactions with mysql becoming unstable (note very, very old Java code!).  But it is now back online :-)

The core idea of vfridge is placing small notes, photos and ‘magnets’ in a shareable web area that can be moved around and arranged like you might with notes held by magnets to a fridge door.

Underlying vfridge was what we called the websharer vision, which looked towards a web of user-generated content.  Now this is passé, but at the time  was directly counter to accepted wisdom and looking back seem prescient – remember this was written in 1999:

Although everyone isn’t a web developer, it is likely that soon everyone will become an Internet communicator — email, PC-voice-comms, bulletin boards, etc. For some this will be via a PC, for others using a web-phone, set-top box or Internet-enabled games console.

The web/Internet is not just a medium for publishing, but a potential shared place.

Everyone may be a web sharer — not a publisher of formal public ‘content’, but personal or semi-private sharing of informal ‘bits and pieces’ with family, friends, local community and virtual communities such as fan clubs.

This is not just a future for the cognoscenti, but for anyone who chats in the pub or wants to show granny in Scunthorpe the baby’s first photos.

Just over a year ago I thought it would be good to write a retrospective about vfridge in the light of the social networking revolution.  We did a poster “Designing a virtual fridge” about vfridge years ago at a Computers and Fun workshop, but have never written at length abut its design and development.  In particular it would be good to analyse the reasons, technical, social and commercial, why it did not ‘take off’ the time.  However, it is hard to do write about it without good screen shots, and could I find any? (Although now I have)  So I thought it would be good to revive it and now you can try it out again. I started with a few days effort last year at Christmas and Easter time (leisure activity), but now over the last week have at last used the fact that I have half my time unpaid and so free for my own activities … and it is done :-)

The original vfridge was implemented using Java Servlets, but I have rebuilt it in PHP.  While the original development took over a year (starting down in Coornwall while on holiday watching the solar eclipse), this re-build took about 10 days effort, although of course with no design decisions needed.  The reason it took so much development back then is one of the things I want to consider when I write the retrospective.

As far as possible the actual behaviour and design is exactly as it was back in 2000 … and yes it does feel clunky, with lots of refreshing (remember no AJAX or web2.0 in those days) and of course loads of frames!  In fact there is a little cleverness that allowed some client-end processing pre-AJAX2.    Also the new implementation uses the same templates as the original one, although the expansion engine had to be rewritten in PHP.  In fact this template engine was one of our most re-used bits of Java code, although now of course many alternatives.  Maybe I will return to a discussion of that in another post.

I have even resurrected the old mobile interface.  Yes there were WAP phones even in 2000, albeit with tiny green and black screens.  I still recall the excitement I felt the first time I entered a note on the phone and saw it appear on a web page :-)   However, this was one place I had to extensively edit the page templates as nothing seems to process WML anymore, so the WML had to be converted to plain-text-ish HTML, as close as possible to those old phones!  Looks rather odd on the iPhone :-/

So, if you were one of those who had an account back in 2000 (Panos Markopoulos used it to share his baby photos :-) ), then everything is still there just as you left it!

If not, then you can register now and play.


  1. The old aQtive website is still viewable at aqtive.org, but don’t try to install onCue, it was developed in the days of Windows NT. [back]
  2. One trick used the fact that you can get Javascript to pre-load images.  When the front-end Javascript code wanted to send information back to the server it preloaded an image URL that was really just to activate a back-end script.  The frames  used a change-propagation system, so that only those frames that were dependent on particular user actions were refreshed.  All of this is preserved in the current system, peek at the Javascript on the pages.    Maybe I’ll write about the details of these another time. [back]

June 9, 2010

PHP syntax checker updated

Filed under: academic, web development — alan @ 4:05 pm

Took a quick break today from Project Phoenix1.

I’ve had a PHP syntax checker on meandeviation for several years, but only checking PHP 4 as that is what is running on the server.  However, I had an email asking about PHP 5 , so now there is a PHP 5 version too :-)

The syntax checker is a pretty simple layer over the PHP command line option “php -l” and also uses the PHP highlight_file function.  The main complication is parsing the HTML outputs of both as they change between versions of PHP!

There is also a download archive so you can also have it running locally on your own system.


  1. watch this space … [back]

February 17, 2010

names – a file by any other name

Filed under: HCI and usability, academic, web development — alan @ 4:55 pm

Naming things seems relatively unproblematic until you try to do it — ask any couple with a baby on the way.  Naming files is no easier.

Earlier today Fiona @lovefibre was using the MAC OS Time Machine to retrieve an old version of a file (let’s call it “fisfile.doc”).  She wanted to extract a part that she knew she had deleted in order to use in the current version.  Of course the file you are retrieving has the same name as the current file, and the default is to overwrite the current version; that is a simple backup restore.  However, you can ask Time Machine to retain both versions; at which point you end up with two files called, for example, “fisfile.doc” and “fisfile-original.doc”.  In this case ‘original’ means ‘the most recent version’ and the unlabelled one is the old version you have just restored.  This was not  too confusing, but personally I would have been tempted to call the restored file something like “fisfile-2010-01-17-10-33.doc”, in particular because one wonders what will happen if you try to restore several copies of the same file to work on, for example, to work out when an error slipped into a document.

OK, just an single incident, but only a few minutes later I had another example of problematic naming.

(more…)

December 22, 2009

reading barcodes on the iPhone

Filed under: academic, web development — alan @ 7:11 pm

I was wondering about scanning barcodes using the iPhone and found that pic2shop (a free app) lets third part iPhone apps and even web pages access its scanning software through a simply URL scheme interface.

Their developer documentation has an example iPhone app, but not an example web page.  However, I made a web page using their interface in about 5 minutes (see PHP source of barcode reader).

If you have an iPhone you can try it out and scan a bar code now, although you need to install pic2shop first (but it is free).

By allowing third party apps to use their software they encourage downloads of their app, which will bring them revenue through product purchases.  Through free giving they bring themselves benefit; a good open access story.

December 18, 2009

big file?

Filed under: HCI and usability, academic, web development — alan @ 2:45 pm

Encountered the following when downloading a file for proof checking from the Elsevier “E-Proofing” System web site1.  Happily it turned out to be a mere 950 K not a gigabyte!


  1. I was going to add a link to the system web site itself at http://elsevier.sps.co.in/ but instead of some sort of home page you get redirected to the proofs of someone’s article in the Journal of Computational Physics! [back]

August 3, 2009

data types and interpretation in RDF

Filed under: academic, web development — alan @ 1:46 pm

After following a link from one of Nad’s tweets, read Jeni Tennison’s “SPARQL & Visualisation Frustrations: RDF Datatyping“.  Jeni had been having problems processing RDF of MP’s expense claims, because the amounts were plain RDF strings rather than as typed numbers.  She  suggests some best practice rules for data types in RDF based on the underlying philosophy of RDF that it should be self-describing:

  • if the literal is XML, it should be an XML literal
  • if the literal is in a particular language (such as a description or a name), it should be a plain literal with that language
  • otherwise it should be given an appropriate datatype

These seem pretty sensible for simple data types.

In work on the TIM project with colleagues in Athens and Rome, we too had issues with representing data types in ontologies, but more to do with the status of a data type.  Is a date a single thing “2009-08-03T10:23+01:00″, or is it a compound [[date year="2009" month="8" ...]]?

I just took a quick peek at how Dublin Core handles dates and see that the closest to standard references1 still include dates as ‘bare’ strings with implied semantics only, although one of the most recent docs does say:

It is recommended that RDF applications use explicit rdf:type triples …”

and David MComb’s “An OWL version of the Dublin Core” gives an alternative OWL ontology for DC that does include an explicit type for dc:date:

<owl:DatatypeProperty rdf:about="#date">
  <rdfs:domain rdf:resource="#Document"/>
  <rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#dateTime"/>
</owl:DatatypeProperty>

Our solution to the compound types has been to have “value classes” which do not represent ‘things’ in the world, similar to the way the RDF for vcard represents  complex elements such as names using blank nodes:

<vCard:N rdf:parseType="Resource">
  <vCard:Family> Crystal </vCard:Family>
  <vCard:Given> Corky </vCard:Given>
  ...
</vCard:N>

From2

This is fine, and we can have rules for parsing and formatting dates as compound objects to and from, say, W3C datetime strings.  However, this conflicts with the desire to have self-describing RDF as these formatting and parsing rules have to be available to any application or be present as reasoning rules in RDF stores.  If Jeni had been trying to use RDF data coded like this she would be cursing us!

This tension between representations of things (dates, names) and more semantic descriptions is also evident in other areas.  Looking again at Dublin Core the metamodal allows a property such as “subject”  to have a complex object with a URI and possibly several string values.

Very semantic, but hardly mashes well with sources that just say <dc:subject>Biology</dc:subject>.  Again a reasoning store could infer one from the other, but we still have issues about where the knowledge for such transformations resides.

Part of the problem is that the ’self-describing’ nature of RDF is a bit illusary.   In (Piercian) semiotics the interpretant of a sign is crucial, representations are interpreted by an agent in a particular context assuming a particular language, etc.  We do not expect human language to be ’sef describing’ in the sense of being totally acontextual.  Similarly in philosophy words and ideas are treated as intentional, in the (not standard English) sense that they refer out to something else; however, the binding of the idea to the thing it refers to is not part of the word, but separate from it.  Effectively the desire to be self-describing runs the risk of ignoring this distinction3.

Leigh Dodds commented on Jeni’s post to explain that the reason the expense amounts were not numbers was that some were published in non-standard ways such as “12345 (2004)”.  As an example this captures succinctly the perpetual problem between representation and abstracted meaning.  If a journal article was printed in the “Autumn 2007″ issue of  quarterly magazine, do we express this as <dc:date>2007</dc:date> or <dc:date>2007-10-01</dc:date>  attempting to give an approximation or inference from the actual represented date.

This makes one wonder whether what is really needed here is a meta-description of the RDF source (not simply the OWL as one wants to talk about the use of dc:date or whatever in a particular context) that can say things like “mainly numbers, but also occasionally non-strandard forms”, or “amounts sometimes refer to different years”.  Of course to be machine mashable there would need to be an ontology for such annotation …


  1. see “Expressing Simple Dublin Core in RDF/XML“, “Expressing Dublin Core metadata using HTML/XHTML meta and link elements” and Stanford DC OWL [back]
  2. Renato Iannella, Representing vCard Objects in RDF/XML, W3C Note, 22 February 2001. [back]
  3. Doing a quick web seek, these issues are discussed in several places, for example: Glaser, H., Lewy, T., Millard, I. and Dowling, B. (2007) On Coreference and the Semantic Web, (Technical Report, Electronics & Computer Science, University of Southampton) and Legg, C. (2007). Peirce, meaning and the semantic web (Paper presented at Applying Peirce Conference, University of Helsinki, Finland, June 2007). [back]

July 20, 2009

What’s wrong with dynamic binding?

Filed under: academic, web development — alan @ 8:14 pm

Dynamic scoping/binding of variables has a bad name, rather like GOTO and other remnants of the Bad Old Days before Structured Programming saved us all1.  But there are times when dynamic binding is useful and looking around it is very common in web scripting languages, event propagation, meta-level programming, and document styles.

So is it really so bad?

(more…)


  1. Strangely also the days when major advances in substance seemed to be more important than minor advances in nomenclature [back]

July 10, 2009

grammer aint wot it used two be

Filed under: HCI and usability, academic, personal, web development — alan @ 10:35 am

Fiona @ lovefibre and I have often discussed the worrying decline of language used in many comments and postings on the web. Sometimes people are using compressed txtng language or even leetspeak, both of these are reasonable alternative codes to ‘proper’ English, and potentially part of the natural growth of the language.  However, it is often clear that the cause is ignorance not choice.  One of the reasons may be that many more people are getting a voice on the Internet; it is not just the journalists, academics and professional classes.  If so, this could be a positive social sign indicating that a public voice is no longer restricted to university graduates, who, of course, know their grammar perfectly …

Earlier today I was using Google to look up the author of a book I was reading and one of the top links was a listing on ratemyprofessors.com.  For interest I clicked through and saw:

“He sucks.. hes mean and way to demanding if u wanan work your ass off for a C+ take his class1

Hmm I wonder what this student’s course assignment looked like?

(more…)


  1. In case you think I’m a complete pedant, personally, I am happy with both the slang ’sucks’ and ‘ass’ (instead of ‘arse’!), and the compressed speech ‘u’. These could be well-considered choices in language. The mistyped ‘wanna’ is also just a slip. It is the slightly more proper “hes mean and way to demanding” that seems to show  general lack of understanding.  Happily, the other comments, were not as bad as this one, but I did find the student who wanted a “descent grade” amusing :-) [back]
Next Page »

Powered by WordPress