Lots of Words is a new experiment to mash up a Wikipedia based lexicon with images from Flickr and whatever else I can get my hands on, in the context of building a representative and informative source of translations for any particular word, in any particular language. I’m trying to keep things as machine-readable as possible for now, so others can build on it, too.

My friend Patrick Hall and I have been musing about it for some time, and only now a technology stack allowed me to do this as a relatively small hack rather than putting together months of optimization work.

It turns out, indexing something as big as Wikipedia (check out those dump file sizes!) isn’t really an “idea in the head and 500 lines of code”, unless you use the right tools for the job. In this case, a shiny new CouchDB instance at Amazon EC2, a bit of Ruby and Merb to add a some logic and presentation magic, and JQuery as a finishing touch did the trick. This gets pretty much every Web N.0 buzzword covered, although I haven’t yet made any millions in an iPhone app.

This is a spare-time project, so it made sense for me to try out as many different bits of new technology as possible and make it into a breakable toy. This is its third implementation, and the first I’m really happy with in terms of performance and malleability. CouchDB, even with 21+ million documents loaded in about 120 GB of storage, still responds in under 200ms times on all queries I’ve tried so far. It truly is, even in its pre-1.0 days, a fantastic piece of software.

Now I find myself wanting to put a nice front-end to this, and while the current Flickr mash-up is already very interesting—and, it turns out, solves the problem of cross-language information retrieval for a small subset of Flickr, I’m sure others will have much more useful ideas about what to do with this data. My colleague Robert Rees has helped put together a hackfest here in the ThoughtWorks UK office, together with the nice folks from the London JavaScript Meetup group. Come join us 12 November!

If you just want to get to the code (be forewarned it is ugly!), it’s on GitHub.