Love what we’ve said? Think we’re talking nonsense? Don’t worry about being polite, just let us have it. We’re not afraid of telling you that you’re talking crap, so don’t be afraid of telling us the same.
Calais

No, not the refugee city in northern France, but Calais, a content tagging web-service. I couldn’t quite get my head around it to begin with, but when it clicked, it’s status as a truly great idea was cemented (in my head, at least).
The idea is that you can send a piece of content (up to 100,000 characters) to Calais (via a SOAP or POST call to their web-service API) and it’ll analyse the text you’ve sent and return a list of metadata that it thinks appropriate to the content. It’s fairly easy to imagine how it can do this, with access to common vocabularies, archives of content/appropriate meta-data, and some neat search algorithms, but it’s such a stunningly simple and useful service I’m amazed no-one else has done this before.
The meta-data that the web-service returns is grouped into ‘Entities’ such as City, Company, County, Fact (so for instance in the text of this post, ‘France’ would be extracted as a Country Entity as it’s mentioned in the first sentence) and also ‘Events & Facts’ - which are a little more complicated, but so incredibly clever it would be criminal of me to ignore. Let us imagine the following text was being analysed:
SmartForce (Nasdaq:SMTF), the world’s largest e-Learning company, today announced an agreement with DigitalMed, a subsidiary of Tenet Healthcare Corporation (NYSE:THC), to provide e-Learning for the health care industry.
Calais would extract a BusinessRelation ‘Event’ that had the following information:
- Company = SmartForce
- Company = DigitalMed
- Status = announced
Clever, eh? How about this:
Electronic component developer Panja Inc. said on Tuesday it has appointed Berry Cash as chairman of the company.
You guessed it, Calais would extract:
- Company = Panja Inc.
- Person = Berry Cash
- Position = chairman
- Action = enters
And it doesn’t stop at business stuff, it’ll extract quotations, personal biographical information, family relationships and more (have a look at the page on Calais Semantic Metadata for some ideas).
It takes about five seconds to start visualising how this kind of service could be used - from something as simple as generating lists of ’suggested tags’ for content created in a CMS or web-application, to a news aggregator that only deals in ‘hard-facts’ - and we’ll definitely start playing with it here at Cimex sooner rather than later. Watch this space.