Online ‘09 – Recover and Discover

4 12 2009

This week, I’ve been at the ‘Online 09′ event in London, during the course of which I was able to present a sort of summary of some of the topics that I’ve been covering this year on this very blog. Indeed the subject itself; ‘Recover & Discover – Matching Search Modes To Users’, was one that I wrote a great deal about earlier in the year and is covered in a great deal of detail in the ‘Search – The Trilogy et al’ on the site.

However, I worked on the basis that presenting on the last afternoon of a busy 3 day event, people were going to be pretty fed-up with yet another burst of Powerpoint ‘Smart Art’ and in a decision that I may have cause to regret, I decided to hand draw my slides. A weekend on the sofa with pencils, felt tips and a sketch pad produced the presentation embedded below.

For those of you who attended and wanted to review, or were unable to attend and would like to ‘entertain themselves’ (very loosely) by working their way through the deck, I’ve written a sort of ’slide-by-slide’ breakdown of what was discussed during the talk.


Introduction (over slide 1)

One of the nice things about discussing topics like search and search methodologies is that there are no right answers (I fact several of the things I’ll tell you in the presentation are 100% false and, I hope. obviously so).

There is a great deal of debate – some well-reasoned, some not – on the subject, but nobody has the right answers. Not even the big players. They might have suggestions that help them make a great deal of money, allow them to wield a great deal of influence, even create their own verbs that become part of the lingua-franca of the internet. Yet it is arguable that whilst search has continued to iterate better and better solutions, it has yet to begin to meet the demands of the ever chancing demographic of the medium.

How has that changed ? let’s take a small step back in time. Which, like all small steps back in WWW history, is in fact a pretty big one.

1994 (slides 2-5)

1994. Why 1994 ? Well it was a pretty important year. For those of you too young (or indeed too old) to remember it clearly, here is a refresher.

This was a year in the era before broadband, before even the home-user 56k modem. Before Google, before eBay, before Internet Explorer, Netscape and the browser wars.

What we did have was the most symmetrical ‘World Cup’ ever in the USA in the summer. It began with Diana Ross missing a penalty during the opening ceremony, upon which – for reasons that baffled me at the time and still baffle me now – the goal exploded. The tournament ended with a strangely hirsute Roberto Baggio sending the last penalty of the tournament into sub-space orbit, rather than anywhere near Tafarel in the Brazilian goal.

Also, 1994 was apparently was nominated by the United nations as being the ‘International Year Of The Family’. This presumably meant that all families that existed prior to 1994 were declared as ‘Beta’ versions and could not rely on ongoing support for upgrades. Or of course, maybe not.

Most importantly, there were two major events in the history of the internet.

Firstly there was the ‘The Superhighway Summit’ at UCLA where Al Gore as keynote speaker is alleged to have coined the phrase ‘The Information Superhighway’. A phrase that in the years that followed, may people kind of wished he hadn’t, as it along the term ’surfing’ were highlights of many a late 90’s wince-enducing vapour-ware presentation.

Secondly – and perhaps not as well know – was that 1994 was also the year that I first used the internet; specifically the WWW, to solve an office argument about ‘The Dukes Of Hazzard’ (whether the doors of the ‘General Lee were welded shut, or whether they Dukes were just too plain lazy to open them).

I was at the time employed by a large computer company and sat on my desk was a monstrously large IBM 3270 terminal, on which I had discovered a few months before, you could access a Lynx-like text browser and through a combination of ‘F keys’ it was possible to enter URLs and see a textual representation of the HTML. This, in combination with ‘Webcrawler’ – the first keyword search tool – allowed me to peer into the ether and find information (not always about bad 1970’s TV shows I hasten to add).

Back then, every search was a discovery. You didn’t know what was out there, there were no real reliable sources on which you rely on being able to utilise. What you found was an iterative process of filtering results until you either found what you were looking for, or hit an informational dead end.

2009 (slides 6-10)

Obviously, by comparison 2009 isn’t just a step forward of 15 years. The internet has a ‘cat-like’ approach to aging, in that a single additional year in human terms equals a great deal more in its evolution. It might as well be another planet by comparison to 1994.

In 2009 it is now law that the iPhone must be mentioned in all presentations, regardless of whether it is actually relevent to the discussion or not. This device apparently is due to ‘….solve all world problems by Q3 2010…’, which is obviously something to look forward to for all of us.

2009 is also the year where a major constitutional crisis was narrowly averted after it was alleged that the Prime Minister wasn’t easily able to identify his favourite biscuit. Who knew ultimately that politics could be distilled down to such simple matters of snack-food preference ? Maybe the 2010 election could end-up being fought over the key battleground of whether ‘Monster Munch’ are smaller than they used to be, or whether it just seems that way ?

Back on topic, the magic of being able to keyword search was long since passed. Webcrawler begat Lycos, which begat Altavista, Hotbot, Metacrawler, Dogpile, Ask….. Google. The familiarity that we now have with search tools is such that we don’t really think about search tools. And looking at the data from Google Insights for example shows that at a high level.

On the day I was sitting down to write this presentation, I checked the previous 7 days ‘Top Searches’ and ‘Rising Searches’. I then compared them to the same set I had recorded when I was writing my original ‘Recover’ search piece back a few months before. The ‘Top Searches’ (which more accurately could be described as ‘Top Search Terms’) were more or less the same. Whilst the ‘Rising Searches’ were of course different, the actual types of term were more-or-less the same.

  • ‘Top Searches’ – existing web properties
  • ‘Rising Searches – known entities (mainly ‘people’, but also ‘places’ and ‘organisations’)

Taking ‘Top Searches’, what does this tell us about how people are using Google ? They are – in this simple example – not using Google to discover whether something exists, but rather to recover information that they already know is there. At a more extreme level it would suggest that search is even used as a form of or replacement for bookmarks (hence why if you drill down into the term ‘facebook’, you quickly get to the detailed term ‘facebook login’). We all knew that search queries were dead, now it seems that basic browser functionality is to follow, bookmarks, the address bar, all replaced by the searchbox.

Not only that, but by implication, users also don’t trust the internal search – site search – at the given locations to give them what they want. Drill down though into the ‘News and Current Affairs’ category and look just how many searches are the combination of known news providers and entity keywords. Google does not care where it sends them, even if the initial intention was to search a specific provider (assuming that the search isn’t using ’site:[domain]‘ which I guess is probably a given.

‘Who Searches For What ?’ (slides 11-16)

It’s obvious that a tool like Google Insights is going to be somewhat skewed towards the mainstream consumer when you look at the data in such a high-level way. It is important though, not to consider the ‘Recover’ mode just something that exists in some lumpen-consumer way, because we are as users of the web, far more sophisticated.

So here’s me in work-mode. You can tell I’m in work mode because I almost smart and I have my hair tucked behind my ears in an attempt to look less scruffy than I actually am. In the course of my work, I will do a fair few ‘recover’ searches;

  • Checking on the company shareprice.
  • Placing a postcode onto map so I know where a meeting is taking place.
  • Pulling up a quick bio on someone I’m meeting.

There I am in casual, non-word mode. Similarly speaking, I will also in my own time conduct searches of this type;

  • Checking the latest football scores (which as a Southampton fan, is something I do through gritted-teeth).
  • Placing a postcode onto a map for I know where I’m trying to meet my friends.
  • Pulling up some information on a celeb, so I know what everybody else will be discussing when we meet.

Back in work-mode, I may well also perform more complex searches.

  • That place I found on the map earlier, how is the best way to travel there ? How long will that take ?
  • That person I was reading up about, is there any co-currence in other articles between this other person or this company/product ?

Whilst the inital search might have been something of a commodity hunt, now I’m performing a much more advanced sort of query. I’m trying to make comparisons and look for patterns. I do the same in casual-mode;

  • What is the best way for me to hook-up my iPod to speakers ? Headphone socket or via an offboard DAC of some sort ?
  • On a 32″ LCD, do I really need 1080p or 720p support for video ?

We can broadly map these two types of query against the ‘Recover and Discover’ model. These models are not there to characterise a user, but to characterise the sort of modes that we all use to get hold of the information that we are after.

The more simple ‘Recover’ is much more simple and could almost be said to be a commodity. A quick basic single or phrase search and a user will most likely jump into the source that they know will fulfill their requirements. Its simplicity doesn’t alter the value of the search, just the complexity of result it takes to fulfil the need.

To be able to just deal with one of these modes and not the other excludes the usefulness of the content we might have, from being found by the very people who actually want to find it. But what is that they want ?

‘What Users Want’ (slides 17-23)

As I said right at the top, there’s very little in the way of facts in this area of debate. And I’m about to contribute further to the weight of conjecture when I say;

‘Users don’t want search, they want find’

What do I mean by this wantonly facile statement ? I mean this; 15 years ago we gave users their first keyword search experience to be able to type in what it was that they were looking for. We then got cleverer and cleverer in what you could do in this text box, how you could construct clever queries which narrowed the results by site or inclusion/exclusion. We developed natural language searching where you asked rather than queried. And then, after all of this, where did our cleverness get us ? Right back to where we started. To keywords.

But that’s not to say that keywords mean that we cannot use well-proven types of search ‘abstraction’ to meet users needs. We are if nothing else, the experts in our own content. We know what is important about it, how it related to other content we also produce and where the commonalities are between them.

The 30k+ volunteers that maintain articles on Wikipedia do this on a manual basis, building the linkages between articles that are related, which if we only looked at those in isolation would give us a pretty good summary of what the most important (and valuable) elements to each article actually are. Not many of us however can could on a volunteer army of enthusiasts to perform that job for us.

If we sit down with a piece of content that we are familiar with, we can fairly quickly pick out the most important terms and add them as metadata to the original document, helping search tools to make better judgements about what is and is not truly relevant to the users attempts to find. Of course, that’s not a suitable solution to people who publish large amounts of content. Luckily, there are some excellent fully or semi-automated tools out there that will be able to do that for us, regardless of how numerous or complex our content is.

By turning each of these metadata elements into an inline link, we can start to second-guess where that user might want to go to next. What started as a simple ‘Recover’ search might actually be the beginning of a more complex traverse through your content (like the travel example I gave earlier when I was in ‘Work Mode’.

Thinking again about ‘Recover’ we can start to think about the output of search not as a set of results, but as information that enlightens a visitor about a specific ‘Topic’. Taking the input from our above inline tags, we might chose not to link to a specific piece of content, but rather to a ‘Topic Page’, which could automatically be produced on-the-fly via a search tool dressed with an SEO-friendly permalink. As we can produce these pages dynamically and on any subject that we have established that we have relevant content concerning, we can almost do this to n degrees; our only limit is the depth of our content and the depth of the associated metadata.

In ‘Discover’, we are actually more familiar with the concept of ‘faceted search’ than we probably aware. When we use eBay to search for products, we’re at a beginning point of knowing some elements of what we’re looking for, but whether they exist for sale is something that we do not know. Starting with a basic keyword search, we are presented back a list of potential candidates to match our first search, with the ability to refine this by the categories under which each item is listed.

These categories and keywords are manually entered by sellers when they list an item (via a system of drop-down menus and some basic automatic keyword extraction), and as such rely to a great degree on the accuracy and honesty of the human being creating the listing. The results are what you’d expect from such an approach; far better that without any metadata, but massively inconsistent. An inconsistency which sometimes manages to hide exactly what you’re looking for (whilst also creating a bargain on some poorly listed items for those of us who try to second guess common listing mistakes).

Newssift, a business news aggregation service from the Financial Times’ ‘FT Search’ unit [disclosure : a customer of Nstein], uses faceted search techniques to allow users to browse articles by selecting topics, companies, locations (and combinations of all of those) to quickly refine their areas of interest. As you drill down through individual topics for example, the associated other facets are dynamically redrawn to only show you those which are ‘co-current’ with that you have selected. It’s a neat solution to assist users to be able to find the content that meets their specific needs via discovery, without them having to make frequent repeated manual queries to refine their requirements.

Finally, we’ve just launched a new product; S3, our ‘Semantic Site Search’, which incorporates many if not most of the aforementioned technologies, including faceted search, dynamic topic page creation and the all-important automatic metadata creation that powers both outputs. Although I would of course say so, it’s well worth a look.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com]





PubBiz – The Velvet Rope

19 11 2009

There is a pub in the city centre, not far from where I live. Until recently it was part of a chain of Australian-themed establishments, but upon arriving home for a recent trip whilst sat in a taxi home, I saw it had undergone something of a revamp.

Outside there were swirling searchlights illuminating the exterior, which had been painted with purple highlights. Guarding the outside were a couple of dinner-jacketed doormen, who were patrolling the entrance via the means of opening and closing a small gate, across which was a purple velvet rope. In short, the place was giving the impression that rather than being a city centre vertical-drinking establishment, this was something a bit exclusive. It wasn’t just a pub anymore. It had the air of a film premiere or a swanky private party, except in the high street of my drizzle-stained home city.

On seeing me staring at the place whilst we waited for the traffic lights to change, the taxi driver proffered his opinion…

‘…yeah it looks all fancy on the outside, but you go in there and apparently it’s exactly the same. Well, I mean it’s painted purple and there are new carpets. But it’s still just a pub. Except you have to pay to get in now….’

Last time – in my usual ‘long answer to a short question’ manner – I talked about tension between niche and mass market products, using as my example the rapid growth that the mobile phone market had undergone once it offered ‘Pay As You Go’ to supplement the contract model.

Additionally, I started to talk about how not all content is equal;  that market specific material, such as that produced by business publications, could be more simple supported by a ‘All or Nothing’ payment model, as it was likely to be supported by a commercial value judgement by its users. What is interesting is that given that, publications such as The Financial Times, have not closed the drawbridge entirely, and offer a small amount of material for free every day for non-subscribers.

Indeed the financial-data market is one that has always interested me when it comes to charging models. I do keep half an eye on things like the FTSE via the BBC News site, which updates me for free on a 15 minute delay from the live feed. Why 15 minutes ? Why not 5 minutes or as-live ? Presumably this is because the value of the data to the sort of financial industry professionals who would pay for a live feed (via services offered by FT, Bloomberg et al) has pretty much expired by then and is only useful for amateur market watchers like me. It no longer has any real commercial value.

In short, volatile content has a distinct shelf life, and in this case quite a pretty brief one.

It’s possible to extrapolate this out across all sorts of factual news data. Doesn’t matter whether we’re talking about stock prices or the latest football scores, there is a very short period where that data is valuable, sometimes mere seconds. After that, you might as well give it away for free in an attempt to support customer acquisition activities.

News content similarly has a shelf-life, but this is less easy to measure.

Taking my local paper as an example, by the time that the first edition has hit the news stands at lunchtime, a small proportion of the stories have already been published to its website. However, these are usually truncated versions of a single paragraph (‘….for the full story, read today’s edition of the paper…’), which is for me at least enormously frustrating. Especially when come the following day, the same caveat still cuts across the stories, whilst the paper edition fills the recycling bins of the city. Whilst the shelf-life isn’t anywhere near the brevity of the FTSE, by the mid-evening of publication day, the sales cycle for the physical paper has ended.

Not all content is equal in terms of its value, both regarding its target-market and its consumption model. Using the example of market data, we could say for example;

FTSE : Current Data – 15 minutes
FTSE : Market Closing Data – 6 Hours
FTSE : Market Analysis – 12 Hours

or for football data;

Match : Latest Score – 15 minutes
Match : Final Score – 1 Hour
Match : Match Analysis – 48 Hours

Ok, I’m really applying my own subjective opinion to the timings and the values, but for me it is the analytical content which is where the valuable content resides. Not only is it more valuable in itself, but it retains its value for longer and is unique to its owner. There maybe be similarities to its peers in other publications, but at a detailed level, only one publication possesses it.

I think that you’ve probably gathered that I’m not entirely convinced by the ‘All or Nothing’ model for content charging. Nor for that matter am I especially enamoured by much of what I have read thus far about its much lauded alternative, ‘Micropayments’. However, that is less about the principle – PAYG as I previously stated is one which I see some potential – but more about the ways in which its application is being discussed, which seem to lack the flexibility to have the long term sustainability required.

  • What is the physical payment model ?

In order for Micropayments to work, there needs be a simple method for paying for content, yet other than the offer from Google – hardly considered the best friend of the industry – the likely offering seems to be a point-to-point payment mechanism between the reader and the publisher. Read multiple publications, then you need multiple accounts. Want to read something on the spur of the moment ? First you must sign up and create an account. Want to read in an anonymous way ? No chance, you need to be a verified user.

  • What is the ideal ‘Pay Per x’ ?

Pay per click ? Pay per day ? Pay per week ? The simpler the model, the more it creeps into the territory of a subscription and less of PAYG. Even the former – Pay per click – assumes that all content is equal within an edition, and assumes that an accurate value buying decision can be made upon the article abstract. If that action is going to cost me x, then I’ll be unlikely to tollorate inaccuracy in the data that trails what you wish me to purchase.

  • Ties users to a single platform for consumption.

I know many people hate the idea of describing journalist output as ‘content’ (reading the ever excellent John Naughton in this weekend’s ‘The Observer’ reminds me that the term is often pronounced through gritted teeth). However,  in the previous part of this rambling series, I mentioned how the ‘product’ has become too intertwined with the medium of consumption.

Historically this of course makes sense. There was the paper output. Just that. If you wanted to read the words, that was the only way you could do it. When the web first arrived it was seen as a by-product of the printed work; the printed word made electronic, but very much secondary to its established older sibling.

I do wonder that perhaps, once the industry starts to introduce (or re-introduce) a charging model, whether there will be a reluctance to be similarly tied down to a single method of reading. If I’m paying, why cannot I read that content in paper, web, mobile, e-reader and most vitally, a combination of all of the above ? If I buy the paper in the morning, shouldn’t that also allow me to read that same content on the web over my lunch ? Is this a simple code printed on the paper that allows access to that day’s electronic version ? If so, then surely the reverse must be the case for those who made PAYG access to the electronic version to claim a free paper version ?

In Naughton’s column, he states that the industry must ‘…must turn Fleet Street into Quality Street…’ if people are to pay for content. This is what life is like if you decide to raise the Velvet Rope. You create an expectation that beyond it, a promised land of previously unseen quality awaits. Looking back up at the idea of shelf-life, it is the analysis, the real unique journalistic content that defines the difference between something which will be and will be not worthy of peoples cash.

If inside it’s the same old establishment with a lick of pain and a few fancy light fittings, then people will not hang around long. The pub down the street might not look as fancy, but the beer tastes just the same.

Next Time : Enough of the thoery, how could this actually work ? And how as consumers are we going to work with it ? Keep your mobile phone handy, you’re going to need it.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com]





PubBiz – Charging Ahead

13 11 2009

Whilst I was sat on the train last Sunday – heading off on my travels again – I did what I do every weekend and read a newspaper. The weekends are generally the only time I can guarantee to have the time to read the physical product, something I imagine which is recognised in the nature of the extended weekend editions.

Usually this is a ritual conducted on the sofa with a cup of coffee and a few Hob Nobs (plain chocolate only, please), but even though travel interrupted my normal way of enjoying the process, I tried my best to keep to the routine (coffee in paper cup and KitKat, papers spread across two seats).

As we’re in the last few weeks of the first decade of this century, we’re already beginning to see articles looking to try and see how we can sum up the most portentous elements of the past ten years. My weekend paper contained something trying to do just this for technology. What was the technological breakthrough of the decade ? 10 responses, virtually all name checked iPhone.

Whilst I grumbled to myself about this homogeneity of response (and the fact all these articles only ever seem to ask people who only have a collective memory of the last thing that they thought was ‘cool’), I realised that what I really wanted the question to be ‘What was the greatest technology enabler of the decade ?’. Yes, I realise that people would still say ‘iPhone’ (‘Need a review of the decade ? There’s an app for that’).

In the unlikely event of anyone asking me to answer my own question, I decided that I would answer ‘Pay As You Go’ mobile phones. Which it turns out would have been just as incorrect, as the model dates back to Eircom’s first bash at it in 1997. So, I’m nowhere near as clever as I thought I was. And actually, what I was suggesting was that the most exciting technology I could think of was in essence a pricing/billing model and not actually technology at all. So equally, I’m actually far more dull than I thought I was too.

The reason why I suspect that PAYG was in my mind at the time, was that I have increasingly come to the conclusion that the current cyclical debate about the future of publishing – and newspapers in particular – is not one of purpose or content, but rather consumption and payment. I don’t believe that there is any less appetite for journalism, any less desire from people for news or analysis, the critical issue is more about the business model for its provision. Or rather that there is still a deal of confusion between the method of producing ‘the product’ and the model for the distribution and consumption of that product.

So, why mobile phones ? In the UK, with a population of around 65m, we own an estimated 90m mobile phones. Going back just over a decade, the only way to get hold of a mobile phone, was to enter into a contract with a telecoms provider. For this you needed to be credit worthy (providing the usual 97 different forms of ID and at least one vital organ) which excluded a sizeable number of the population for all sorts of reasons. As a result, handsets and talktime were still prohibitively high, as the infrastructure costs were shared by a relatively small number of subscribers. Sure, the companies were profitable (go back and see the returns that Vodafone were delivering back then), but how could they extend the user base beyond the high value – predominately business – subscribers ?

PAYG was hugely successful. For a significant part of the early decade, it propped up many a high street chain and brought many new players into the market. A decade on, it is beginning to significantly undermine the fixed line business; indeed when we see 20MB+ speeds 2-5Mb delivered by ‘next gen’ mobile internet, the last USP for the fixed line – namely ADSL provision – might be gone forever. [Update - As a side topic, I might be overselling this somewhat]

The ‘All Or Nothing’ model, offering services only to those who could make an ongoing financial commitment, was a good opening gambit for the mobile industry. They were able to sign up pretty much every person who had to make calls on the move and could justify a commercial reason for doing so. The PAYG model opened up the market to a massive secondary market; those who wanted, rather than needed, to make mobile calls. It’s niche (business) vs mass market (consumer).

To me it is somewhat ironic that when we look at the current debate about online charging for newspaper content, those who propose a blanket ‘All Or Nothing’ (subscription) approach generally use as a success model, a specialist publisher – a financial newspaper or business magazine – to illustrate their point. And of course, within the narrow parameters of those examples, you can of course prove the point. The content that these publishers produce can support a long-term financial commitment from a user, as they are weighing this cost against the value that they can derive from the information they are being provided with.

Can the same really be said for mass-market publishers ?

Next Time : How practical are  PAYG systems for publishers ? What is the model for what content can be charged for ? And how can we deliver that content flexibly enough to meet consumer needs ?

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com]

Whilst I was sat on the train last Sunday – heading off on my travels again – I did what I do every weekendand read a newspaper. The weekends are generally the only time I can guarantee to have the time to read thephysical product, something I imagine which is recognised in the nature of the extended weekend editions.
Usually this is a ritual conducted on the sofa with a cup of coffee and a few Hob Nobs (plain chocolateonly, please), but even though travel interrupted my normal way of enjoying the process, I tried my best tokeep to the routine (coffee in paper cup and KitKat, papers spread across two seats).As we’re in the last few weeks of the first decade of this century, we’re already beginning to see articles

looking to try and see how we can sum up the most portentus elements of the past ten years. My weekend paper

contained something trying to do just this for technology. What was the technological breakthrough of the

decade ? 10 responses, virtually all name checked iPhone.

Whilst I gumbled to myself about this homogony of response (and the fact all these articles only ever seem

to ask people who only have a collective memory of the last thing that they thought was ‘cool’), I realised

that what I really wanted the question to be ‘What was the greatest technology enabler of the decade ?’.

Yes, I realise that people would still say ‘iPhone’ (‘Need a review of the decade ? There’s an app for

that’).

In the unlikely event of anyone asking me to answer my own question, I decided that I would answer ‘Pay As

You Go’ mobile phones. Which it turns out would have been just as incorrect, as the model dates back to

Eircom’s first bash at it in 1997. So, I’m nowhere near as clever as I thought I was. And actually, what I

was suggesting was that the most exciting technology I could think of was in essence a pricing/billing model

and not actually technology at all. So equally, I’m actually far more dull than I thought I was too.

The reason why I suspect that PAYG was in my mind at the time, was that I have increasingly come to the

conclusion that the current cyclical debate about the future of publishing – and newspapers in particular -

is not one of purpose or content, but rather consumption and payment. I don’t believe that there is any less

appetite for journalism, any less desire from people for news or analysis, the critical issue is more about

the business model for its provision. Or rather that there is still a deal of confusion between the method

of producing ‘the product’ and the model for the distribution and consumption of that product.

So, why mobile phones ? In the UK, with a population of around 65m, we own an estimated 90m mobile phones.

Going back just over a decade, the only way to get hold of a mobile phone, was to enter into a contract with

a telecoms provider. For this you needed to be credit worthy (providing the usual 97 different forms of ID

and at least one vital organ) which excluded a sizeable number of the population for all sorts of reasons.

As a result, handsets and talktime were still prohibitavely high, as the infrastructure costs were shared by

a relatively small number of subscribers. Sure, the companies were profitable (go back and see the returns

that Vodafone were delivering back then), but how could they extend the user base beyond the high value -

predomnatly business – subscribers ?

PAYG was hugely successful. For a significant part of the early decade, it propped up many a high street

chain and brought many new players into the market. A decade on, it is beginning to significantly undermine

the fixed line business; indeed when we see 20MB+ speeds delivered by ‘next gen’ mobile internet, the last

USP for the fixed line – namely ADSL provision – might be gone forever.

The ‘All Or Nothing’ model, offering services only to those who could make an ongoing financial commitment,

was a good opening gambit for the mobile industry. They were able to sign up pretty much every person who

had to make calls on the move, could justify a commercial reason for doing so. The PAYG model opened up the

market to a massive secondary market; those who wanted, rather than needed, to make mobile calls. It’s niche

(business) vs mass market (consumer).

To me it is somewhat ironic that when we look at the current debate about newspaper subscriptions, those who

propose a blanket ‘All Or Nothing’ approach generally use as a success model, a specialist publisher – a

financial newspaper or business magazine – to illustrate their point. And of course, within the narrow

parameters of those examples, you can of course prove the point. The content that these publishers produce

can support a long-term financial commitment from a user, as they are weighing this cost against the value

that they can derive from the information their are being provided.

Can the same really be said for mass-market publishers ?





PubBiz – The Phone(y) War

12 11 2009

Over the last few months I have been dragging my poor old, battered, suitcase all over Europe. As those of you who do a fair bit of business travel will attest, this is a far from glamorous activity, but there are benefits.

  • I now have a pretty good working knowledge of the (excellent) Vienna Metro system.
  • I now know that there’s nowhere better to eat lunch than in Barcelona with knowledgeable locals.
  • I am now clear that not being in the Euro is an utter, utter pain when you travel backwards and forwards into the Euro zone.
  • I now have final proof that I should never allow myself to be photographed without a stylist present.
  • You’ve not had to suffer my ramblings on this blog for a while. Until now.

Along the way I’ve been lucky enough to spend a great deal of time with newspaper and magazine publishers all over Europe and listen to loads of knowledgeable speakers whilst making copious notes about how I’m going to pass off their ideas as my own when I get the opportunity.

The poor souls visiting Ifra Beyond saw this on the welcome screen as they arrived. I can only apologise.

The poor souls visiting the Ifra 'Beyond...' conference in Barcelona saw this on the welcome screen as they arrived. I can only apologise.

It doesn’t seem to matter where you are, to whom you speak, there are the same topic ‘de jour’ in almost every organisation; ‘How can we build readership, whilst still deriving a direct revenue benefit ?’ Almost as an after thought, there is ‘How can we do all that whilst still delivering multi-channel ?’.

The challenge – and irony given the industry we’re in – is that this is a debate where there are no facts. No truly compelling case-histories, just conjecture though-through to various levels of completeness. Nobody can claim any degree of victory yet, the spoils are yet to be divided, but already the casualties are mounting. Time is not on our side.

So I can claim my stake in that world of partially thought-through conjecture (free from facts but full of opinion), over the next 2 posts I’ll be discussing these issues, namely; charging and distribution models for publishing companies.

In this (short) series I’ll be discussing some topic that I at least think are pretty key, but seem to have been largely overlooked in the debate thus far;

  • How can we bring some subtly to the idea of content paywalls ?
  • Can we really consider all content to be equal when we look to charge for it ?
  • How can we make this all work within consumption models that are already well-established and familiar to readers ?

Y’know, some of the answers might just lurk in the pockets of just about every one of us.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com]





Livin’IT – The ‘Retrophone’ Experiment

25 07 2009

Returning to your favourite things is potentially a dangerous game to play.

Speaking as someone with a  obsession with collecting vinyl records, I’m often strangely compelled to dig out something from the racks that I’ve not played in years and give it a spin. Sometimes this provides proof of the quality of your recollection (Cookie Crew’s ‘Got To Keep On’ is still a fine tune), on other occasions reminds you that your subsequent experiences have rendered your memory inaccurate in the extreme (Overlord X’s ‘14 Days In May’ is not the masterpiece that I remembered it as being, lyrically worthy as it still is).

The last couple of months I’ve been travelling all over the UK (and touching down a couple of times in Southern Europe too) with work and my trusty HTC Tytn II Windows Mobile smartphone has done me proud. It might not exactly be cutting edge anymore – a feeling I know only too well – but despite the battering it has taken over the last two years, it’s remained about the most trusted piece of hardware I own.

We’ve been adjusting our corporate phone contracts recently, something that will mean that I would have to separate my personal number from a (new) work number, meaning that I would need a new phone for one of these two numbers. Now, aside from the fact that I have never understood why twin SIM handsets never became a briefly glimpsed niche product (as that would be ideal), I started to look at the alternatives.

I knew I needed my new work phone to do everything that my Tytn II could do as a bare minimum, and whilst browsing through the handsets available on our new network it occurred to me that there really wasn’t anything there which adding anything new to the party.

Going back a few years, the first MS Mobile phones I used suggested the adage ‘…as a phone, they make a good PDA…’ and subsequent Blackberries only confirmed that experience. Today, things are somewhat better, MS Mobile is very usable and iPhone OS is gradually bridging the professional/consumer smartphone market with every release. Android looks like it will develop into something interesting, but current support for Exchange seems limited right now to 3rd party apps (like Touchdown) and that is a primary requirement for any work device in my current job. In short – despite some interesting upcoming HTC handsets – there seemed no reason to migrate to anything new. So, unlock the Tytn II and swap the SIM. Problem solved…. sort of.

Now, I had the reverse issue. The existing SIM had to live somewhere. I’ve had the same mobile number since 1998 (and my original Motorola brick complete with a mighty 15 mins of talk time per month) and it is still the primary way in which people know they can reach me. After a similar browse through the consumer end of the market for something appropriate (most of which do seem to be hybrid MP3 players or cameras first, phones second), I decided to do something radical. Or rather, not radical, but regressive.  I’d take a step back and remove myself from the arms race. I’d go ‘Retrophone’.

So, on the first day of my two-week break from work, I put on my circa ‘99 pair of Levis ‘Engineered’ jeans and Adidas ‘Stan Smith Comfort’ (both of which had seen better days) and went to the post office to reunite myself with an old member of the family. One who’s birth date matches the era of those (now battered) items of clothing.

The Ericsson T28 'Retrophone' charging on the kitchen worktop.

The Ericsson T28 'Retrophone' charging on the kitchen worktop.

Y’see, recalling that the aforementioned Motorola was a complete dog of a device, I went for my 2nd ever handset, the mighty Ericsson T28. Actually, mighty isn’t exactly accurate, given its diminutive dimensions and weight (only 81g complete with the onboard Lithium Polymer battery, a first for a mass market device back then). A few minutes on eBay got me a reconditioned example, complete with a charger and two batteries for a whole £15 delivered.

Aside from the fact it took 48 hours for the package to arrive from Hong Kong to my local Royal Mail office, who then lost it for 3 weeks, the first thing that struck me was how little there was to the package. None of the ephemera that you get in a modern phone box (CDs, cables, headphones….), just a charger. You could at the time of launch buy a serial cable for attaching it to your PC, but in the spirit of ‘keeping it retro’, I didn’t have one first time around and I wasn’t going to have one this time either.

My experiment was going to be a simple one. Could I cope without all the smartphone functionality ? OK, I still would have all that for my work device, but this was a consumer test. How much of the smart stuff on the HTC would I miss paring my mobile computing down to the bare minimum ? No mobile web, no GPS, no keyboard or predicative texting, no desktop syncing.

5 days in and so far there are a few things that are immediately glaringly obvious;

  • I’d forgotten what a pain that little aerial was. Literally. In the aforementioned jeans (so far only trousers tested), I now recall the unerring ability for the aerial to lodge itself in my groin on each occasion I sit down with the phone in my pocket. I do appreciate why I had removed that from my consciousness. Minus 1 for ‘Retrophone’.
  • The battery life is spectacular. Even using the (obviously fake) battery supplied on a first charge, the ‘Retrophone’ was still well and truly alive after 5 days. Now, I know it’s not actually doing very much in comparison to that in the Tytn II which does tend to exhaust itself within 12 hours, but still….. Plus 1 for ‘Retrophone’.
  • Texting is not easy. This is a device from before the days of predicative texting (which in itself is still a consistent partial fail), so you are back to the days of hitting ‘2′ once for ‘A’ or ‘9′ four times for ‘Z’ (plus * if you want to change case). Additionally, you’re also dealing with single-deck messages limited at 160 characters…. so if someone sends you a multi-deck message, you get two separate messages that don’t always arrive in order. Mind you, I have managed to recall some of the limited texting skills I had back in the day and whilst I miss my soft keyboard, currently …. A score draw.

I’ve still got a full week of holiday before I return to work, so I’m hoping to shakedown any more obvious flaws in the next few days. Once we’re back into the working cycle, we’ll see how well ‘Retrophone’ copes alongside the Tytn II in the daily grind.

Stay tuned.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com]. He promises to keep mentions of his groin to the bare minimum in future. Apologies.





Orgdata – The Attributes of Football

15 06 2009

There are many things that define us as people, over which we have no control. Some of these are obviously decided at a genetic level; colour of your eyes, skin tone…. the fact that I had dead straight hair until I was about 15 and then it went inextricably super-curly almost over night. We’re just born this way and there is no way to fight it. Even with ‘Frizz Ease’.

For many of us, the same goes for sporting allegiance.

As soon as I was old enough to be able to pick out colour and shape, It was pretty clear what life had in store for me. I was dressed in red and white stripes and I was from that moment, I was to be a Saints fan. Getting on for 40 years later and save for a single fleeting day of glory day back in the mid-70s that I can barely remember, I can can almost count my football genes as unsuccessful as my skintone genes (10 minutes outside in a sunny day and I’m generally sporting burns of the same palette as one of those aforementioned stripes).

Right now in the UK, we’re supposed to be in the football ‘Close Season’. All the leagues are done for another year, silverware distributed and players off rapidly gaining weight on their summer holidays whilst keeping half an ear for the mobile call from their agent to alert them to a pre-season move elsewhere.

The reality is of course is that as far as News goes, there is no such thing as a football close season. Arriving back at a London mainline station one afternoon this week to begin the final leg of my journey back to the coast, the giant TV News screen screamed the headline ‘£80m!’, the fee agreed for Cristiano Ronaldo’s transfer from Manchester United to Real Madrid. This, just days after the same buyers had agreed to pay A.C Milan £56m for the services of Kaka.

The next day, I watched a solitary Saints player walking to the ground – at the end of my street – for his preseason fitness tests. Tests which, unless something concrete changes in the next few weeks, might be somewhat redundant. Saints were forced into financial administration at the tail end of last season, an event triggered initially by exceeding an banking overdraft facility by £5k.

There are times when I forget how dense the information is surrounding specialist areas of knowledge and football is a perfect example. Growing up as a kid, we used to collect the the football stickers produced by the Italian company ‘Panini’ and try and get complete collections of the players, grounds and badges of all the top teams stuck into our albums. Of course in the process you gained an increasingly detailed and somewhat arcane knowledge of subject…. even now I don’t have to even think about these things, so ingrained are they in my consciousness.

Sometimes you’ll overhear a football conversation on a train. Someone will mention ‘City’. I can’t help myself wondering which ‘City’ they are talking about. Manchester ? Leicester ? Norwich ? Then you’ll hear something else that helps ‘… down at Dean Court…’. Ahh, ok so they’ve been to A.F.C. Bournemouth, which makes them much more likely to be Lincoln City fans. Or Chester City, Exeter City…. and it’s not until you turn to see them and see Maroon & Gold and know right away that it was actually Bradford City after all. They were Bantams*.

Last time, I talked about Geodata, adding descriptive information specifically related to ‘Places’ that might be found in text (for example geographic coordinates) and some of the opportunities that present themselves when you mix them cleverly into the user experience.

These additional bits of information we can call ‘Attributes’. For example, the city of Southampton, might look like this when described at a geographical level;

Southampton
<Latitude = “N 50° 54′ 0””>
<Longitude = “W 1° 24′ 0””>

That information in itself is enough to plot it onto a mapping application. However, there’s obviously more that we can add. For example, population.

Southampton
<Latitude = “N 50° 54′ 0””>
<Longitude = “W 1° 24′ 0””>
<Population = “246,201″>

Now we can plot this onto a map and also weight the the location pin by the size of city. Of course those vested in the subject will correctly recognise the rather facile nature of the above example and rightly point out that it is massively over simplified. Mapping information is something that is so well covered across the globe (for example in repositories like Geonames and by organisations like OS in the UK), that maintaining this sort of detailed data at a local level (‘Curating’) is just not necessary.

In his recent exemplary article, my colleague Chris Scott posted the question ‘Semantic Web ? What’s in it for me ?’ and whilst I don’t intend to retread what he describes in great detail, there is much in there that will help us here, as we’re beginning to make the journey towards the world of ‘Linked Data’.

What ‘Linked Data’ is beginning to do to a greater or lesser degree is to almost commoditise very high-level generic factual knowledge. Any of us can hook-up applications to ‘The Cloud’ and get hold of ‘attribute’ information which will helps us improve the user experience of our sites. All we need to do is to hold the linkage between us and it, the ‘Uniform Resource Indicator’ (‘URI’) and we can call the data whenever we need it.

However, publishers hold another very precious thing within their organisations; their own specialist information, their own highly valuable ‘Knowledgebase’. For example, what does the average UK newspaper hold in terms of specialist data on football ? Far more than exists currently within recognised Cloud resources for sure.

Looking back at the early paragraphs of this post, it is packed with footballing information, both ‘entities’ (in this case ‘People’ and ‘Organisations’), but also what we could refer to data being ‘attribute’ data of entities themselves.

I’m a Saints fan. In that case ‘The Saints’ could be said to be an ‘Nickname’ attribute for the entity ‘Southampton Football Club’, the same way as ‘The Bantams’ is of  ‘Bradford City Football Club’. When we can categorise an article as being about ‘Football/England’ and we identify ‘Saints’ as a term within the text, we are able to use the presence of that term to be suggest that it is also about the parent term, even if that is not actually present in the text directly. This collection of terms, we can call ‘Orgdata’ and something like club nicknames is barely scratching the surface of the attribute data that can be described against an entity like a football club.

This presents publishers with both choices and opportunities.

By locally curating their own knowledge – and adding their own specialist terms and attributes to the extraction, normalisation and knowledge management capabilities of Text Mining tools such as Nstein’s TME – they are able to additional flavour and richness to that data when they present it to web users. This not only helps build the overall user experience, but also helps enrich the actual content itself, greatly assisting the ability to package data for example, for syndication.

The opportunities of course do not end there. For many areas of specialist knowledge, there is not right now a ‘de facto’ trusted source, no Geonames or IMDB to refer to.

In the Semantic Web where Linked Data is an essential component part, is being that trusted source the next critical step towards making content pay its keep ?

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com].

(* Yes, I realise you’d probably be able to disambiguate in that case by accent. Assuming you’re good at that sort of thing, natch)





Geodata – The Properties Of Property

7 06 2009

Every so often, as regular as the arrival of yet another UK non-Summer, comes news of another apparent government IT project ‘failure’.

Much in the same way as people seem to be greeting the current travails of the Newspaper industry with some sadistic relish, news that a few million has been spiffed on a grand project that has failed to do what it was supposed to, delivers a similar public response.

Everybody feigns surprise at the news at first and then has a good old moan about how all technology is useless, misuse of public funds and modern life being rubbish. And then promptly forgets about it until the next time it happens.

The first government IT project in this country was not exactly a roaring success either. The government saw a fantastic demo given by an enthusiastic entrepreneur, bought into the panacea described and dropped a fair wodge of public cash with a start-up, who promptly burned through the cash in a few months and delivered absolutely nothing back in return.

Another take of modern life being rubbish ? Not really. This was 1822. The entrepreneur ? Charles Babbage. The product ? The Difference Engine.

When Microsoft by-lined Bing as ‘The Decision Engine’, I winced slightly with Babbage in mind. Last weekend, before is was formally available, I wrote of my hope that we might finally see a proper search ‘product’ that utilised some of the best practice you can find in the vertical search market. Then it launched. My shoulders slumped after a few minutes of play and I went back to hoping again.

‘Google Squared’ – another beta arrival from the Google’s big house of endless betas – popped up in the second half of the week and was at least entertaining in its inability to handle fairly basic complex linguistic searches (especially if you add a geographical element to the search query).

This weeks crop of betas has not been all uninspiring though. From the folks at MySociety (and paid for by Channel 4’s 4ip investment fund) comes ‘Mapumental‘. Whilst it’s in invite-only beta right now, there is an introductory video to watch whilst you wait for an invitation to run up in your inbox.

The principle is pretty simple. Let’s say you’re considering a new job, simply enter the postcode of the location you will be working at (UK only I’m afraid) and by selecting a map location, it will tell you the estimated journey time by public transport (using public data).

The map itself can be manipulated in a couple of other ways. By setting a maximum time that you are prepared to drag yourself out of bed in the morning, the map will show you the places that you can realistically live and still be at your desk by 9am. You can then set a property value – based on the average sale price, itself public data again – and the map will further refine the areas showing you where you can afford to live.

Finally, to gild the lily further, you can set a final control called ‘Scenicness’, which can be set to show only those places remaining that have been scored as being having varying degrees of ‘prettiness’ (via the MySociety’s photo scoring site ‘Scenic’). Ok, this bit isn’t especially scientific, especially when the places voted as being most scenic tend to be those without actual houses or places of work. Still, nice idea in principle.

The tool itself is a great deal of fun to play with and the results are a useful guide… however, if MySociety are suggesting that it is possible to commute into Central Southampton from Sandown, IOW in 2 hours by public transport, then I suggest that they’ve never attempted that journey (train, ferry, train) for real. However good the tool is, it’s still relying on the quality of the data that supplies the abstraction.

Given our national obsession with property and property prices, it is no wonder that  some of the best vertical search tools of this type come from the UK. Looking at Globrix – a UK-based property search tool – is a good example of how using the detailed data that can be extracted from text can provide a high quality of user experience.

In my recent search trilogy I discussed the idea of ‘Aboutness’, basically the understanding of what information a piece of content contains, and how we can use that information (‘Metadata’) to drive the sort of user experiences that keep people on our sites longer. Globrix uses some of these ideas – for instance finding linguistic  ‘concepts’ within the property detail text – and allowing those to be used to refine search. Location information via Postcode drives a map abstraction of these results produced in real time. It’s an impressive effort.

Obviously this sort of mapping and the use of ‘geodata’ is a key for organisations for whom property is their key business, but how can other content-heavy organisations like Newspapers use some of this technology to help develop their own user experience ?

News content is often heavy with geographical information. Many of the most persuasive ‘keywords’ – those terms we use intuitively to make our split-second decisions on whether to read or ignore – are those that tell us how ‘close’ this content is to us. This is not the just high level terms like country, but more distinct…. city, town, district, even landmark.

As previously discussed, manually adding relevant tags to content quickly becomes difficult when you are dealing with large scale operations. Add into that the requirements of geographic information and the task becomes even more daunting. What are these requirements ?

Well, with a normal tag, it is just the descriptive term itself that might be required (e.g. simple : ‘politics’ or complex : ‘council elections’). If we want to start utilising geographical tag data, then this this needs to be far more distinct, especially if we want to start tying this together with mapping applications. Aside from the ‘disambiguation’ that we discussed before with reference to ‘People’ (but is equally valid with ‘Places’ as placenames are far from unique) to be able to map the stories we need to know where they actually are, with a great deal of precision.

In Nstein’s Text Mining Engine (TME), we automatically apply additional geodata to those ‘Places’ that TME automatically detects within text. This geodata is supplied in two forms:

- Co-ordinates : The traditional longitude/latitude ‘Sexagesimal’ information.
- WGS-84 : The same data as your car’s GPS / Sat Nav uses to build maps and that used by almost all online mapping applications.

Adding this information turns your ‘tags’ into ‘geotags‘. Once you have this information, you can display your content not only by subject maps (like our previously discussed ‘Topic Pages’), but geographical maps. For a demonstration using Nstein’s WCM product, we built a a simple widget using Google Maps, which showed editorial staff at a glance the geographical ‘Aboutness’ spread of their content, using the geotags generated by TME, displayed on a Google Maps globe.

Google Maps of course is just a start. For example if Google Latitude starts to get real user adoption  – the upcoming iPhone version will surely help that process – then ‘News content about where I am right now’ will be a viable option for mobile users. Ally this with (user opt-in) advertising content…. and there are some interesting applications on the horizon.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com].

Every so often, as regular as the arrival of the UK non-Summer, come news of another

supposed government IT project failure.

Much in the same way as people seem to be greeting the current travails of the Newspaper

industry with some sadistic relish, news that a few million has been spiffed on a grand

project that has failed to do what it was supposed to, delivers a similar public response.

Everybody feigns suprise at the news at first and then has a good old moan about how all

technology is useless, misuse of public funds and modern life being rubbish. And then

promptly forgets about it until the next time it happens.

The first government IT project in this country was not exactly a roaring success either.

The government saw a fantastic demo given by an enthuiastic entrapeneur, bought into the

panacea described and dropped a fair wodge of public cash with a start-up, who promptly

burned through the cash in a few months and delivered absolutely nothing back in return.

Another take of modern life being rubbish ? Not really. This was 1822. The entrepraneur ?

Charles Babbage. The product ? The Difference Engine.

When Microsoft bylined Bing as ‘The Decision Engine’, I winced slightly with Babbage in

mind. Last weekend, before is was formally available, I wrote of my hope that we might

finally see a proper search ‘product’ that utlised some of the best practise you can find

in the vertical search market. Then it launched. My shoulders slumped after a few minutes

of play and I went back to hoping again.

‘Google Squared’ – another beta arrival from the Google’s big house of endless betas -

popped up in the second half of the week and was at least entertaining in its inability to

handle fairly basic complex linguistic searches (especially if you add a geographical

element to the search query).

This weeks crop of betas has not been all uninspiring though. From the folks at MySociety

(and paid for by Channel 4’s 4i investment fund) comes ‘Mapumental’. Whilst it’s in

invite-only beta right now, there is an introductory video to watch whilst you wait for an

invitation to run up in your inbox.

The principle is pretty simple. Let’s say you’re considering a new job, simply enter the

postcode of the location you will be working at (UK only I’m afraid) and by selecting a

map location, it will tell you the estimated journey time by public transport (using

public data).

The map itself can be manipulated in a couple of other ways. By setting a maxiumum time

that you are prepared to drag yourself out of bed in the morning, the map will show you

the places that you can realistically live and still be at your desk by 9am. You can then

set a property value – based on the average sale price, itself public data again – and the

map will further refine the areas showing you where you can afford to live.

Finally, to gild the lily further, you can set a final control called ‘Scenicness’, which

can be set to show only those places remaining that have been scored as being having

varying degress of ‘prettiness’ (via the MySociety’s photo scoring site ‘Scenic’). Ok,

this bit isn’t especially scientific, especially when the places voted as being most

scenic tend to be those without actual houses or places of work. Still, nice idea in

principle.

The tool itself is a great deal of fun to play with and the results are a useful guide…

however, if MySociety are suggesting that it is possible to commute into Central

Southampton from Sandown, IOW in 2 hours by public transport, then I suggest that they’ve

never attempted that journey (train, ferry, train) for real. However good the tool is,

it’s still relying on the quality of the data that supplies the abstraction.

Given our national obessesion with property and property prices, it is no wonder that

some of the best vertical search tools of this type come from the UK. Looking at Globrix -

a UK-based property search tool – is a good example of how using the detailed data that

can be extracted from text can provide a high quality of user experience.

In my recent search trilogy I discussed the idea of ‘Aboutness’, basically the

understanding of what information a piece of content contains, and how we can use that

information (‘Metadata’) to drive the sort of user experiences that keep people on our

sites longer. Globrix uses some of these ideas – for instance finding linguistic

‘concepts’ within the property detail text – and allowing those to be used to refine

search. Location information via Postcode drives a map abstraction of these results

produced in real time. It’s an impressive effort.

Obviously this sort of mapping and the use of ‘geodata’ is a key for organisations for

whom property is their key business, but how can other content-heavy organisations like

Newspapers use some of this technology to help develop their own user experience ?

News content is often heavy with geographical information. Many of the most persuasive

‘keywords’ – those terms we use intuatively to make our split-second decisions on whether

to read or ignore – are those that tell us how ‘close’ this content is to us. This is not

the just high level terms like country, but more distinct…. city, town, district, even

landmark.

As previously discussed, manually adding relevant tags to content quickly becomes

difficult when you are dealing with large scale operations. Add into that the requirements

of geographic information and the task becomes even more daunting. What are these

requirements ?

Well, with a simple tag, it is just the descriptive term itself that might be required

(e.g. simple : ‘politics’ or complex : ‘council elections’). If we want to start utilising

geographical data, then this this needs to be far more distinct, especially if we want to

start adding this into mapping applications. Aside from the ‘disambiguation’ that we

discussed before with reference to ‘People’ (but is equally valid with ‘Places’ as

placenames are far from unique) to be able to map the stories we need to know where they

actually are.

In Nstein’s Text Mining Engine (TME), we automatically apply additional geodata to those

‘Places’ that TME automatically detects within text. This geodata is supplied in two

forms:

- Co-ordinates : The traditional longitude/lattitude ‘Sexagesimal’ information.
- WGS-84 : The same data as your car’s GPS / Sat Nav uses to build maps.

Adding this information turns your ‘tags’ into ‘geotags’. Once you have this information,

you can display your content not only by subject maps (like our previously discussed

‘Topic Pages’), but geographical maps. For a demonstration using Nstein’s WCM product, we

built a a simple widget using Google Maps, which showed editorial staff at a glance the

geographical ‘Aboutness’ of their content, using the geotags generated by TME.

Google Maps of course is just a start. For example If Google Lattitude starts to get real

user adoptions  – the upcoming iPhone version will surely help that process – then ‘News

content about where I am right now’ will be a viable option for mobile users. Ally this

with (user opt-in) advertising content…. and there are some interesting applications on

the horizon.





Livin’IT – Bing There, Done That

30 05 2009

15 years ago I bought an LP.

It was one that I had been really looking forward to being released. The two singles that had preceded it had whetted my appetite and having devoured the full-length release just once, I immediately phoned everybody I knew to demand that they too go out and buy it. A few of us traveled the length of the country a few months later to hear it live and for the first and only time in my life, I cried at a gig due to its quality.

Of course what happened in the year or so after my purchase, was that gradually it was everywhere (every shop, every TV show) closely followed by a small army of similar closely modeled products, each facsimile filtering out and diluting many of the elements that made the original the artifact what it was. And indeed, still is.

This morning I dug out the LP for its annual single airing, and whilst it spun on the Soundburger, I sat down to catch up on the week’s industry news. Primary amongst this was of course Microsoft’s bi-annual attempt to do mass-market search, ‘Bing’.

Now aside from the curious naming decision, there is plenty in this that many of us in the industry will find strikingly familiar. Indeed, Paul Miller’s blog post sums it up all very neatly; this is technology which is already out there and well proven, and I suppose explains my initial ‘meh’ reaction to the announcement. But in that lies the explanation; ‘…us in the industry…’.

Just because we are familiar with faceted-search, result clustering, semantic keyword analysis et al, doesn’t mean that that anyone outside our little cosseted gang is. ‘Bing’ is supposed to be a mass-market tool, in a way that something like Clusty will never be and Newssift was never intended to be. And as for Wolfram Alpha… that was the answer to a question that nobody actually asked.

So, it’s not about the technology per se in this respect, but the packaging. A good case in point would be Apple.

Now, I’ve been accused in the past of being somewhat anti-Apple. For the record, it’s not the case, I guess I just don’t buy into the Applecult. Whilst I do not see Apple as a technology company at the bleeding edge, I do see them as a hugely admirable and successful product company. What do I mean by this ?

What are Apple held in high regard for ? Macs, iPod & iPhone. None of these were conceptual developments that came from within Apple. I mean they did not invent the idea of the personal computer with a GUI (Xerox would be a better bet – check this at 4:20 for what I’m referring to). They did not invent the MP3 player and they did not invent the mobile telephone.

Now the last one is more complicated. iPhone is not just a mobile ‘phone, but a smartphone. I’ve had an increasingly improving experience as a Windows Mobile user for the last 5 years and Symbian has been around a tad longer than that.

Again, iPhone is not just a smartphone, but an application platform. But that again was not their concept, Nokia got their first by a fairly long chalk. Ok, they screwed up their market advantage with a series of baffling ideas (segmenting the OS to device releases is just one of the boneheaded decisions) and are now dropping marketshare every quarter.

What Apple have done is to package technology better than everyone else. They’ve made products that the mass-market can use out-of-the-box and where there have been shortcomings in the products (it’ll be the 3rd iteration of the OS before iPhone can do what my HTC TyTn II could do 2 years ago), they’ve been almost of background importance to the functioning of the product in the eyes of the consumer.

They work, they look good and they are easy to use. How could I fail to appreciate that ?

With ‘Bing’ the key will not be whether the technology is unique. Not whether you can argue that it has been done elsewhere before. What will be important is whether the best practice demonstrated by a range of small-scale and niche players can be packaged right for the mass-market, but not over-diluted. Whether the execution is perfect.

In short, to suceed ‘Bing’ must not feel like using technology. It must instead feel like using a product.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com].





Linking – Is All Similarity The Same?

20 05 2009

Today I was lucky enough be able to speak on the second day of the 2009 ePublishing Innovation Forum in London, presenting; ‘Is All Similarity The Same? How Context Drives Revenue and Brand Loyalty’.

Now whilst you can cover a fair bit in a 15 minute talk, there are some areas where naturally you have to gloss over a fair bit of the potential detail. And indeed, there might of those of you out there who would have liked to have heard the presentation but for reasons of geography/time/money/lethagy (delete as applicable) were not able to.

So, in order to enlarge on the talk itself and to open the discussion to anyone who is interested, here is a slight redux. A ‘Directors Cut’ if you like (but not like the ‘Directors Cut’ of ‘Cinema Paradiso’ where the ending got totally screwed, or a George Lucas one where it’s sort of the same but with extra CGI and Greedo shooting first).

It is probably worth mentioning at this point that as a primer, I have recently written a 3 part series on user search modes. In that series I touch on a number of areas which are complimentary to this piece and it goes without saying that I’d recommend you read that too if you have time.

Right now, there is a huge ongoing debate about what sort of charging model publishers should use to try and derive cash from their content. Thus far, the overriding model has been a free / advertising supported model, where the building of an mass audience has outweighed any real thought of creating mass revenues.

The collapse of the advertising market, both in terms of quantity of potential clients and the rates at which they are charged, has seen a rapid shift towards re-addressing charging models for this content. Some – perhaps those with more specialist data – have well-established models for carrying this out, but now it’s the mainstream news publishers who are looking seriously at following suit.

Ultimately however, regardless of the charging models themselves, the same challenges exist. I might be accused of over-simplifying here, but to me these broadly are;

- Reader Aquisition
- Reader Renention

In fact they are no different for news publications when we look at their paper-versions, and online they don’t really differ whether you are applying a charging model for readers or not (just replace ‘Reader’ with ‘Subscriber’). As we’re talking about online here, I’m just going to refer to ‘Visitors’ as a generic term. So ;

We want new visitors. And we’d like them to come back.

Traditionally, the area of visitor acquisition has been the domain of Search Engine Optimisation (SEO). This being the method of luring users in through the ’side door’ (from places like Google) directly into pages that match their searches. Doing this well has a proven success rate and as a result there is a myriad of sources out there to read up on and learn best practise.

Where I am going to focus on is one of the most important ways of building visitor retention. That of automatically providing content similar and relevent to that they are already reading.

Now, this sort of functionality is not exactly new. Look at an ‘article page’ on any news site and you’ll see this in the accompnaying side bar, usually called ‘Related Stories’. Some of these are generated by our old friends the search engines, some from specialist tools, but many are hand-cranked by editorial staff during the creation-cycle.

However they are created, they are important to our goal of retention, because they help us to show the depth of our knowledge, the gravitas of our brand and crutially, that we understand the requirements of the visitor. For the free charging model, they help as additional clicks to the visit (helping build our advertising charging model), with a charging model they also assist in demonstrating the ‘Fitness For Fee’. The more a visitor consumes, them more likely they are to see value in continuing to subscribe.

In the search series, it was suggested that the key to meeting the various modes discussed was real understanding of the ‘Aboutness’ of content. You’ll not be suprised that it is also key here. However, whilst before we really only touched on basic tagging of ‘Entities’ (People, Places & Organisations) and ‘Concepts’ (single and multi word descriptive text) we now have to add to that, ‘Relevance’.

Knowing that say ‘Girls Aloud’ appears as an entity in our text is one thing. What now becomes more important is how relevant they are to the overall subject matter of the article itself. The more relevant they are, the more likely they are to matching another piece of content on the same subject.

With Nstein’s Text Mining Engine (TME), mathematical scores are added to each automatically generated entity/concept, suggesting how relevant they are to the overall content item (e.g. ‘Girls Aloud’ are 83% relevant to this article).

Of course, this is something that you could consider performing manually. Again, with small collections of content, where the same manual tagger scores all items, this is even achievable. However, it is vital to have a consistent scoring mechanism – the same baseline methodology for all the mathematical results – and this is almost impossible for a human being to do alone.

A single article might have entity/concept lists that run into high double figures for each. Scale that out across the tens of thousands of articles that a modest content heavy organisation produces annually and the size of the task becomes clear.

When we apply this sort of tagging methodology to collections of content – for example all articles inside a news content database – we’re actually creating something that we can refer to as a ‘Knowledgebase’. This being a repository not only of content, but also knowledge about what that content is about, the ‘Aboutness’ being described in that tagging ‘metadata’ for each document.

This Knowledgebase is also potentially highly interconnected. The relative ‘Aboutness’ of each document can be calculated against each and every other document by the use of this metadata. At the heart of the relative strengths of these connections lies their ‘Base Similarity’. Now these connections are pretty complex things, if we look at the fact each item of content might have 50 separate elements within its metadata, each with a different level of revevance, the connections between each are individually multi-faceted.

The Knowledebase itself is of course not static, but rather a living and breathing organism. New content is likely to be added on a constant drip basis, with each new item creating a disturbance to the existing connection calculations. It shouldn’t be any surprise why we employ really bright mathematicians to bring order to these conditions within our products.

Now, ‘Base Similarity’ is a wonderful thing. We can take our repository of content and using the tagging that we can apply to each, create a ‘Knowledgebase’ rich with multi-dimensional relevancy links between each item. Best still, as we can use a constant automated method to do this, there would be no subjectivity in how these linkages would be created or maintained (or arguably, a consistent level of very low-level subjectivity in the calculations). Best of all, we can provide highly accurate ‘Similar Stories’ result sets to accompany articles.

I’m lucky in my job that I get to hang out with bright people. Not only the aformentoned Nstein mathematicians (who have to explain things to me very slowly so I can keep up), but also within our customers in the publishing industry. Spending time with these people gives the slower-learners like me the chance to absorb their interesting ideas, re-phrase them and pretend that they were actually my idea in the first place, stealing the credit and learn from their best practices.

Talking to a few of these people late last year, gave me an interesting insight into how  ‘Similar Items’ could be better modeled for the online publishing world, especially for newspapers. It’s not that the above described methodology is wrong, far from it, but that it is a baseline. A starting point on which you can then build something far more interesting.

As we have discussed, the similarity calculations that we have are very low in subjectivity. Trouble is that our visitors are not. They bring their own context to their judgement of similarity and it is important to reflect this in the experience we provide them with.

For example, I’m viewing an article in the ‘Travel’ section of a site. The place I’m reading about has been visited by a few celebrities over the years. Now it maybe that the ‘Base Similarity’ calculation for similar items at this point would correctly produce articles that reference these celebrities, but is this would the visitor would expect ? After all, they are reading the ‘Travel’ section. We know this. Shouldn’t we favour similarity calculations upon ‘Places’ rather than ‘People’ ? In essence, the context in both what and where a visitor is reading content should carry weight the similarity calculation.

‘Base Similarity’ also treats all content as being equal. It is part of that important objectivity that we want from the creation of the metadata and the linkages. However, for news organisations who are attempting to acquire and retain visitor numbers, all content is most certainly not of the same value.

Today’s football scores for example, appear in a variety of sources, you can find them almost everywhere and as such they are really commodity content. Today’s exclusive new column from the well-respected and popular football writer is a high value unique property. We want to tip visitors towards our highest value content where we can, and that is another weighting that should be able to be applied to the similarity calculation in certain circumstances.

The conclusion to this might be judged as a bit backwards. I started by discussing that similarity needs to be based around detailed objective scoring and then seemingly have contradicted myself by saying that judging similarity purely objectively might not be enough.

In fact turning your content repository into a knowledgebase is the critical starting point for any automated similarity solution. To be able to apply any editorial imperatives to these calculations – for example adding subject category context as I described with ‘Travel’ – still relies on the objective scoring as a starting point. The weighting of the results is a secondary action – a filter – that can be applied dynamically defendant on the house-rules for an organisation. Critically, these filters rules can then be rapidly shaped and tuned by the online editorial staff on-the-fly, as they are not part of the core methodology of how the complex linkages are calculated in the knowedgebase.

In short, they don’t need to be mathematicians. Because we already have them.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com].





Livin’IT – Where There’s A Will, There’s A Weg

15 05 2009

I have an almost name sake and you’re reading this blog via his baby. When we’re old men, I suspect our lives will be pretty different. I’ll be the bloke in the park, feeding the ducks and waving his walking stick as the kids roar past on their hover boards shouting….

‘I fought in the browser wars so you could ride those damn things’

…whilst my almost-namesake will probably be enjoying his retirement by traversing the globe in his luxury Enourmoyacht. Having invented the damn hover board no doubt.

Back in the heady days of the late 90’s, I was working for an internet technology ‘incubator’ unit based at my local university. I’d spend the previous few years working for one of the biggest computing companies on the planet and now going to work in jeans on a university campus was a welcome change. My colleague and I had persuaded our new boss to introduce an unofficial 20% rule – inspired by a similar system at 3M – so we could develop our own ideas on top of the work that the unit was supposed to be doing.

A few months into the job, I called my colleague into my little office to show him something that I’d hacked together over a few weeks of my 20% time. I’d been trying to keep a little diary and was getting bored mashing this manually into HTML, so I’d built a rudimentary GUI, which allowed me to write entries in text and then saved it to a database and served it out properly rendered.

There was no inline linking or styling (you could add links into a ‘related links’ section separately, which when served appeared alongside in the page). It was therefore pretty basic stuff.

‘What is it ?’ asked my colleague.
It’s a Web Diary system. You can type in your diary entry and then publish it to the web’ I replied.
‘Who the hell wants to publish a diary ?’
‘Erm…. not sure. I’m sure someone will’.

That was the last we heard from that project.

A year or two later, I had started to develop small-scale systems which we would now recognise as ‘Web Content Management’, mainly at that stage for intranet and extranets. The incubator unit had been disbanded and I’d been taken on by a small consultancy company, to further develop these systems and add additional functionality for the increasing number of customers.

I remembered the diary code, blew the dust off it and integrated it into the pre-release version of the new intranet software. The idea was that the staff could maintain their own pages within the organisational structure about themselves. I’d added the ability to upload pictures, extended the authoring a bit to allow inline linking so they could add links to each others pages etc… it looked half-decent.

‘Explain it to me’ said the new boss.
‘Well, it’s sort of a way for staff to share their interests with colleagues’
‘Why would they want to do that ?’
‘I s’pose so they can find other people in the company who like the same things as they do….’
‘Nobody is going to do that’.

Again, that was the end of that idea.

Between these two events – before the end of the unit – my colleague and I had invented an addictive new game.

We’d been sent a small plastic football from one of our customers and we used to spend our lunch hours in the server room, trying to chip it into a waste paper basket (or as we say here, ‘bin’). So addictive was this game, that we’d get to work early and leave late just to try and perfect the techniques required to win at what we’d now dubbed ‘BinBall’.

An unfortunate incident involving one of us knocking out one of the DNS servers for over an hour with a misdirected shot led to it voluntarily being banned, which was a shame as we’d only just finished the 15 page ‘Official Guide to BinBall – Authorised by the National BinBall Association’.

And we’d been too slow to get the ‘nba.com‘ domain.

Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com]. He also now accepts that servers rooms are no place to develop dangerous ballgames.