This week, I’ve been at the ‘Online 09′ event in London, during the course of which I was able to present a sort of summary of some of the topics that I’ve been covering this year on this very blog. Indeed the subject itself; ‘Recover & Discover – Matching Search Modes To Users’, was one that I wrote a great deal about earlier in the year and is covered in a great deal of detail in the ‘Search – The Trilogy et al’ on the site.
However, I worked on the basis that presenting on the last afternoon of a busy 3 day event, people were going to be pretty fed-up with yet another burst of Powerpoint ‘Smart Art’ and in a decision that I may have cause to regret, I decided to hand draw my slides. A weekend on the sofa with pencils, felt tips and a sketch pad produced the presentation embedded below.
For those of you who attended and wanted to review, or were unable to attend and would like to ‘entertain themselves’ (very loosely) by working their way through the deck, I’ve written a sort of ’slide-by-slide’ breakdown of what was discussed during the talk.
Introduction (over slide 1)
One of the nice things about discussing topics like search and search methodologies is that there are no right answers (I fact several of the things I’ll tell you in the presentation are 100% false and, I hope. obviously so).
There is a great deal of debate – some well-reasoned, some not – on the subject, but nobody has the right answers. Not even the big players. They might have suggestions that help them make a great deal of money, allow them to wield a great deal of influence, even create their own verbs that become part of the lingua-franca of the internet. Yet it is arguable that whilst search has continued to iterate better and better solutions, it has yet to begin to meet the demands of the ever chancing demographic of the medium.
How has that changed ? let’s take a small step back in time. Which, like all small steps back in WWW history, is in fact a pretty big one.
1994 (slides 2-5)
1994. Why 1994 ? Well it was a pretty important year. For those of you too young (or indeed too old) to remember it clearly, here is a refresher.
This was a year in the era before broadband, before even the home-user 56k modem. Before Google, before eBay, before Internet Explorer, Netscape and the browser wars.
What we did have was the most symmetrical ‘World Cup’ ever in the USA in the summer. It began with Diana Ross missing a penalty during the opening ceremony, upon which – for reasons that baffled me at the time and still baffle me now – the goal exploded. The tournament ended with a strangely hirsute Roberto Baggio sending the last penalty of the tournament into sub-space orbit, rather than anywhere near Tafarel in the Brazilian goal.
Also, 1994 was apparently was nominated by the United nations as being the ‘International Year Of The Family’. This presumably meant that all families that existed prior to 1994 were declared as ‘Beta’ versions and could not rely on ongoing support for upgrades. Or of course, maybe not.
Most importantly, there were two major events in the history of the internet.
Firstly there was the ‘The Superhighway Summit’ at UCLA where Al Gore as keynote speaker is alleged to have coined the phrase ‘The Information Superhighway’. A phrase that in the years that followed, may people kind of wished he hadn’t, as it along the term ’surfing’ were highlights of many a late 90’s wince-enducing vapour-ware presentation.
Secondly – and perhaps not as well know – was that 1994 was also the year that I first used the internet; specifically the WWW, to solve an office argument about ‘The Dukes Of Hazzard’ (whether the doors of the ‘General Lee were welded shut, or whether they Dukes were just too plain lazy to open them).
I was at the time employed by a large computer company and sat on my desk was a monstrously large IBM 3270 terminal, on which I had discovered a few months before, you could access a Lynx-like text browser and through a combination of ‘F keys’ it was possible to enter URLs and see a textual representation of the HTML. This, in combination with ‘Webcrawler’ – the first keyword search tool – allowed me to peer into the ether and find information (not always about bad 1970’s TV shows I hasten to add).
Back then, every search was a discovery. You didn’t know what was out there, there were no real reliable sources on which you rely on being able to utilise. What you found was an iterative process of filtering results until you either found what you were looking for, or hit an informational dead end.
2009 (slides 6-10)
Obviously, by comparison 2009 isn’t just a step forward of 15 years. The internet has a ‘cat-like’ approach to aging, in that a single additional year in human terms equals a great deal more in its evolution. It might as well be another planet by comparison to 1994.
In 2009 it is now law that the iPhone must be mentioned in all presentations, regardless of whether it is actually relevent to the discussion or not. This device apparently is due to ‘….solve all world problems by Q3 2010…’, which is obviously something to look forward to for all of us.
2009 is also the year where a major constitutional crisis was narrowly averted after it was alleged that the Prime Minister wasn’t easily able to identify his favourite biscuit. Who knew ultimately that politics could be distilled down to such simple matters of snack-food preference ? Maybe the 2010 election could end-up being fought over the key battleground of whether ‘Monster Munch’ are smaller than they used to be, or whether it just seems that way ?
Back on topic, the magic of being able to keyword search was long since passed. Webcrawler begat Lycos, which begat Altavista, Hotbot, Metacrawler, Dogpile, Ask….. Google. The familiarity that we now have with search tools is such that we don’t really think about search tools. And looking at the data from Google Insights for example shows that at a high level.
On the day I was sitting down to write this presentation, I checked the previous 7 days ‘Top Searches’ and ‘Rising Searches’. I then compared them to the same set I had recorded when I was writing my original ‘Recover’ search piece back a few months before. The ‘Top Searches’ (which more accurately could be described as ‘Top Search Terms’) were more or less the same. Whilst the ‘Rising Searches’ were of course different, the actual types of term were more-or-less the same.
- ‘Top Searches’ – existing web properties
- ‘Rising Searches – known entities (mainly ‘people’, but also ‘places’ and ‘organisations’)
Taking ‘Top Searches’, what does this tell us about how people are using Google ? They are – in this simple example – not using Google to discover whether something exists, but rather to recover information that they already know is there. At a more extreme level it would suggest that search is even used as a form of or replacement for bookmarks (hence why if you drill down into the term ‘facebook’, you quickly get to the detailed term ‘facebook login’). We all knew that search queries were dead, now it seems that basic browser functionality is to follow, bookmarks, the address bar, all replaced by the searchbox.
Not only that, but by implication, users also don’t trust the internal search – site search – at the given locations to give them what they want. Drill down though into the ‘News and Current Affairs’ category and look just how many searches are the combination of known news providers and entity keywords. Google does not care where it sends them, even if the initial intention was to search a specific provider (assuming that the search isn’t using ’site:[domain]‘ which I guess is probably a given.
‘Who Searches For What ?’ (slides 11-16)
It’s obvious that a tool like Google Insights is going to be somewhat skewed towards the mainstream consumer when you look at the data in such a high-level way. It is important though, not to consider the ‘Recover’ mode just something that exists in some lumpen-consumer way, because we are as users of the web, far more sophisticated.
So here’s me in work-mode. You can tell I’m in work mode because I almost smart and I have my hair tucked behind my ears in an attempt to look less scruffy than I actually am. In the course of my work, I will do a fair few ‘recover’ searches;
- Checking on the company shareprice.
- Placing a postcode onto map so I know where a meeting is taking place.
- Pulling up a quick bio on someone I’m meeting.
There I am in casual, non-word mode. Similarly speaking, I will also in my own time conduct searches of this type;
- Checking the latest football scores (which as a Southampton fan, is something I do through gritted-teeth).
- Placing a postcode onto a map for I know where I’m trying to meet my friends.
- Pulling up some information on a celeb, so I know what everybody else will be discussing when we meet.
Back in work-mode, I may well also perform more complex searches.
- That place I found on the map earlier, how is the best way to travel there ? How long will that take ?
- That person I was reading up about, is there any co-currence in other articles between this other person or this company/product ?
Whilst the inital search might have been something of a commodity hunt, now I’m performing a much more advanced sort of query. I’m trying to make comparisons and look for patterns. I do the same in casual-mode;
- What is the best way for me to hook-up my iPod to speakers ? Headphone socket or via an offboard DAC of some sort ?
- On a 32″ LCD, do I really need 1080p or 720p support for video ?
We can broadly map these two types of query against the ‘Recover and Discover’ model. These models are not there to characterise a user, but to characterise the sort of modes that we all use to get hold of the information that we are after.
The more simple ‘Recover’ is much more simple and could almost be said to be a commodity. A quick basic single or phrase search and a user will most likely jump into the source that they know will fulfill their requirements. Its simplicity doesn’t alter the value of the search, just the complexity of result it takes to fulfil the need.
To be able to just deal with one of these modes and not the other excludes the usefulness of the content we might have, from being found by the very people who actually want to find it. But what is that they want ?
‘What Users Want’ (slides 17-23)
As I said right at the top, there’s very little in the way of facts in this area of debate. And I’m about to contribute further to the weight of conjecture when I say;
‘Users don’t want search, they want find’
What do I mean by this wantonly facile statement ? I mean this; 15 years ago we gave users their first keyword search experience to be able to type in what it was that they were looking for. We then got cleverer and cleverer in what you could do in this text box, how you could construct clever queries which narrowed the results by site or inclusion/exclusion. We developed natural language searching where you asked rather than queried. And then, after all of this, where did our cleverness get us ? Right back to where we started. To keywords.
But that’s not to say that keywords mean that we cannot use well-proven types of search ‘abstraction’ to meet users needs. We are if nothing else, the experts in our own content. We know what is important about it, how it related to other content we also produce and where the commonalities are between them.
The 30k+ volunteers that maintain articles on Wikipedia do this on a manual basis, building the linkages between articles that are related, which if we only looked at those in isolation would give us a pretty good summary of what the most important (and valuable) elements to each article actually are. Not many of us however can could on a volunteer army of enthusiasts to perform that job for us.
If we sit down with a piece of content that we are familiar with, we can fairly quickly pick out the most important terms and add them as metadata to the original document, helping search tools to make better judgements about what is and is not truly relevant to the users attempts to find. Of course, that’s not a suitable solution to people who publish large amounts of content. Luckily, there are some excellent fully or semi-automated tools out there that will be able to do that for us, regardless of how numerous or complex our content is.
By turning each of these metadata elements into an inline link, we can start to second-guess where that user might want to go to next. What started as a simple ‘Recover’ search might actually be the beginning of a more complex traverse through your content (like the travel example I gave earlier when I was in ‘Work Mode’.
Thinking again about ‘Recover’ we can start to think about the output of search not as a set of results, but as information that enlightens a visitor about a specific ‘Topic’. Taking the input from our above inline tags, we might chose not to link to a specific piece of content, but rather to a ‘Topic Page’, which could automatically be produced on-the-fly via a search tool dressed with an SEO-friendly permalink. As we can produce these pages dynamically and on any subject that we have established that we have relevant content concerning, we can almost do this to n degrees; our only limit is the depth of our content and the depth of the associated metadata.
In ‘Discover’, we are actually more familiar with the concept of ‘faceted search’ than we probably aware. When we use eBay to search for products, we’re at a beginning point of knowing some elements of what we’re looking for, but whether they exist for sale is something that we do not know. Starting with a basic keyword search, we are presented back a list of potential candidates to match our first search, with the ability to refine this by the categories under which each item is listed.
These categories and keywords are manually entered by sellers when they list an item (via a system of drop-down menus and some basic automatic keyword extraction), and as such rely to a great degree on the accuracy and honesty of the human being creating the listing. The results are what you’d expect from such an approach; far better that without any metadata, but massively inconsistent. An inconsistency which sometimes manages to hide exactly what you’re looking for (whilst also creating a bargain on some poorly listed items for those of us who try to second guess common listing mistakes).
Newssift, a business news aggregation service from the Financial Times’ ‘FT Search’ unit [disclosure : a customer of Nstein], uses faceted search techniques to allow users to browse articles by selecting topics, companies, locations (and combinations of all of those) to quickly refine their areas of interest. As you drill down through individual topics for example, the associated other facets are dynamically redrawn to only show you those which are ‘co-current’ with that you have selected. It’s a neat solution to assist users to be able to find the content that meets their specific needs via discovery, without them having to make frequent repeated manual queries to refine their requirements.
Matt Mullen is an Industry Consultant at Nstein Technologies [http://www.nstein.com]

