Enterprise Search and Usability

Tuesday, July 22, 2008

not as radical as I thought. . .

http://en.wikipedia.org/wiki/Generation_V

It looks like my boss might have been surfing the web . . .

Generation V

My boss has a new buzz word he is using "Generation V" . He sees that there is a new class of people, a generation not bounded by age or birth - but instead bounded by the experience of using computers. This is an interesting idea. There are people who are active users of the Internet, blogging, social networking tools, and so on. There are other people who are not, and likely never will be. This set of differences is larger than the similarities that might exist across a age group.

Any other opinions on this concept?

Monday, April 28, 2008

Improving internal search

I think that it is critical that our people have tools to find information inside and outside our organization. The more high quality information we can place in the hands of the practitioner, the better they can do their job. I believe that intranet search is an area where we can improve on this capability. The first step towards improving is to measure where we are today. If we do not know where we are, we cannot know if we have improved. We have taken that baseline. We have started to "tweak" our environment to improve on that baseline. As we prove out improvements in the lab, we are moving those out to production as Beta implementations. As improvements are proved out in Beta they will be moved into production search.

How did we take that baseline? The standard for measuring search is based on a subjective, human understanding of relevance. The standard was set by the NIST. The NIST holds an annual event where they test and tweak search. They use a standard set of content, fairly small. They have experts evaluate the content and determine the ideal documents within that standard set. They then use complex queries supplied by those experts to bring back documents from the search engine. They measure how many documents within a number of results are from the ideal set - precision. They measure how many ideal documents can be found by the search engine - recall. This seemed like a reasonable process, so that is how we measured our impact on the search engine.

We established a test environment – a new instance of the search engine, indexing the production content. We asked for volunteers to act as our experts. We asked them to establish an area of expertise for themselves. We then had them identify the "top" 25 documents within that area of expertise, given a query that they suggested. This became our ideal set. We used that ideal set to measure precision and recall at 3, 10 and 25 results.

Usually search engine vendors apply more than one approach to search and combine them together to create a relevance ranking. Generally, the various approaches to calculating relevancy - Vector, Probability, Language Model, Inference Networks, Boolean Indexing, Latent Semantic Indexing, Neural Networks, Genetic Algorithms and Fuzzy Set Retrieval - are all capable of retrieving a good set of documents with a query. On a similar set of content, with a similar query, they will each retrieve the same documents in about the same order. The difference comes in with the work done to "tweak" the engine to particular content collections.

One thing that can improve search relevance is a set of known good documents for a particular query presented either as a "Best Bets" or used behind the scenes as a "golden sample" to refine relevancy. This presupposes that your queries follow a Zipf distribution and you can target most of your users by tuning a relatively small set of queries. Unfortunately our current search logs do not follow a Zipf distribution. Our top 100 queries do not represent 10% of our total queries. To reach a 80% penetration rate we would need to generate 20,000 separate "golden samples". This is a tuning method that is dependant on the Zipf distribution to make it effective. When your logs are flat, like ours are, it does not scale.

We are also investigating other search technologies and evaluating them on our current content to see how well they retrieve documents with the same queries. What we are seeing is that these other search engines are not significantly better against our content than our existing implementation, according to our measures of precision and recall. What other search engines offer is significantly improved tools for tweaking the search engine based on our content collection.

I hope all of the above has been of interest to you. I have found it interesting to compare vendor claims to reality as we have been working through these projects. The difference is quite large. Nothing is ever simple, or easy. I'd rather have a "silver bullet", but recent evidence shows that one does not exist. The only silver bullet is to invest hard work, pay attention to the details and persevere. I think improving search is worth hard work, attention to detail and perseverance.

Friday, March 14, 2008

Response time effect on our users with Search

Looking across the industry, we see some consistant information about system response time.

Generally speaking, when a page takes more than 8 seconds to launch there is a significant impact on the users perception of quality.

Neilsen says in this article:

The basic advice regarding response times has been about the same for thirty years [Miller 1968; Card et al. 1991]:
0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.
10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.

So - what is your ideal search response time?

Friday, February 8, 2008

Usability Techniques

We have a number of core techniques we employ in our usability program. This post is a brief overview of the techniques. In future posts, I'll dwell on each technique in more detail and discuss how we employ each on a project. The three core techniques are Card Sorting, Expert Review and Lab Testing.

Card Sorting – uncovering the mapping of the computer display of information and the user’s conceptual model of the information; each concept is written on a card and the users sort the cards into piles.

Expert Review – a formal review conducted by usability specialists according to common, pre-established usability principles.

Lab Testing – while being observed by usability specialists, users attempt to complete scripted tasks, which take advantage of the functionality of the system.

What do users want from a search engine?

This post is going to be just a quick bulleted list of what I know my users want from a enterprise search engine, based on my study of our search logs, usability studies, profiles, recent research in the field, information retrevial studies and some conference presentations. If you have additional suggestions, please include them as comments.

A single search box, persistently placed on all pages. Wide enough to avoid typos.
Google - everyone knows google and wants internal search to be google. This doesn't mean the google appliance. This means quick, accurate and comprehensive search. Quick means sub 5 second response time. Accurate means the right document in the first 3 documents. Comprehensive means every possible document - regardless of which firm silo created it, regardless of which technology holds it, regardless of if "it" is a word document, a zip file, a email, a document on their hard drive or a documentum folder.
Some kind of advanced search, even though they will not use it.
An ability to narrow the search to specific areas of content that is contextual to them. Sometimes this is content types, like Policies, People, Sites. Other times this is my country, my service line, my language, my industry - taxonomy, but without having to call it taxonomy.
For the tool to allow them to type in "How do I do an internal audit?" and bring back documents on audit methodology. This doesn't have to mean natural language queries. If you ignore the "How do I do an" and the "?" that query is "internal audit". The system has to ignore ? and "How do I do" to run correctly. But it does mean that there needs to be a relationship between internal audit and audit methodology.
People expect the system to find things using an - not a PHRASE and not a . People looking for Risk Management in Technology Companies in the UK type in Risk Management Technology UK. The tool needs to understand that behavior and correct for it.
People expect that the system will correct their spelling.
People expect that documents with some of the words they searched for in the title will be more relevant. That documents with some of the words in the summary will be more relevant. That the system will know that TAS and Transaction Advisory Services are the same exact thing, even though they are not to the computer. People expect that more recent documents will be relevant, except when they are looking for older documents - and they want the tool to know the difference.

Those are some of the things that I know people want from search. What do you want from search?

How do you evaluate a enterprise search engine?

I am not going to claim this is the only way to evaluate a enterprise search engine. But, early on in my search program we started thinking about how we could measure our progress with search. How could we tell if a search engine tweak caused the results to improve, or get worse, or stay the same? Seems like a simple question, no? How do you evaluate a search engine?

Companies have been selling search engines for decades. IBM started with a product called STAIRS in 1960. Given that long history, you would think there would be a simple answer - do x, look at y and if it is larger than z, you have a good search engine. Evaluating a search engine is actually a complex question. Search is a very context sensitive behavior. In a knowledge environment, the documents that are of interest to you are not the same as the documents that are of interest to me. Search really can only be evaluated within a specific context for a specific user.

There is, however, a standard for search evaluation. The standard is based on a subjective, human understanding of relevance. The standard was set by the NIST. The NIST holds an annual event where they test and tweak search. They use a standard set of content, fairly small. They have experts evaluate the content and determine the ideal documents within that standard set. They then use queries supplied by those experts to bring back documents from the search engine. They measure how many documents within a number of results are from the ideal set - precision. They measure how many ideal documents can be found by the search engine - recall. This seemed like a reasonable process, so that is how we measured our impact on the search engine.

We established a test environment – a new instance of the search engine, indexing the production content. We asked for volunteers to act as our experts. We asked them to establish an area of expertise for themselves. We then had them identify the “top” 25 documents within that area of expertise, given a query that they suggested. This became our ideal set. We used that ideal set to measure precision and recall at 3, 10 and 25 results.
We also established another way of measuring the impact of our changes. We asked for real users to tell us how we are doing – have we improved, declined or stayed about the same - by creating a Beta site.

From this testing we determined that we could improve relevancy, and that these improvements would be noticible to the end user.

What is usability?

Usability is a process for measuring and improving the user's experience with a web site or application. Within the KWeb team, Usability begins with a focus on users’ needs, tasks, and goals. Usability requires that you change your mind set, and spend more time on initial research and requirements. Instead of starting with the design of the system, you start by identifying your target audience and observing them as they use an application to accomplish their tasks. You use this research to identify what the users needs are, what they want to accomplish, how their environment affects their behavior and identifying their priorities.
This process creates an emphasis on iterative design process. You need to develop applications in a prototype form, test the application with the end users – have them try to accomplish tasks using your prototype. Once you see the pain points, change the prototype to address those pain points and test again. Through a series of iterations you develop a product that solves real end user’s problems in a simple and intuitive way.
One way to describe a usable system is that it is easy to learn, easy to use, easy to explain and hard to forget. Think of a tool, software or hardware, that you enjoy using – which works well. Usability is a way of developing new tools that hit that sweet spot. It all comes back to measures – in order to make effective change you have to know where you are, know the effect of your changes and have a target goal in mind. We target improvements in task completion, task time, user satisfaction as measured by the System Usability Scale and a reduction in way finding errors.

The great search experiment

Back in 2007 I tried an experiment with some coworkers to see how well other search engines worked. We wanted to know, from experience, how well faceted search, guided navigation, search presentation layers and so on were in day to day life. If you are interested, you can see the entire experience here: http://30daysgooglefree.blogspot.com/. In the end, we determined that if the search engine gives you a good document on your first page of results, you won't need these other search tools. If it does not, then you will not see a great improvement.

I was a bit disappointed, really.