Information Visualisation

By Glenda: First published in Online Currents – Vol.18 Issue 8, October 2003


Information visualisation is a technique for visually presenting large quantities of information to users. It has been applied to search engine results, library catalogues, business information and scientific research data, with the aim of making information quickly accessible to end-users, and of developing new ways for users to explore data. The methods for displaying the data are often compared with traditional maps; both summarise an enormous amount of information and allow users to identify patterns and trends.

In this article I will examine features of two visualisation tools  KartOO, a metasearch engine which visually displays automatically categorised results from Internet searches, and Visual Net, a program usually used for visual display of more structured information, including library catalogues. I will not be covering the visualisation of scientific and other research data, nor the broader interpretation of visualisation, which includes all aspects of the display of information (including plain text, lists and so on).

KartOO
KartOO (http://www.kartoo.com ) is a French metasearch engine that uses Flash to present search results visually as a collection of nodes showing Web site URLs, with links between them. The size of the nodes indicates the relevance of the site to the search query, and shading between the nodes indicates relatedness of the hits, while an isolated node suggests a site with little in common with the other sites that were retrieved by the search.

Between the nodes are words that were used in identifying the relatedness of the nodes. As you hover over a node you see the links from that node to the words in the links. You also see summary information about the site identified by the node at the left of the screen. The summary information includes a title, brief description, and URL. As you hover over the words in the links, the links from those words to Web site nodes are highlighted, along with plus and minus icons. You can add or delete these terms from your search by clicking on the icons.

On the left of the page, the top five sites are listed, along with the top fifteen automatically identified topic clusters. You can refine a search by clicking on the plus or minus icons at the topics. These clusters usually offer more useful refinement than the single keywords within the visual display.

You can now download the Kartoolbar for use on your own computer (http://www.kartoo.net/a/en/metamoteur.html ).

In a search for information visualization on 28 July 2003, 10 sites were displayed (Figure 1). The terms displayed in links were advanced, books, tech, graphics, user and wyoming. (By coincidence, two of the hits involved Wyoming, and one was BarnesandNoble.com, a site sponsor). Unfortunately, none of these terms is useful for refining the search (unless youre interested in Wyoming, of course). On the left of the page topic, clusters that have been automatically identified are displayed. They include visualization software, data visualization, software products, and georgia tech, all of which could potentially be useful concepts for narrowing the search.

Figure 1: Search for information visualization on KartOO

When the same search was performed on 30 July 2003, the top four sites were the same (the fifth one was the second sponsor), and the categories were similar, but not identical. The Web sites in the visual display had changed, with the two linked by the term wyoming gone, and two new ones related to the visualisation of proteins. Also the Olive site, which is discussed below, moved from isolated status to become linked to the group through the term categories.

KartOO shows the potential of visualisation for improving browsing of search engine results, particularly through showing relationships between topics, and allowing refinement using single keywords or identified categories. It is also a tool that could be useful for in-depth exploration of the relationships within a subject area (i.e. understanding a topic without necessarily searching for Web sites).

To me, the big stumbling block in the use of KartOO at the moment is the fact that the keywords within the visual display are not generally useful for search refinement, or for understanding the nature of the hits linked to those keywords. From the two searches done on information visualization, the terms proteins and wyoming are most useful for refinement, but both are very specific and not relevant to my needs. Some other terms could be useful in certain situations (e.g. tools), while others dont add value at all (e.g. special). It is possible that use of a stop list of general words, or automatic techniques to identify more content-bearing words, could help here.

Other limitations of KartOO relate to speed  it takes longer to display results visually than textually, and it takes longer to explore them, as it is necessary to hover over a node to read the summary information about it. That said, for a serious search there is no need to begrudge an extra few seconds or so. My approach for subject exploration will now be to use a combination of Google and KartOO, along with other search approaches, to ensure a view from different perspectives.

Grokker is another program that provides knowledge maps of search results. It is also available for use on personal computers (http://www.groxis.com ).

Antarctica/Visual Net
Antarctica is a Canadian company founded by Tim Bray, co-creator of XML. It produces Visual Net software which creates large-scale browseable maps (http://www.antarctica.net ). Version 4.0 has just been released.  A white paper for libraries (February 2003) is available after registering athttp://antarctica.net/request.html .

Visual Net has been used to display: a library catalogue; medical bibliographic data (Pub Med; http://pubmed.antarcti.ca/ , authorisation required); financial data (Macdonald and Associates Limited,http://public.vn.canadavc.com/start ); and a map of the Internet using Open Directory input (http://maps.map.net) . A number of metaphors are used for the visual display, including geographic maps and blocks to imply physical dimensions similar to library shelves.

The library catalogue at Belmont Abbey College, North Carolina (http://Belmont.antarcti.ca/help/help_2D.html ) uses Visual Net software in an attempt to replicate the concept of shelf browsing.

The library is organised according to the Library of Congress Classification (LC), and resources are grouped by classification number. The size of the blocks indicates the number of holdings for each category, and bits jutting out at the side of blocks indicate subcategories. Selected resources within each category (e.g. new or popular ones) are indicated by dots or other icons, along with the title. Icons are used to indicate computer accessible resources, videos, maps and so on, and the colour within a dot indicates the type of book (print or ebook). Coloured circles of different thicknesses around the dots provide further information, such as the newness or language of the item.

You can browse the collection by clicking on sections of interest, and you can limit the holdings that are visible on the maps by typing a keyword into the filter. You can also see text-based search results by clicking the List button.

While some of the displays could enhance browsing, others seem to provide little useful context and less content than would be available in a text-based results list. For example, the display for Z Library Science is squashed into a short area and names 5 categories and the top ten items according to their visibility algorithm, including 100 library lifesavers, How to find money online, and A day in the life of a colonial printer. For a discussion on the implementation of the system and the value of browsing seehttp://www.charlestonco.com/features.cfm?id=95&type=np.

A search for Knowledge Management using the Visual Net display of the Open Directory site (http://maps.map.net ) resulted in a fairly readable collection of categories (Figure 2) including Information Architecture (with Jakob Nielsens useit.com site highlighted) and Knowledge Discovery (with Antarcti.ca Systems: Visual Mapping Technology highlighted). The icons on this site have different meanings to those in the library catalogue, with the circles indicating the number of pages at the site, the number of incoming and outgoing links, and whether the human editor says the site is cool.

Figure 2: Search for knowledge management on Open Directory using Visual Net

A search for indexing societies retrieves only one hit, for the Canadian society, although there are others in the UK, US, Australia, Southern Africa and Canada. This highlights the dependence of the display on the quality of the information underlying it. If the directory is good, there are good chances of retrieving useful information, but where the directory is lacking, the display becomes irrelevant.

Visual Net uses XML and JavaScript, and can visualise one or more information repositories using one or more categorisation schemes (taxonomies). The ease of implementation depends on the number of collections and taxonomies, and the ease of extracting existing metadata. The program works with millions of items and categories, and it is possible to restrict access for some users to specific areas of the map.

You can explore other visual displays of structured information athttp://www.webbrain.com and http://www.inxight.com/map/.

Information Visualisation Research and Commentary
Most of the references in this section discuss research into various aspects of information visualisation, while others provide useful commentary.

The Atlas of Cyberspaces (http://www.cybergeography.org/atlas/atlas.html ) is a beautiful and intriguing site, with examples of many different types of maps of the Internet, organised into categories such as Web Site Maps, Surf Maps and Historic. Its worth a look just for general interest.

Evaluation of text, numeric and graphical presentations for information retrieval interfaces: user preference and task performance measures by E. Morse, M. Lewis, R. Korfhage and K. Olsen (http://usl.sis.pitt.edu/ulab/pubs/SMC98MLKO.html ) is a report on the evaluation of five interface representations: ordered text, ordered icons, tables, x-y graphs and a spring-based visualisation. Results showed that the ordered icon list and text list produced the best results, while users liked the textual format least, and the two visualisation methods (ordered icons and spring-based) the most. Evaluating Visualizations: Using a Taxonomic Guide by E. Morse, M. Lewis, and K. Olsen (http://usl.sis.pitt.edu/ulab/pubs/IJHCS00MLO.html ) is a similar report with graphics, showing the types of interfaces studied. This study also found that visualisation methods worked well (especially with more difficult searches) and were strongly preferred, while text methods worked fine but were extremely ill-preferred.

Grokking the infoviz (19 June, 2003; http://www.economist.com/ ;search for grokking) is an introduction to information visualisation published in The Economist print and Web editions. It includes a graphic of the Visual Thesaurus from Plumb Design (http://www.plumbdesign.com/thesaurus/ orhttp://www.visualthesarurus.com/desktop/index.jsp  my system always times out or crashes when I try to explore this).

InfoVis.net (http://www.infovis.net/MainPage.htm ) is a project devoted to information visualization, which is seen as the process of the incorporation of knowledge through the perception of information, mainly (but not only) in visual form.

Olive: the on-line library of visualization environments (http://www.otal.umd.edu/Olive/ ) is the result of a class project set in 1997. It gives an excellent overview of different types of visualization, including temporal, 1-D, 2-D, 3-D, tree and workspace.

Scientific data visualisation is discussed in the Web sites of Sydney VisLab (http://www.vislab.usyd.edu.au/about/about_visualisation.html ) and Spotfire software (http://www.spotfire.com ).

To search is great, to find is greater: a study of visualisation tools for the Web (http://w3.informatik.gu.se/~dixi/publ/mdi.htm ) starts with an exploration of search problems, and some techniques that enhance search success (natural language support, query expansion, domain pruning, and intelligent retrieval agents). It then provides an overview of visualisation tools including cone trees, hyperbolic trees, bullseye graphs, perspective walls and scatter/gather interfaces. It concludes with a suggestion that good results can be achieved using an approach that combines natural language, domain pruning (refining searches) and visualisation techniques.

Visualizing keyword distribution across multidisciplinary C-space (D-Lib Magazine, June 2003;http://www.dlib.org/dlib/june03/beagle/06beagle.html ) is an exploration of the concept of keyword vector clusters to enhance retrieval, using Belmont Abbey College library (discussed above) as the example.

Evaluation
According to research by Lewis and others discussed above, users love visualisation techniques and do well in information retrieval tasks using them. Why, then, have they not rapidly taken off?

Maryellen Mott Allen (ONLINE Magazine v.26, n.3, May/June 2002;http://www.infotoday.com/online/may02/allen.htm ) believes that information professionals tend not to like visualisation approaches, and have therefore held them back. She says the effort to design interfaces that cater to users desires rather than to those of the information professional is a constant struggle, usually with the user on the losing side. However, research by Lewis and others suggests that user preferences are not necessarily a good guide to the most successful approaches, so other criteria need to be taken into account.

It appears as if projects such as the visualisation of the Belmont Abbey College have received a positive response. They have the strength that they are based on structured data (already classified according to the Library of Congress Classification), and they offer traditional text-based results as an alternative, so no access has been lost. The Open Directory using Visual Net is easy to use and often, but not always, offers useful access points  again, it is based on structured data, which would be reasonably easy to navigate without visualisation as well.

Where visualisation really has the chance to add value is in the presentation of vast quantities of unstructured data, such as that retrieved from Internet searches. In this situation, KartOO does a valiant job, but still suffers from limitations in the display of useful keywords for search refinement. The fact that it shows relationships both within the visual display, and in textual lists at the left of the screen is a strength, meaning it is providing a range of approaches for different users and different needs.

A Gartner report (14 November, 2000; http://news.com.com/2100-1023-248587.html) on visualisation has suggested that it might be a particularly useful technique for the detection of trends in enterprise data stores. Good success with the technique might come when there is a better match between underlying data, user needs, and technological ability.

Search Tools Consulting (http://www.searchtools.com/info/visualization.html ) has suggested that the lack of success experienced by visualisation tools might be because they are attempting to combine graphical displays with text concepts, perhaps because most people are searching for individual items rather than a topic or category. They might become one tool in our search arsenal, but not the main one.

One thing I find lacking in some visual displays is the context to enable me to select a site, or to choose how to refine my search. Marcia Bates (http://dlis.gseis.ucla.edu/research/mjbates.html ) has discussed research on information retrieval by Resnikoff and Dolby in 1971 and 1972, which found that people process information in gradual steps with a ratio of 30:1. For example, on average a book title is 1/30 the length of a table of contents; a table of contents is 1/30 the length of a back of the book index; and an index is 1/30 the length of the text of a book. These results suggest that there is something about the 30:1 ratio that matches the way people process information. It is possible that visualisation software that provides these layers of access will be more successful than software that requires larger jumps from, say, URLs to the sites themselves. Sites such as KartOO that offer descriptions to the side of the visual display are moving in the right direction.

Ben Shneiderman has said there are many visual alternatives but the basic principle for browsing and searching might be summarized as the Visual Information Seeking Mantra: overview first, zoom and filter, then details-on-demand. (http://www.cs.umd.edu/hcil/members/bshneiderman/ivwp.html). Edward Tufte (quoted in the Antarctica white paper and many other sources) describes two fundamental rules for visual display: maximise the dataink ratio (i.e. everything on the screen should be information bearing) and maximise information density (people prefer more, rather than less, information on a screen, so long as the presentation is clear and uncluttered).

With bigger screens and more data to be navigated, and strong user preferences (at least from some users) for visualisation approaches, it seems certain that they will progress. The more these techniques follow the rules of information design as discussed above, the more likely they are to do their job well.

All links were valid on 30 July 2003.