Creating classifications and concepts

This article came about because readers tend to use the index instead of TOCs in hypertext documents (online, online help, web, and also in printed books), but indices are not a very good way to list information in online help, or in a book. I perceived this as a classification problem. This led me think that we writers need to find better techniques for classifying information, possibly using concepts to provide the proper focus.

The big question for HTML (especially HTML help) is what would be the best thing to put in the left panel that would work well for locating (and of course navigating to) information? Loosely, that's indexing - but obviously not an "index." A TOC is one way to index information. Other ways include icons, page headings, putting material in alphabetical order (dictionary and encyclopedia), lists of tables or illustrations, chapter contents, thumbtabs, and creating an index at the end. We have a lot of conventions in the book world for "indexing" information. Those conventions work for the medium. Dictionaries, encyclopedias, and telephone books are much easier to use by being indexed alphabetically - a TOC is not useful for the main information in those books.

Hypertext has created other ways of indexing information - expanding hierarchies, dynamic links, graphs, and search engines. Hypertext has also brought us closer to the user - we hear more about their frustrations and their needs. The need that I am clearly hearing is tremendous user frustration over not being able to find information, and I see now (partly from my own frustration) that this goes back to books as well.

This article is about classification in literature, and I will eventually link it to the article on cultural analysis. I'm not an expert in any of these fields, and this article is intended to be a starting point for further investigation, and for a dialogue with others in the writing and publishing fields. Usability studies and reader feedback are needed to confirm (or disprove) the ideas presented here.

My conclusions are that there are better ways of conceptualizing and classifying information. Basic research in cognitive psychology and in linguistics have created a body of knowledge that should be illuminating and helpful. Those areas of basic research haven't reached final conclusions yet, and might not for 5, 25, 50, or 100 years, but they do present some evidence that is useful. What I think we tend to do is classify information without regard for the reader's purpose.

For example, in the publishing world, books are written by authors with the idea in mind: "What is the thing that makes this book different from other books?" The focus is on that idea because they want to convince the publisher to buy it. Publishers come at the book from a marketing point of view: "What will get the public to buy the book?" Notice that the emphasis of both is on the sale of the book, not what the buyer wants - although it may reflect what the buyer wants. So what you get is a steady succession of books that incrementally inform the buyer, and he has to buy many books to get good coverage of the entire area. This isn't all bad - the buyer gets multiple points of view and also keeps up with the field. But what the buyer wants is not a consideration, except to manipulate him into buying book after book.

On the other hand, the buyer knows not to judge a book by its cover and looks in the index to find out what is really in it. The authors and publishers also know this, so they follow the rule of thumb that a good index sells the book. Thus many books index nearly every word. You can't tell a book by its index. (This is one reason why I like Internet publishing - I write material in the cross-section of what interests me and what I think visitors want to read, and I monitor the number of visits to pages and quit writing those types if they aren't getting visits.)

Writing is about communicating ideas. When a writer creates a document he tries to do the best he can to communicate the ideas he thinks are necessary for the reader. He then creates an organization scheme (TOC) that he thinks represents what he wants to tell the reader. Often the writer has experience and knows a great pattern that supposedly works. The reader then turns to the index. Why?

Because the writer and the reader have two different perspectives on the information. Following are examples of six different ways to categorize information on the topic Garden Weed Control:

Environmentalist

Preplanting preparation
Ground cover
Cultivation
Last resort chemicals

Effectiveness

Ground cover
Preplanting preparation
Pre-emergence chem.
Last ditch efforts

System

Ground cover
Preplanting preparation
Pre-emergence chem.
Cultivation
Post-emergence chem.

Natural ordered steps

Ground cover
Preplanting preparation
Pre-emergence chem.
Cultivation
Post-emergence chem.

Time of year

April

May - July

User oriented

What do I want to do?

Most effective
Preventive
Corrective
Time of year
Five Steps
Environmentally friendly

In a typical table for online documents, items are buried in a hierarchy so that the reader can't find them. In many book TOCs, there are so many items that they can't all be in the TOC.

TOCs present information that is abstract. The user must think from the specific information that he wants to some abstract classification that it might be under. Any piece of information can be classified many different ways, so if there are six to a hundred different potential categories, the reader may have to check all of them.

The two pane screen with the contents continuously displayed in the left pane (persistent) appears to be the user preference. The persistent display has the advantage of providing context - the reader always has a sense of where he is in the document. This sense of place or context is very helpful to readers, particularly in hypertext where it is easy to get disoriented, but it has its precedent in books, which show the chapter and section in the header and footer.

After the section on relevant theory, I will explain using meta-concepts to focus the table of contents.

Searching for information and processing information are not only different activities, they have been shown to have totally different cognitive activities in which categorizing may conflict. According to Kearsley, 1993, these activities have the following characteristics:

Categorizing is involved in processing information, but categorizing is not involved when the person is searching for information. So if the person is forced to go through abstract heirarchies to locate information, this adds an additional mental load. This can make the task of finding information all the more daunting and frustrating if the person has to stop and think about the different ways the information could be categorized.

What does this mean? Put items in lists instead of hierarchies as much as possible. When it is necessary to create a hierarchy, make the title very obvious with regard to the type of information that would be found within it.

Categorizing things is fundamental to humans. It starts from the first moments of life. The world is a continuum and breaking things down into concepts and categories is how we make sense of our world in bite-size chunks. Is there anything natural about the way we categorize information? Yes, very early studies in the field by Eleanor Rosch showed that colors are named in the same sequence in different cultures. It seems to have to do with the central nervous system. But in a very telling way, finding similar results with other things did not happen.

It seems the very basics of categorization begin early in life, and are a neurological function, but there the similarities between us nearly ends. The brain forms what researchers call "attractors," which like nails on a pegboard gather very basic similar information to it. All coats get hung on one peg, but the information is much more basic than a coat. The brain seems to form structures that mirror the external environment, but in a way that categorizes things. This process seems to proceed at first by visual similarities. All coats get hung on the same peg, people on another, and all the typical things in our environment get their own peg.

This process (eigenvalues, prototyping, family resemblance) has been studied by many researchers, and the results are consistent. But there is evidence of other types of categorization that is currently being studied that departs from the neurological level. To understand this shift, it helps to understand that this pegging system is both stable and dynamic. Eigenvalues form a stable system for us to categorize things - the categories are stable - but if our information about a category changes, the category changes to reflect the change. But it is thought that this self-organizing system of ours can only grasp supportive aspects of things in our environment, and only provides a small subset of possible classifications. We could be stuck at this point were it not for other mechanisms.

One important point to note is that all of this classification comes from our interaction with our environment. If we don't react with the environment, we have nothing to classify. And just as importantly, the environment serves as a focus for us for classification. People working in a lumber yard have a lumber orientation - they might be likely to classify things by the wood they are made from. People working with accounting are likely to be in a mathematical frame of mind. People working with communications programs are likely to come at information from that point of view. The environment is very influential on how we categorize.

So when someone is working with a computer program and needs help, what is his environment? A computer environment. A program that does a specific thing. A task within the program. These environmental factors all shape how the person is classifying information.

There are two mechanisms, currently being researched, that shed more light on how we categorize. These mechanisms have support from research results. One is an extension of the neurological model called "dualism." This theory sees categorization as a two step process. First things are categorized by similarity. Then they are further clarified by "definitions." So in this we can see that we first see similarities, then see differences. We define things by how they are different from other things. One illustration I often use is the following line:

What we take note of on the line is not the individual dots, but instead the lone !. I suppose we quickly categorize the dots as a line of dots and then we're looking for what is different. This seems to hold true in many aspects of life. Compare with the following line:

The mind very quickly assimilates the new, more complex pattern and overlooks the similarity to spot the !.

the mind catches the ! in the middle. And in the second and third lines we can do pattern matching - the ! occurs in the middle of each and we quickly catch the pattern.

But as patterns become more complex and unfamiliar, it becomes more difficult to see the different item:

In this last line we have to stop and analyze the segments of four, note what is different, and compare to the next segment to see just how different it is. In the last line, the middle has the letter T which doesn't occur in the rest of the line. The mental load gets much bigger. When presented with an index or a TOC, the mind has to try to pattern match the words it is looking for while also trying to decipher how the writer chose to label and categorize the subject.

We can ask ourselves about a reader, has he reached the point yet of making fine distinctions about a subject, or about a computer program, or is he still categorizing things that are similar?

The other mechanism currently under research is where the fields of cognitive psychology and linguistics finally meet - symbols. The fact that our neurological categorizing system is dynamic - can change, is not set in concrete - makes it possible for higher order categorizing and creating new constructions.

We can use symbols to unite categories. Words are symbols. The word chair is just a word that we can identify with the object chair. The word chair has no natural meaning to us. There are many words for chair in many languages, and even in our own language we have other words to define what a chair is: rocker, recliner, sofa, couch, futon, seat, etc. We may understand these all conceptually as fitting in the category seat or chair. But we define them by their characteristics. So the word chair is a symbol, and a concept that can bridge the categories of sitting, and of furniture that is similar.

In the case of a table of contents, what symbols are best used to categorize information? Probably words that don't require a mental leap to knowing specific definitions, but are not so abstract that too many categories are covered; the best words are close to the conceptual category. So not rocker, not mobile furniture, but chair.

On the other hand, people seem to prefer lists of information (the index) to hierarchies. We should avoid making more out of this than it is. People get very frustrated with hierarchies because they can't find the specific information that they want. Lists are very specific - they have information that is much closer to the definition level - and they are very long. The writer's categorization scheme is also likely to have influenced the index listings. So there are hidden hierarchies in an index.

So we need to find a better method than lists and a better method than tables of contents. We also need to keep in mind that as readers become more experienced, their needs change and the delivery system needs to adapt to their needs. The adaptation should be as seamless as possible. The best system of categorizing would be one that meets the needs of both inexperienced and expert users.

In all of the information I have looked at, and conclusions I have drawn, one theme seems to stand out. Contrast. Or you can call it definition. That is, giving information the ability to be located by providing sufficient contrast from other information. This is a task of being able to see what is different in a group of familiar. That seems to be what the mind does - quickly categorizes similar information and looks for what is different. As the level of complexity increases, we need to increase the level of definition. The mind needs to be able to contrast one item from another. This affects information design at all levels.

There are other things to keep in mind. The mind also needs to see things in context (similar information together), which can also be described as using environmental cues (the computer environment, the program environment, the process environment, the task environment). Following are some suggestions about improving different kinds of information, including some other good suggestions from other usability research.

To create contrast, create definers to distinguish text. Some ways to do this are to:

Use bold leads and headings for ease of scanning (contrast), and to establish context. (A lead as I'm using it here is a bold word or sentence that defines or identifies the text that follows it.)

Everything I have put in this article really should be tested or proven in usability testing. (Not that it can't be of benefit without testing, but I wouldn't want to see this information as misapplied as the chuncking principle.) Several ideas of my own, and the available literature (especially the book The Psychology of Menu Selection: Designing Cognitive Control at the Human/Computer Interface, by Kent L. Norman, Ablex Publishing Corporation, 1991) suggest several areas that need further investigation and testing to determine more usable information design categorizing structures:

There are a number of things that can be done to improve categorization so that it works for the reader:

Organize information in conceptual units so that information that is related (part of an environment) can be found in a corresponding literary environment. For example, in a book about creating a hit record, for a unit on recording a song, put in conceptual information about the process of recording, put in task information about getting the room accoustics set up, put in task information about microphone placement, put in task information about sound filters that cut down on noise and remove unwanted voice qualities, put in task information about controlling the volume so it doesn't overmodulate, and task information about mixing the instruments and voice, then put in the procedural information for starting the tape recorder, cutting the tape, rewinding the recorder, etc. This is a conceptual unit, and contains everything about making a recording.

Recording is a category broad enough to cover all the processes, tasks, and procedures. Recording is descriptive, not abstract.

Conceptualize what the reader's purpose is. Does he want to know more about a process? Does he want to know about the tasks involved in a process? Does he want to know the steps in a procedure? Categorize according to the concepts. Personally I like to conceptualize primarily on the How to level. Whether it is a process, a task, or a procedure, the reader usually wants to know how to do something. A litmus test I use for information is, "How does this benefit me?" If there is no specific benefit to the information, then it is wasted space.

Categorize for the conceptual level, not the definitive level. Most people who need help are thinking in conceptual terms, not by precise definitions.

I frequently use meta-concepts in writing (usually just called concepts). They are useful for bringing focus to organizations (mission statements) and to stories. They basically state what something is about, and I refine them to specify the things that are important to the organization or story. For writing non-fiction, a concept might go like this:

"The purpose of this category is to serve the reader's purpose for using the information, at the level the reader conceptualizes information, and in common terms used in the environment the reader is working in, understanding that the reader is searching (not categorizing), and putting information in conceptual units."

Are people going to learn tasks, or just go through numbered steps to accomplish the task? That is a very active debate in the technical writing community. When the reader gets to the point of asking for help in accomplishing the task, he is frustrated with the computer program and may be mentally disoriented. He wants to get the task accomplished. It may take three steps, or it may take fifteen. Readers have a preference for steps instead of conceptual material. The trend is to give bare procedures that get the task accomplished. The fact is, people don't learn very quickly from steps. So the next time they have to do the task, they will go through the same frustration.

The writer needs to take a hard look at the task and decide if it is a frequently recurring task and if the information needed to complete the task is conceptual in nature, or is so complex that it requires careful step by step direction. (I see very few step by step procedures that require specific direction.) If it is a frequently occurring task, the writer can best serve the user by giving instructions that work but that also help him learn. This can be as simple as giving him the option of selecting a procedure or an overview of the task. If he doesn't understand the task, then following steps is a slow way to learn. For example, if he is using a communication program and needs to connect, connecting is a task that may involve several procedures. A topic that simply says:

is a much less complicated approach and much less frustrating than letting him discover the need to do all these procedures by trial and error. And it is a very simple topic.

The Selecting Your Modem topic can simply say: "Select the type of modem you have and the maximum speed of your modem." Notes can give related information. Creating three or more steps only ensures the reader will have to come back.

Other programs, such as paint programs and backup and compression programs have routine tasks that are very complicated, but often go without any explanation of what is involved.

Simply preceding steps with a few lines of explanatory text may be all the information the user needs - they can forget the steps - they have learned the procedure. Minimalist designs (which I basically agree with) and usability studies don't ask the question about the user coming back and user frustration - they only look at getting the user through the task as conveniently as possible.

These articles are all available on the Internet. They typically include large bibliographies of their own. Not all of these articles were used directly, but are included for general interest and further reading. I especially recommend Practical problems and proposed solutions in designing action-centered documentation, because of its user-centered approach (which I think is the best approach), and The Psychology of Menu Selection, which is a very thorough work that seems well investigated.

Draper, S. (1996). Practical problems and proposed solutions in designing action-centered documentation. Department of Psychology, University of Glasgow. http://(later)

(Harnad, S. - Group at University of Southhampton) A Hybrid Framework for Categorization - Thesis - see Harnad, S. University of Southampton, Highfield, Southampton, United Kingdom
http://www.soton.ac.uk/~coglab/coglab/Thesis

Henry, C., and Rocha, L.M. Language Theory: Consensual Selection of Dynamics. In: Cybernetics and Systems: An International Journal. Vol. 27, pp. 541-553. (Also see Rocha, L.M.)
http://ssie.binghamton.edu/~rocha/

Norman, K. (1991). The Psychology of Menu Selection: Designing Cognitive Control at the Human/Computer Interface. Ablex Publishing Corporation, 1991. ISBN 0-89391-553-X
http://www.lap.umd.edu/POMSFolder/pomsHome.html (full text)

Schneider, Daniel. Teaching & Learning with Internet Tools: A Position Paper, Appendix 1, Some learning theory background. Presented at the Workshop on "Teaching & Learning with the Web" at the First International Conference on the World-Wide Web, 1994 at CERN, Geneva. TECFA, FPSE, University of Geneva
http://tecfa.unige.ch/edu-comp/edu-ws94/contrib/schneider/schneide.book.html

Rocha, L.M. Eigenbehavior and Symbols. In: Systems Research, Vol. 12 No. 3, pp. 371-384, 1996, Special Issue Heinz von Foerster Festschrift, Ranulph Glanville (ed.).
http://ssie.binghamton.edu/~rocha/

Note: This article came about because of two disparate but synergistic areas that I was trying to resolve problems in. The first area was about finding a method that I could use to analyze the cultural use of Postmodernism so I could explain it. This inadvertantly involved looking into classification methods. (Classification is something I'm not especially fond of because I think classification schemes are misused and overused. I really dislike labels.) The second area was information classification in literature. The literature problem arose because of the apparent failure of tables of contents in hypertext documents. Users tend to use the index instead, but indices are not a very good way to list information in online help, or in a book. I perceived this as a classification problem also. This led me think that we writers need to find better techniques for classifying information, possibly using concepts to provide the proper focus.