Creating Classifications and Concepts
Literature, cognitive psychology
How purpose affects the categorization of material
- Applicable research on categorization and concepts
- Summary of problems
- Improving searchability
- Conceptualizing information
This article on categorization came about because readers tend to use the index instead of TOCs in hypertext documents (online, online help, web, and also in printed books), but indices are not a very good way to list information in online help, or in a book. I perceived this as a classification problem. This led me to think that we writers need to find better techniques for classifying information, possibly using concepts to provide the proper focus.
The big question for HTML (especially HTML help) is what would be the best thing to put in the left panel that would work well for locating (and of course navigating to) information? Loosely, that's indexing - but obviously not an "index." A TOC is one way to index information. Other ways include icons, page headings, putting material in alphabetical order (dictionary and encyclopedia), lists of tables or illustrations, chapter contents, thumb tabs, and creating an index at the end.
We have a lot of conventions in the book world for "indexing" information. Those conventions work for the medium. Dictionaries, encyclopedias, and telephone books are much easier to use by being indexed alphabetically - a TOC is not useful for the main information in these books.
Hypertext has created other ways of indexing information, such as expanding hierarchies, dynamic links, graphs, and search engines. Hypertext has also brought us closer to the user - we hear more about their frustrations and their needs. The need that I am clearly hearing is tremendous user frustration over not being able to find information, and I see now (partly from my own frustration) that this goes back to books as well.
This article is about classification in literature. I'm not an expert in any of these fields, and this article is intended to be a starting point for further investigation, and for a dialogue with others in the writing and publishing fields. Usability studies and reader feedback are needed to confirm (or disprove) the ideas presented here.
My conclusions are that there are better ways of conceptualizing and classifying information. Basic research in cognitive psychology and in linguistics have created a body of knowledge that should be illuminating and helpful. Those areas of basic research haven't reached final conclusions yet, and might not for 5, 25, 50, or 100 years, but they do present some evidence that is useful. What I think we tend to do is classify information without regard for the reader's purpose.
For example, in the publishing world, books are written by authors with the idea in mind: "What is the thing that makes this book different from other books?" The focus is on that idea because they want to convince the publisher to buy it.
Publishers come at the book from a marketing point of view: "What will get the public to buy the book?" Notice that the emphasis of both is on the sale of the book, not what the buyer wants, although it may reflect what the buyer wants. So what you get is a steady succession of books that incrementally inform the buyer, and he has to buy many books to get good coverage of the entire area.
This isn't all bad - the buyer gets multiple points of view and also keeps up with the field. But what the buyer wants is not a consideration, except to manipulate him into buying book after book.
On the other hand, the buyer knows not to judge a book by its cover and looks in the index to find out what is really in it. The authors and publishers also know this, so they follow the rule of thumb that a good index sells the book. Thus many books index nearly every word. You can't tell a book by its index. (This is one reason why I like Internet publishing - I write material in the cross-section of what interests me and what I think visitors want to read, and I monitor the number of visits to pages and quit writing those types if they don't get visits.)
Tables of contents (TOCs) in reference materials have several problems:
- TOCs are general, and users are typically looking for specific information. This means readers can't find specific locations and must read the entire chapter where information might possibly be located.
- TOCs represent the writer's mental model of the material, but not necessarily the reader's way of classifying.
- The purpose the writer had in mind for the material may be entirely different from the purpose of the reader.
- TOCs are typically in heirarchical format, which means guessing several levels of classifications, which is frustrating.
The following sections will cover these topics:
- How purpose affects the categorization of material
- Applicable research on categorization and concepts
- Using meta-concepts to provide focus
How purpose affects the categorization of material
Writing is about communicating ideas. When a writer creates a document he tries to do the best he can to communicate the ideas he thinks are necessary for the reader. He then creates an organization scheme (TOC) that he thinks represents what he wants to tell the reader. Often the writer has experience and knows a great pattern that supposedly works. The reader then turns to the index. Why?
Because the writer and the reader have two different perspectives on the information. Following are examples of six different ways to categorize information on the topic Garden Weed Control:
Last resort chemicals
Last ditch efforts
Time of year
May - July
What do I want to do?
In a typical table for online documents, items are buried in a hierarchy so that the reader can't find them. In many book TOCs, there are so many items that they can't all be in the TOC.
TOCs present information that is abstract. The user must think from the specific information that he wants to some abstract classification that it might be under. Any piece of information can be classified many different ways, so if there are six to a hundred different potential categories, the reader may have to check all of them.
What do users prefer? The two pane screen with the contents continuously displayed in the left pane (persistent) appears to be the user preference. The persistent display has the advantage of providing context - the reader always has a sense of where he is in the document. This sense of place or context is very helpful to readers, particularly in hypertext where it is easy to get disoriented, but it has its precedent in books, which show the chapter and section in the header and footer.
The following things can help bridge the gap between writer and reader:
- Readers prefer lists to complicated hierarchies.
- The larger environment (computer world, etc.) provides context for the reader's mental model.
- The task the user is involved in will dictate the information he wants.
- Conceptualize the information.
- If hierarchies must be used, use the most revealing headings.
After the section on relevant theory, I will explain using meta-concepts to focus the table of contents.
Applicable research on categorization and concepts
Searching for information and processing information
Searching for information and processing information are not only different activities, they have been shown to have totally different cognitive activities in which categorizing may conflict. According to Kearsley, 1993, these activities have the following characteristics:
- Searching for or receiving information - detects, observes, inspects, identifies, reads, surveys
- Processing information - categorizes, calculates, codes, itemizes, tabulates, translates
Categorizing is involved in processing information, but categorizing is not involved when the person is searching for information. So if the person is forced to go through abstract heirarchies to locate information, this adds an additional mental load. This can make the task of finding information all the more daunting and frustrating if the person has to stop and think about the different ways the information could be categorized.
What does this mean? We should put items in lists instead of hierarchies as much as possible. When it is necessary to create a hierarchy, make the title very obvious with regard to the type of information that would be found within it.
How do we categorize information?
Categorizing things is fundamental to humans. It starts from the first moments of life. The world is a continuum and breaking things down into concepts, and categories are how we make sense of our world in bite-size chunks.
Is there anything natural about the way we categorize information? Yes, very early studies in the field by Eleanor Rosch showed that colors are named in the same sequence in different cultures. It seems to have to do with the development of the central nervous system. But in a very telling way, finding similar results with other things did not happen.
It seems the very basics of categorization begin early in life, and are a neurological function, but there the similarities between us nearly ends. The brain forms what researchers call "attractors," which, like nails on a pegboard, gather very basic similar information to them. All coats get hung on one peg, but the information is much more basic than a coat.
The brain seems to form structures that mirror the external environment, but in a way that categorizes things. This process seems to proceed at first by visual similarities. All coats get hung on the same peg, people on another, and all the typical things in our environment get their own peg.
This process (eigenvalues, prototyping, family resemblance) has been studied by many researchers, and the results are consistent. But there is evidence of other types of categorization that is currently being studied that departs from the neurological level.
To understand this shift, it helps to understand that this pegging system is both stable and dynamic. Eigenvalues form a stable system for us to categorize things - the categories are stable - but if our information about a category changes, the category changes to reflect the change. But it is thought that this self-organizing system of ours can only grasp supportive aspects of things in our environment, and only provides a small subset of possible classifications. We could be stuck at this point were it not for other mechanisms.
One important point to note is that all of this classification comes from our interaction with our environment. If we don't react with the environment, we have nothing to classify. And just as importantly, the environment serves as a focus for us for classification. People working in a lumber yard have a lumber orientation - they might be likely to classify things by the wood they are made from. People working with accounting are likely to be in a mathematical frame of mind. People working with communications programs are likely to come at information from that point of view. The environment is very influential on how we categorize.
So when someone is working with a computer program and needs help, what is his environment? A computer environment. A program that does a specific thing. A task within the program. These environmental factors all shape how the person is classifying information.
There are two mechanisms, currently being researched, that shed more light on how we categorize. These mechanisms have support from research results. One is an extension of the neurological model called "dualism." This theory sees categorization as a two step process. First things are categorized by similarity. Then they are further clarified by "definitions." So in this we can see that we first see similarities, and then see differences. We define things by how they are different from other things. One illustration I often use is the following line:
What we take note of on the line is not the individual dots, but instead the lone !. I suppose we quickly categorize the dots as a line of dots and then we're looking for what is different. This seems to hold true in many aspects of life. Compare with the following line:
The mind very quickly assimilates the new, more complex pattern and overlooks the similarity to spot the !.
Even in the following pattern:
the mind catches the ! in the middle. And in the second and third lines we can do pattern matching - the ! occurs in the middle of each and we quickly catch the pattern.
But as patterns become more complex and unfamiliar, it becomes more difficult to see the different item:
In the line above we have to stop and analyze the segments of four, note what is different, and compare to the next segment to see just how different it is. The letter T is in the middle, which doesn't occur in the rest of the line. The mental load gets much bigger. When presented with an index or a TOC, the mind has to try to pattern match the words it is looking for while also trying to decipher how the writer chose to label and categorize the subject.
We can ask ourselves about a reader, "Has he reached the point yet of making fine distinctions about a subject, or about a computer program, or is he still categorizing things that are similar?"
The other mechanism currently under research is where the fields of cognitive psychology and linguistics finally meet - symbols. The fact that our neurological categorizing system is dynamic (meaning it can change, is not set in concrete) makes it possible for higher order categorizing and creating new constructions.
We can use symbols to unite categories. Words are symbols. The word chair is just a word that we can identify with the object chair. The word chair has no natural meaning to us. There are many words for chair in many languages, and even in our own language we have other words to define what a chair is: rocker, recliner, sofa, couch, futon, seat, etc. We may understand these all conceptually as fitting in the category seat or chair. But we define them by their characteristics. So the word chair is a symbol, and a concept that can bridge the categories of sitting, and of furniture that is similar.
In the case of a table of contents, what symbols are best used to categorize information? Probably words that don't require a mental leap to knowing specific definitions, but are not so abstract that too many categories are covered; the best words are close to the conceptual category. So not rocker, not mobile furniture, but chair.
On the other hand, people seem to prefer lists of information (the index) to hierarchies. We should avoid making more out of this than it is. People get very frustrated with hierarchies because they can't find the specific information that they want. Lists are very specific - they have information that is much closer to the definition level - and they are very long. The writer's categorization scheme is also likely to have influenced the index listings. So there are hidden hierarchies in an index.
Summary of problems
So we need to find a better method than lists and a better method than tables of contents. We also need to keep in mind that as readers become more experienced, their needs change and the delivery system needs to adapt to their needs. The adaptation should be as seamless as possible. The best system of categorizing would be one that meets the needs of both inexperienced and expert users.
Following are the most significant problems with classifying information:
- The writer's classification scheme will typically depend on past experience, a very definitive knowledge of the environment, and expert knowledge of the program or subject.
- The reader's classification scheme will typically depend on minimal to no past experience, a very poor mental model of the environment, and minimal knowledge of the program or subject.
- Classification by the reader may not have reached the stage of definitions.
- Searching for information is a different process than categorizing, so having to categorize is an extra mental load on a person who may already be frustrated.
- Readers are often involved in a specific task, so they don't want general information - and general is how TOCs are typically organized.
- Lists are very specific, yet they are also presented by a hidden hierarchy of information categorizing that the reader may not be able to decipher, and they are so long they are unwieldy.
We should keep in mind that about half of readers will go directly to a search feature if it is available. Search is the ultimate "say it my way" feature.
In all of the information I have looked at, and conclusions I have drawn, one theme seems to stand out. Contrast. Or you can call it definition. That is, giving information the ability to be located by providing sufficient contrast from other information. This is a task of being able to see what is different in a group of familiar. That seems to be what the mind does - quickly categorizes similar information and looks for what is different. As the level of complexity increases, we need to increase the level of definition. The mind needs to be able to contrast one item from another. This affects information design at all levels.
There are other things to keep in mind. The mind also needs to see things in context (similar information together), which can also be described as using environmental cues (the computer environment, the program environment, the process environment, the task environment). Following are some suggestions about improving different kinds of information, including some other good suggestions from other usability research.
- Use consistent wording (similarity) with easily distinguishable definers (contrast).
- Visually distinct words are easier to find.
- Semantically distinct words are easier to find.
- Categorize or define by the reader's purpose (environment, mental model).
- If a categorizing or defining word is well known, the user will use visual matching, which is very fast.
To create contrast, create definers to distinguish text. Some ways to do this are to:
Use bold leads and headings for ease of scanning (contrast), and to establish context. (A lead as I'm using it here is a bold word or sentence that defines or identifies the text that follows it.)
- Use short bulleted lists.
- Use background images for context.
Areas for further testing or usability
Everything I have put in this article really should be tested or proven in usability testing. (Not that it can't be of benefit without testing, but I wouldn't want to see this information as misapplied as the chuncking principle.) Several ideas of my own, and the available literature (especially the book The Psychology of Menu Selection: Designing Cognitive Control at the Human/Computer Interface, by Kent L. Norman, Ablex Publishing Corporation, 1991) suggest several areas that need further investigation and testing to determine more usable information design categorizing structures:
- How do you add semantic contrast (definition) to items so they can easily be located?
- Incorporate the hierarchy structure into the listing semantically. For example, using verbs, gerunds, etc. to identify types of information and tasks. But only if the user can easily discern the principle.
- Use a metaphor for categorizing. For example, menu (restaurant listing). Metaphors transfer knowledge about ways of categorizing, making the information design easier to grasp. (Menu is currently not a good selection in the computer world because it is already known in the computer environment.)
- Index information using a graph structure. Graph structures show nodes which are connected by arches to indicate relationship. The problem here is the amount of space required for showing complex relationships. It doesn't take many lines on a page to obscure the nodes.
- Mouse-over display traces. Traces indicate relationships (as in graphs), but only appear while a mouse is over a node. This is an instant cue whether the information is in the correct category.
- Mouse-over expands hierarchy. A system that can respond rapidly can quickly display a full information hierarchy so the user can quickly determine if he is in the correct hierarchy.
- Mouse-over selective group display. In large lists, when the mouse is over larger groups of information, information that is related can be automatically highlighted (black or red).
- User constructed categories. This is similar to bookmarking. Information that a user thinks he may need again, or is related to a specific process he frequently uses, and doesn't want to look for again, can be placed in a category created by the user.
- Conceptualize information (the subject of the next section)
There are a number of things that can be done to improve categorization so that it works for the reader:
Organize information in conceptual units so that information that is related (part of an environment) can be found in a corresponding literary environment. For example, in a book about creating a hit record, for a unit on recording a song, put in conceptual information about the process of recording, put in task information about getting the room accoustics set up, put in task information about microphone placement, put in task information about sound filters that cut down on noise and remove unwanted voice qualities, put in task information about controlling the volume so it doesn't overmodulate, and task information about mixing the instruments and voice, and then put in the procedural information for starting the tape recorder, cutting the tape, rewinding the recorder, etc. This is a conceptual unit, and contains everything about making a recording.
Recording is a category broad enough to cover all the processes, tasks, and procedures. Recording is descriptive, not abstract.
Conceptualize what the reader's purpose is. Does he want to know more about a process? Does he want to know about the tasks involved in a process? Does he want to know the steps in a procedure? Categorize according to the concepts. Personally I like to conceptualize primarily on the How to level. Whether it is a process, a task, or a procedure, the reader usually wants to know how to do something. A litmus test I use for information is, "How does this benefit me?" If there is no specific benefit to the information, then it is wasted space.
Categorize for the conceptual level, not the definitive level. Most people who need help are thinking in conceptual terms, not by precise definitions.
Using meta-concepts to focus material
I frequently use meta-concepts in writing (usually just called concepts). They are useful for bringing focus to organizations (mission statements) and to stories. They basically state what something is about, and I refine them to specify the things that are important to the organization or story. For writing non-fiction, a concept might go like this:
"The purpose of this category is to serve the reader's purpose for using the information, at the level the reader conceptualizes information, and in common terms used in the environment the reader is working in, understanding that the reader is searching (not categorizing), and putting information in conceptual units."
learning or just following steps?
Are people going to learn tasks, or just go through numbered steps to accomplish the task? That is a very active debate in the technical writing community. When the reader gets to the point of asking for help in accomplishing the task, he is frustrated with the computer program and may be mentally disoriented. He wants to get the task accomplished. It may take three steps, or it may take fifteen. Readers have a preference for steps instead of conceptual material. The trend is to give bare procedures that get the task accomplished. The fact is, people don't learn very quickly from steps. So the next time they have to do the task, they will go through the same frustration.
The writer needs to take a hard look at the task and decide if it is a frequently recurring task and if the information needed to complete the task is conceptual in nature, or is so complex that it requires careful step by step direction. (I see very few step by step procedures that require specific direction.)
If it is a frequently occurring task, the writer can best serve the user by giving instructions that work but that also help him learn. This can be as simple as giving him the option of selecting a procedure or an overview of the task. If he doesn't understand the task, then following steps is a slow way to learn. For example, if he is using a communication program and needs to connect, connecting is a task that may involve several procedures. A topic that simply says:
- Selecting your modem (see related procedure)
- Setting the communications protocol (see related procedure)
- Selecting the user profile (see related procedure)
is a much less complicated approach and much less frustrating than letting him discover the need to do all these procedures by trial and error. And it is a very simple topic.
The Selecting Your Modem topic can simply say: "Select the type of modem you have and the maximum speed of your modem." Notes can give related information. Creating three or more steps only ensures the reader will have to come back.
Other programs, such as paint programs and backup and compression programs have routine tasks that are very complicated, but often go without any explanation of what is involved.
Following are some tips learned from learning theory about successful learning:
- Instruction should be designed to mentally engage the user in the experience, and encourage them to fill in the gaps beyond the information that is given. (Bruner, 66)
- People learn large amounts of information from text (as in textbooks), as opposed to rote (procedures) (Ausubel, 63)
- Procedural knowledge is learned by making inferences from already existing factual knowledge. (Anderson, 87)
Simply preceding steps with a few lines of explanatory text may be all the information the user needs - they can forget the steps - they have learned the procedure. Minimalist designs (which I basically agree with) and usability studies don't ask the question about the user coming back and user frustration - they only look at getting the user through the task as conveniently as possible.
These articles are all available on the Internet. They typically include large bibliographies of their own. Not all of these articles were used directly, but are included for general interest and further reading. I especially recommend Practical problems and proposed solutions in designing action-centered documentation, because of its user-centered approach (which I think is the best approach), and The Psychology of Menu Selection, which is a very thorough work that seems well investigated.
Note: Most Internet links are quickly outdated. If the information is not at the link indicated, use a search engine to find it or similar information.
Classification Society of North America: http://www.pitt.edu/~csna/csna.html
Draper, S. (1996). Practical problems and proposed solutions in designing action-centered documentation. Department of Psychology, University of Glasgow. http://staff.psy.gla.ac.uk/~steve/
Harnad, S. (1987) Psychophysical and cognitive aspects of categorical perception: A critical overview, Chapter 1 of:
Harnad, S. (ed.) (1987) Categorical Perception: The Groundwork of Cognition. New York: Cambridge University Press.
(Harnad, S. - Group at University of Southhampton) A Hybrid Framework for Categorization - Thesis - see Harnad, S. University of Southampton, Highfield, Southampton, United Kingdom
Harnad, Stevan (2002) Symbol grounding and the origin of language. (On categorization.)http://eprints.ecs.soton.ac.uk/archive/00006471/01/harnad02.symlang.html
Harrison, Claire (2002) Hypertext Links: Whither Thou Goest, and Why. firstmonday.org, issue7_10
Henry, C., and Rocha, L.M. Language Theory: Consensual Selection of Dynamics. In: Cybernetics and Systems: An International Journal. Vol. 27, pp. 541-553. (Also see Rocha, L.M.)
Norman, K. (1991). The Psychology of Menu Selection: Designing Cognitive Control at the Human/Computer Interface. Ablex Publishing Corporation, 1991. ISBN 0-89391-553-X http://www.lap.umd.edu/POMSFolder/pomsHome.html (full text)
Pustejovsky, James [no date]. Models of Lexical Meaning. (An overview of his research.)
Schneider, Daniel. Teaching & Learning with Internet Tools: A Position Paper, Appendix 1, Some learning theory background. Presented at the Workshop on "Teaching & Learning with the Web" at the First International Conference on the World-Wide Web, 1994 at CERN, Geneva. TECFA, FPSE, University of Geneva
Rocha, L.M. Eigenbehavior and Symbols. In: Systems Research, Vol. 12 No. 3, pp. 371-384, 1996, Special Issue Heinz von Foerster Festschrift, Ranulph Glanville (ed.).
Nielsen, J. (1997). How Users Read on the Web. Alertbox. http://www.useit.com/alertbox/9710a.html
Nielsen, J. (1997). Search and You May Find. Alertbox. http://www.useit.com/alertbox/9707b.html
Flatus & Inflatus (Gas & Inspiration)
And this is why journalists misquote speakers.
Flatus & Inflatus (Gas & Inspiration)
With all of the things we have to take into account to understand each other, it is a wonder that we can communicate at all. I was never very good with foreign languages. I took French in high school, but failed at asking a Frenchman what time it was. I took Ancient Greek in college, but can barely read it. I don't try to interpret it, leaving that to experts who love to disagree. At least in their disagreement I see my own naiveté.
At the risk of being too wordy, I make my e-mails too long, making sure that word meanings aren't left to arbitrary decisions and oversights. At the risk of being too brief, I use words that are packed with meaning, knowing that some will be forced to either ignore them or look them up. I do this rather than reduce the use of our language to the lowest common denominator and deny us all opportunities for growth.
But regardless of our personal perspectives, the true test of all speaking and writing is, "Did it communicate effectively?" The rest is just style and venue.
Chapter 10 Footnotes
1. The American Heritage® Dictionary of the English Language, Third Edition copyright © 1992 by Houghton Mifflin Company. Electronic version licensed from INSO Corporation. All rights reserved.
2. The American Heritage® Dictionary of the English Language, Third Edition copyright © 1992 by Houghton Mifflin Company. Electronic version licensed from INSO Corporation. All rights reserved.
3. The American Heritage® Dictionary of the English Language, Third Edition copyright © 1992 by Houghton Mifflin Company. Electronic version licensed from INSO Corporation. All rights reserved.
4. Eco, Umberto. 1992. Interpretation and Overinterpretation. Cambridge University Press.
5. Cole, Dorian Scott, The Last Prophet, 1979. (Includes research on Biblical symbols. Used for reference only; not commercially available.)
CONTENTS | Part I | Part II | Part III | Part IV
Other distribution restrictions: None
Page URL: http://www.visualwriter.com/ReadFun/WhatsInAWordPartIV.htm