20 August 2009

Concept, Task, and Reference

Structured Authoring With DITA

Structured Authoring is, fundamentally, about consistent organization of content. When authoring in DITA, it's the writing use (as distinct from the output generation use) of the general semantic tagging capability of XML markup .

With DITA, there are at least three levels of structure; the organization of topics, the organization of the content of topics, and the external objects imported into topics, such as images. I'm addressing only the "content of topics" part of structured authoring with DITA in this post.

The point to having topic types, and thus the point to DITA including topic types as part of the core specification of the XML vocabulary, is to be able to separate different information objectives into different topic types. Different types of topics enable their audience in different ways.

Concept, task, and reference are specialized, and in terms of internal structure, much more specific, compared to the generic DITA topic type. The problem is that they remain general, and in any group of writers larger than one, the reflexive understanding of what these topic types obviously mean will vary.

So as you implement your DITA solution, you will have to decide precisely how you want the individual topic types to be used, make sure the whole writing team knows the local topic information type definitions, and be prepared to reconsider those definitions if you don't like how they work out in practise.

Generic Topic

DITA includes a completely generic topic type; the root element is <topic/&gt, with class[1] - topic/topic.

There are two problems with saying "well, simple is good; generic means we should be able to handle future surprise easily" and going with the generic DITA topic for your writing.

The first problem is that the generic topic is a little too generic; writing within the full scope of the generic topic is not obviously structured writing. This gets rid of the "simple" part of the advantage, and much of the "generic" part as well; you would have to develop specific local business rules and processes about how you would use the generic topic type in order to practise structured writing by using it.[2]

The second problem is that if all your topics are just "topic", you can't do information typing other than by metadata. In the unlikely even that it was designed to handle this case, a DITA CMS might make this straightforward, but DITA was designed on the assumption that you didn't necessarily have a CMS, and so are the content management systems meant to support it. DITA allows for information typing by providing three specialized topic types.

These types are the concept topic, task topic, and reference topic.


Concept topics are the least structured and most general of the three default topic types.

From a writing perspective, concept topics contain both paragraph level elements, such as a paragraphs and lists, and section level elements; <section/> <example/> and <table/>.[3] Concept topics are general enough to present pretty much any kind of information.

Concept topics are rare in most technical writing infosets. They are concerned with theory, abstract information, the general, rather than the specific, and ideas.

I would recommend considering a rule that concept topics are used for those cases where no quantified information or instructions are being provided, and that those responsible for tagging style watch very carefully for those cases where there ought to be quantified information. Pure theory, in other words; mathematical proofs, definitions of capacitance, SI units, and economic value are all examples of the type of thing that goes in a concept topic.


Task topics are the most structured and least general of the three default topic types.

From a writing perspective, tasks contain a number of specialized section-level elements: <prereq/> ("pre-requisites"), <context/>, <steps/>, <result/>, <example/>, and<postreq/> ("post-requisites"). Task topics are specialized for presenting sequential instructions. Task topics are not suitable for presenting anything else.

Task topics are common in most technical writing infosets. Everywhere you have a procedure or instructions, you use one or more task topics.

I would recommend considering a rule that all instructions are a task; this is the easy part. The hard part is agreeing on how much content, and what kind, goes in the <context/> element, and how task steps are to be broken down.

Context is intended to provide context for the task ("you brush your cat to cut down on hairballs and shedding") or a small amount of conceptual information ("This experiment allows you to measure the acceleration due to gravity. Remember that acceleration is defined as the rate of change in velocity, measured in meters per second per second.")

Task step (or sub-step) breakdown depends very much on the house style, as well as the type of information to be presented. There are arguments for and against presenting complex instructions as a single large task with sub-steps or as a sequence of individual tasks.

I would recommend that the number of steps in a task, or sub-steps in a step, not go above five. Whether this is best handled with sub-steps (and treating "log on to the service" as a step with sub-steps for the details) or breaking the procedure into multiple task topics is a function of the complexity of the material. More complex instructions benefit from being chunked into discrete topics more than relatively simple material with a large number of specific operations.

While keeping everything to five or fewer steps is not always achievable, it's well-attested that people can keep track of seven things at once, plus or minus two. Since it's unlikely documentation will get all of someone's available attention—remembering to pick the kids up from daycare is, or ought to be, difficult to displace—five is sometimes too large a number.


Reference topics are only slightly more structured and less general than concept topics.

From a writing perspective, references (in the <refbody/> element) contain section level elements; <section/> <example/> and <table/>.[3] They do not directly contain paragraph-level elements, which is the primary structural difference between a reference topic and a concept topic.

Reference topics are common in most technical writing infosets. Reference topics present information intended for quick look-up of specific facts. While you would generally have to read all of a concept topic or task topic to make use of it, it is common for someone to read a reference topic solely to find a single fact, and to stop once they have done so.

I would recommend a rule that reference topics must contain content that is specific and quantified; always "five apples" rather than "some apples". Aside from making the distinction between the information function of the concept and reference topic types more clear to both the writing team and the audience, there is value in terms of content quality in insisting on being specific in a reference topic. Removing, as much as possible, qualifiers, and presenting quantified facts makes the technical reviewers' jobs easier, pushes the writer toward precision, and makes it easier for the audience to find that single fact they might be looking for.

[1] The class attribute, often abbreviated "@class" after XPATH syntax, is how processing figures out what to do with a particular element. Values of @class are provided by the DITA DTDs during XML normalization; users need not, and should not, ever be entering the class attribute manually. Which means that you might spend years writing documentation with DITA using raw XML, and never see the class attribute.

[2] You can, of course, specialize your own topic types starting from the generic topic type. Specialization is another subject for another time.

[3] This is a simplification, leaving out special-purpose elements like <data/> and <foreign/>. By all means, see the DITA language specification for details.

No comments: