10 August 2009

Narrative authoring vs. topic based authoring—Productivity gains and their causes

Moving from narrative to topic-based authoring should get you at least a factor of two in productivity improvements, considered on a per-writer-per-delivered-document basis. This assertion assumes that you're in a position to take most or all of the improvements as productivity increases, and don't need or want to use them as an opportunity to improve information quality, instead.

There are three main reasons for this.

Support Complexity Management

Firstly, topic-based authoring allows better complexity management, reducing the overhead non-writing effort necessary to producing effective writing.

Good Information Delivery And Complexity

Good information delivery can be said to be correct, complete, clear, and consistent -- all the information is accurate, all necessary information is present, the information is expressed in an understandable way, and the way the information is expressed follows a common pattern across the whole information delivery, so familiarity with one part aids in understanding the other parts.

It's easier to do this using DITA XML and topic-based authoring, though it takes a bit of explanation to understand why it's easier.

Difficulty Scales with Complexity

The difficulty of managing the information associated with any task scales based on the number of information groupings that have to be kept coherent—mutually up-to-date, so that if you change A you also change B, and vice-versa—and scales rapidly, so that if n is the number of information groupings the difficulty goes up as n(n-1). This is just the number of one-direction paths from one information grouping to another grouping.

Consider the case where there's a building phone list, listing only name and phone number, a company employee database, which includes phone number, and a contacts database, which also includes phone number. If your phone changes, there are six checks that have to be made to ensure that all three groups of information with your phone number in it agree with each other:
building list -> employee database
building list -> contacts database
employee database -> building list
employee database -> contacts database
contacts database -> building list
contacts database -> employee database
This issue of coherency is why there's a strong trend to centralize databases as much as possible; the effort of keeping groups of databases up-to-date with each other quickly becomes overwhelming.

Keeping all four of correct, complete, clear, and consistent coherent with each other is 12 times (4 x (4-1) = 4 x 3 = 12) more difficult, compared to producing writing where only one of those things is true, and six times as hard as doing any two of those things ( 2 x (2-1) = 2; two things is only twice as difficult as one thing). Three things at once are six times, four things are 12 times, and five things are 20 times, harder than updating one information grouping at a time. This kind of complexity-management overhead is often missed as a cost in document production, because those components of a good information delivery aren't seen as distinct information groupings.

Those components of good information delivery are distinct groupings; we've all seen an information delivery that was clear and consistent, but neither complete nor correct, or one that was both complete and correct but far from clear, or one that's correct (so far as it goes) but not complete.

Topic-Based Authoring in the Context of Supported Complexity Management

Topic-based authoring allows for content/structure separation, and topics are small; this lets you work on easily comprehensible pieces of your information delivery at a time, and then worry about how you're going to arrange them for best effectiveness. If your content management system supports (as it should support!) labelling topics with an information quality or process state label (e.g., "needs technical review", "waiting for stakeholder review", "knowledge transfer", etc.), you can explicitly separate the work stages associated with the four elements of good information delivery. This is a little counter-intuitive, because everyone is used to staged production making more work in a desk-top publishing application, but it does work. You can divide the work up and the individuals doing the work only need to worry about their piece of it: engineers can handle technical correctness and writers can handle clear language, a net win for all concerned. This also allows a more senior person to assign a list of topics to a junior writer; the junior writer doesn't have to worry about how things fit together into the information delivery, all they have to do is write good topics to specific requirements.

At my former place of employ, a contract technical writer previously completely unfamiliar with XML authoring of any kind produced a bit more than 500 topics in their first full month. While that person is an experienced and capable technical writer, much of the reason for that level of productivity rests on being able to abstract all of the complication away into other roles and divisions of labour; with XML topics, a writer is free to simply write.

Automate Output Formatting

Secondly, manual formatting in a DTP application eats a lot of time. You generally have to stand behind people with a stopwatch, or, better, video them working for a couple of hours and dissect the recording with a stopwatch, before most technical writers will believe how much time. It's somewhere around 50%, but because it's emotionally null time in almost all cases, applying formatting doesn't often register as taking up much time or effort.

Semantic tagging with XML allows all the formatting to be done automatically; you feed in the contents of an information delivery marked up with semantic tags, such as <title>Introduction</title>, and the output processing figures out what to do with the content based on the semantic tagging and produces a shippable document for you.

Building the output processing takes effort, but that's a capital cost; once done, it keeps on replacing a lot of writer time thereafter. It's also a capital cost with good opportunities for leverage, since whether one writer or one hundred writers are going to use the output processing doesn't really affect the cost of building or maintaining it.

Automatic output formatting is an excellent example of one of the general benefits of process automation in information production; generally speaking, effort dies with a document and persists in the process.

Which is to say, all the work you do on a document you generally have to do again for the next document. Work done on the process automation is there for this document, and the next document, and the one after that, and the productivity benefit over a year's worth of document delivery is substantial.

Single Sourcing with Information Quality Labels

Thirdly, topic-based authoring is further up the content management stratigraphy than single-sourcing. So when you're writing using topic-based authoring, you should be getting substantial re-use out of your topic-based authoring system by using a single-source approach to your content.

Single-sourcing and re-use in general ought to drive increased productivity, but using topics with process stage or information quality state labels as content objects provides some extra benefits over single-sourcing with less well-defined content divisions.

If you have state labels, you've implemented a form of information abstraction; you don't need to either look at the content of the topic, or know anything about the content as a subject, in order to make decisions about how you can use the topic in an information delivery. The most immediate benefit is to enable an "only change what needs changing" approach to the content objects making up your information delivery.

As a practical matter, it's immediately obvious to a writer looking at a list of possible topics, or a list of existing topics in a map, which topic is being worked on, and which topic is finished. In the absence of a reason to change it, finished topics can stay finished across multiple content deliveries. This isn't practical with narrative authoring, where the component size of the content is larger, and where structural changes to any part of the content necessarily require formatting changes in the rest.

Since it's very common for information not to change over product life cycles—many software features are stable after being introduced, for example, or most kinds of standard reference topic explaining things like how to use measuring cups or how to read resistor colour codes—you can, with a DITA CMS that knows about state labels, safely leave finished content finished across product life cycles. If content objects are unchanged in this version compared to the last shipped version, nothing about the way content deliveries are performed will compel you to open or change them. Conversely, if you only need to change one thing due to a late-arriving review, you can change only the one topic, leaving the rest of your content delivery in a finished state. This sort of granularity is only possible when, as in the DITA case, content deliveries are assembled from content objects by reference.

In order for this to work, the state labelling has to have automatic aspects; any change to a topic with a "finished" label has to alter the state label to one that indicates that the topic is being worked on. Generally, state changes can be automatic toward lesser information quality and require human decisions to move them toward greater information quality. Automatic state changes to a lesser information quality level also need to effect referencing objects; you don't want a map to stay in the "finished, you can ship this" state when one of the topics referenced by that map has suddenly been changed to "requires technical review". Getting your CMS to manage these state changes usefully has some technically tricky aspects, but it prevents inadvertently shipping unfinished content in your information delivery.

This is something it's effectively impossible to do with narrative authoring, because narrative authoring approaches do not support automatic assembly of content delivery from small, independent pieces.


The partially supported complexity management, automatic formatting, and content re-use with state labels combine to drive the productivity benefits of using a well-designed DITA Content Management System. As noted at the beginning, you should expect a minimum of a factor of two productivity increase over narrative authoring approaches.

No comments: