12 August 2009

Narrative Authoring Vs. Topic-Based Authoring, Part 2: Everything is Trade-offs

In my previous DITA post, I discussed the productivity benefits you get from moving from narrative authoring using some kind of desk-top publishing software to using topic-based authoring with DITA XML. Those benefits are substantial, but this being the actual world, and not the happy land of theory, everything is trade-offs.

Change is Hard Work

One of the most basic of those trade-offs is that change is hard work.

Any writing team, shipping content and making its deadlines using a narrative authoring process with some kind of DTP software (Word, Frame, Open Office, etc.) as its primary authoring tool has a set of skills that is at most half applicable to a topic-based authoring process with some kind of XML editor (oXygen, XML Mind...) as its primary authoring tool.

In practise, it's going to be less than half; the members of that writing team will need to learn new software, new process habits, how XML works, and new natural language techniques suitable to the reverse-lumberjack (building trees!) writing approach appropriate to XML. This has the side effect of obsoleting skills members of that writing team may have spent the last ten or fifteen years developing, flattening the team skill hierarchy—everybody has now got six months, or a year, of XML experience, instead of a much broader range of narrative writing experience—and probably flattening the functional hierarchy at the same time. Automated processing of an XML content deliverable into a formatted document removes individual control of the appearance of shipped documentation, and removes writer skill at formatting from the possible value-add contributed to the delivery by the writer.

It's very easy for this set of changes to appear purely negative from the viewpoint of an individual writer or writing team. This is especially the case since the countervailing benefits—increased productivity, speed of delivery, and modularity of production—appear in ways that are not immediately obvious. One or two quarters after fully switching to the XML system, individual team members may start to notice that the twelve and sixteen hour days have gone away, or that a lengthy high volume of work period was awful, but a lot less awful than the DTP version would have been; while they are switching, they're going to notice that they feel like they don't know what they're doing and that the delivered documentation feels like it's out of their control.

It's Not Supposed to Look the Same

The core trade-off between using DTP software and automated processing of XML is between flexibility and speed/consistency. Good XML output processing easily handles things like changed languages and automatic text—such as the various translations of the word "note", automatically inserted in the output of a <note/> element—and can make sensible decisions about grouping—how many rows of a table, or how many list items, is the minimum to display before a page break, for example—provided, and this is critical, that the writing—which means tagging style, as well as word choice, with XML—is consistent in representing similar information in similar ways.

It's still, to venture an analogy, exchanging hand-jointed Hepplewhite furniture for the good grade of Ikea flat-pack. If you're in a business where hand-optimized output is critical, you get to consider if you can feed your processed XML output into layout software for human attention as an intermediate production step[1]. Otherwise, and this is almost everyone, consistency and speed have a business case; pretty (as distinct from legible and presentable) do not. For technical writers who have spent a their previous professional life with the quality of their work being judged in large part on its aesthetics as well as its content, this is a big change. It can be a wrenching change.

As a result, it's important to go into your DITA transition emphasizing to your writing team that the output from the DITA system isn't supposed to look the same as the output from the DTP program did. The individual writers should be trying to get the semantic tagging right—similar information gets tagged in the same DITA-logical ways by the whole team—and not trying to reproduce the look the content had in the DTP application.

My experience is that this is a tough transition to make, and will take considerable time to be fully accomplished. One of the toughest aspects of this transition can be explaining to internal customers that formatting changes they request on a per-document basis are no longer possible; all the formatting comes off an output template, and template changes are both global (for an output type, at least) and require a business case.

Whose Topic Is That?

Topic-based authoring with DITA means assembling content deliveries out of many different topics. It is not likely that a single writer will produce, or be the last person to modify, all of the topics in a content delivery.

This brings up a number of CMS issues: how are topics assigned to writers? Is this assignment visible as part of the topic metadata? what writer roles are there, and how does the CMS support their application to individual topics as units of work? How can I tell when this topic, currently assigned to someone else but which I want to reference from my map and use in my content delivery, is safe to change?

While those issues have to be addressed in CMS selection or configuration, the other issue that arises is that technical writers generally don't immediately trust each other as writers, and find distributed authoring—which is what having multiple maps assembled from a common pool of topics by multiple writers is, even if all the writers involved sit next to each other in the same block of cubes—an uncomfortable state.

The traditional, one writer, one document approach to job assignment certainly contributes to this. So do the per-individual relationships with subject matter experts which results from having writers specialize in a particular range of content deliverables, and any divergent issues of style. Issues of style are a particular problem, because these are most often driven by a real desire to achieve quality in output. Despite this, effective topic-based authoring requires being able to use any writer's topics anywhere, often in a context the writer did not know about when the topic was created. Because of this, it's necessary to arrive at a common style for topics and to stick to it. This necessitates some (formal or informal) editorial review for consistency.

This can be an easy change, if the narrative-based production mechanism already included editing for consistency and the editor or editors find the transition to topic-based authoring with XML straightforward. It is more likely that it will be an effort-full change, requiring a dedicated effort toward achieving general consistency in topic style over at least the first year of authoring in XML.

WYSIWIG Considered Harmful

DITA XML content can be processed—if you take advantage of the multi-channel publishing potential of XML, is being processed— into different output formats for different audiences. One of the consequences of this is that the concept of WYSIWIG, What You See Is What You Get, stops being directly applicable. The writer of a topic does not necessarily know which maps their topic will wind up being referenced by, or how those maps will be processed to documents that include the content in "their" topic. Since there's no telling what the eventual formatted document is going to be—the writer may know what map they're working on at the moment, but even that is not required; another writer may be responsible for the "build the map" step, and the writer of the topic may not know or need to know about the map or maps currently being worked on—there's no way to display that formatting to the writer as they work.

It is certainly possible to use an XML editor that is style-supplying, or which has a style-supplying mode; this will display the text contents of the XML elements of the topic using a consistent set of styles. This capacity can be helpful during the initial transition to XML authoring, but there are two things that have to be emphasized about it:
  • this is not how the eventual processed output is going to look

  • style-supplying views are an impediment to learning to see XML content in terms of the inherent tree structure
Style-supplying editors can be compared to training wheels on a bicycle; it's easier to get started, but there are things you don't learn and stuff you can't do while you're dependent on them.

To avoid the training-wheels effect, and a writing team that never becomes confident and capable using XML as an authoring medium, it is important to provide XML training and an editor that fully supports a tagged XML view.

XML training needs to cover not just things like what an element is, or which DITA element should be used for what purpose, but also ways to approach writing in a tree structure (the scale of individual topics) and what the house style for topic based authoring is going to be. That includes deciding what the supported-for-output DITA elements[2] are, which should be as much as possible a collaborative process across the writing team.

This transition appears to take about a quarter to get through; it's not especially difficult, but it does involve a certain amount of memorization. The memorization load can be ameliorated somewhat by choosing an XML editor capable of displaying information from the DITA DTD or schema, so that a writer can see what elements are allowed as children of the current element, and read the element descriptions from the DTD. The transition to writing in XML will also involve producing content deliveries in both the old and new systems through one set of deliverables as delivery capability in DITA is developed, and this is inevitably extra work.

On the plus side, once through the transition, the sixteen hour days really do go away.

[1]If you can't, you might not be able to use topic-based authoring at all.
[2]It's unlikely you need all of the DITA elements to effectively deliver your content; that being so, you don't need to process all of them, either.

No comments: