Friday, December 24, 2010

Doxygen and QuickBook


Doxygen and QuickBook


Doxygen and QuickBook are both great tools for generating documentation from C++ code. But their cooperation is quite complex... let's say, a challenge.

Quickbook

QuickBook (QBK) is based on BoostBook, which used to be the documentation tool for Boost Libraries. BoostBook is on its turn based on DocBook.

With QuickBook (and also BoostBook) you can make good looking library documentation describing headers, examples, syntax highlighting, etc. Look at the QBK website for more information. For example, I like the callouts.

QuickBook input files (extension .qbk) are plain text files with Wiki style commands. However, when using QuickBook alone it is not maintainable to document hundreds of classes and functions, you really should have a link to the source code. And that is what Doxygen is doing.

Doxygen

Doxygen is a standalone tool, accepting C++ source code (it accepts other languages as well) with JavaDoc style comments to document function parameters, etc. Look at the website for a manual. While Doxygen does a great job, the generated documentation is often considered a bit too technical. In the review report was included:

Using Doxygen alone as a documentation tool has its (known) shortcomings when it comes to generic libraries.

And the individual reviews mentioned things as : "I think doxygen is convenient for developers but not very useful for users, because of the high structure/content ratio." and "I really did not like the doxygen generated content.  I found it difficult to find the real documentation among all the doxygen generated fluff." and "I have a general problem with doxygen generated docu with regard to generic libraries. It not easy to separate the important classes and fuctions from the implementation details. GGL helps with this through Grouping by modules and a overview Page, but  it could be even better. I think the Modules should them self be further grouped, and the most important should be highlighted.". 

This does not mean that Doxygen is wrong or bad, and maybe we did not spend enough time to configure it right. An individual review also mentioned: "Doc: Excellent. Looks very nice. Only an index is missing - but extensive Doxygenation including Doxygen comments are far above what passes for documentation in many packages". We did our best at least.

Anyway, people do like QuickBook generated documentation, so we, with Boost.Geometry, are moving to QuickBook. But we want to keep Doxygen there to link with the source code.

Doxygen -> QuickBook via XSLT

It is certainly possible to let Doxygen and QuickBook work together. Doug Gregor wrote here:

(...) If I haven't scared you yet, I'll put it in a diagram:

C++ sources -> Doxygen XML -> BoostBook XML -\
                                              -> Merged BoostBook XML
Quickbook sources -> BoostBook XML ----------/

Then, for the output:

Merged                      /--> HTML
BoostBook --> DocBook XML ------> FO --------> PDF
XML                         \--> Man pages

Look complex? It is. (...)

So it is complex. Especially the XSLT file to convert from Doxygen XML to BoostBook XML proved complex and sometimes incomplete, we (Mateusz) spent time on it and achieved good documentation, but it was really complex. And in the end, you don't get QuickBook, you get BoostBook... Suppose you want something different, for example adding "complexity" to the documentation... It means changing the XSLT files again.

What is in the docs?

I like to see in documentation, per function or per class:
  • What it does
  • The header where the function or class is found
  • Its parameters (for a function)
  • Its return type (for a function)
  • Its limitations
  • An example (this is very important)
  • An image (very important for a geometry-library, probably not for all libraries) to show what it is doing
  • Behaviour in various combinations
  • Its complexity
The cplusplus documentation has for every function or method a little example, I like that. It is great for programmers who program by example. Examples need to be compilable examples, they need to be extracted from real compiled code, otherwise old constructs or other obsolete things or typos will stay there unnoticed. You see that often.

About behaviour: within Boost.Geometry, one function (e.g. distance) can do many things (distance between two points, between point-and-polygon, etc). We also have multiple dimensions (2D, 3D) and coordinate systems. The behaviour of the functions might be different in different circumstances. Therefore, we want to have a possibility to document these differences explicitly.

Besides this, we often have two overloads, one with a so-called strategy, giving the library user the tools to change default calculation methods, or one without, so sticking to the defaults. Besides these differences, these overloads are the same. So documentation is the same. Where does the documentation come from? From the sources. Do we want to have the same documentation duplicated for two functions? No. Or probably not.

Don't repeat yourself, and macro's

Doxygen does have an overload option, but it seems to work only for overloaded member functions. We are overloading free functions, which are often heavily templated. I didn't have success to get Doxygen's overload construct running with our code.

Anyway, Doxygen provides a great utility: an alias. This is a sort of a macro, expanding to whatever you specify. So when we think we repeat ourself (e.g. writing for a template parameter "Coordinate system (e.g. cs::cartesian)", which will be found at many places) we can introduce an alias there, param_macro_coorsystem, and we can use it in the documentation.

The result looks cryptic, but at least you don't repeat yourself and if you want to change that sentence, you do it in one place.

To make the picture complete: QuickBook also provides a great utility: a macro. This is doing about the same: it is a placeholder which is replaced by quickbook by the actual text.

So we can actually chose: write plain English, take an alias, take a macro. Or mix them.

Repeating yourself is usually not so good, but in documentation, it cannot always be avoided. Refactoring all repetitions from comments and document fragments will result in a very cryptic developer-defined language. Actually, it is already looking cryptic now, read on...

Doxygen -> QuickBook via parser

As described above, the XSLT process to go from Doxygen to BoostBook was not completely satisfactory for us, with all our requirements to documentation. It probably can be done, but requires high XSLT skills, and who nowadays still writes XSLT? There are a lot of libraries parsing XML, in C++ (and in other languages), and for a C++ programmer it is quite easy to parse the Doxygen XML and output QuickBook. So that is what is done... We developed a new tool, doxygen_xml2qbk. It goes from the XML's generated by Doxygen to QBK.

The tool might be more useful in general (for more Boost libraries), but currently it does its job for Boost.Geometry.

It is currently (lightly) based on RapidXML (great library, by the way, and super fast).

The markup

So we document functions like this, I take here the area function:

/*!
\brief \brief_calc{area}
\ingroup area
\details \details_calc{area}. \details_default_strategy

So these lines use macros to say that it is calculating the area (brief), and again that it is doing that (detail). The differences in brief and detail are Doxygen differences, sometimes convenient, sometimes not. Then  a remark-macro is added that this function takes the default strategy. That is the case with a lot of functions, therefore it is a macro, to avoid repetition. Then it continues:

\tparam Geometry \tparam_geometry
\param geometry \param_geometry
\return \return_calc{area}

So here the parameters are described. There is a template-parameter (Doxygen's tparam) and a parameter (Doxygen's param). Those are similar, geometry! Not literally the same, but the parameter geometry (lower case) is of the type Geometry (Camel Case). This is the case for most of the functions, accepting sometimes one, sometimes two geometries. Therefore macro's (\tparam_geometry) are introduced here... Below we will see that they can be handled in the same line! Though not everyone is convinced of that feature. It might go away or be made flexible.

Avoiding repetitions: it the same for the return value, after Doxygen's \return. It would be possible to write here: "the function area calculates the area of the input geometry". For the function length we will write (about) the same comment (replace area with length). For perimeter the same. Et cetera. So, to avoid repetition, we created the cryptic macro \return_calc, with a parameter a literal string (in this case "area").

We continue:

\qbk{example,area_polygon}
\qbk{example,area_polygon_spherical}

So this is a Doxygen Alias (\qbk) to generate an XML node called <qbk.example>, it generates two of them, one per line. Such an XML node is recognized by our new tool doxygen_xml2qbk. And that tool generates, as expected, an example section. So these two lines generates this QuickBook syntax:

[heading Examples]
[area_polygon]
[area_polygon_spherical]

Where area_polygon can be found in one of the examples, using a QuickBook construct to show a syntax highlighted example.

We continue:

\qbk{behavior,__0dim__:[qbk_ret 0]}
\qbk{behavior,__1dim__:[qbk_ret 0]}
\qbk{behavior,__2dim__:[qbk_ret the area]}
\qbk{behavior,__cart__:[qbk_ret the area] __cs_units__}
\qbk{behavior,__sph__:[qbk_ret the area] __sph1__}
\qbk{behavior,__rev__:[qbk_ret the negative area]}

This all goes to a xml.behavior node, and then to a QuickBook behavior section, a table describing the behaviour for 0 dimensions (point), 1 dimension (linear), etc. All this is macro'd, where __sph__ is also a QuickBook macro, and [qbk_ret] is also a QuickBook macro. They can be nested as well. qbk_ret just says "returns". It has the same length, the only advantage is that you don't repeat yourself in words but now in macros.

OK, using two macro systems through each other, some having a specific meaning for our converter tool, this is quite complex and I realize that. But on the other hand, if everything is written in words here, it is probably unmaintainable.

The results

Let's finally show the results here, as screen-dumps.

Doxygen generates this piece:
q1a

To be honest, we must say that examples (using syntax highlighting) within Doxygen are certainly possible as well.

With Doxygen-conversion-QuickBook the following piece is generated:

qb1b


So we see what our \qbk alias does here... It generates the sections behavior, complexity, examples, and might generate more.

I like it, despite its complexity.


5 comments:

  1. Hi, I'm a computer engineering student and now I'm an intern at a company. I am asked to do the things you tell here. However I'm a little unfamilier to linux. I found and did everything you told here, but only the merging two boostbook xml files. What command do I need to do that? There is a combine.xslt file in the folder. I tried that with "xsltproc combine.xslt" but I had no idea. If you help me combining two boostbook xml files, I'll be appreciated.

    Thanks.

    Faruk

    ReplyDelete
  2. Faruk: first it is work in progress (I changed it a bit recently). Second, it is multi-platform (I apply it on Windows). Third, instead of combining XML files, it avoids them... (well, they are intermediary). bjam cares for them.

    ReplyDelete
  3. Hello again, thanks for your help. But still I didn't understand. How does bjam care for them? I need a detailed explanation. If you know a source I can learn from I'll be appreciated if you share that with me. My e-mail is faruk.kuscan@gmail.com . Now I have a boostbook xml file called quick.xml whose source is quick.qbk, and another boostbook xml called reference.xml whose source is doxygen-all.xml. I want them to be merged.

    Sorry for taking your time.

    Faruk

    ReplyDelete
  4. With bjam takes care, I mean that I specify a qbk in my jamfile and that bjam translates this to HTML.
    I really don't do anything with XML's, or look into the XML's that are created as intermediary files.
    I will mail you soon with some details.

    ReplyDelete
  5. @Faruk, you are using the wrong command to combine the xml files, if you look in the script you will see :

    xsltproc combine.xslt index.xml >all.xml

    is the way to do it.

    ReplyDelete