Semantic docs - DocBook vs. Sphinx


#1

Dear colleagues,
I’m in a rare position where I’m designing a new “documentation system”. While I’m sure that I want to use the single-sourcing principle and semantic mark-up, I got stuck at choosing a particular format.
The dilemma is to choose between DocBook 5 (XML) and reStructuredText with Sphinx extensions (lightweight wiki-like mark-up). Both formats provide more-or-less the same features and both are supported by free open-source tools. I tried to consult this with my colleagues but nobody has a strong opinion in any direction.
Do you have any experience with these formats? I would be grateful for your opinions on the pros and cons or suggestions upon which criteria to weigh my decision.
Thank you for reading.


#2

Wow, sounds like an awesome opportunity. A few questions, though:

  • Who will be contributing to the doc content? Dedicated writers only?
    Multiple stakeholders? Dev? Support? (IOW, how easy to use does your
    system need to be for a range of contributor types?)
  • What has led to the choice between (only) DocBook
    and rST/Sphinx? Any particular reasons to discard asciidoc or
    markdown?
  • Any other requirements you can share that might help the
    discussion along?

#3

Thanks for a fast response!

  • in this case, I will be the only contributor - there may be a second dedicated TW in the future, though

  • the main idea is to use a single source which will be transformed to several deliverable formats (mainly to HTML and PDF)

    • I’ve been using DocBook for more than five years (in my old job) and the semantics was very useful because it helps to separate content from presentation. Also, with a good editor (I use jEdit), it’s not such a big pain to edit XML. (Yeh, my main argument is that I’m used to it.)
    • Sphinx is an alternative here because it is also semantic and my colleagues (in my new job) have chosen it for the documentation of another project. (Their main argument being that it is more readable than DocBook.)
    • I wouldn’t choose MarkDown because it lacks many of DocBook’s features and I’m not very familiar with AsciiDoc - but my wild guess about why Sphinx is better, is that it provides support for automated Python API documentation (yes, we develop in Python).
  • other requirements…

    • well, we develop free, open-source software, so used tools should be free, open-source, too
    • it should be possible to transform from one format to the other (I’ve discovered Pandoc, so a rough transformation shouldn’t be a problem)

I hope my answers covered what you asked.

I’m not lazy to learn new things, on the contrary, and that’s why I have such a hard time deciding :smile:


#4

Confession, I don’t always respond this quickly, but (a) I’m on a train with wifi, and (b) I love this kind of discussion.

If you haven’t already, you might want to look at the asciidoc thread on this forum

But Python! In that case, I’d be inclined to go with rST/Sphinx. Because that’s a complete code/doc ecosystem, and it seems to me to be the most broadly supportable in an environment like yours. (I come from the doc/writing side of things, not the code side, and it’s this beautiful support for doc that has me learning Python :D)

Also, if your project is in Python, markdown makes no sense, because exactly what you say. And while I love asciidoc, in a Python environment it doesn’t make sense either.


#5

Just a little moar followon to the previous post – I’d go for Sphinx/rST over XML also because more extensible to multiple stakeholders. Especially if you’re the only writer, it seems likely you’ll need content from other contributors, and XML can be pretty off-putting, even with “a good editor.” (I don’t see the big deal either, because like you I’m used to it, with and without good editor, but in my experience The World Out There does not share our comfort level …)

I asked about asciidoc primarily because of its support for DocBook output. But (I repeat myself) Python!


#6

As a former DocBook/SGML user, I kind of miss it sometimes. But, I’d gotten very used to it and very comfortable with DocBook over the years. I’m in a different environment now, with Python coders all around, and I’m using Sphinx/rST. The learning curve isn’t that steep and it makes it easier for our other team members to submit Pull Requests in GitHub against the docs. Take the plunge and learn something new! And gain a little freedom from the rigid markup side of DocBook (I feel like I focus on the content more than the tagging, which is a nice trade-off). :slight_smile: Oh, and I’m using Sublime as my editor, which I am digging a lot more than emacs.


#7

Thank you both for your posts, it helped a lot :relaxed: I am convinced to go with Sphinx.

I’m just not sure if jEdit is a good idea because its support for rST is only “partial” – it doesn’t recognize more lines of a comment (highlights only the first line) and it doesn’t highlight the text of a title, only the special-char underscore of a title.
docschick, does Sublime have a better support for rST?


#8

I’ve not used jEdit, but Sublime seemed like the better choice for me for the editors I did try out this Spring. I’m using version 2, which was free to download. It highlights enough of the tagging to make me happy, though it is less highlighting than I was used to in DocBook/SGML. I think it just takes a bit of getting used to all around when moving to Sphinx/rST.


#9

Hi,

To have my documents in structured format, I am using YAML text files with ad-hoc structure. Then I generate (parts of) Sphinx documentation from YAML data using Jinja templates. Some formatting is permitted with restructured text directives.

Example of source: https://github.com/miohtama/opsec/blob/master/data/team.yaml

Generated document: http://operationssecurity.org/en/latest/team/index.html

The benefit of this approach is that I can easily change the generated format. E.g. I created a script to generate docx instead of Sphinx for proof-reading.


#10

Editors with good RST support include PyCharm (and other JetBrains IDEs) and Sublime Text with RestructuredText Improved plugin.


#11

I’m developing a kind of YAML/Markdown hybrid that I call SAM (Semantic Authoring Markdown). The most significant feature is that it will have a simple template based schema language so that it can be validated vs a specific schema. It is designed to support the creation of specific semantically constrained languages (like XML) but be much simpler and easily writable by humans.

It is very much a work in progress at this point, and the schema validator is not implemented. I am currently testing it by using it for my own work and tweaking the markup/parser as I find what I need and what I am comfortable with. Still, if anyone is interested in checking it out or providing feedback, you can find it at https://github.com/mbakeranalecta/sam

Mark


#12

At my dayjob, we use Sphinx. Having used DocBook as part of the Fedora Docs team, I have to say I prefer it in almost every case. Sphinx is definitely easier to produce and to read in source form, but I think it’s too simple in a lot of cases. DocBook has a lot more flexibility in terms of different kinds of elements (e.g. <guimenu> versus <command>) that allow you to style them differently if you want.

I’ve also had a lot of issues where I’ve misused whitespace which resulted in the build document not quite looking how I wanted. If you’re not careful when proofreading the finished result, you may end up with a broken doc. “But Ben,” you say, “XML is so easy to break!” That’s entirely true, but at least when it’s invalid XML, you find out very quickly because it will just fail to build.

I also find DocBook documents to be better looking, although that might just be the result of not using good Sphinx templates. One area that Sphinx excels in is autogenerated API docs. If that’s important to you, that might be a compelling reason to choose Sphinx.


#13

Lena,

Have you settled on a solution? I’m curious to know how you’re going in developing your new documentation process. I think that’s a position almost all technical writers would love to have. :smile:

I am a fan of AsciiDoc, because it is a plain text-like markup but provides some powerful features. The markup along has a history dating back about 14 years, while there has fairly recently been a change in the processor. I would encourage you to at least consider AsciiDoc. I don’t think it offers all the features of Sphinx but I expect it would be possible to fill in some gaps.

Make sure you have thought through your workflow, from drafting content, through peer and technical review, to the final published form. Before making a final decision, perhaps assemble a rough system using Sphinx, AsciiDoc and experiment with each.

Good luck.


#14

Note that although Sublime Text is available to download for free, it is not free. The developer allows people to download and use it but maintains a nag screen until it’s licenced. Development and enthusiasm in the ST world seems to have declined a LOT. I would suggest trying GitHub’s Atom text editor instead.


#15

Hi Russel,

thanks for your interest. :smile: I, indeed, consider my position very fortunate – I am learning so much!

I decided to stick to Sphinx. Developers (programmers) will do technical reviews, so they will be happier with more readable source. Also my supervisors are technically skilled and they will revise my work in source. But apart from them, I’m on my own for now, I don’t have any “peers” (fellow TW colleagues you mean?). The source will also be a subject of automated version control and it will be much easier to revise changes in this format.

I suppose the process will be very similar to our software development process, so I was suggested, because it is desirable that I use established tools. For now, the process is chaotic and it will require gradual improvement which can take some time because I have to produce actual results as well.
I don’t have the luxury of time to experiment with more solutions which is a shame, I would love to. Thanks, though, for your suggestion.

Editor – I’m currently trying out PyCharm Community Edition but I will definitely look into the Atom.


#16

Hi bcotton,

thanks for your contribution. I agree with all your points, actually! :smile:
I’ve already had issues with source indentation and that is a pain… but apart from that, I’m quite satisfied with Sphinx so far. When we improve the output formats, although that means additional work, I will not miss DocBook that much.

API docs – I think it will come to that. I’ve figured out that some of our Python projects miss code documentation, so I will probably have to tell our developers to fill in, what, where and how. (They will probably “love” me for that.) But that is for a different discussion :blush:


#17

Hi Lena,
I’ve foud this “old” post, because I’ m looking for the same things of you (one years later)So, I’ m wondering what you’ve choosed for your documentation.
Later, I will ask you somethings else.

TIA

Renato


#18

Hi Renato,

I’ve chosen the reStructuredText with Sphinx. The main reason was that people like better to edit the source code in this format, it’s more people-friendly.
Sometimes, I miss the flexibility of XML – the rST lacks some features which the DocBook had, and rST is not as easy to extend.
But as long as you need only the basics, I recommend rST.

And maybe AsciiDoc would be worth looking into – I learned it can be transformed into the DocBook and then the DBk toolchain is available for transformations. But I’m not that familiar with it and there’s another discussion about this :smile:


#19

Hi Lena, thank you for the answer.
This is about my scenario. I’m thinking to a wiki (moinmoin or another. Do you know another one?) to collect data. Export it in .rst and finaly transform it with sphinx. Do you think could it be a good toolchain?

TIA

Renato


#20

Hi Renato,

I am not an expert on wikis, so I can’t give much advice on this area. Also, I don’t know your environment, therefore I don’t know your needs nor what is or is not good for you.
It also depends on how you want to deliver the content.

Generally, I don’t think it is a good idea to collect data in one markup (which probably is not semantic) and then transform it to another (semantic, in the case of rST) because automatic transformations usually make mess and you also probably would not be able to take advantage of advanced features of rST and Sphinx. If you don’t need them then fine but why bother with Sphinx at all? Then you can use a Wiki for documentation directly and eventually, their build-in ability (or a plugin) to export to PDF.