How to replicate CCMS needs with static site generators and version control?


#1

A component content management system (CCMS) is a component-based database that stores individual topics for lots of doc projects. Writers grab the topics they need and integrate those topics into their documentation. The CCMS system facilitates re-use of content, so the same component (topic) can be re-used as needed in other doc projects in the organization.

For example, if the documentation for Widget X appears in Products 1, 2, and 3, then you write and translate the Widget X documentation one time, and all the other product doc sets include it in their outputs. You don’t copy and paste the same documentation three times (tripling translation costs and leading to all kinds of inconsistencies when Widget X gets updated).

Most CCMSs cost upwards of $50k per year and require content to be in DITA. If you want to avoid using a CCMS, could you use version control with tools like Github or Bitbucket to achieve a similar result?

Sure, you could put all your topics into a github repo, and then each writer could clone and pull the repo to their own system. But then you end up with a long list of topics on your local machine. Suppose you want only 100 of 1,000 topics? How do you avoid including the other 900 in your individual project? If using a static site generator, how do you filter out those files you don’t want?

I’m interested to know if anyone has strategies for implementing the same strategies as a CCMS but with version control and static site generators. This post by Eliot Kimber gives me hope that it’s possible, but I’m not quite sure how to execute it with Jekyll.


#2

Just looking at the repo perspective:

  • you can have things like submodules, where main repo pulls in subrepo a b and c but afaik they are fiddly and not really worth the effort in most situations
  • if we’re talking plain (ish) text, even 10k files would not be too bad a download.

The problem I think does not like in getting the files or keeping them updated, git is pretty efficient, but in knowing where each file is reused in the static site generator, so you know what context(s) you need to edit in.

A pre commit hook that references which files include each one affected in the commit would be a start. I’ve not seen that used, but it is pretty easy to implement.


#3

I am not quite answering the original question here, but I’m working with a new team on a workflow that (we hope) will look something like this (more or less what @plaindocs describes, with a bit of extra processing) :

  • Write appropriately chunked content in AsciiDoc
  • Pull appropriately chunked content into “parent” AsciiDoc files with include statements (yes, there’s some organizing and creating of file and directory structures that needs to be done here to deal with Sam’s “problem … in knowing where each file is reused …”)
  • Run Asciidoctor or a custom build of the Python AsciiDoc processor to generate HTML
  • In our case, post HTML via platform TBD
  • In a case like Tom’s, use an Asciidoctor/Jekyll tool to publish (they exist, but that is all I know about them)

I’m grateful to this thread for raising the architectural issue that Sam mentions. Might it not be even easier to do something like this?

In AsciiDoc (thinking with my fingers on the keyboard here, so details absolutely NOT worked out), I’m thinking we could/should tag content chunks in the header with some identifier that we could then simply search the whole source tree for. (I come from a company where we used just such an elaborate system as Tom describes, and it didn’t let us differentiate between source and deliverable in search results, so I see no loss of functionality here.)

I need to think about this some more. But the repo/publish side of things does look pretty straightforward to me.

If you’re wedded to XML, then a very simple XSLT would take care of your search for includes, wouldn’t it? Or am I missing something?


#4

Thanks for sharing your thoughts on different strategies. After thinking about this a bit, here’s what I’m going to try. Instead of having separate projects and trying to pull content in from a parent project, I’ll just have one project that has various folders. In my Jekyll project, I’ll simply pull from the content I need because it will all be in the same project.

For example, in a pages folder, I might have project_a, project_b, and project_c. Then I’ll just have to filter out content for those projects when I build my output.

One other challenge is that we have several other writers working on other projects. Up until now, they have been handling their content separately from me. Everyone has their own Jekyll project for the different products.

In this new paradigm, we would all be working within the same large project. As a result, when each person gets the latest files from the repo, there will be a lot of files from other projects. But as long as we filter them out when we build our own outputs, then it should work. If everyone works in the same project, we’ll be more consistent with our styles, variables, and other approaches.

However, as a drawback, if someone puts some content in the project that doesn’t validate (like liquid tags that aren’t valid), it will screw the builds up for everyone else. Yet at the same time, others could more easily help each other out when their sites don’t build.


#5

The more writers you have, the more important validation and checks are. Can you lint liquid tags? I’ve not checked.

Regarding includes, they are easy enough to find, but it does depend on your include syntax. In jekyll / markdown / liquid you can either do it with a simple grep or something, or get cleverer and use the parser.

I find it easier if you put all of your boilerplate text in one place, and then include it in, rather than arbitrarily include it from within the flow of the document.


#6

It’ll complain if you add a tag that makes no sense {% like this one %}, but it won’t complain if you have one {{ like this }}. Here’s an open issue to get around this.

At GitHub we just grep around for text that looks like {{ }} and fail the build if it finds something unmatched.

I find it easier if you put all of your boilerplate text in one place, and then include it in,

This is also something we do, with Jekyll’s data files. We have YAML that looks like this:


product_name:
  {% if page.version == 'dotcom' %}
    'GitHub'
  {% else %}
    'GitHub Enterprise'
  {% endif %}

and then filter the content appropriately by setting page versions. I’ve got a few Jekyll plugins to support this kind of workflow–I can no longer remember what’s core Jekyll and what I’ve added. :sweat:


#7

Garen, I would love to use this plugin: https://github.com/gjtorikian/jekyll-conrefifier

Being able to put variables or liquid tags into YAML areas of a Jekyll project would be huge. However, I can’t figure out how to get that plugin working. There doesn’t seem to be any instructions for implementing it. Can you put some info there about how to get it working?

For example, are there specific dependencies? Do you put certain files inside the _plugins folder? Then run bundle install? Sorry, I’m kind of in the dark about the whole plugin architecture with Jekyll. It seems like whenever I try implementing a plugin, I run into gem errors. That’s why I’ve tried to stay away from them, but I would also really like to overcome some shortcomings with Jekyll through plugins.


#8

Sure, I’ll update the README in the next few days.

Do you put certain files inside the _plugins folder? Then run bundle install?

In general, you’d create a file called Gemfile that looks like this:

source 'https://rubygems.org'

group :jekyll_plugins do
  gem "jekyll-conrefifier"
end

And then run bundle install. That should get you all set up.


#9

Garen, sorry for my slow reply to this thread. Thanks for adding the installation instructions. However, when I include the code you listed in my gemfile and run bundle install, and then try to build my project, I get this error:

/Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/jekyll-conrefifier-0.5.1/lib/jekyll-conrefifier.rb:80:in `alias_method': undefined method `read_collections' for class `Jekyll::Site' (NameError)
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/jekyll-conrefifier-0.5.1/lib/jekyll-conrefifier.rb:80:in `<class:Site>'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/jekyll-conrefifier-0.5.1/lib/jekyll-conrefifier.rb:79:in `<module:Jekyll>'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/jekyll-conrefifier-0.5.1/lib/jekyll-conrefifier.rb:1:in `<top (required)>'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/bundler-1.10.6/lib/bundler/runtime.rb:76:in `require'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/bundler-1.10.6/lib/bundler/runtime.rb:76:in `block (2 levels) in require'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/bundler-1.10.6/lib/bundler/runtime.rb:72:in `each'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/bundler-1.10.6/lib/bundler/runtime.rb:72:in `block in require'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/bundler-1.10.6/lib/bundler/runtime.rb:61:in `each'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/bundler-1.10.6/lib/bundler/runtime.rb:61:in `require'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/bundler-1.10.6/lib/bundler.rb:134:in `require'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/jekyll-3.0.0/lib/jekyll/plugin_manager.rb:39:in `require_from_bundler'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/gems/jekyll-3.0.0/bin/jekyll:13:in `<top (required)>'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/bin/jekyll:23:in `load'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/bin/jekyll:23:in `<main>'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/bin/ruby_executable_hooks:15:in `eval'
	from /Users/tjohnson/.rvm/gems/ruby-2.0.0-p481/bin/ruby_executable_hooks:15:in `<main>'

Am I missing something? Is the plugin compatible with Jekyll 3.0? Any tips or insights you can give me would be much appreciated. Thanks,

Tom


#10

Also, I’m not sure why I can’t just add the gem in my gemfile like this:

gem 'jekyll-conrefifier', '~> 0.5.1'

When I add the gem this way and run bundle install, then build my project, I don’t get the error. However, the plugin doesn’t seem to work. When I reference a value in my config or data file in a page title, it doesn’t render.


#11

Great discussion. This is partly what got us started on Corilla in the first place. Except we were already happy with our own version control. Plus we noticed that our hosted editor solved our needs for collaboration with multiple team members being able to view and edit the same topic.

So that separated the CCMS topic from the publishing/output topic. We learned a lot about what not to do when working on PressGang CCMS at Red Hat. For various reasons we used Publican, which even running locally was a nuisance. I had to update an entire OS once to be able to access a Publican update just to run a fresh build for minor edits. Our customers suffered because our workflow was over-complicated.

The experience made me focus on UX issues in technical writing, rather than my own preference for command line workflow. I often wonder if this attitude of “local editor, Github repository, server with static site builder” is really the best we can do for writers. Who really should be writing, but who should also be allowing real-time collaboration at scale (e.g. what we built into the Corilla editor).

Technical peeps like us love tinkering, but our research has shown that writers want to experience the kind of UX that other content and creative roles are enjoying. I’m not sure that translates into “pull a repo to sift through thousands of source files via command line”. Just my first JBoss EAP 6 guide was a few hundred topics that I wrote myself, let alone the team across the suite, let alone other Middleware products, let alone the entire company’s writers. Even if we do want to manage that locally… what else could we be doing instead?

I used to think I’d never leave the command line, but freeing myself up from obsessive svn st and svn up or manual xi:include madness helped me focus on higher level problems. Plus I feel the fundamental abstraction of content authoring and content publishing is worthy to uphold (even if they can be bundled into the same workflow or even SaaS tool).

Moving this into a shared environment and adding a graphical repo with search and tagging worked for us. Lots of work to do though and I think even the GUI aspects will always draw from what we do well from a command line workflow. Great topic @tomjohnson1492, I’d love to hear how your new process goes.