Converting from word processor to Markdown

I was thinking about how it might be possible to convert complex documents for Markdown so they can be pasted here as Commentaries. There are lots bad ways, but the following is one way of producing good quality Markdown from a word processor with only a little hand tweaking. To do this you’ll need to have LibreOffice with the LaTeX plugin, pandoc, and a text editor to do regular expressions.

The process was this.

  1. Create a .odt file in Libre Office, using the following. The first three items in the list must be applied using styles, i.e. from “Styles and Formatting” > “Paragraph Styles”:
  • Headings—not Titles,
  • Preformatted Text (only works for block elements),
  • Quotations,
  • ordered and unordered lists,
  • hyperlinks,
  • italics and bold,
  • footnotes.
  1. Export it to LaTeX using the Libre Office LaTeX extension selecting “Clean Article”. (This works much better than exporting to HTML, etc.)

  2. Run pandoc --no-wrap file.tex -o file.text. (Pandoc choked on the original .odt file, so I could’t test going straight from .odt to markdown.)

  3. Run the following regex Find & Replace: \[\^(\d+?)\]\: (.*)\[\^\1\]\: h "\2"

This adds an “h” to the footnotes, which Markdown interprets as the start of an http://... and thus treats as a link. The footnotes become “title” attributes in HTML. I’ve made them prettier, by removing ^ and wrapping in <sup></sup>. Note that, as the footnotes are title attributes you can’t apply formatting to them. And you can’t use “” inside the notes.

And there you have it. This uses pretty much all the kinds of markup that Markdown recognizes, and it’s mostly all automatic. The following test doc was produced with this method.

This is a “glass half full” solution. On the one hand, it’s possible to very simply convert complex documents to Markdown so they can be pasted straight into Discourse. On the other hand, the constraints of creating the document suggest that it’s unlikely that messy real-world documents will come smoothly through this process.


Here is a heading

Here is some text. Here is some italics. Here is some bold. Here is a link. Here is a footnote.1

Here is some text. Here is some italics. Here is some bold. Here is a link. Here is a footnote.2

Here is some text. Here is some italics. Here is some bold. Here is a link. Here is a footnote.3

Here is a subheading

Here is a subsubheading

Here is a subsubsubheading

  • Here is an unordered list.

  • Here is an unordered list.

  • Here is an unordered list.

Here is some text. Here is some italics. Here is some bold. Here is a link. Here is a footnote.4

  1. Here is an ordered list.

  2. Here is an ordered list.

  3. Here is an ordered list.

Here is some monospace text.

Here is a blockquote. Here is a blockquote.

There is another possibility to convert .doc to Markdown files. Try Writage - a MS Word plugin which enables to save Word document as a Markdown file. All required styles are supported.

Also it is possible to open and edit Markdown file in MS Word as a normal document, without Markdown tags.

Thanks for the tip. One of the big problems is support for footnotes: do you know how this plugin handles them? There’ll be added footnote support in Markdown coming to Discourse later in the year, hopefully this plugin will place nice with that.

Footnotes with Writage won’t be a problem as it supports Pandoc version of Markdown which includes footnotes. Here is screenshot with footnotes example: https://dl.dropboxusercontent.com/u/50217142/footnotes.PNG

That’s great, but Discourse doesn’t support all the markdown extensions available on Pandoc. If you have Writage, would you be able to make a sample document with some footnotes and see what happens when you paste it here?