LaTeX to word document

Jan 24, 2014
Tested on: Linux Mint 13 (Ubuntu 12.04), Microsoft Office 2010, LibreOffice 3

It is common to typeset scientific documents in  \(\LaTeX\). However, few people may like to still read and edit the document in word processors like Microsoft word (.doc or .docx)or LibreOffice (.odt). Here are quick (& dirty) steps to produce reasonably looking "office" documents from .tex files. The idea is to exploit HTML output as intermediate format which is readable by word processors. This ideas is shown below and has been explored and reported widely.
.tex.html.docx
  1. Convert \(\LaTeX\) to .html: There are various ways to achieve this, but htlatex from TeX4ht worked best for me. First install TeX4ht and then then run following on the \(\LaTeX\) document. Note that htlatex runs latex three times before calling TeX4ht programs and it would write out all the outputs in the same directory as the .tex file (see below for handling PDF images). The generated files would include one .html file (paper.html for command shown below). $ sudo apt-get install tex4ht $ htlatex paper.tex
  2. PDF images: If the .tex file uses PDF files as images, then htlatex (or latex) may not be able to go along with correct conversion. Some discussions on Stack Exchange were very help for this issue and provided the solution. First create a configuration file, say myxhtml.cfg, with following content: \Preamble{xhtml} \Configure{graphics*} {pdf} {\Needs{"convert \csname Gin@base\endcsname.pdf \csname Gin@base\endcsname.png"}% \Picture[pict]{\csname Gin@base\endcsname.png}% \special{t4ht+@File: \csname Gin@base\endcsname.png} } \begin{document} \EndPreamble Then run htlatex as shown below. Note that the .tex file should use correct extension for every included image (or else use \DeclareGraphicsExtensions{.pdf}). $ htlatex paper.tex myxhtml
  3. Fix HTML: Sometimes TeX4ht will not write out clean html files i.e. the html tags may not be aligned properly. This may prevent html to import into word processors. Online editors like Fix My Html and HTML Tidy Online did the trick for me. Use the .html file generated by htlatex as input to these online editors and save the fixed html file for next step.
  4. Import HTML in word processor: Usually most word processors would support importing HTML files. If that does not work (like in my case), you can just open the HTML file in a browser then copy-paste the whole page into the word processor (check if your word processor also supports Paste Special for formatted text. It everything works out then you should be able to just save the pasted document in .doc, .docx or .odt format. 
The above set of steps produced good looking single column document for me with correct bibiliography references. The math equations are embedded as images in the documents, which may not be optimal but the process works for reviewing and editing the text by someone who prefers word processors.