Spreadsheets as a data source for LaTeX

The problem

In the process of organising ICANN 2016, I made the pdf version of the conference programme, using LaTeX, of course 🙂

The programme includes the list of all the ~170 talks and posters, including authors, divided in the different conference sessions. All these data were prepared by the other organisers in a spreadsheet (an xlsx Excel file, for the records) and the way they had proceeded in previous editions of the conference was to copy paste all these details in the LaTeX file, hoping not to make many mistakes. One problem with this is that I hate copy-pasting 🙂 It is tedious, and every time I do this operation, there is a (small) chance that I will make a mistake. If I have to copy-paste more than 400 times, the chances increase a lot. Also, it happens that we have to change something in the Excel table on the way, and with the copy-paste approach we should always remember to propagate the changes in the LaTeX file. These thoughts led me to the question:

Can we use a spreadsheet as a data source for an LaTeX file, including automatically the data of the relevant cells in the appropriate places in the LaTeX code?

This would solve both the issues above and automate our work! Surprisingly, I found only little help online, in the form of a perl script on the LaTeX stack exchange. I wanted something more flexible, so I decided to make my own script (in Python) to take care of the task. It works with Excel files (both xls and xlsx).

The code

Repo: https://github.com/pmasulli/spreadsheet_to_LaTeX

# -*- coding: utf-8 -*-

# Spreadsheet to LaTeX - import data in your LaTeX code automatically
# Copyright (C) 2016 Paolo Masulli
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software Foundation,
# Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301  USA

from xlrd import open_workbook
import re
import sys
import os.path

default_sheet_name = "Sheet1" # insert here the default sheet name in the xlsx file

def cell_coords(cell_ref):
    pos = re.search("[0-9]", cell_ref).start()
    col = cell_ref[:pos].upper()
    if len(col) == 1:
        col = ord(col) - ord('A')
    elif len(col) == 2:
        col = 26 * (1 + ord(col[0]) - ord('A')) + ord(col[1]) - ord('A')
    row = int(cell_ref[pos:]) - 1
    return row, col

if len(sys.argv) < 3:
    print "Usage: ", sys.argv[0], "datafile.xlsx templatefile"

xlsx_file = sys.argv[1]
template_file = sys.argv[2]
latex_file = template_file + ".tex"

if not (os.path.isfile(xlsx_file) and os.path.isfile(template_file)):
    print "Cannot find the data file or the template file."

print "Writing output in %s." % latex_file,

tf = open(template_file, 'r')
lf = open(latex_file, 'w')

wb = open_workbook(xlsx_file)

p = re.compile("(<data>(.+?)</data>)")

for line in tf:
    line_rep = line
    for f in p.findall(line):
        sheet_name = None
        coords = f[1]
        if ":" in coords:
            sheet_name = coords[:coords.find(":")]
            coords = coords[(coords.find(":")+1):]
        row, col = cell_coords(coords)
        if not sheet_name:
            sheet_name = default_sheet_name
        sheet = wb.sheet_by_name(sheet_name)
        value =  sheet.cell(row, col).value.encode('utf-8').replace("_",r'\_')
        line_rep = line_rep.replace(f[0], value)
        print ".",


print "Done."

I decided to use xlrd to read the spreadsheet. It seems quite solid, copes well will cells containing the result of formulae and its recent versions read both xls and xlsx.

The code above is a very preliminary version put together in an hour or so. It should definitely be improved (among the other things, I would like to support Libreoffice spreadsheets as well!).


Assume that you have your data contained in the file data.xslx and that you want to include those data in document.tex. Then prepare a LaTeX template where, instead of the data, you include tags of the form:


where SheetName is the name of the sheet in the Excel file and A2 and B3 the coordinates of some cell. Note that you can specify a default sheet name in the Python script at line 8 and then just use tags of the form:








Let's include a number for the spreadsheet: <data>B2</data>.


Once you have the template, open a terminal window (how this is done depends on your operating system — google is your friend!) and then move to the directory where you saved the script:

cd path/to/the/script

Afterwards, just run the script to generate the LaTeX file including the data from the spreadsheet:

python spreadsheet_to_latex.py data.xlsx document_template.txt

This will generate the LaTeX file document_template.txt.tex, which you can just feed to pdflatex:

pdflatex document_template.txt.tex
The resulting pdf file
The resulting pdf file