KevCaz's Website

Today I was asked whether I was aware of a way to extract a table from a pdf file. I actually knew about one CLI tool pdftotext 🔗 that converts a pdf file to a text file and I had this memory that I had used it for tables in the past. pdftotext is developed by Glyph & Cog with several other CLI tools to manipulate pdf files and the pdf viewer Xpdf. On Debian (and Debian derivatives), pdftotext and the other CLI tools are included in the package Debian package poppler-utils that can be installed like so:

1
$ sudo apt-get install poppler-utils

Once installed, the following command line does the conversion

1
$ pdftotext input.pdf output.txt

There are several additional option and if one means to extract a table, the -layout option is pretty helpful as it maintains the original physical layout (as explained in the documentation):

1
$ pdftotext -layout input.pdf table.txt

Pretty sweat 😄!