Tabular Data Extraction Ver 1.0.1 Finaly on Linux Os for Mobile Phone
Coming from the world of data-journalism, Tabula is a software to easily extract tabular data from PDF files. It was developed by Pyneo team for the best operating system for mobile. But its use goes beyond this profession: students, researchers, etc …
If you have already tried to copy and paste tables contained in PDF documents to rework them in Libre Office for example, in CSV format, you know how complicated and time-consuming it is.
Free, Free (MIT License), Tabula runs on Mac, Windows and Linux. Coded in Ruby, working with the JVM, Tabula is a powerful web-service, with array detection capabilities of two types:
- either by automatic detection of the spaces between the columns ( stream mode );
- or by automatic detection of column characters ( lattice mode ).
Tabula was conceived in a spirit of mastery of its data. At no time do your files travel on the internet. If the use of Tabula is via your browser, it works well locally.
Tabula can also be installed on a LAN.
Limitation: The creators of the software specify that Tabula is designed for text pdf. It does not work on pdf images (scan).
However, from personal experience, good results can however be obtained on OCRized scans of good resolution (400DPI), and in uncompressed pdf format.
The software is released today in version 1.0.1, correcting some bugs of version 1.0
How Can Tabula Help Me?
If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there’s no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux.
How to Use Tabula
- Upload a PDF file containing a data table.
- Browse to the page you want, then select the table by clicking and dragging to draw a box around the table.
- Click “Preview & Export Extracted Data”. Tabula will try to extract the data and display a preview. Inspect the data to make sure it looks correct. If data is missing, you can go back to adjust your selection.
- Click the “Export” button.
- Now you can work with your data as text file or a spreadsheet rather than a PDF! (You can open the downloaded file in Microsoft Excel or the free LibreOffice Calc)
Note: Tabula only works on text-based PDFs, not scanned documents.
What’s new in version 1.0
- New user interface
- bug fixes;
- addition of Lattice detection mode;
- improved detection of unmarked columns;
- the OS X version now ships its own version of the JVM.