Camelot: PDF Table Extraction for Humans

With Camelot, you can extract tables from PDFs using Python and convert the data into a more structured format, such as a pandas DataFrame or a CSV file for efficient analysis, manipulation, and integration.

To see how Camelot works, start by reading the PDF file named ‘foo.pdf’ that contains the following table:

import camelot
tables = camelot.read_pdf('foo.pdf')
tables

The output shows that there is one table extracted from the PDF file.

Export the extracted tables to a CSV file named ‘foo.csv’. Camelot also supports exporting tables to other formats like JSON, Excel, HTML, Markdown, and SQLite databases.

tables[0].parsing_report
{
    'accuracy': 99.02,
    'whitespace': 12.24,
    'order': 1,
    'page': 1
}
tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_markdown, to_sqlite
tables[0].df # get a pandas DataFrame!

Link to Camelot.

Run in Google Colab.

Related Posts

Related Posts

<img loading="lazy" width="438" height="516" src="https://codecut.ai/wp-content/uploads/2021/05/two_editors_-21-1-438x516.png" class="attachment-medium size-medium wp-image-11996" alt="" decoding="async" srcset="https://codecut.ai/wp-content/uploads/2021/05/two_editors_-21-1-438x516.png 438w, https://codecut.ai/wp-content/uploads/2021/05/two_editors_-21-1-1272x1500.png 1272w, https://codecut.ai/wp-content/uploads/2021/05/two_editors_-21-1-768x906.png 768w, https://codecut.ai/wp-content/uploads/2021/05/two_editors_-21-1-1302x1536.png 1302w, https://codecut.ai/wp-content/uploads/2021/05/two_editors_-21-1-600x708.png 600w, https://codecut.ai/wp-content/uploads/2021/05/two_editors_-21-1.png 1438w" sizes="(max-width: 438px) 100vw, 438px" />

Better Outputs

3 Tools to Track and Visualize the Execution of your Python Code

November 12, 2024

<img loading="lazy" width="537" height="516" src="https://codecut.ai/wp-content/uploads/2024/09/Screenshot-2024-09-05-at-7.35.06 PM-537x516.png" class="attachment-medium size-medium wp-image-10489" alt="" decoding="async" srcset="https://codecut.ai/wp-content/uploads/2024/09/Screenshot-2024-09-05-at-7.35.06 PM-537x516.png 537w, https://codecut.ai/wp-content/uploads/2024/09/Screenshot-2024-09-05-at-7.35.06 PM-600x577.png 600w, https://codecut.ai/wp-content/uploads/2024/09/Screenshot-2024-09-05-at-7.35.06 PM-768x738.png 768w, https://codecut.ai/wp-content/uploads/2024/09/Screenshot-2024-09-05-at-7.35.06 PM.png 998w" sizes="(max-width: 537px) 100vw, 537px" />

Better Outputs

From Python to Paper: Visualizing Calculations with Handcalcs

September 5, 2024

Better Outputs

Comparing Python Command Line Interface Tools: Argparse, Click, and Typer

July 30, 2024