Deep	Link	Demo	Notebook	Deep?	Reads image?	Detectron?	OCR included?	Seems to work	get pandas df?	get text?	get image?	throughput (cpu)
nougat	github		Nougat eval	✓	✓		✓	✓✓	latex table (mmd)	✓	✗	~330 s/page
gmft	github		gmft eval	✓	✓		✗	✓✓	✓	✓	✓	~1.867 s/page
img2table	github		img2table eval	✗	✓		✓	✓✓	✓	✓	✓	~1.45 s/page
unstructured	docs.unstructured.io		Unstructured eval	✓	✓	✓	✓	✓	✓ (html -> df)	✓	?	~15.35 s/page
open-parse (unitable)	github	openparse_quickstart.ipynb	open-parse eval	✓	✓			✓	✓ (html -> df)	✓	✓ (custom)	~126 s/page
open-parse (tatr)	github		open-parse eval	✓	✓			✓	✓ (html -> df)	✓	✓ (custom)	~4.992 s/page
open-parse (pymupdf)	github		open-parse eval	✗	✗			✗			✓ (custom)	~0.67 s/page
deepdoctection, tatr	github		deepdoctection tatr eval	✓	✓	✓	✓	✗ needs config			?	~58s per page
surya	github		surya eval	✓	✓		✓	✓	✗	✗	✓	~60.679 s/page
paddleocr	github		https://medium.com/@malshanCS/automating-table-data-extraction-tools-and-techniques-for-efficiency-a29df313cbda#629d	✓	✓			?
alibaba/omniparser	github			✓	✓			?
alibaba/DocXChain	github			✓	✓			?
layoutparser (no commit in 2 yrs?)	github	https://github.com/Layout-Parser/layout-parser/blob/main/examples/OCR%20Tables%20and%20Parse%20the%20Output.ipynb		✓	✓	✓		unmaintained

doctr (not tbl focused)	github	https://huggingface.co/spaces/mindee/doctr		✓	✓			N/A	N/A

Non-deep
camelot	github		camelot eval	✗				✓ many false positives, needs config	✓	✓	possible	~1.82 s/page
pdfplumber	github		pdfplumber eval	✗				✗ or needs config			possible	~0.273 s/page
pymupdf	github		pymupdf eval	✗				✗ or needs config			possible	~0.250 s/page
pdfminer	github			✗
Proprietary
mathpix				✓				✓
Adobe Sensei	developer.adobe.com			✓				✓
AWS TextExtract				✓				✓
Azure Document Intelligence	azure.microsoft.com			✓				✓
Google Document AI	cloud.google.com			✓				✓

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comparison.md

comparison.md

Files

comparison.md

Latest commit

History

comparison.md

File metadata and controls