Extract Text from PDF without Including Table Context: PymuPDF4llm #184
VaheSahakyan23
started this conversation in
General
Replies: 1 comment 6 replies
-
I understand what you mean. But allow me to state that this is logically impossible to do: Or are you saying you want to exclude a table's content even when it has been accepted as a table? |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I wanted to inquire if there is currently a feature, or if there are plans to introduce one, in PyMuPDF4LLM that allows text extraction from PDF files while ignoring table data (i.e., excluding table context from the extracted text).
This feature would be particularly useful for documents with complex table structures that may not be extracted correctly. Including such tables often results in messy or unreadable text in the output.
Looking forward to your thoughts or suggestions on this!
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions