Repeated table extraction in Markdown output #168
Replies: 4 comments
-
I noticed the same issue while investigating performance improvements. I forked the repository, profiled the code, and found that by skipping markdown table extraction at this line, I was able to process PDFs approximately five times faster, and those PDFs didn't have much tables inside. |
Beta Was this translation helpful? Give feedback.
-
So I think this is a bug with the latest version, to avoid the duplication of table content I had to go down a version with:
|
Beta Was this translation helpful? Give feedback.
-
@PiochU19 @jamie-lemon This turned out to be a bug indeed. I reported it as an issue over #171 |
Beta Was this translation helpful? Give feedback.
-
0.0.17 seems to have not fixed this issue |
Beta Was this translation helpful? Give feedback.
-
Hello there,
Thanks for the wonderful work! this outperforms even most commercial solutions out there!
I have a question regarding tables extraction: when extracting a PDF page that has a table to markdown, it seems that the table's raw text is first extracted and put in place of the table, then the formatted table at the bottom of the page.
Is this the desired output? Why?
Beta Was this translation helpful? Give feedback.
All reactions