page extract with pdf4llm.to_markdown not extracting the first line of the page in specific scenarios #196
Answered
by
JorjMcKie
leelaraj72
asked this question in
Q&A
-
I am using following simple python code to extract a page (and the corresponding text) from a pdf fileimport pdf4llm data = pdf4llm.to_markdown("../docs/abc.pdf", pages=[13], page_chunks=True)
|
Beta Was this translation helpful? Give feedback.
Answered by
JorjMcKie
Nov 29, 2024
Replies: 1 comment 11 replies
-
Try setting the margin parameter. Default is |
Beta Was this translation helpful? Give feedback.
11 replies
Answer selected by
leelaraj72
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Try setting the margin parameter. Default is
margins=(0, 50, 0, 50)
which ignores stripes of height 50 at top and bottom.Using
margins=0
looks at the full page.