Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error extracting data from PDF document: "No current point in closepath" #45

Open
DigitalLeaves opened this issue Dec 21, 2022 · 3 comments

Comments

@DigitalLeaves
Copy link

Hello,
First of all, thanks a lot for the awesome work on this library. We have been using it for some time and are quite amazed by the work you made here.
Today we run into this error from an apparently totally OK PDF:

  error: 'Syntax Error (30523): No current point in closepath\n' +
    'Syntax Error (30538): No current point in closepath\n' +
    'Syntax Error (30556): No current point in closepath\n' +
    'Syntax Error (30566): No current point in closepath\n',
  pdf_path: '../samples/515317730_121477412.pdf'

This is a searchable/text pdf, so it is using pdfOCR with the following options:

const ocrSearchableOptions = {
    type: 'text', // extract searchable text from PDF
    ocr_flags: ['--psm 1'], 
    enc: 'UTF-8',  
    mode: 'layout'
}

I can provide the PDF if needed to analyze it.
Any help is greatly, greatly appreciated 🙏.
Thanks a lot in advance.

@DigitalLeaves
Copy link
Author

Any info or help about this greatly appreciated 🙏

@DigitalLeaves
Copy link
Author

DigitalLeaves commented Feb 7, 2023

The issue seems to be related to the PDF which may be "broken" for PDFTOTEXT perhaps. However, other OCR tools and softwares seem to read it without problem (for example, Node PDF Text).

@DigitalLeaves
Copy link
Author

More info, package pdf-text-extract uses pdftotext too but seems to work these files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant