Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read document structure and apply PDF accessibility tag? #873

Open
gobsmack opened this issue Aug 2, 2024 · 2 comments
Open

Read document structure and apply PDF accessibility tag? #873

gobsmack opened this issue Aug 2, 2024 · 2 comments
Labels
document-editing Related to creating or editing/modifying documents question

Comments

@gobsmack
Copy link

gobsmack commented Aug 2, 2024

I am working on a project to make existing PDF documents accessible. I am trying to switch from PdfSharp to PdfPig because it seems to handle metadata better. My goal is to read the document structure, and then add accessibility tags to the PDF.

I've got a pretty good start, using the PDFMerger. So, I can copy the input document to an output document with all the metadata. But I'm stuck on the tags (the whole point). I'm getting a tagged PDF. But there are no tags.

It looks like this is all written into PdfPig already. PdfPig can analyze the document structure. So, I wonder if there is a way to write the tags based on the document structure.

Am I missing something? Could somebody point me in the correct direction?

@gobsmack
Copy link
Author

gobsmack commented Aug 2, 2024

More specifically, I wonder if it's possible to create the tag structure based on the bookmark structure.

@EliotJones
Copy link
Member

Sorry it has been so long since I last worked with PDFs I don't recall what is and isn't available. On that basis I think the library as-is probably doesn't currently support this. Editing is not really full-featured yet. PDFSharp may have a better editing story here.

I think if tags are per-page it would be possible to insert them directly but I assume the tags you're referring to are a document level structure like AcroForms or Bookmarks. Unfortunately there's no API support for writing custom objects at this level yet. The 2 paths to enable such a thing would be support for tags à la PdfDocumentBuilder.CreateBookmarkTree. Or support for writing arbitrary PDF objects in PdfDocumentBuilder which would probably be a fairly well-isolated change, it would just require being able to call context.WriteToken(...) for some list of objects attached to the builder, and add the required key to catalog or trailer dictionaries which would be required to plug in the user's desired functionality. In this case it sounds like you need to set both MarkInfo and StructTreeRoot properties in the catalog dictionary as well as write the actual StructTree. This is not something the library currently supports alas.

@EliotJones EliotJones added question document-editing Related to creating or editing/modifying documents labels Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
document-editing Related to creating or editing/modifying documents question
Projects
None yet
Development

No branches or pull requests

2 participants