-
I'm loading documents with this options: PDFDocument.load(docData, {updateMetadata: false}) I have around 4000 documents, each of which has \CreationDate and \ModDate. To test it I've looked them up via regex (very inefficient...). The date value is always in the same form as described in pdf reference. Though when I try to read it via the API PDFDocument -> getCreationDate() or getModificationDate() I get an undefined for around 1000 documents. My assumption is that its present, but stored in a different location than where pdf-lib is looking up. The document formats range from pdf-1.3 to pdf-1.7 and "PDF/X-1:2001". The issues is not bound to a certain format. Every of the formats has documents with and without found dates. I've looked up in the pdf-lib's code and found that its looked up in the PDFDict: doc.context.lookup(doc.context.trailerInfo.Info) The document that produce So I guess the objects \CreationDate and \ModDate must be somewhere else, because they are definitely present in every single of the documents. Is there any way to look the different places? The context and lookup docs in the API are kind of void of any comments. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
In the problematic docs, I find for example this:
Though /CreationDate is:
And the And another one with problem:
And:
And the You see what I see? It looks like its confused parsing /Size Value as a reference to the Info. |
Beta Was this translation helpful? Give feedback.
-
I figured out the /CreationDate and /ModDate end up in the indirectObjects list. I assume because they don't get associated with the trailer -> Info. Currently I workaround it: import {
PDFDict,
PDFDocument,
PDFHexString,
PDFName,
PDFObject,
PDFString
} from "pdf-lib"
function getDates(doc: PDFDocument) {
const cdate = doc.getCreationDate()
const mdate = cdate && doc.getModificationDate()
if (cdate) {
return {cdate, mdate}
}
// see https://github.com/Hopding/pdf-lib/discussions/1477
const alt = (() => {
function tryParseDate(obj?: PDFObject) {
if (obj instanceof PDFString || obj instanceof PDFHexString) {
return obj.decodeDate()
}
}
for (const [_ref, obj] of doc.context.enumerateIndirectObjects()) {
if (!(obj instanceof PDFDict)) {
continue
}
const cdateAlt = obj.lookup(PDFName.CreationDate)
const mdateAlt = cdateAlt && obj.lookup(PDFName.ModDate)
if (cdateAlt) {
return {cdate: tryParseDate(cdateAlt), mdate: tryParseDate(mdateAlt)}
}
}
})()
return {
cdate: alt?.cdate,
mdate: alt?.mdate
}
} Inferred return type is: {
cdate: Date | undefined;
mdate: Date | undefined;
} Yes, technically pdf might have none - depending on the properties / compliance, so its okay. |
Beta Was this translation helpful? Give feedback.
I figured out the /CreationDate and /ModDate end up in the indirectObjects list. I assume because they don't get associated with the trailer -> Info.
Currently I workaround it: