Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification Request: Cross-Reference Stream Behavior in ISO 32000-2:2020 #500

Open
RahamanMdEkhlasur opened this issue Dec 6, 2024 · 6 comments
Assignees
Labels
bug Something isn't correct proposed solution Proposed solution is ready for review

Comments

@RahamanMdEkhlasur
Copy link

I am reaching out to request clarification regarding certain behaviors of cross-reference streams in the PDF specification, as detailed in ISO 32000-2 Second edition, Section 7.5.8.2 Specifically, I am investigating the W array (Table 17; Page:67) in cross-reference stream dictionaries. Below are the detailed scenarios and questions for which clarification would be greatly appreciated.

1. Default Behavior for Type Field Omission
According to the specification, "If the first element [of the W array] is zero, the type field shall not be present, and shall default to Type 1." My understanding is that this behavior eliminates the need to explicitly store Type 1 objects, which represent regular, non-compressed objects.

Questions:

  • If the Type field (first element) is 0 then defaults to Type 1, why is it necessary to define Type 0 entries explicitly (Table 18)? Could the specification not rely solely on the default mechanism for active objects while omitting free objects altogether?
  • Are there specific scenarios where the absence of Type 0 entries would cause issues in parsing or reusing free objects?

2. Prohibition of Zero for the Second Element in the W Array
The specification states, "A value of zero shall not be used for the second element of the W array." The second element represents the byte offset for both Type 1 and Type 2 entries. This requirement ensures that all entries provide a valid offset for locating the object or its containing stream.
However, Field 2 of Type 1 (offset) includes "default value: 0," which seems contradictory to the prohibition.

Questions:

  • Could you confirm if the default value of 0 for Field 2 applies only when W[1] = 0, indicating that the offset field is omitted?
  • In practice, what should happen when the offset defaults to 0? For example, does it imply that the object resides at the start of the file? How is this practically interpreted?

We greatly value the expertise and insights of the PDF Association and would deeply appreciate any clarifications or references to further documentation that might address these points.

@RahamanMdEkhlasur RahamanMdEkhlasur added the bug Something isn't correct label Dec 6, 2024
@mkl-public
Copy link

mkl-public commented Dec 6, 2024

(Answering as I read the specification; may differ from an 'official' reading.)

1. Default Behavior for Type Field Omission
* If the Type field (first element) is 0 then defaults to Type 1, why is it necessary to define Type 0 entries explicitly (Table 18)? Could the specification not rely solely on the default mechanism for active objects while omitting free objects altogether?

Please remember that cross-reference streams merely are an alternative way to store cross-reference information with the option to store new object types. Thus, the requirements in the cross-reference table section that are not specific to the structure of the xref table apply to cross-reference streams as well. And there you find:

The cross-reference table (comprising the original cross-reference section and all update sections) shall contain one entry for each object number from 0 to the maximum object number defined in the PDF file, even if one or more of the object numbers in this range do not actually occur in the PDF file.

Thus, if you have free objects numbers in the object number range, you need to have free object number entries, even if you use cross-reference streams. In particular, you cannot simply omit them altogether.

Ok, that was the strict answer...

* Are there specific scenarios where the absence of Type 0 entries would cause issues in parsing or reusing free objects?

No, no technical issues, but nonetheless a violation of the spec which a PDF processor need not accept.

While the above said is true - cross reference entries for all object numbers in the object number range of the document are required by the spec -, there are some PDF producers that create sparse cross references, i.e. omit free object entries. Thus, any PDF processor that wants to be able to process real world PDFs, must somehow be able to handle such sparse tables, usually by assuming missing entries to represent free object numbers or at least by failing gently, rejecting the document as broken.

2. Prohibition of Zero for the Second Element in the W Array

Here I think that the prohibition of W[1] == 0 is the relevant part, you cannot omit the second field and, therefore, the default value for type 1, field 2 makes no sense. Thus, there is no need to wonder about the meaning of offset 0 entries.

@RahamanMdEkhlasur
Copy link
Author

@mkl-public
Thank you so much for clarifying the behavior of Cross-Reference Streams. I appreciate your detailed explanation.

Regarding Question 1: I now understand that I missed the following important point:

"The requirements in the cross-reference table section that are not specific to the structure of the xref table apply to cross-reference streams as well.

This clears up why Type 0 entries for free objects cannot be omitted, as they are required by the broader cross-reference table rules.

For Question 2: I understand now that the second field in the W array cannot be omitted. However, I am still unclear on one point:
Are there any scenarios where the default value of 0 for Field 2 (offset) in Type 1 entries would actually be used? If so, how should this be interpreted in practice?

Thank you again for your assistance and insights. I greatly appreciate your expertise!

@mkl-public
Copy link

For Question 2: I understand now that the second field in the W array cannot be omitted. However, I am still unclear on one point:
Are there any scenarios where the default value of 0 for Field 2 (offset) in Type 1 entries would actually be used? If so, how should this be interpreted in practice?

Well, as the second field in the W array cannot be omitted, there always is a value for the field 2 in type 1 entries. Thus, in valid PDFs the default never kicks in.

One can of course wonder about invalid PDFs. But in that case we are essentially talking about repair strategies which are not the topic of the PDF specification (at least not in case of the cross references tables and streams)

@RahamanMdEkhlasur
Copy link
Author

@mkl-public

I would like to clarify the following point once more:

The specification states:
"A value of zero for an element in the W array indicates that the corresponding field shall not be present in the stream, and the default value shall be used, if there is one."
Additionally, it mentions:
"If the first element is zero, the type field shall not be present, and shall default to Type 1."

Based on these statements, can we conclude that the default value of 0 for Field 1 of Type 0 (as outlined in Table 18 — Entries in a cross-reference stream) is unnecessary?

If so, would it not be more appropriate to remove the unnecessary default value of 0 from the specification? Or am I misunderstanding something?

@mkl-public
Copy link

If so, would it not be more appropriate to remove the unnecessary default value of 0 from the specification? Or am I misunderstanding something?

That's also how I perceive this.

@petervwyatt I think you can consider the removal of that unnecessary default as proposed solution of this issue.

@petervwyatt
Copy link
Member

Proposed solution as per above: delete "Default value: 0" from Table 18 for Type=1, Field=2 Description cell.

@petervwyatt petervwyatt self-assigned this Jan 6, 2025
@petervwyatt petervwyatt added the proposed solution Proposed solution is ready for review label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't correct proposed solution Proposed solution is ready for review
Projects
None yet
Development

No branches or pull requests

3 participants