-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Form fields missing when rendering after flatten #140
Comments
Hi @bavardage , thank you for reporting the issue, and for your example code that demonstrates the problem. I can reproduce the problem. The Pdfium documentation on But it turns out all you need to do is drop and reload the page handle in order to get the flattened page: ...
let mut page = document.pages().first().expect("to get first page");
page.flatten()?;
// Close and reopen the page
let page = document.pages().first().expect("to get first page");
... Now the output will include the filled-in form fields. This work-around should help you in the short term. It would certainly be nice if |
Added page drop/auto-reload to In the meantime, you can take |
This is great - thanks so much! |
I realize this is a closed issue, however, I'm having the same problem. I have a multi-page PDF that maps HashMap values into the form: pub fn fill_form(
&self,
document: &mut PdfDocument,
data: HashMap<usize, HashMap<String, String>>,
) -> SystemSyncResult<()> {
let pages = document.pages();
for (page_index, page) in pages.iter().enumerate() {
if let Some(page_data) = data.get(&page_index) {
let annotations = page.annotations();
for (annotation_index, mut annotation) in annotations.iter().enumerate() {
if let Some(mut field) = annotation.as_form_field_mut() {
if let Some(field_name) = field.name() {
if let Some(value) = page_data.get(&field_name) {
if let Some(mut text_field) = field.as_text_field_mut() {
log_message!(
SysLogLevel::Debug,
"Page {}, text field {}: {:?} currently has value: {}",
page_index,
annotation_index,
field_name,
format!("{:?}", text_field.value()),
);
text_field.set_value(value)?;
log_message!(
SysLogLevel::Debug,
"Page {}, text field {}: {:?} now has updated value: {}",
page_index,
annotation_index,
field_name,
format!("{:?}", text_field.value()),
);
}
}
}
}
}
}
}
Ok(())
}
I call flatten: pub fn flatten(&self, document: &mut PdfDocument) -> SystemSyncResult<()> {
let pages = document.pages();
for mut page in pages.iter() {
page.flatten()?;
}
Ok(())
} And the PDF is blank(devoid of any content) post flatten operation. Commenting out the the flatten operation: pub fn safe_fill_form(
&self,
bytes: Vec<u8>,
data: HashMap<usize, HashMap<String, String>>,
copies: Option<usize>,
) -> SystemSyncResult<Vec<u8>> {
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
let mut document = self.load_pdf(bytes, copies)?;
self.fill_form(&mut document, data)?;
// self.flatten(&mut document)?;
self.save_to_bytes(&document)
})) {
Ok(result) => result,
Err(_) => Err(SystemError::new(
file!(),
line!(),
SystemErrorCode::GenericErrorCode,
SystemErrorType::GenericErrorType,
"PDF Production panicked with error",
)
.into()),
}
} Results in a filled PDF, but the form fields are still fillable(what I want to avoid). In other projects I have used iText extensively to produce millions of PDF's. The form fill feature, in my opinion, for business use cases, is one of the best features available in this library. Allowing designers to design the layout and merging data in is an excellent use case, thanks for your efforts on this. I'm using a PDF 1.3 variant form I created in Illustrator. I'm using the latest 0.8.22 crate: pub fn flatten(&mut self) -> Result<(), PdfiumError> {
// TODO: AJRC - 28/5/22 - consider allowing the caller to set the FLAT_NORMALDISPLAY or FLAT_PRINT flag.
let flag = FLAT_PRINT;
match self
.bindings()
.FPDFPage_Flatten(self.page_handle, flag as c_int) as u32
{
FLATTEN_SUCCESS => {
self.regenerate_content()?;
// As noted at https://bugs.chromium.org/p/pdfium/issues/detail?id=2055,
// FPDFPage_Flatten() updates the underlying dictionaries and content streams for
// the page, but does not update the FPDF_Page structure. We must reload the
// page for the effects of the flatten operation to be visible. For more information, see:
// https://github.com/ajrcarey/pdfium-render/issues/140
self.reload_in_place();
Ok(())
}
FLATTEN_NOTHINGTODO => Ok(()),
FLATTEN_FAIL => Err(PdfiumError::PageFlattenFailure),
_ => Err(PdfiumError::PageFlattenFailure),
}
} Let me know if I'm doing something wrong here. Thanks again for your time. |
Hi @chriskyndrid , thank you for adding to the issue. My first thought is that it may be something specific to your document. Are you able to share a page that exhibits the problem, along with a sample call to your |
Ok, I think I can reproduce the problem using the use pdfium_render::prelude::*;
use std::collections::HashMap;
pub fn main() -> Result<(), PdfiumError> {
let pdfium = Pdfium::new(Pdfium::bind_to_library(
Pdfium::pdfium_platform_library_name_at_path("../pdfium/"),
)?);
let mut document = pdfium.load_pdf_from_file("../pdfium/test/form-test.pdf", None)?; // Load the sample file...
let mut x = HashMap::new();
x.insert(
"form1[0].#subform[0].Pt1Line1a_FamilyName[0]".to_string(),
"foo".to_string(),
);
let mut y = HashMap::new();
y.insert(0, x);
fill_form(&mut document, y)?;
flatten(&mut document)?;
document.save_to_file("./output.pdf")?;
Ok(())
}
pub fn fill_form(
document: &mut PdfDocument,
data: HashMap<usize, HashMap<String, String>>,
) -> Result<(), PdfiumError> {
let pages = document.pages();
for (page_index, page) in pages.iter().enumerate() {
if let Some(page_data) = data.get(&page_index) {
let annotations = page.annotations();
for (annotation_index, mut annotation) in annotations.iter().enumerate() {
if let Some(field) = annotation.as_form_field_mut() {
if let Some(field_name) = field.name() {
if let Some(value) = page_data.get(&field_name) {
if let Some(text_field) = field.as_text_field_mut() {
println!(
"Page {}, text field {}: {:?} currently has value: {}",
page_index,
annotation_index,
field_name,
format!("{:?}", text_field.value()),
);
println!("Applying value: {}", value);
text_field.set_value(value)?;
println!(
"Page {}, text field {}: {:?} now has updated value: {}",
page_index,
annotation_index,
field_name,
format!("{:?}", text_field.value()),
);
}
}
}
}
}
}
}
Ok(())
}
pub fn flatten(document: &mut PdfDocument) -> Result<(), PdfiumError> {
let pages = document.pages();
for mut page in pages.iter() {
page.flatten()?;
}
Ok(())
} Running the sample produces the following debugging output:
When the call to |
My first thought was that dropping and reloading the page handle (which is what happens inside pub fn flatten(document: &mut PdfDocument) -> Result<(), PdfiumError> {
let pages = document.pages();
for mut page in pages.iter() {
page.bindings()
.FPDFPage_Flatten(page.bindings().get_handle_from_page(&page), 1);
// page.flatten()?;
}
Ok(())
} If the problem is in Pdfium itself, it might be hard to mitigate from |
Thanks for looking into this. I think it is intrinsic to pdfium. Regrettably I ran into other issues getting various readers to process and display the filled form data as well similar to what @liammcdermott noted elsewhere. In addition, even with the non flattened form fields every time I would send the document to print in chrome or Firefox, the data in the form fields would not render. I’ve observed this behavior in chrome before when filling forms manually too. This morning I wrote a solution for the form filling and mapping using the rust native lopdf crate. It’s working well, with some gotchas I had to address with non-ascii chars requiring conversion to UTF-16. I believe I can address the flatten requirement as well in lopdf and will work on that solution this weekend. I did not experience any issues with printing or rendering post fill in lopdf. It was more work, but I think may be the solution for now…and I can run parallel generation of pdf docs. In any case thanks again for your efforts on this and getting back to me so quickly. |
Yes, I agree with you that the problem is upstream; there's an open bug https://issues.chromium.org/issues/40584154 which looks related, and it's been open for five years :/ See also: https://issues.chromium.org/issues/42271909 Pdfium's flatten operation is at https://pdfium.googlesource.com/pdfium/+/main/fpdfsdk/fpdf_flatten.cpp#292. I would like to build this locally with some debugging output to see if I can figure out what's happening. The problem happens with both XFA and non-XFA (AcroForm) forms. It looks like |
Thanks again for all your effort and investigation on this. Although I have some concerns about lopdf and long term maintenance given activity level on the crate, I like the idea of Rust native solution. Once I get flattening working in lopdf, I’ll post the Struct that does all the work here in case somebody else has this need until pdfium is ready. Thanks again! |
I ended up implementing a solution with lopdf. I can now successfully fill forms in an edible manner or map straight to a flattened document. My implementation is about 1200 lines across 10 files or so, so I may end up creating a crate for this since there isn't really anything readily available in rust that is easy to work with for form manipulation/flattening. I forgot how much a pain working with PDF's is. My code runs the form fill/flatten operations in parallel across templates and uses cosmic_text for sizing text into the bounding boxes of a form, and will automatically reduce font size down to a threshold before truncating the text to work to fit the text into the desired region of the form bounding box on the page. It's all fairly alpha right now, but on my development machine mapping 100 fields across 4 pages (2, 2 page templates) takes about 15ms including Actix API overhead, postgres data retrieval and all the pdf operations (flatten, compress, cleanup, and merging ) in release build with lopdf. I have some issues to still work out with char encoding with flattened text, but otherwise it's working pretty good. I think you can close this issue. |
Thanks for the update, @chriskyndrid . Good to hear you found a solution. If you do end up making it public, I'd be keen to take a look at it because I'm kind of thinking a similar approach needs to happen here, where the best way to achieve a consistent flatten result in For that reason, I'll leave the issue open for now, although it may not make much progress in the short term. I'm curious as to whether you had to regenerate appearance streams for your flattened elements at any point? |
@ajrcarey what I have right now is fairly alpha and somewhat kludgy/hacky, but is working for my use case, which is currently only dealing with text fields. I did not have to regenerate the appearance streams to get things to work. This is an example that is currently specific to my codebase: For flatten operations I use the coordinates and handle all the text placement, sizing, line wrapping, etc manually. I'm specifically suggesting a font size, but my text sizing/placement routines will determine fitment down to some lower threshold (6pt in my case). use crate::log_message;
use crate::pdf::lopdf::cleanup::Cleanup;
use crate::pdf::lopdf::font::{Font, PDFFont};
use crate::pdf::lopdf::pdf::PDF;
use crate::pdf::lopdf::placement::Placement;
use crate::pdf::lopdf::utility::Utility;
use crate::prelude::*;
use erp_contrib::lopdf::content::{Content, Operation};
use erp_contrib::lopdf::{dictionary, Document, Object, ObjectId, Stream, StringFormat};
use std::collections::HashMap;
pub trait FormFlatten: Utility + Placement + Cleanup + Font {
fn fill_field_updates(
document: &Document,
fields: &Vec<Object>,
parent_name: Option<&str>,
parent_value: Option<&String>,
page_data: &HashMap<String, String>,
updates: &mut Vec<(ObjectId, ObjectId, (f32, f32, f32, f32), String)>,
) -> SystemSyncResult<()> {
for field_ref in fields {
let field_id = field_ref.as_reference()?;
let field = document.get_object(field_id)?.as_dict()?;
let name_string = if let Ok(name_obj) = field.get(b"T") {
if let Ok(name) = name_obj.as_str() {
let mut name_string =
Self::clean_string(&String::from_utf8_lossy(name).to_string());
if let Some(parent) = parent_name {
name_string = format!("{}.{}", parent, name_string);
}
Some(name_string)
} else {
None
}
} else {
None
};
let value = if let Some(name_string) = &name_string {
log_message!(
SysLogLevel::Debug,
"Value from page_data key: {:?}",
name_string
);
if let Some(val) = page_data.get(name_string) {
val.to_string()
} else {
"".to_string()
}
} else if let Some(parent_val) = parent_value {
parent_val.clone()
} else {
"".to_string()
};
log_message!(SysLogLevel::Debug, "Field ID: {:?}", field_id);
log_message!(SysLogLevel::Debug, "Field Name: {:?}", name_string);
log_message!(SysLogLevel::Debug, "Value: {:?}", value);
log_message!(SysLogLevel::Debug, "Field Dict: {:?}", field);
if let (Ok(page_id), Ok(rect)) = (
field.get(b"P").and_then(|p| p.as_reference()),
field.get(b"Rect").and_then(|r| r.as_array()),
) {
if rect.len() == 4 {
let x = rect[0].as_f32().unwrap_or(0.0);
let y = rect[1].as_f32().unwrap_or(0.0);
let width = rect[2].as_f32().unwrap_or(0.0) - x;
let height = rect[3].as_f32().unwrap_or(0.0) - y;
log_message!(
SysLogLevel::Debug,
"Page ID: {:?}, Rect: ({}, {}, {}, {})",
page_id,
x,
y,
width,
height
);
updates.push((field_id, page_id, (x, y, width, height), value.clone()));
}
}
if let Ok(kids_obj) = field.get(b"Kids") {
if let Ok(kids) = kids_obj.as_array() {
Self::fill_field_updates(
document,
kids,
name_string.as_deref(),
Some(&value),
page_data,
updates,
)?;
}
}
}
Ok(())
}
fn flatten_form_fields(
document: &mut Document,
page_data: &HashMap<String, String>,
font: PDFFont,
) -> SystemSyncResult<()> {
let mut updates = Vec::new();
// Collect all necessary information
let catalog = document
.trailer
.get(b"Root")
.and_then(|obj| obj.as_reference())?;
let catalog_dict = document.get_object(catalog)?.as_dict()?;
let acroform = catalog_dict
.get(b"AcroForm")
.and_then(|obj| obj.as_reference())?;
let acroform_dict = document.get_object(acroform)?.as_dict()?;
let fields_list = acroform_dict.get(b"Fields")?.as_array()?;
log_message!(SysLogLevel::Debug, "Fields List: {:?}", fields_list);
Self::fill_field_updates(document, fields_list, None, None, page_data, &mut updates)?;
log_message!(SysLogLevel::Debug, "Updates Collected: {:?}", updates);
// Process collected information
for (annotation_id, page_id, (x, y, width, height), value) in updates {
let suggested_font_size = 12.0;
let (lines_with_coords, optimal_font_size, _) = Self::get_text_placement(
&value,
x,
y + height, // Start from top of the box
width,
height,
suggested_font_size,
Some(font.clone()),
);
log_message!(SysLogLevel::Debug, "Lines: {:?}", lines_with_coords);
let mut operations = Vec::new();
operations.push(Operation::new("q", vec![]));
operations.push(Operation::new("BT", vec![]));
operations.push(Operation::new(
"Tf",
vec!["FS1".into(), optimal_font_size.into()],
));
if let Some((first_line, line_x, line_y)) = lines_with_coords.first() {
operations.push(Operation::new(
"Td",
vec![(*line_x).into(), (*line_y).into()],
));
operations.push(Operation::new(
"Tj",
vec![Object::String(
Self::to_ascii_unicode_point(first_line.clone()),
StringFormat::Hexadecimal,
)],
));
}
for i in 1..lines_with_coords.len() {
let (_, _, previous_y) = lines_with_coords[i - 1];
let (line, line_x, current_y) = &lines_with_coords[i];
let y_difference = previous_y - current_y;
operations.push(Operation::new("Td", vec![0.into(), (-y_difference).into()]));
operations.push(Operation::new(
"Tj",
vec![Object::String(
Self::to_ascii_unicode_point(line.clone()),
StringFormat::Hexadecimal,
)],
));
}
operations.push(Operation::new("ET", vec![]));
operations.push(Operation::new("Q", vec![]));
let content = Content { operations };
let content_id = document.add_object(Stream::new(dictionary! {}, content.encode()?));
// Update the page to include the new content and resources
if let Ok(page_obj) = document.get_object_mut(page_id) {
if let Ok(page_dict) = page_obj.as_dict_mut() {
// Add new content to existing contents
let contents = page_dict.get(b"Contents")?;
let mut contents_array = match contents {
Object::Array(arr) => arr.clone(),
Object::Reference(id) => vec![Object::Reference(*id)],
_ => vec![],
};
contents_array.push(Object::Reference(content_id));
page_dict.set("Contents", Object::Array(contents_array));
}
}
// Remove the annotation
document.objects.remove(&annotation_id);
}
Self::cleanup_pdf(document)?;
Ok(())
}
}
impl FormFlatten for PDF {}
For fill operations, I'm just filling the value (V) of existing annotations. use crate::log_message;
use crate::parallel::spawn_channel_task;
use crate::pdf::lopdf::form_flatten::FormFlatten;
use crate::pdf::lopdf::pdf::PDF;
use crate::pdf::lopdf::utility::Utility;
use crate::prelude::*;
use erp_contrib::crossbeam::channel::{unbounded, Receiver, SendError, Sender};
use erp_contrib::lopdf::{Document, Object, ObjectId};
use erp_contrib::{lopdf, rayon};
use std::collections::HashMap;
use std::sync::Arc;
pub trait FormFill: Utility {
fn fill_annotation_updates(
document: &Document,
fields: &Vec<Object>,
parent_name: Option<&str>,
data: &HashMap<String, String>,
annotation_updates: &mut Vec<(ObjectId, String)>,
) -> SystemSyncResult<()> {
for field_ref in fields {
let field_id = field_ref.as_reference()?;
let field = document.get_object(field_id)?.as_dict()?;
log_message!(SysLogLevel::Debug, "Field dictionary: {:?}", field);
let name_string = if let Ok(name_obj) = field.get(b"T") {
log_message!(SysLogLevel::Debug, "Got Name Obj: {:?}", name_obj);
if let Ok(name) = name_obj.as_str() {
log_message!(SysLogLevel::Debug, "Got Name");
let mut name_string =
Self::clean_string(&String::from_utf8_lossy(name).to_string());
if let Some(parent) = parent_name {
name_string = format!("{}.{}", parent, name_string);
}
log_message!(
SysLogLevel::Debug,
"Constructed Field Name: {}",
name_string
);
Some(name_string)
} else {
None
}
} else {
None
};
if let Some(name_string) = &name_string {
if let Some(value) = data.get(name_string) {
log_message!(
SysLogLevel::Debug,
"Field {} currently has value: {:?}",
name_string,
format!("{:?}", field.get(b"V")),
);
annotation_updates.push((field_id, value.clone()));
}
}
if let Ok(kids_obj) = field.get(b"Kids") {
if let Ok(kids) = kids_obj.as_array() {
Self::fill_annotation_updates(
document,
kids,
name_string.as_deref(),
data,
annotation_updates,
)?;
}
}
}
Ok(())
}
fn collect_annotation_updates(
document: &Document,
data: &HashMap<String, String>,
) -> SystemSyncResult<Vec<(ObjectId, String)>> {
let mut annotation_updates = Vec::new();
log_message!(SysLogLevel::Debug, "Hashmap Data: {:?}", data);
let catalog = document
.trailer
.get(b"Root")
.and_then(|obj| obj.as_reference())?;
let catalog_dict = document.get_object(catalog)?.as_dict()?;
let acroform = catalog_dict
.get(b"AcroForm")
.and_then(|obj| obj.as_reference())?;
let acroform_dict = document.get_object(acroform)?.as_dict()?;
let fields_list = acroform_dict.get(b"Fields")?.as_array()?;
log_message!(SysLogLevel::Debug, "Field list value: {:?}", fields_list);
Self::fill_annotation_updates(document, fields_list, None, data, &mut annotation_updates)?;
Ok(annotation_updates)
}
fn apply_annotation_updates(
document: &mut Document,
annotation_updates: Vec<(ObjectId, String)>,
) -> SystemSyncResult<()> {
for (annotation_id, value) in annotation_updates {
let cleaned_value = Self::clean_string(&value);
let utf16be_value = Self::to_utf16be(&cleaned_value);
if let Ok(mut annotation_object) = document.get_object_mut(annotation_id) {
if let Ok(mut dict) = annotation_object.as_dict_mut() {
dict.set(
"V",
Object::String(utf16be_value, lopdf::StringFormat::Literal),
);
}
}
log_message!(
SysLogLevel::Debug,
"Text field {:?} now has updated value: {}",
annotation_id,
cleaned_value,
);
}
Ok(())
}
fn fill_form(
&self,
data: Arc<HashMap<usize, HashMap<String, String>>>,
flatten: bool,
) -> SystemSyncResult<()>;
}
impl FormFill for PDF {
fn fill_form(
&self,
data: Arc<HashMap<usize, HashMap<String, String>>>,
flatten: bool,
) -> SystemSyncResult<()> {
let (tx_data, rx_data): (Sender<SystemSyncResult<()>>, Receiver<SystemSyncResult<()>>) =
unbounded();
for (i, document) in self.documents.iter().enumerate() {
let tx_clone = tx_data.clone();
let data_clone = data.clone();
let document_clone = document.clone();
let font = self.default_font.clone();
spawn_channel_task(tx_clone, move || {
let document = document_clone.clone();
if let Some(page_data) = data_clone.get(&i) {
let mut document_guard = document.write();
if (flatten) {
PDF::flatten_form_fields(&mut document_guard, page_data, font)?;
} else {
let annotation_updates =
PDF::collect_annotation_updates(&document_guard, page_data)?;
PDF::apply_annotation_updates(&mut document_guard, annotation_updates)?;
}
}
Ok(())
});
}
drop(tx_data);
while let Ok(result) = rx_data.recv() {
result?;
}
Ok(())
}
}
To put a crate together I'd need to clean things up and package it for distribution, which would take a bit. I'd want to expand it to support other form field types like checkboxes, etc. Right now I interact with it via: pub struct PDF {
pub documents: Vec<Arc<RwLock<Document>>>,
pub default_font: PDFFont,
pub copies: Option<usize>,
}
impl PDF {
pub fn new() -> Self {
Self {
documents: Vec::new(),
default_font: PDFFont::Courier,
copies: None,
}
}
pub fn set_font(mut self, font: PDFFont) -> Self {
self.default_font = font;
self
}
pub fn set_copies(mut self, copies: Option<usize>) -> Self {
self.copies = copies;
self
}
pub fn safe_fill_form(
&mut self,
bytes: Vec<u8>,
data: HashMap<usize, HashMap<String, String>>,
flatten: bool,
) -> SystemSyncResult<Vec<u8>> {
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(
|| -> SystemSyncResult<Vec<u8>> {
self.load_pdf(bytes, self.copies.clone())?;
let arc_data = Arc::new(data);
self.fill_form(arc_data.clone(), flatten)?;
self.save()
},
)) {
Ok(result) => result,
Err(_) => Err(SystemError::new(
file!(),
line!(),
SystemErrorCode::GenericErrorCode,
SystemErrorType::GenericErrorType,
"PDF Production panicked with error",
)
.into()),
}
}
} And in my case, a single source template is reused based on the number of pages/documents a caller wants. E.G. a multi-page invoice would use the same source template, the caller would request the number of pages the invoice requires(say 3), and organize the incoming Hashmap indexed by corresponding page/document (usize), where the inner map contains all the mappings for the page/document at that ordinal position. I clone the template (n) number of times, process the flatten or fill operations in parallel, and then merge the resulting document together, and pass them through optimization. The optimization logic deduplicates content and move everything into XObject's, deduplicates fonts, etc, as lopdf doesn't handle any of this and cloning out a 1000 page document, as an example, is going to result in a very large size due to redundant information(like fonts) getting cloned per page. It would be more efficient to deal with some of that as the pages our cloned out, rather than in a single optimization pass at the end, but that doesn't matter to me right now. Fields in the template, in my case, are delimited with a "." e.g. named as "question.1.prompt", and the PDF standard will expand that to A -> B -> C dictionaries, hence following the Kids to determine placement in the form. I used illustrator to design the test templates. |
Began work on an internal flatten implementation for pdfium-render behind feature flag |
Test whether upstream issue resolved by https://pdfium.googlesource.com/pdfium/+/a4682f9a0f699288e7b84992d2096340133b74ac. |
I may be misunderstanding how
flatten()
works, but doing something like this:examples/form_flatten.rs
and running as
cargo run --example form_flatten
end up with:
whereas I think i would have expected the form fields to be inlined and thus rendered (without any yellow bg)
something is happening because no yellow highlighted fields are rendered!
The text was updated successfully, but these errors were encountered: