-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect bounding box on TimesNewRomanPSMT #749
Comments
Hi @grinay, would you be able to share a screenshot of the itextsharp bounding box? The bounding box here loosk correct to me, but I might be wrong. For words, the bounding boxes start from the baseline points. Maybe compare the bounding box of this word with the bounding boxes of its letters. |
Hi @BobLd , I've made a simple test with textsharp https://gist.github.com/grinay/ff5bba3c0b9b6a81f11413ca669583ff.
I was able to fix it without modifying pdfpig.
After this fix everything looks correct. I've tested on other documents, and it works. |
@grinay Thanks a lot for the great explanation of your fix. I had a look on my side and I think there's an easier way to achieve what you want to do. Can you try to compute the word bounding box using the following: foreach (var word in page.GetWords())
{
var first = word.Letters[0];
var last = word.Letters[word.Letters.Count - 1];
double x1 = first.GlyphRectangle.TopLeft.X;
double x2 = last.GlyphRectangle.TopRight.X;
double y1 = word.Letters.Max(l => l.GlyphRectangle.TopLeft.Y);
double y2 = word.Letters.Min(l => l.GlyphRectangle.BottomLeft.Y);
var bbox = new PdfRectangle(x1, y1, x2, y2); // This is the bbox you're looking for
DrawRectangle(bbox, canvas, redPaint, size.Height, Scale);
} I've pushed my code in https://github.com/BobLd/PdfPig/tree/issue-749-word-bbox (NB: this branch is not dirrectly based on Have a look in the test You can see (above) that the "difficulty" bbox is now what you want. One difference between my code and yours might be the bbox of words that do not contains letters with ascenders and descenders, for example "The" and "in", in "The difficulty in". |
Let's leave the issue openned as I'd like to do some further tests |
@BobLd yes looks good at the image. As I understand this will not work at the the current master branch right? We should wait until you merge your changes? And btw on image you shows it some words, like "program" bounding upper boundary shifter to the bottom, which was the reason we had to extend boxes up to 20%, as in other documents it was very incorrect for us. After I applied that fix with font descriptor, this problem disappears, and we removed code which extended the boxes. |
@grinay I think you can try my fix with the current PdfPig version, or with the latest pre-released version. But do let me know if that does not work. As you point out, I do think some issue remains (especially upper boundary problem) and your approach might be the solution. This is what I want to look into. I am refering to the upper boundary in your initial screenshot: Let's leave the issue open for now so that we keep that in mind |
Hi. I found an issue on the bounding boxes for font TimesNewRomanPSMT,
![image](https://private-user-images.githubusercontent.com/13212299/295513055-a4d007d2-0290-4465-8d08-a4fce5dbe2c4.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1NTk0ODYsIm5iZiI6MTczOTU1OTE4NiwicGF0aCI6Ii8xMzIxMjI5OS8yOTU1MTMwNTUtYTRkMDA3ZDItMDI5MC00NDY1LThkMDgtYTRmY2U1ZGJlMmM0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDE4NTMwNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTJjMWFkMDQzODY3NTg2MDg1MmIxYjYxNzNmMTI4NWNmZDJhNDlhZGY0Mzk2OGFhZWZiMTI4MjA0ZGRmNmM4NmImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.1ggFg640FN7CKC0YKC3tG9BJAzHMxWbYetP8OB29KAc)
document11.pdf
If you try to extract bounding box for word "difficulty" on the first page
you will see that bounding box shifted. I've tested that case in the itextsharp, and find out that it's using font descender to calculate the bounding box.
Looking into the code I can't find any usage of descender of the font. Is that correct? May you advice how to fix this?
The text was updated successfully, but these errors were encountered: