Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with multiple page item from IIIF training #46

Closed
glenrobson opened this issue Dec 14, 2023 · 8 comments
Closed

Issue with multiple page item from IIIF training #46

glenrobson opened this issue Dec 14, 2023 · 8 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@glenrobson
Copy link
Collaborator

This came up in the IIIF training:

https://archive.org/details/st-anthony-relics-01/

It contains 5 images but the v3 manifets contains 1 image:

https://iiif.archive.org/iiif/3/st-anthony-relics-01/manifest.json

and the v2 manifest doesn't work:

https://iiif.archive.org/iiif/2/st-anthony-relics-01/manifest.json

@digitaldogsbody
Copy link
Collaborator

This is caused by the same issue we were discussing on Slack with Sara, our code expects type image to just be a single image, and anything multiple uses texts and the <identifier>_images.zip file construction.

Should be an easy enough fix, we can just look to see how many image files with type: original there are in the metadata when processing an image type record

@digitaldogsbody
Copy link
Collaborator

Although looking at the item in question, I think the mediatype might be wrong, as in addition to the static images in the top level directory, there are also zipfiles with JP2s that look as if they have been processed by the BookReader code, but because the mediatype is not texts, you can't access them in the IA interface (and we would never expose them via IIIF)

@digitaldogsbody
Copy link
Collaborator

@digitaldogsbody digitaldogsbody added the enhancement New feature or request label Dec 14, 2023
@digitaldogsbody
Copy link
Collaborator

The V2 manifest issue may be related - the code downloads the item in order to open it with PIL to get dimension data etc, but it is quite naive and it ends up downloading the first original file from the item, which here is one of the PDFs, so then PIL errors out:

[2023-12-14 11:01:28,724] ERROR in app: Exception on /iiif/2/st-anthony-relics-01/manifest.json [GET]
Traceback (most recent call last):
  File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/mike/projects/iiif.archive.org/iiify/app.py", line 208, in manifest2
    return ldjsonify(create_manifest(identifier, domain=domain, page=page))
  File "/home/mike/projects/iiif.archive.org/iiify/resolver.py", line 208, in create_manifest
    info = web.info(domain, path)
  File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/iiif2/web.py", line 32, in info
    w, h = Image.open(path).size
  File "/home/mike/projects/iiif.archive.org/venv/lib/python3.10/site-packages/PIL/Image.py", line 3283, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file 'media/st-anthony-relics-01'

This is the file that it downloads:

(venv) mike@revelator:~/projects/iiif.archive.org$ ls -lta media/st-anthony-relics-01 
-rw-rw-r-- 1 mike mike 8653052 Dec 14 11:00 media/st-anthony-relics-01
(venv) mike@revelator:~/projects/iiif.archive.org$ file media/st-anthony-relics-01 
media/st-anthony-relics-01: PDF document, version 1.3

And the matching type and bytesize from the metadata:

{
  "created": 1702519976,
  "d1": "ia601202.us.archive.org",
  "d2": "ia801202.us.archive.org",
  "dir": "/22/items/st-anthony-relics-01",
  "files": [
    {
      "name": "ASB-Consorzio.pdf",
      "source": "original",
      "mtime": "1702313965",
      "size": "8653052",
      "md5": "4f9f26a566c797410ebd05b58596e8de",
      "crc32": "ad52676c",
      "sha1": "6e482b42e9d670088fa4816f235345f0b195c6e8",
      "format": "Image Container PDF",
      "viruscheck": "1702314437"
    },
<snip>

So I think I would say that we should deal with the multiple images in an image mediatype object (regardless of whether this object should actually be texts), but the V2 issue is a wontfix.

@glenrobson glenrobson added the bug Something isn't working label Jan 18, 2024
@glenrobson
Copy link
Collaborator Author

Duplicate of #52

@glenrobson glenrobson marked this as a duplicate of #52 Jan 18, 2024
@benwbrum benwbrum added this to the Spring 2024 sprint milestone Jan 18, 2024
@glenrobson glenrobson self-assigned this Jan 18, 2024
@glenrobson
Copy link
Collaborator Author

@glenrobson
Copy link
Collaborator Author

Re-target as a cantaloupe issue.

@glenrobson glenrobson assigned cdrini and unassigned glenrobson Feb 22, 2024
@glenrobson
Copy link
Collaborator Author

This is a very complicated example of:

#12

It contains a number of PDF documents and jpg images. Currently only the PDFs are shown in the Internet Archive viewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants