You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think this is caused by a change in assets: OCR-D/assets@b12e5eb, which was supposed to fix OCR-D/assets#87, but does not work.
Here is a debug log of what actually happens when copying the workspace to a temporary location:
DEBUG ocrd.resolver.workspace_from_url:resolver.py:164 workspace_from_url
mets_basename='mets.xml'
mets_url='/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data/mets.xml'
src_baseurl='/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data'
dst_dir='/tmp/test-ocrd-calamari'
DEBUG ocrd.resolver.download_to_directory:resolver.py:49 directory=|/tmp/test-ocrd-calamari| url=|/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data/mets.xml| basename=|mets.xml| if_exists=|skip| subdir=|None|
DEBUG ocrd.resolver.download_to_directory:resolver.py:99 Copying file '/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data/mets.xml' to '/tmp/test-ocrd-calamari/mets.xml'
DEBUG ocrd.workspace.download_file:workspace.py:142 download_file <OcrdFile fileGrp=OCR-D-IMG ID=OCR-D-IMG_0001, mimetype=image/tiff, url=OCR-D-IMG/INPUT_0017.tif, local_filename=OCR-D-IMG/INPUT_0017.tif]/> [_recursion_count=0]
DEBUG ocrd.resolver.download_to_directory:resolver.py:49 directory=|/tmp/test-ocrd-calamari| url=|OCR-D-IMG/INPUT_0017.tif| basename=|OCR-D-IMG_0001.tif| if_exists=|skip| subdir=|OCR-D-IMG|
DEBUG ocrd.workspace.download_file:workspace.py:158 First run of resolver.download_to_directory(OCR-D-IMG/INPUT_0017.tif) failed, try prepending baseurl '/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data': File path passed as 'url' to download_to_directory does not exist: OCR-D-IMG/INPUT_0017.tif
DEBUG ocrd.workspace.download_file:workspace.py:142 download_file <OcrdFile fileGrp=OCR-D-IMG ID=OCR-D-IMG_0001, mimetype=image/tiff, url=/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data/OCR-D-IMG/INPUT_0017.tif, local_filename=OCR-D-IMG/INPUT_0017.tif]/> [_recursion_count=1]
DEBUG ocrd.resolver.download_to_directory:resolver.py:49 directory=|/tmp/test-ocrd-calamari| url=|/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data/OCR-D-IMG/INPUT_0017.tif| basename=|OCR-D-IMG_0001.tif| if_exists=|skip| subdir=|OCR-D-IMG|
DEBUG ocrd.resolver.download_to_directory:resolver.py:99 Copying file '/ocrd_calamari/test/assets/kant_aufklaerung_1784-page-region-line-word_glyph/data/OCR-D-IMG/INPUT_0017.tif' to '/tmp/test-ocrd-calamari/OCR-D-IMG/OCR-D-IMG_0001.tif'
So, essentially, Resolver.workspace_from_url undoes the non-standard path names when downloading, and subsequently the @imageFilename reference does not work (again).
@kba I suppose we could fix this in assets by using standard basenames, but it looks more like a bug in core to me.
IOW, when you have a partial clone of a local workspace, and you attempt to download some of its files, the following happens:
chdir to the clone's Workspace.directory (the only reference to the original workspace is in Workspace.baseurl now)
resolving the relative local URL fails
"downloading" it fails
a recursive attempt is started with the absolute local URL (from baseurl + url)
chdir to the same directory again
resolving the absolute local URL fails
downloading it into ID+ext succeeds ← this changes the relative local URL though
further down the line, for Workspace.resolve_image_exif or Workspace.image_from_page, via a PAGE-XML's @imageFilename the old relative local URL is requested
it cannot be not found
Me feeling is that 7 is wrong – we should at least keep the old relative URL.
But what if some PAGE files in the workspace to be cloned even contain remote references for @imageFilename?
I think this is caused by a change in assets: OCR-D/assets@b12e5eb, which was supposed to fix OCR-D/assets#87, but does not work.
Here is a debug log of what actually happens when copying the workspace to a temporary location:
So, essentially,
Resolver.workspace_from_url
undoes the non-standard path names when downloading, and subsequently the@imageFilename
reference does not work (again).@kba I suppose we could fix this in assets by using standard basenames, but it looks more like a bug in core to me.
Originally posted by @bertsky in OCR-D/ocrd_calamari#73 (comment)
The text was updated successfully, but these errors were encountered: