Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%20 is not escaped in image file names/URLs for LaTeX #13189

Open
Paebbels opened this issue Dec 22, 2024 · 4 comments
Open

%20 is not escaped in image file names/URLs for LaTeX #13189

Paebbels opened this issue Dec 22, 2024 · 4 comments

Comments

@Paebbels
Copy link

Paebbels commented Dec 22, 2024

Describe the bug

When a shield from Shields.io is embedded in the documentation, the URL might contain %20 for a space (or other escaped characters). This URL is used as filename without escaping/rewriting the percent-sign by Sphinx. When translating the document afterwards with LaTeX (xelatex), the document structure is broken, due to a comment sign (%).

Suggestion:

  • Remove all occurrences of %, or
  • rename all occurrences of % to _.

How to Reproduce

Example URL: https://raster.shields.io/badge/doc-CC--BY%204.0-green.png

This generates a shield with CC-BY 4.0 as text.
%20 is the space character:

Generated LaTeX Code:

\sphinxAtStartPar
\sphinxhref{https://GitHub.com/pyTooling/Actions}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{pyTooling-Actions-63bf7f}.png}} \sphinxhref{https://GitHub.com/pyTooling/Actions/blob/main/LICENSE.md}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{pyTooling}.png}} \sphinxhref{https://pyTooling.github.io/pyTooling/}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{website}.png}} \sphinxhref{https://GitHub.com/pyTooling/Actions/blob/main/doc/License.rst}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{doc-CC--BY%2041}.0-green}}
\sphinxhref{https://GitHub.com/pyTooling/Actions/tags}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{Actions}.png}} \sphinxhref{https://GitHub.com/pyTooling/Actions/releases}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{Actions1}.png}}

Produced LaTeX Error message:

Actions.tex: Error: 108: Paragraph ended before \sphinxhref was complete.

Environment Information

Platform:              win32; (Windows-11-10.0.22631-SP0)
Python version:        3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)])
Python implementation: CPython
Sphinx version:        8.1.3
Docutils version:      0.21.2
Jinja2 version:        3.1.4
Pygments version:      2.18.0

Sphinx extensions

No extension needed for this bug.

Additional context

LaTeX environment: MikTeX (all updated)
LaTeX processor: xelatex

@jfbu
Copy link
Contributor

jfbu commented Dec 22, 2024

I observe that using \ (i.e. backslash + space) in the URL mark-up like this:

.. image:: https://raster.shields.io/badge/doc-CC--BY\ 4.0-green.png

"works": the LATEX build directory contains a fetched copy of the image file with an underscore replacing the space and correctly refers to it:

\noindent\sphinxincludegraphics{{doc-CC--BY_4.0-green}.png}

while the HTML file contains non escaped space character

<img alt="https://raster.shields.io/badge/doc-CC--BY 4.0-green.png" src="https://raster.shields.io/badge/doc-CC--BY 4.0-green.png" />

and Firefox displays correctly. I am not HTML-savvy enough to see if this is mark-up is generally accepted by browsers. If one control-clicks (macOS mouse) on the image and selects "Copy image link" Firefox reconstructs https://raster.shields.io/badge/doc-CC--BY%204.0-green.png.

Trying the same \ for local image files as in

.. image:: /images/doc-CC--BY\ 4.0-green.png

leads to another kind of behavior: the HTML will be with the %20 escape:

<img alt="_images/doc-CC--BY%204.0-green.png" src="_images/doc-CC--BY%204.0-green.png" />

On the other hand the LATEX file will be with an unescaped space but this is fine as the image file is copied to the LATEX build directory with no change of filename and the mark-up is

\noindent\sphinxincludegraphics{{doc-CC--BY 4.0-green}.png}

and is ok for the PDF build.

Using %20 in local references

.. image:: /images/doc-CC--BY%204.0-green.png

does not work even for HTML builds.

As per what is presumably related to your use case:

.. image:: https://raster.shields.io/badge/doc-CC--BY%204.0-green.png

the distant file is correctly fetched and is copied to the LATEX build directory but with filename being doc-CC--BY%204.0-green.png.
Then the latex mark-up

\noindent\sphinxincludegraphics{{doc-CC--BY%204.0-green}.png}

refers to the correct file but fails similarly as you reported due to % being special to LATEX.

Note that this does not involve \sphinxhref so it would be nice if you could provide some minimal example.

Trying to reproduce your problem led me to the above which does not involve \sphinxhref at all but the similar % error related to \sphinxincludegraphics.

If we use the URL without image directive this kind of TEX is produced:

A link \sphinxhref{https://raster.shields.io/badge/doc-CC--BY\%204.0-green.png}{https://raster.shields.io/badge/doc\sphinxhyphen{}CC\textendash{}BY\%204.0\sphinxhyphen{}green.png}

which produces a correct PDF (which does not have the image but only the link to the image).

I see how to let \sphinxincludegraphics work with \sphinxincludegraphics{{doc-CC--BY%204.0-green}.png} however this will complicate the Sphinx latex code and future maintenance.

Simpler would be for Sphinx when fetching the distant image file and copying it in the LATEX build directory to not keep the original %20 in the created filename but as you suggest use either a space or underscore character. Probably better the latter in view of what happens with \ input and also in view of the fact that spaces in filenames are evil.

But then do we also have to take care of all % prefixed HTML escapes... ?

@Paebbels
Copy link
Author

Any characters in a URLs can be escaped in hex (here: space = dec 32 = hex 0x20). Escaping is done with a percent character. In form data (after the ? character), + is used for spaces.

Using \ for escaping might work, but this is not according the specification of URLs.

I might consider this as a work around until Sphinx has a fix to handle URLs with escaped hex-characters.


My ReST has 2 images, one for HTML in SVG format and one in PNG format for LaTeX. So I can specify different URLs if needed.

@skoehler
Copy link

Or you properly escape the percent sign when generating the LaTeX code. This works:

\documentclass{article}
\usepackage{graphicx}

\catcode`\%=12
\newcommand\pcnt{%}
\catcode`\%=14

\begin{document}
\includegraphics[width=1cm]{test\pcnt ing.png}
\end{document}

Note the space after \pcnt. Found the workaround here: https://tex.stackexchange.com/questions/351123/percentage-sign-and-space-in-image-filename

@skoehler
Copy link

skoehler commented Dec 22, 2024

And before you ask, here's how to escape curly braces:

\documentclass{article}

\usepackage{graphicx}

\catcode`\%=12
\catcode`\{=12
\catcode`\}=12
\newcommand\pcnt%
\newcommand\curlya{
\newcommand\curlyb}
\catcode`\%=14
\catcode`\{=1
\catcode`\}=2

\begin{document}
\includegraphics[width=1cm]{test.png}
\includegraphics[width=1cm]{test ing.png}
\includegraphics[width=1cm]{test^ing.png}
\includegraphics[width=1cm]{test_ing.png}
\includegraphics[width=1cm]{test\pcnt ing.png}
\includegraphics[width=1cm]{test\curlya ing.png}
\includegraphics[width=1cm]{test\curlyb ing.png}
\end{document}

Paebbels added a commit to pyTooling/Actions that referenced this issue Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants