Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up stale image checker #2530

Closed
Eric-Arellano opened this issue Dec 23, 2024 · 3 comments · Fixed by #2541
Closed

Speed up stale image checker #2530

Eric-Arellano opened this issue Dec 23, 2024 · 3 comments · Fixed by #2541
Assignees

Comments

@Eric-Arellano
Copy link
Collaborator

#2528 improved the stale image checker to be "fast enough" to run occasionally, but the program is still quite slow due to starting up ripgrep 3450 times. It takes several minutes on my mac.

I suspect a faster approach would be to read the markdown to extract all used images, such as by using the link checker, then compare that to the images we do have on disk. It would avoid subprocess calls. However, it's possible this solution would overwhelm system memory.

@arnaucasau
Copy link
Collaborator

What do you think about integrating the stale image checker with our alt text image checker in https://github.com/Qiskit/documentation/blob/main/scripts/js/commands/checkImages.ts? I think we could use the unified plugin to find the stale images:

visit(tree, "image", (node) => {
// Sphinx uses the image path as alt text if it wasn't defined using the
// :alt: option.
const imageName = last(split(node.url, "/"));
if (!node.alt || node.alt.endsWith(imageName!)) {
imagesErrors.add(`The image '${node.url}' does not have alt text.`);
}
});
visit(tree, "html", (node) => {

@Eric-Arellano
Copy link
Collaborator Author

That might be helpful to use the stale image checker. However, the stale image checker doesn't check historical API docs. Also, I realize we could probably implement #2538 such that we no longer even need to worry about checking API docs. If true, that would speed up this overall program enough automatically that we could avoid spending more dev resources on this ticket.

@Eric-Arellano Eric-Arellano self-assigned this Jan 2, 2025
@Eric-Arellano
Copy link
Collaborator Author

Indeed, #2541 fixes the performance problem 🚀

It's blocked by #2540, which I want to fix because that is a total gotcha for future contributors.

github-merge-queue bot pushed a commit that referenced this issue Jan 3, 2025
Closes #2538. We only
didn't delete the API images due to
#2540.

To fix #2540, we have to
fix our skip of release note images. We also had to manually move 4
release note images for legacy release note files from the latest Qiskit
images folder to instead the 0.40 and 0.43 Box artifacts, then manually
update those release notes to point to the paths. With those fixes and
the re-uploaded Box artifacts, `npm run regen-apis` doesn't make
changes.

This change allows us to have checkStaleImages.ts no longer worry about
API docs images, which dramatically speeds up the program. It's now fast
enough to run on every PR. Closes
#2530.

With ripgrep:

```
  Time (mean ± σ):      1.705 s ±  0.022 s    [User: 1.143 s, System: 1.085 s]
  Range (min … max):    1.660 s …  1.735 s    10 runs
```

With git grep (CI):

```
  Time (mean ± σ):      4.118 s ±  0.054 s    [User: 11.290 s, System: 1.092 s]
  Range (min … max):    4.044 s …  4.200 s    10 runs
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants