Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPC: Broken/Insufficient Testing of Load Balancer Functionality #63

Open
Schlevidon opened this issue Mar 5, 2024 · 1 comment
Open

Comments

@Schlevidon
Copy link
Collaborator

Mistakes related to the examples in hpc/test:

  • Helix/Makefile is outdated and unusable in its current state.
  • MultiplyBy2/client.py sends a request with too many input dimensions and crashes.
  • minimal and MultiplyBy2 do almost the same thing on the server side and should be merged into a single example or renamed appropriately.
  • Currently only Evaluate requests are tested.

Tasks:

  • Clean up and fix the current examples.
  • Add tests to ensure coverage of the entire UM-Bridge model interface, i.e
    • GetInputSizes and GetOutputSizes
    • Evaluate, Gradient, ApplyJacobian and ApplyHessian
    • SupportsEvaluate, SupportsGradient, SupportsApplyJacobian and SupportsApplyHessian
  • If possible: Add CI to automatically run tests.
    • It should be possible to use HQ without SLURM by using hq worker start instead of automatic allocation.
    • Alternatively, it might be possible to run a virtual SLURM cluster?
@linusseelinger
Copy link
Member

We might actually consider removing the examples in hpc entirely. There was a point in having them initially when the hpc code was in a separate repo, but now they are mostly duplicates of the existing (and CI tested) example models.

Running hq without SLURM sounds good for CI!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants