Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Multi-Threaded Generation of XDLRC Files #36

Open
ttown523 opened this issue Oct 18, 2016 · 8 comments
Open

Bug: Multi-Threaded Generation of XDLRC Files #36

ttown523 opened this issue Oct 18, 2016 · 8 comments
Assignees
Labels

Comments

@ttown523
Copy link
Collaborator

ttown523 commented Oct 18, 2016

I have observed that the multi-threaded generation of XDLRC files in Tincr hangs a significant fraction of the time. I resorted to specifying just one thread to get my file(s) generated.

Need to fix it to be more reliable.

This issue was originally posted in the RS2 repository by Dr. Nelson, but it is related to TINCR so I am moving it here.

@ttown523
Copy link
Collaborator Author

ttown523 commented May 2, 2017

This is still an issue! I tried running the command:

tincr::write_xdlrc -part xcku025-ffva1156-1-c -max_processes 4 -primitive_defs xcku025ffva1156_full.xdlrc

twice, and both times the process hung. One at 16% and the other at 22%. I have generated an XDLRC file using just one process, but it took almost 24 hours to complete on my machine for the specified part. I need to generate a new XDLRC, because there is a bug with XDLRC generation for ultrascale devices (will describe more in a future pull request). but don't want to wait 24 hours every time I find an issue to regenerate the XDLRC.

Can you reproduce the hanging bug and try to find a solution?

@nelsobe
Copy link
Member

nelsobe commented May 2, 2017

This has been an off-and-on issue for some time. I have sometimes gotten it to work and other times had it hang, just like Thomas. I think we should elevate it to a higher level since, as Thomas points out, debugging with these time scales is problematic.

Other suggestion: could we divide it up on the university supercomputer by parametrizing it via command line to have many CPUs speed it up?

@ttown523
Copy link
Collaborator Author

ttown523 commented May 3, 2017

@bradselw can you take a look at this when you get some time? I just had an XDLRC run stall at 98% even with the -max_processes flag set to 1.

@bradselw
Copy link
Collaborator

bradselw commented May 3, 2017

I'll take a look this weekend.

@bradselw
Copy link
Collaborator

bradselw commented May 8, 2017

Still having troubles reproducing this error. Ran this command overnight:

tincr::write_xdlrc -part xcku025-ffva1156-1-c -max_processes 4 -primitive_defs xcku025ffva1156_full.xdlrc

It completed successfully after 10 hours.

This appears to be an intermittent issue, possibly machine-related, which makes it difficult to debug. The best action at this point is probably to add better logging, to help diagnose the issue the next time it happens.

@bradselw
Copy link
Collaborator

bradselw commented May 9, 2017

@ttown523 are you sure you have enough space on your machine before running the command? Ideally, the drive you are running/saving the command on will have free space that is greater than twice the size of the file you are generating. I am looking into adding checks to write_xdlrc for disk space, so that it can detect if it is running out of space and print a nice message.

@ttown523
Copy link
Collaborator Author

ttown523 commented May 9, 2017

Yes. I am installing the XDLRC files on my C: drive which has plenty of space for the ultrascale XDLRC (which was around 13 GB).

image

@DallonTG
Copy link
Member

DallonTG commented Apr 2, 2018

Just wanted to comment something I noticed a few weeks ago.

I am seeing this bug frequently on Windows 10. I also saw it occur once when I was only using a single process. Interestingly, I used to be running Ubuntu on this same machine. I don't think I ever saw this error occur on Ubuntu - if I did, it was very rare.

So, there is a possibility that this bug is OS-dependent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants