Replies: 7 comments 2 replies
-
If you are running with
Obviously those variables would be substituted with the real values in your environment You mention Kubernetes so the one thing that comes to mind is how are you invoking your script? Are you calling it from an entrypoint script in your container image, manually from a shell, via the Whichever of those it is I would make sure you are properly escaping the arguments because it's possible that Note that the script honours the value of the In general it would be useful to confirm exactly which version of Jena you are using and how you invoke the |
Beta Was this translation helpful? Give feedback.
-
Thanks for such a prompt reply @rvesse, your right, I should have stated the version etc. The version I am using is It's good to know it should honour the |
Beta Was this translation helpful? Give feedback.
-
I'm note sure what's in 3.4.0 . There are important security upgrades and fixes.
More work has gone into the TDB2 loader. As well as a xloader style loader, there is a multithreaded loader which is easier to use. |
Beta Was this translation helpful? Give feedback.
-
This is all great stuff thanks, gives me avenues to investigate. If I were to upgrade, we use both
If I were to upgrade, what would be the |
Beta Was this translation helpful? Give feedback.
-
Some further background. The commands all work fine when I use say 300 files, but fails when I use 1000 files, meaning more files to work with and more data, the total will be closer to 2000+ files. It consistently fails during the sorting, so it can only be memory or disk-space, if you disagree though, please just say. The actual error I see is: /opt/apache-jena-3.4.0/bin/tdbloader2index: line 306: 109143 Killed sort $SORT_ARGS -u $KEYS < "$DATA" > $WORK The logging I'm seeing is: 12:09:41 DEBUG JVM Arguments are -Xmx1200M
12:09:41 DEBUG Jena Classpath is /opt/apache-jena-3.4.0/lib/*
12:09:41 INFO Index Building Phase
12:09:41 DEBUG Sort Arguments: --buffer-size=50% --parallel=3
12:09:41 DEBUG Sort Temp Directory: /var/lib/wims-staging/
12:09:41 DEBUG Sort Temp Directory is on disk //fac25d2a92a4740cf8686bc.file.core.windows.net/pvc-69ec4bea-4244-4249-aa61-a27910cbc6a7 which has 70% free space (113054253056 bytes)
12:09:41 INFO Creating Index SPO
12:09:41 DEBUG Size of data to be sorted is 8902588356 bytes
12:09:41 DEBUG Sufficient free space on database drive //fac25d2a92a4740cf8686bc.file.core.windows.net/pvc-ade1ddd5-a53f-421b-92b2-cf81a2144b94 to attempt sorting data file /fuseki/databases/green/DS-DB//data-triples.tmp (8902588356 bytes required from 201771515904 bytes free)
12:09:41 DEBUG Should be sufficient free memory (14374588416 bytes) for sort to be fully in-memory
12:09:41 INFO Sort SPO
12:09:41 DEBUG Sorting /fuseki/databases/green/DS-DB//data-triples.tmp into work file /fuseki/databases/green/DS-DB//SPO-txt Suggesting it can do it in memory. Since the error is in the |
Beta Was this translation helpful? Give feedback.
-
That's approaching 14G. I think |
Beta Was this translation helpful? Give feedback.
-
Thanks guys, I did the first suggestion of setting it to 1G and it has worked, so maybe I just have to force disk-based sorting. I'll try a couple of times though just in case, I will also try the options @afs added as well. Definitely heading in the right direction though, thanks for all the help. |
Beta Was this translation helpful? Give feedback.
-
Hi there,
I'm trying to run the
tdbloader2
command but it consistently fails during sorting, more annoyingly, it seems to fail silently. I have got it to work with a small number of files, about 300, but when I went up to 1000 (true total is 2048), then it successfully passes theLoad Data Phase
, but during theBuild Index Phase
it quitely fails onSPO
. I expect to see something likeCompleted index build SPO
and move on to the other indexes, but what I actually see is nothing, it starts indexing, obviously fails and quietly moves on to the next steps in my script. I have tried to run this manually, but the same thing, it just quietly fails and exists. I have tried adding--debug
and that didn't give me any indication as to the problem either, so I'm looking for some advice.It is running inside kubernetes with 85G free diskspace and allowed up to 16G of memory. I am have tried to set the
--sort-args=--temporary-directory=/var/lib/tmp
which is a mounted disk with GBs of diskspace, as I have experienced this problem before, on my local machine, but was able to clear out disk space so it was over 100Gb, but I cannot do that on the kubernetes pod, I'm restricted to what is available, which is why I'm trying to set the temporary-directory setting.Any advice would be greatly appreciated, thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions