Skip to content

Commit

Permalink
Initial docs from existing issue
Browse files Browse the repository at this point in the history
  • Loading branch information
will-moore committed Mar 28, 2024
1 parent 6594001 commit 3289c53
Show file tree
Hide file tree
Showing 2 changed files with 175 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ Plugin to swap OMERO filesets with NGFF
Usage
=====

For the full workflow used to update IDR with NGFF data, see
docs.md.


To create sql containing required functions and run it:

::
Expand Down
171 changes: 171 additions & 0 deletions docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
Once a submission has been processed by BioStudies, it will become available at a URL like:
https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD815.html

We need to harvest the uuids from the links in the "Viewable Images" table. We can do this with the following JavaScript code, pasted into the `Console` tab of the browser dev tools:

```
let csv = "";
$("#viewable tbody tr").each(function() {
let $this = $(this);
if ($("a", $this).length == 0) return
let uid = $( "a:first", $this).attr("href").replace(".html", "");
let zarrname = $( "td:nth-child(3)", $this).text().replace(".zip", "");
csv += `${zarrname},${uid}\n`
});
console.log(csv);
```
Which will print something like this:
```
idr0051/180712_H2B_22ss_Courtney_p00_c00_reg_preview.klb.ome.zarr,S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97
idr0051/embryo_dmso_2_new_17-00-44_p00_c00_reg_preview.klb.ome.zarr,S-BIAD815/b2633930-86b0-489e-a845-d2a7afe6ff15
idr0051/180712_H2B_22ss_Courtney1_20180712-163837_p00_c00_preview.ome.zarr,S-BIAD815/c49efcfd-e767-4ae5-adbf-299cafd92120
idr0051/2018-06-28_21ss_DMSO_TF_20180628-185945_p00_c00_reg_preview.ome.zarr,S-BIAD815/e12a8e2a-4fce-4579-a78b-b0c4597c3ada
```

That CSV is a table of `filesetName.ome.zarr, UUID`. We need to add the Fileset IDs from IDR to that table, using `idr-util` scripts from https://github.com/IDR/idr-utils/pull/56
That PR contains a file `idr_filesets.csv` which contains `Fileset ID, filesetName.ome.zarr` from IDR.
It also contains a script to take the csv from above and add the appropriate Fileset IDs (from `idr_filesets.csv`).

Checkout the `idr-utils` branch of that PR. This can be done on a local machine.
Copy the csv generated by JavaScript above and save it into a file like `idr-utils/scripts/ngff_filesets/idr0051.csv`. You will see some examples included in that PR.
Then run the script, passing in the IDR ID...

```
$ cd idr-utils/scripts/ngff_filesets
$ python parse_bia_uuids.py idr0051
```

This will update the csv file you just created, adding in the Fileset IDs to a new 3rd column.

Now we want to use that data with `omero-mkngff`.
We need to do everything as the `omero-server` user since we'll want to be able to create symlinks from the ManagedRep.

E.g. working on `idr0138-pilot`...

```
$ sudo -u omero-server -s
```

Created conda environment created as `omero-server` user, e.g. `mkngff` and installed omero-py and `omero-mkngff`

```
conda create -n mkngff -c conda-forge -c ome omero-py bioformats2raw
conda activate mkngff
pip install 'omero-mkngff @ git+https://github.com/IDR/omero-mkngff@main'
```

Get Database password (and host) needed for psql, and set these to env variables. Also set variable for `$IDRID` so you can copy and paste other commands from below...

```
export IDRID=idr0012
export OMERODIR=/opt/omero/server/OMERO.server
omero config get | grep omero.db.host
$ export DBHOST=192.168.10.231
omero config get --show-password | grep omero.db.pass
export PGPASSWORD=[********]
```
Use psql to get SECRET (last session ID). NB: for pilot servers we only have 1 process (as in this example). For other servers, update the `1` to `3` in this psql command:
```
psql -U omero -d idr -h $DBHOST -c "select uuid from (select * from session where node = 0 and owner = 0 and defaulteventtype = 'Sessions' order by id desc limit 1) x order by x.id asc limit 1;"
uuid
--------------------------------------
8add790d-7855-46f6-8239-c6a72937d572
(1 row)
export SECRET=8add790d-7855-46f6-8239-c6a72937d572
```

Copy the contents of `idr0051.csv` table from above (contains `Fileset ID` and `UUID`) and create a copy of the csv in the `omero-server` user's home dir...

```
$ cd
$ vi $IDRID.csv # paste in the csv contents from above
```

Now we can read that csv and create an sql file for each Fileset (named `FILESET_ID.sql`).
In the loop below, `biapath` is like `S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97` and `uuid` is like `51afff7c-eed4-44b4-95c7-1437d8807b97`.


The BIA s3 repository should be mounted under `/bia-integrator-data`:

```
sudo mkdir /bia-integrator-data && sudo /opt/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other bia-integrator-data /bia-integrator-data
```

Check that e.g. `$ ls /bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97/51afff7c-eed4-44b4-95c7-1437d8807b97.zarr` will give you `0 OME`
The`omero mkngff` command below also creates the symlinks we need, from the ManagedRepository to the s3-mounted data (if they don't already exist).

```
# first output sql functions and login...
omero mkngff setup > setup.sql
omero login
$ mkdir -p $IDRID
$ for r in $(cat $IDRID.csv); do
biapath=$(echo $r | cut -d',' -f2)
uuid=$(echo $biapath | cut -d'/' -f2)
fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
omero mkngff sql $fsid --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr" "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
done
# IF YOU WANT TO EXECUTE SQL IMMEDIATELY... include $SECRET and create symlinks...
$ for r in $(cat $IDRID.csv); do
biapath=$(echo $r | cut -d',' -f2)
uuid=$(echo $biapath | cut -d'/' -f2)
fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr" "/bia-integrator-data/$biapath/$uuid.zarr" >> "$IDRID/$fsid.sql" --bfoptions
psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
done
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Found prefix demo_2/Blitz-0-Ice.ThreadPool.Server-14/2018-11/26 // 10-39-49.639 for fileset 604306
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-14/2018-11/26/10-39-49.639
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-14/2018-11/26/10-39-49.639_converted/bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-14/2018-11/26/10-39-49.639_converted/bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97/51afff7c-eed4-44b4-95c7-1437d8807b97.zarr -> /bia-integrator-data/S-BIAD815/51afff7c-eed4-44b4-95c7-1437d8807b97/51afff7c-eed4-44b4-95c7-1437d8807b97.zarr
BEGIN
mkngff_fileset
----------------
5811532
(1 row)
COMMIT
...
```

**Running sql on a different server (using saved sql)**

Zip and copy sql to a different server.
Unzip and update the SECRET in all sql files, getting current `$SECRET` as above
The replace didn't work using `$SECRET` etc in the regex, so just use actual values...
`SECRETUUID` is the default placeholder if you didn't use `--secret` option to create sql.

```
$ for i in $(ls); do sed -i 's/SECRETUUID/fc5d3566-eea0-412c-849e-daa6d3c6bfcc/g' $i; done
```
We want to execute all sql, using the csv, and also to use `omero mkngff` to do just the symlink creation...
```
$ for r in $(cat $IDRID.csv); do
biapath=$(echo $r | cut -d',' -f2)
uuid=$(echo $biapath | cut -d'/' -f2)
fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
omero mkngff symlink /data/OMERO/ManagedRepository $fsid "/bia-integrator-data/$biapath/$uuid.zarr" --bfoptions
done
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/2017-03/07/16-50-40.721
Creating dir at /data/OMERO/ManagedRepository/demo_2/2017-03/07/16-50-40.721_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/2017-03/07/16-50-40.721_mkngff/e45c988b-945e-49d6-8c6a-7284a2b0525e.zarr -> /bia-integrator-data/S-BIAD848/e45c988b-945e-49d6-8c6a-7284a2b0525e/e45c988b-945e-49d6-8c6a-7284a2b0525e.zarr
```

Now we can try viewing the images in webclient.
NB: sometimes this can take a while for the memo file to be regenerated. To check on the timings you can use unique string from the fileset name

```
grep -A 2 "saved memo" /opt/omero/server/OMERO.server/var/log/Blitz-0.log | grep -A 2 "46.368_mkngff"
2023-08-29 12:21:51,993 DEBUG [ loci.formats.Memoizer] (l.Server-4) saved memo file: /data/OMERO/BioFormatsCache/data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-2/2023-05/11/22-57-46.368_mkngff/HT20.ome.zarr/OME/.METADATA.ome.xml.bfmemo (3838714 bytes)
2023-08-29 12:21:51,993 DEBUG [ loci.formats.Memoizer] (l.Server-4) start[1693309192879] time[2519114] tag[loci.formats.Memoizer.setId]
2023-08-29 12:21:51,995 INFO [ ome.io.nio.PixelsService] (l.Server-4) Creating BfPixelBuffer: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-2/2023-05/11/22-57-46.368_mkngff/HT20.ome.zarr/OME/METADATA.ome.xml Series: 0
```
E.g. `2519114` ms is 42 minutes.

0 comments on commit 3289c53

Please sign in to comment.