Preparing environments for execution
Preparing a DataLad-enabled environment¶
On CHUV's cluster¶
When HPC is planned for processing, DataLad will be required on that system(s).
-
Start an interactive session on the HPC cluster
Do not run the installation of Conda and DataLad in the login node
HPC systems typically recommend using their login nodes only for tasks related to job submission, data management, and preparing jobscripts. Therefore, the execution of resource-intensive tasks such as fMRIPrep or building containers on login nodes can negatively impact the overall performance and responsiveness of the system for all users. Interactive sessions are a great alternative when available and should be used when creating the DataLad dataset. For example, in the case of systems operating SLURM, the following command would open a new interactive session:
-
Install DataLad. Generally, the most convenient and user-sandboxed installation (i.e., without requiring elevated permissions) can be achieved by using Conda, but other alternatives (such as lmod) can be equally valid:
-
Get and install Conda if it is not already deployed in the system:
-
Install DataLad:
-
Check the availability and dependencies for a specific Python version (here we check 3.8.2):
-
Load Python (please note
ml
below is a shorthand formodule load
) -
Update pip:
-
Install DataLad:
-
-
Check datalad is properly installed, for instance:
DataLad crashes (Conda installations)
DataLad may fail with the following error:
ImportError: cannot import name 'getargspec' from 'inspect' (/home/users/cprovins/miniconda3/lib/python3.11/inspect.py)
In such a scenario, create a Conda environment with a lower version of Python, and re-install datalad
-
Configure your Git identity settings.
On UNIL's Curnagl¶
Do not run the installation on the login node
HPC systems typically recommend using their login nodes only for tasks related to job submission, data management, and preparing jobscripts. Therefore, the execution of resource-intensive tasks such as fMRIPrep or building containers on login nodes can negatively impact the overall performance and responsiveness of the system for all users. Interactive sessions are a great alternative when available and should be used when creating the DataLad dataset. For example, in the case of systems operating SLURM, the following command would open a new interactive session:
-
Install Micromamba following Curnagl's instructions:
-
Add the following two lines to your
~/.bashrc
file: -
Instruct Micromamba to update your profile issuing the following command line:
- Log out and back in
-
-
Create a new environment called
datamgt
with Git annex in it: - Activate the environment
- Install DataLad and DataLad-next:
-
Configure your Git identity settings.
Getting data¶
Installing the original HCPh dataset with DataLad¶
Wherever you want to process the data, you'll need to datalad install
it before you can pull down (datalad get
) the data.
To access the metadata (e.g., sidecar JSON files of the BIDS structure), you'll need to have access to the git repository that corresponds to the data (https://github.com/<organization>/<repo_name>.git)
To fetch the dataset from the RIA store, you will need your SSH key be added to the authorized keys at Curnagl.
Getting access to the RIA store
These steps must be done just once before you can access the dataset's data:
- Create a secure SSH key on the system(s) on which you want to install the dataset.
- Send the SSH public key you just generated (e.g.,
~/.ssh/id_ed25519.pub
) over email to Oscar at *@****.
-
Install the dataset:
-
Reconfigure the RIA store:
micromamba run -n datamgt \ git annex initremote --private --sameas=ria-storage \ curnagl-storage type=external externaltype=ora encryption=none \ url="ria+file://<path>"
REQUIRED step
When on Curnagl, you'll need to convert the
ria-storage
remote on a localria-store
because you cannot ssh from Curnagl into itself. -
Get the dataset:
Data MUST be fetched from a development node.
The NAS is not accessible from compute nodes in Curnagl.
-
Execute
datalad get
within a development node:Success is demonstrated by an output like:
-
Fetch the data:
-
Installing derivatives¶
Derivatives are installed in a similar way:
-
Install the dataset:
-
Reconfigure the RIA store:
-
Fetch the data
Registering containers¶
We use DataLad containers-run to execute software while keeping track of provenance. Prior to first use, containers must be added to DataLad as follows (example for MRIQC):
-
Register the MRIQC container to the dataset
datalad containers-add \ --call-fmt 'singularity exec --cleanenv -B {{${HOME}/tmp/}}:/tmp {img} {cmd}' \ mriqc \ --url docker://nipreps/mriqc:23.1.0
Insert relevant arguments to the
singularity
command line with--call-fmt
In the example above, we configure the container's call to automatically bind (
-B
flag to mount the filesystem) the temporary folder. MRIQC will store the working directory there by default. Please replace the path with the appropriate path for your settings (i.e., laptop, cluster, etc.).Pinning a particular version of MRIQC
If a different version of MRIQC should be executed, replace the Docker image's tag (
23.1.0
) with the adequate version tag within the above command line.