Before data acquisition (storage preparation)

DataLad must be version 1.0 or later

This project maintains data under version control thanks to DataLad¹. For instructions on how to setup DataLad on your PC, please refer to the official documentation. When employing high-performance computing (HPC), we provide some specific guidelines.

Please read the DataLad Handbook, especially if you are new to this tool

Creating a DataLad dataset¶

Designate a host and folder where data will be centralized. In the context of this study, the primary copy of data will be downloaded into <hostname>, under the path /data/datasets/hcph-pilot-sourcedata for the piloting acquisitions and /data/datasets/hcph-sourcedata for the experimental data collection.
Install the bids DataLad procedure provided from this repository to facilitate the correct intake of data and metadata:
```
PYTHON_SITE_PACKAGES=$( python -c 'import sysconfig; print(sysconfig.get_paths()["purelib"])' )
ln -s <path>/code/datalad/cfg_bids.py ${PYTHON_SITE_PACKAGES}/datalad/resources/procedures/
```
DataLad's documentation does not recommend this approach

For safety, you can prefer to use DataLad's recommendations and place the cfg_bids.py file in some of the suggested paths.

Check the new procedure is available as bids:

$ datalad run-procedure --discover
cfg_bids (/home/oesteban/.miniconda/lib/python3.9/site-packages/datalad/resources/procedures/cfg_bids.py) [python_script]
cfg_yoda (/home/oesteban/.miniconda/lib/python3.9/site-packages/datalad/resources/procedures/cfg_yoda.py) [python_script]
cfg_metadatatypes (/home/oesteban/.miniconda/lib/python3.9/site-packages/datalad/resources/procedures/cfg_metadatatypes.py) [python_script]
cfg_text2git (/home/oesteban/.miniconda/lib/python3.9/site-packages/datalad/resources/procedures/cfg_text2git.py) [python_script]
cfg_noannex (/home/oesteban/.miniconda/lib/python3.9/site-packages/datalad/resources/procedures/cfg_noannex.py) [python_script]

Learn more about the YODA principles (DataLad Handbook)

Create a DataLad dataset for the original dataset:

cd /data/datasets/
datalad create -c bids hcph-dataset

Configure a RIA store, where large files will be pushed (and pulled from when installing the dataset in other computers)

Creating a RIA sibling to store large files

cd hcph-dataset
datalad create-sibling-ria -s ria-storage --alias hcph-dataset \
        --new-store-ok --storage-sibling=only \
        "ria+ssh://<username>@curnagl.dcsr.unil.ch:<absolute-path-of-store>"

Getting [ERROR ] 'SSHRemoteIO' ...

If you encounter:

[ERROR ] 'SSHRemoteIO' object has no attribute 'url2transport_path'

Type in the following Git configuration (datalad/datalad-next#754):

git config --global --add datalad.extensions.load next

Configure a GitHub sibling, to host the Git history and the annex metadata:

Creating a GitHub sibling to store DataLad's infrastructure and dataset's metadata

datalad siblings add --dataset . --name github \
        --pushurl git@github.com:<organization>/<repo_name>.git \
        --url https://github.com/<organization>/<repo_name>.git \
        --publish-depends ria-storage

Synchronizing your DataLad dataset¶

Once the dataset is installed, new sessions will be added as data collection goes on. When a new session is added, your DataLad dataset will remain at the same point in history (meaning, it will become out-of-date).

Pull new changes in the git history. DataLad will first fetch Git remotes and merge for you.

cd hcph-dataset/  # <--- cd into the dataset's path
datalad update -r --how ff-only

If you need the data, now you can get the data as usual:

find sub-001/ses-pilot019 -name "*.nii.gz" | xargs datalad get -J 8

Adding data or metadata¶

Use datalad save indicating the paths you want to add, and include --to-git if the file contains only metadata (e.g., JSON files).

Adding data files (e.g., NIfTI and compressed TSV files)Adding metadata files

find sub-001/ses-pilot019 -name "*.nii" -or -name "*.nii.gz" -or -name "*.tsv.gz" | \
    xargs datalad save -m '"add(pilot019): new session data (NIfTI and compressed TSV)"'

find sub-001/ses-pilot019 -name "*.json" -or -name "*.tsv" -or -name "*.bvec" -or -name "*.bval" | \
    xargs datalad save -m '"add(pilot019): new session metadata (JSON, TSV, bvec/bval)"'