Data Submission

Overview

The data submission portal enables laboratories to submit data to the Brain Image Library. The key processes to submit data are:

  1. Create an account.
  2. Log in to the data submission portal.
  3. Create a submission: A submission is a collection of image sets associated with the metadata.
  4. Upload image data: Because this is the time-limiting step, we suggest this step be started before uploading metadata for the submission collection.
  5. Upload metadata: Submission collection metadata can be uploaded through the portal in spreadsheet format.
  6. Validate data: Users request to validate the data and correct any errors discovered.
  7. Data is made public: Once data is validated and metadata is curated, the submission is made public.

Definitions

The following terms in this document have specific context, as defined below:

Dataset
At BIL a dataset is a stand-alone entry of an image-volume or image-set associated with a single subject or experimental unit with unique metadata. A single dataset is usually associated with a single donor or subject when submitting multiple subjects in a collection, or a single part of the brain when imaging many parts of the brain in a collection. Many datasets make up a collection. A dataset typically contains many 2d image files that are assembled to form a more complex two or three-dimensional volume.

Submission
A submission contains one or more related datasets and the associated metadata. Submissions will inherit project metadata (such as the NIH project, grant number, laboratory name, etc.), thus all datasets within a submission must belong to the same project. A submission can contain a single or multiple datasets. In general, smaller submissions are recommended because all datasets within a submission must pass the validation process for the datasets within the submission to be published. Each “level” of data should be uploaded in separate submissions (e.g. The set of raw data and the same data aligned to a reference are considered two separate submissions).

Project
A way of grouping data to ensure that (1) People working with the data gain appropriate access and credit for their contributions and (2) Data is linked to the proper funding sources.

Publish
The act of making a submission that has passed all validation checks publicly available.

How to submit data

Once you have created your account by creating an ACCESS portal account, requested access to the BIL submission portal, and set your initial password, you are ready to continue to the submission portal.

1. Enter the submission portal

Visit submit.brainimagelibrary.org and enter your PSC username and password.

2. Define your projects (PIs and Data Managers Only)

Only users with PI/Data manager access with have the functions described in this step. If you need access to these functions,
have the PI send email to the bil-support@psc.edu with your BIL userid and name to request access.

The PI dashboard allows PIs and Data Managers to define and manage projects. This dashboard will allow the PI to add authorized users to the project and see the status of in-progress, unsuccessful, and successful submissions, regardless of the users who uploaded and submitted the data.

Add/edit projects

To enter (or edit) project information, Select the Manage Projects option. From this menu, you can create a new project and view both the personnel and submissions associated with each project.

To define a project select the Create a New Project button. Enter the project name and the grant number associated with the project in the Funded By: field.

If the project is a BICCN project, please be sure to tag the project appropriately and make sure that the Project Name field contains the project name assigned by BCDC.

Add users to a project

To add a user to the project (such as a data submitter) select the Manage Projects link from the main dashboard then select View Personnel for the project you would like to add the user to. Select the user from the list and select Submit at the bottom of the page.

3. Create a submission

There MUST be a project defined to create a BIL submission. See step 2 above for more details.

A submission contains one or more related datasets and associated metadata. Submissions will inherit project metadata (such as grant), thus all datasets within a submission must belong to the same project.

A submission can contain single or multiple datasets, but smaller submissions are recommended because all datasets within a submission must pass the validation process for the datasets within the submission to be published.

To create a submission, select the Submission sub-option of the New menu.

Next, enter the required metadata associated with the submission collection (see image below).

When you are done filling out the form, select the Save button. You will then be automatically taken to the metadata submission step.

When created, each BIL submission will have a unique 16-digit identifier associated with it (e.g. 6247417d691a4548) and a unique Dropbox-like landing zone directory (e.g. /bil/lz/username/6247417d691a4548. This landing zone directory is where the datasets belonging to the submission must be transferred for validation and ingestion processing.

4. Upload Descriptive Metadata

After the collection is created, you will automatically be taken to the metadata submission page. Alternatively, you can also access the metadata submission page from the submission portal by selecting New and then New Metadata in the drop-down menu.

If you are NOT ready to upload metadata at this time, click Cancel to exit the upload metadata spreadsheet step. When you are ready to load your metadata, return to the portal and select New and then New Metadata in the drop-down menu.

You can download the metadata sheet in the submit portal at this point. Next, you will select the appropriate metadata schema for your data. Be sure to choose the appropriate option as this will affect the way your metadata is ingested.

Metadata Schema Options

  1. Each dataset was generated from a single sample and donor (e.g. whole brain imaging) on the same instrument. Multiple datasets can be accommodated in a single submission. The lines of the specimen tab in the metadata spreadsheet must match exactly to the lines in the dataset tab (i.e. the specimen listed in specimen tab row 5 is the specimen for dataset row 5). Each dataset tab should have a corresponding image tab. There should be exactly one line in the instrument tab.

  2. The dataset was generated from multiple samples, specimens, or donors on the same instrument. Only a single dataset can be accommodated per submission. Multiple specimens should be listed in the metadata spreadsheet, but only a single dataset. There should be one line in the image tab. There should be exactly one line in the instrument tab.

  3. All datasets were generated from one specimen, but are from unique regions of interest on the specimen. Multiple datasets can be accommodated in a single submission. List the specimen multiple times in the specimen tab. Identify the specific region of interest in the locations column. Multiple donor lines must match exactly in the metadata spreadsheet (i.e. the specimen listed in specimen tab row 5 is the specimen for dataset row 5). Each dataset tab should have a corresponding image tab. There should be exactly one line in the instrument tab.

  4. All datasets were generated from one specimen – but used different experimental (machinery) parameters. Multiple datasets can be accommodated as a single submission. There should be exactly one entry in the specimen tab. The lines listed in the Instrument tab in the metadata spreadsheet must match exactly to the lines in the dataset tab (i.e. the instrument listed in instrument tab row 5 is the instrument for dataset row 5). Each dataset tab should have a corresponding image tab.

After a metadata spreadsheet is prepared, select the submission collection it will be associated with. Then select the Upload Metadata button to select your metadata spreadsheet and upload it. This button will not be available until a metadata schema is selected.

Note that while preparing the metadata spreadsheet, separate data directories need to be listed for each dataset in the submission.

5. Upload image data

The submission portal creates an upload landing zone space for you to transfer your image data to. To find this landing zone, select the Submissions sub-option of the View menu. The field “Data Path” shows the landing zone space for the submission.

The ingestion process currently supports native TIFF formats and is validated using the bioformats tool. For more information on the bioformats tool see:
https://docs.openmicroscopy.org/bio-formats/5.7.3/formats/index.html.

Note that separate data subdirectories are required for each dataset in the submission. For example, if you had an experiment containing 5 mouse datasets that you wanted to include as a single submission collection, you would create 5 subdirectories in the landing zone for the submission collection, one for each mouse dataset. e.g:

/bil/lz/testuser/abcdef0123456789/mouse1
/bil/lz/testuser/abcdef0123456789/mouse2
/bil/lz/testuser/abcdef0123456789/mouse3
/bil/lz/testuser/abcdef0123456789/mouse4
/bil/lz/testuser/abcdef0123456789/mouse5

The recommended practices to structure your data directories are further described here.

To tie metadata to an image dataset, each image dataset must be uploaded in a separate subdirectory.

Due to size, image data can not be uploaded through the submission portal (submit.brainimagelibrary.org). It must be uploaded separately through the BIL data transfer nodes, which are available at the virtual host: upload.brainimagelibrary.org. All users authorized to use the data submission portal (submit.brainimagelibrary.org) are also authorized to use the data transfer nodes (upload.brainimagelibrary.org). The username and passwords are the same on both systems.

Data Upload Methods

There are many supported ways to upload files into the landing zone directory through the BIL data transfer nodes including rsync, Globus, sftp, and scp.

rsync

Using rsync to upload.brainimagelibrary.org: An example uploading all data in the directory (mouse1) as user testuser through the data transfer node upload.brainimagelibrary.org to the submission collection landing zone (abcdef0123456789)is shown below:

$ rsync -lrtpDvP mouse1 testuser@upload.brainimagelibrary.org:/bil/lz/testuser/abcdef0123456789
sending incremental file list
mouse1/
mouse1/data1.tiff
     1356122 100%  126.20MB/s    0:00:00 (xfer#1, to-check=0/2)

sent 1356392 bytes  received 35 bytes  2712854.00 bytes/sec
total size is 1356122  speedup is 1.00
sftp

Using sftp to upload.brainimagelibrary.org: An example logging into the data transfer node as testuser is shown below. The first cd command is used to move to the BIL landing zone for the submission collection. The mkdir command is used to create a sub-directory (called mouse1 in the landing zone, while the second cd command is used to move to this subdirectory. Finally, the put command is used to upload data - in this case, a single file (data1.tiff):

$ sftp testuser@upload.brainimagelibrary.org
The authenticity of host 'upload.brainimagelibrary.org (128.182.108.164)' can't be established.
ECDSA key fingerprint is 32:cf:46:44:3d:9c:8e:b2:1d:14:03:66:45:0b:11:29.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'upload.brainimagelibrary.org,128.182.108.164' (ECDSA) to the list of known hosts.
testuser@upload.brainimagelibrary.org's password:
Connected to upload.brainimagelibrary.org.
sftp> cd /bil/lz/testuser/abcdef0123456789
sftp> mkdir /bil/lz/testuser/abcdef0123456789/mouse1
sftp> cd /bil/lz/testuser/abcdef0123456789/mouse1

sftp> put data1.tiff
Uploading data1.tiff to /bil/lz/testuser/abcdef0123456789/mouse1/data1.tiff
data1.tiff                                      0%    0     0.0KB/s   --:-- ETA
data1.tiff                                    100% 1324KB   1.3MB/s   00:00

sftp> exit

Due to their ability to resume interrupted transfers, the use of rsync and Globus is recommended over sftp.

scp

Using scp to upload.brainimagelibrary.org: An example uploading data (data1.tiff) to the landing zone as testuser is shown below:

$ scp data1.tiff testuser@upload.brainimagelibrary.org:/bil/lz/abcdef0123456789/mouse1/data1.tiff
testuser@upload.brainimagelibrary.org's password:
data1.tiff                                                                      0%    0     0.0KB/s   --:-- ETA
data1.tiff                                                                    100% 1324KB   1.3MB/s   00:00

Due to their ability to resume interrupted transfers, the use of rsync and Globus is recommended over scp.

rclone

For uploading data stored on Google Drive, you have the option to use rclone to transfer data directly.

Globus

Globus file transfer web service (www.globus.org) can be used to schedule and execute data transfers reliably. Globus will attempt to resume interrupted file transfers and will send a notification via e-mail upon completion of data transfer tasks. Globus web service is free for non-commercial use.

Data uploads to the BIL landing zone (/bil/lz) via Globus are accomplished via authenticated access to the Globus endpoint “Brain Image Library Upload with CILogon Authentication”.

To use Globus, you can log in with an existing institutional identity, Google, ORCID, or a Globus ID as described at https://docs.globus.org/how-to/get-started/.

For data transfers to or from your local computer, a free Globus Connect Personal client application should be installed, as described at https://www.globus.org/globus-connect-personal.

One-time setup to authenticate via CILogon to BIL for Globus data uploads:

To upload data via Globus to the BIL landing zone (/bil/lz), you must first have a BIL account and then provide the BIL Helpdesk with the CILogon identity information to map that CILogon identity to your BIL account for authentication via Globus.

Please provide your CILogon Certificate Subject and ePPN information to the BIL Helpdesk (bil-support@psc.edu) as follows:

  1. Go to https://cilogon.org/.
  2. Select your institution from the ‘Select an Identity Provider’ list. If your Institution is not among those listed, but you have an ACCESS ID (formerly XSEDE username), then use ACCESS ID as your CILogon Identity Provider.
  3. Click the ‘Log On’ button. You will be taken to your institutional login page.
  4. Log in with your username and password for your institution.
    • If your institution has an additional login requirement (e.g., Duo), authenticate to that as well.
  5. You will be returned to the CILogon webpage after successfully authenticating your institution’s credentials.
  6. Click on the ‘Certificate Information’ drop-down link to find the ‘Certificate Subject’. Select and copy the entire certificate subject string to include in your e-mail to bil-support@psc.edu.
  7. Click on the User Attributes drop-down link to find the ‘ePPN’. Select and copy the ePPN string (which typically looks like an e-mail address) to include in your e-mail to bil-support@psc.edu. If your CILogon ePPN string is blank, please let us know that, and also which CILogon Identity Provider you selected.
  8. Send an email to bil-support@psc.edu with your CILogon Certificate Subject and ePPN fields, asking that they be mapped to your BIL username for Globus GridFTP data transfers.

Your CILogon Certificate Subject and ePPN will be installed on BIL data transfer servers within one business day, after which you will be able to transfer files via Globus to and from BIL.

Transferring files with Globus

The Globus file transfer service moves data between Globus Connect Servers (formerly GridFTP endpoints). A Globus Connect Server service may be on your desktop/laptop (Globus Connect Personal, See https://www.globus.org/globus-connect-personal) or installed on a host system connected to data storage at a remote site such as BIL at PSC. File transfers may be scheduled between Globus Connect Servers using the web-based Globus File Transfer service at https://app.globus.org/file-manager, or using a Python-based command-line interface (See https://docs.globus.org/cli/ for installation and examples).

5. Validate and Submit Publish Request

Once all data has been uploaded to the landing zone and the metadata has been submitted, request that the data be validated and made publicly available. This can be done through the submission portal by selecting Submit Publish Request from the main menu of the submission portal. If an embargo period is being requested, please send an email to bil-support@psc.edu along with the submission id and brief note.

To submit your datasets for publication:

  1. Log in to the submission portal https://submit.brainimagelibrary.org/
  2. On the top navigation bar, select “Submit Publish Request”
  3. Select the submission you would like to publish from the list by selecting the check box on the left
  4. Select “Submit Validation Request”

If data validation fails, the data submitter will be notified by email. The submitter should address the validation issue(s) and re-submit the validation request. Both data and metadata need to pass the validation checks – datasets that fail the validation process are considered incomplete and will not be made publicly available.

Additional Submission Information

Image File Formats

The ingestion process supports native TIFF file format and is validated using the bioformats tool. (For more information on the bioformats tool see: https://docs.openmicroscopy.org/bio-formats/5.7.3/formats/index.html.)

Metadata Formats and Schema

Please see the Submission Portal for the most recent metadata specification.

The archive is transitioning to a new, more comprehensive metadata schema developed as a part of the NIH BRAIN Initiative "BRAIN 3D MICROSCOPY STANDARDS PROJECT". For more information about the standards project, please see the Dory website. A draft spreadsheet implementing this standard, which will soon be required by the ingest portal is available here. Collecting metadata on this more extensive form will also enable BIL to issue DOIs for datasets when data is submitted.

Sharing and computing on data prior to submission:

All data submitters are given access to a login node "login.brainimagelibrary.org". To connect to this node, use a terminal program that implements SSH (such as Xterm, the native Mac terminal program, or PuTTY) and use the same username and password that you would use for the submission portal. The login node is ideal for non-cpu-intensive processing, such as editing or deleting a file that is already in the landing zone. The login node is not intended for heavy computation or visualization, but it is connected to the BIL Computational Cluster. The BIL computational cluster provides a suitable resource for both computation and visualization. in addition, if you need to perform more intensive processing on your data, please see this computing and visualization page or contact the BIL Helpdesk for further assistance.

Assistance and Help

If you need assistance with anything, please contact the BIL Helpdesk for assistance. Please note that we are located in the Eastern Time zone.

Please note that BIL also provides networking support. If you are experiencing networking issues related to data transfer (including slow transfer speeds), contact the BIL Helpdesk. If network issues prevent data transfer, we may recommend that you send your datasets to us via an alternate path such as on our BrainBall portable device or on LTO tape.

Distribution Licence

Data contributed to the library will be redistributed to others under a Creative Commons Attribution-ShareAlike 4.0 International License. In addition, all data submitters expressly permit the Library data to be transferred to and maintained by another open data repository or by a government agency such as the US National Institutes of Health.

Release of entries

Data submitted to the library, by default, will be released as soon as possible once data has completed the submission process. Optionally, data submitters can select a limited embargo period for their submitted data consistent with the data-sharing policies of the BRAIN initiative and NIH data sharing policies, or up to one year if the data was not NIH funded.

Assignment of DOI

BIL will soon issue permanent, citable DOI for submitted datasets when the new metadata schema is implemented. BIL can also issue DOI's for groups of data so that they may be cited in a paper (See here for an example).

Changes to entries after submission

Changes may be made to submitted entries before release. Minor changes, such as updating metadata or adding citations may be made after dataset release. Major revisions will require the existing entry to be marked as obsolete and replaced by a new entry. The DOI pointing to the metadata for the obsolete entry may have a superseded-by field added to it to point users to the superseded entry. There may also be circumstances in which an entry in the library may be marked as withdrawn (for example, research misconduct). In those cases, the DOI for the obsolete entry will have a withdrawn field added to it.

Use of Pre-publication Data

The Library expects all users of the submitted data unless explicitly granted by the data submitter or the data submitter's project, to grant the authors of the data the right to publish the first paper concerning the dataset within three years of deposit. If there is no publication associated with the dataset, please contact the listed authors for permission to publish. There is no restriction on post-publication data.