1. Home
  2. Data & Files
  3. Uploading & importing data
  4. Guidance and best practices for uploading

Guidance and best practices for uploading

 

What if I am uploading a lot of files?

Your project may have thousands of existing files of diverse types. As you are planning to move your research into the workspace, understanding how the upload process works can help you plan how to do this most effectively. For example, you might consider one the following approaches:

  • We recommend that you never try to upload more than 500 files at a time (just as you probably wouldn’t between your desktop and any other shared file server).
  • If you have a virtual desktop add-on to your workspace, using .zip files to batch uploads into manageable chunks is one way to manage this. You will be able to unpack these later in the workspace, and the upload process will be easier.
  • The SFTP upload mechanism route is best for large files, as you can choose the Datafiles destination designed to hold such files.
  • File and data uploads via the web UI have an upload limit of 1GB.
  • Individual files (including .zip files) that are larger than 250GB should not be uploaded into the workspace using the methods described. If you have files that are over 250GB, please get in touch with your Data Steward Team who will be able to help plan the data migration.
 

What happens to my data?

If you choose to upload your data or files into the Scripts, Documents or Datafiles folders, it is not modified and remains unchanged when it stored on the workspace file system.

In cases where you choose to upload structured data (i.e. a CSV file) into the Data folder, the data contained within the file will be loaded into a database table. The mapping of the CSV column to columns within the new data table takes place automatically, however it is also possible to provide table definition files that specifically inform the platform as to how the data should be loaded. More information on how table definition files work can be found in mapping structured CSV file data to a new data table.

To ensure the mapping is exactly what you want, you can either:

  • Provide a table definition file along with your CSV file if you are using SFTP.
  • Use the upload tool provided within the web interface to help you upload your data.

In both cases you can, for example, rename columns and specify the destination table name. You are in control of this process.

 

Guidance

See the table below for a summary of different types of source data, guidance on where to store this data, and how you can access the data once it has been uploaded into the workspace.

Source data to be uploaded File extensions Typical size per file Purpose Workspace folder Data mapping applied? Stored in Accessed from
Web interface Virtual desktop
Tabular data .csv 1000s of rows and columns
< 10MB
Database analysis Data Yes Workspace database Yes Yes
Analysis scripts .r, .sql 100 – 500kB Reproducible statistics Scripts No Workspace file system Yes Yes
Text, pdf documents and small images .txt, .doc, .pdf, .png, .jpg 2MB Project communication and reports Documents No Workspace file system Yes Yes
Large image files, image series, genomic data, executable files for tool installation, other non-structured data .png, .jpg, .vcf, .exe 100MB – 250GB Raw data for analysis and information extraction Datafiles No Workspace file system No Yes
Previously created workspace snapshot .zip Recreating a (standard) workspace Snapshots No Workspace file system No Yes
Updated on August 6, 2018

Was this article helpful?

Related Articles

Add A Comment