As a Workspaces Ecosystem Edition user, you have three options when uploading data into your workspace via the web wizard.
If you’re not sure which type of user you are, consider whether Workspaces is offered as the digital research environment for your institution; if this is the case, you are an Ecosystem Edition user and should follow the specific guidance below. If not, you should follow the Project Edition guidance.
Each of the Ecosystem Edition upload options follow the same basic outline detailed below, but with a few important differences for those organisations where a Data Staging Area is in use. These are:
- selecting files which have previously been de-identified
- selecting files which do not require de-identification
- selecting files which require de-identification through your organisation’s Data Staging Area
To upload any type of tabular data formatted as CSV files from the web interface, get started by navigating to the workspace you want to upload data into.
Select files which have previously been de-identified or do not require de-identification
Once in the correct workspace, navigate to the ‘Add’ dropdown menu and select ‘Upload Data’. In the ‘Upload Data’ panel shown below, browse to select the CSV file you wish to upload and click ‘Upload File’.
If your data has already been de-identified, you should select the statement indicating that your file either does not contain any identifiable data, or that it has already been de-identified.
Once selected, you also have the option to provide an accompanying table definition file (TDF) which describes the fields and data within the CSV file. This is not mandatory, however if a TDF is not provided, a new one will be created based on the input gathered during the upload process. This can be downloaded at the end of the upload process if you wish.
You can also provide an authorisation reference for the dataset if applicable. This might be the name of the project or study which this dataset has been approved for use in, or the name of the data owner who has provided consent for this data to be uploaded.
Describe your dataset
The next screen requires you to enter a title and gives you the option of adding a basic description for your dataset. You may also wish to provide a web URL which may provide the location of that dataset within a web-available repository or scientific journal article relating to that dataset. Again, this is optional.
Parse your data
The next screen allows you to configure how your data should be processed when it is uploaded into the workspace.
The following settings can be configured in this screen:
- Data table name: The name of the database table that your data will be loaded into.
- Delimiter: The character used to separate the columns in your CSV file.
- Include header row: Determines if the first row in your CSV file contains column headers.
- Text qualifier: The character used to surround text within each column in the CSV file.
- Null qualifier: This character in the CSV file will be replaced with a database Null value when it is loaded into the workspace database.
- Encoding: The character encoding set to use when processing the data. Several common options are provided, with the default set to UTF-8. Clicking on a column header in the grid allows you to change the name of the column that will be created in the workspace database. Similarly, clicking on the data type directly below the column header allows you to change the type of the column that will be created.
Describe your fields
Users can alter the label of each column by altering text in the ‘Label’ field and can provide a description of the data captured within that column by altering text in the ‘Description’ field. This will ultimately provide a metadata description of each field should you wish to share the resulting TDF with another individual.
Once you have successfully completed these steps, you will be informed that your data is ready to be uploaded. Pressing ‘Upload’ will upload data from your CSV file to the workspace.
When the upload is completed a note will appear in the Summary tab notifying you if upload was completed successfully. You can also select ‘Download TDF’ if you wish to share the table definition file which has been cumulatively generated during the upload process. This is not automatically saved in your workspace, therefore you may wish to download a copy if you plan to use it in future.
Select files which require de-identification through your organisation’s Data Staging Area
Some organisations choose to enable a Data Staging Area which provides a controlled environment for staging data prior to selection and usage within a workspace. It is designed to manage sensitive healthcare data.
The process of uploading data from your organisation’s networks, or your local machine, via the Data Staging Area is much the same as the method detailed above. The important difference is that you select the correct ‘Authorisation and Privacy’ option to ensure that your data is de-identified via the Data Staging Area before it is uploaded into your workspace.
Select the ‘This data needs to be de-identified before it is uploaded to my workspace’ option. This will ensure that you are offered options for configuring how each dataset field should be processed through the de-identification service.
Now follow the process decribed above in the 'Select files which have previously been de-identified or do not require de-identification' section above all the way through to the ‘Describe your dataset’ segment. At this point, aside from completing this stage as detailed, you will also be given the option to store a copy of your data in your organisation’s data catalogue. This means that the metadata of this particular data file becomes searchable and findable by other data catalogue users either within specific departments, across your entire organisation, or by the general public depending on the options selected when creating the data catalogue entry.
Follow the process detailed in the ‘Parse your data’ section as detailed above. Then, on reaching the ‘Describe your fields’ section, you can select the de-identification configuration options which should be applied prior to upload. Simply use the de-identify dropdown options to choose how to de-identify the values in each column.
The following options are available:
- Keep: Ensures that values will remain unchanged and will not be de-identified.
- Drop: Removes this column from the data table.
- Drop and store in keyfile: Removes this column from the data table, but will store the column value in the key file along with the pseudonymised identifier. This allows re-identification of this data field at a later date should this be required.
- Use as pseudonym: Uses the value in this column as the pseudonymised key for the column marked ‘pseudonymise’, rather than generating a new key value.
- Pseudonymise: Generates a pseudonymised identifier from the value in the column and stores the identifier in the key file.
Once you have specified how you wish to de-identify each field, click on ‘Next’ to progress to the Review section.
You will then be informed that you have specified for your data to be de-identified via the Data Staging Area (where the de-identification service is typically deployed). At this point you may also download the table definition file if one was created during this process. Upon clicking ‘Upload’, the data will be added to you workspace and the process is complete.