Before You Start¶
This page describes what you need to prepare before depositing data into MaveDB. Having these items ready before you begin will make the upload process faster and smoother.
Prerequisites¶
Before you can submit data to MaveDB, you must:
-
Have an ORCID account. MaveDB uses ORCID for authentication. If you do not have an ORCID iD, you can register for free. See User Accounts for more details.
-
Set an email address. MaveDB requires an email address to upload datasets. You can provide one on the Profile settings page after logging in. See User Accounts for instructions.
-
Understand the record types. MaveDB organizes data into experiment sets, experiments, and score sets. Review the Key Concepts page to understand how these relate to each other and when to create multiples of each.
Danger
Do not submit patient data or anything that could be used to identify individuals to MaveDB.
Submission workflow overview¶
The typical workflow for depositing a study in MaveDB is:
graph LR
A["Create<br/>experiment"] --> B["Create<br/>score set"]
B --> C["Review &<br/>publish"]
style A fill:#4a90d9,color:#fff
style B fill:#4a90d9,color:#fff
style C fill:#7ab648,color:#fff
- Create an experiment -- Describe the MAVE assay, including library construction, functional assay, and sequencing strategy. This automatically creates an experiment set.
- Create a score set -- Define the score set's target, analysis methods, and upload your data in a single form submission. This includes:
- Associating the score set with the experiment you created.
- Specifying the target sequence or accession that was mutagenized.
- Uploading your variant score CSV file (required) and count CSV file (recommended). See Data Formats for file specifications.
- Review and publish -- Review your submission for completeness, then publish to make it publicly accessible.
Note
When you first create records, they are assigned temporary accession numbers (beginning with tmp:) and are only visible to you and any contributors you add. You can continue editing until you are ready to publish.
Warning
Once a record is published, certain fields (including the target, scores, and counts) become un-editable. Make sure everything is correct before publishing. If you need to fix errors after publication, you can deprecate and replace the score set.
What you will need¶
Use the checklist below to gather everything before starting the upload process.
Required for all records¶
- Title -- A descriptive title for the experiment or score set.
- Short description -- One or two sentences summarizing the record.
- Abstract -- A longer description of the motivation and approach. Supports Markdown formatting.
- Methods -- A detailed description of the experimental or analytical methods.
See the Metadata Guide for formatting guidelines and recommended content for each field.
Required for score sets¶
- Score CSV file -- A CSV file containing variant scores. Must include a
scorecolumn and at least one variant column (hgvs_ntorhgvs_pro). See Data Formats for detailed formatting requirements. - Target information -- Either a target sequence (DNA or amino acid) or an external accession number (RefSeq or Ensembl). See Targets for guidance on choosing the right target type.
- License selection -- Choose from CC0, CC BY 4.0, or CC BY-SA 4.0. The default is CC0. See the Metadata Guide for details.
Recommended for all records¶
- Publication identifiers -- DOIs, PubMed IDs, bioRxiv IDs, or medRxiv IDs for associated publications.
- Digital Object Identifiers (DOIs) -- For any non-publication digital resources (e.g., code repositories, external datasets).
- Contributor ORCID iDs -- ORCID identifiers for all contributors. Each contributor must have logged into MaveDB at least once. See Contributors for details on permissions.
Recommended for experiments¶
- Raw data accession numbers -- Accession numbers for raw sequencing data in public repositories such as the Sequence Read Archive (SRA) or ArrayExpress.
- Controlled keywords -- Keywords from the controlled vocabulary to improve searchability and facilitate assay facts generation.
Recommended for score sets¶
- Count CSV file -- A CSV file containing variant count data. Count data supports the development of new statistical models for calculating variant effect scores. See Data Formats for details.
- Score calibrations -- Calibration data that maps functional scores to clinical evidence strength, enabling clinical interpretation of variants. See Score Calibrations.
- Column metadata files -- JSON files describing the columns in your score and count tables. See Data Formats for the expected format.
Optional for score sets¶
- Extra metadata -- A JSON object containing any additional structured metadata.
- Data usage policy -- Free-text terms describing any restrictions on data use (e.g., pre-publication data sharing agreements).
Next steps¶
Once you have gathered the required information:
- Review the Data Formats page to ensure your CSV files meet MaveDB's formatting requirements.
- Review the Metadata Guide for detailed descriptions of every metadata field.
- Follow the Upload Guide for step-by-step instructions on creating experiments and score sets.