Curating Metadata
Last updated
Last updated
MIT Resources
https://accessibility.mit.eduMassachusetts Institute of Technology
First: Reach out to the Data management team (fairdata@mit.edu) to set up a brief meeting to overview metadata curation. Below is a summary of what he will discuss in the meeting.
We structure metadata as a relational database. A very brief overview:
We architecture sample metadata through provenance: starting at generation of the sample (PAT: Patient, PAV: Patient Visit, and TIS: Tissue) along with metadata surrounding the data file (DNA: DNA Library, D.SEQ: Sequencing Data File, A.GEX: Gene Expression Analysis File).
Each sample type (PAT, PAV, TIS, DNA, D.SEQ, A.GEX) will have different sets of metadata included. Different labs will be responsible for different sample types as well.
For example: The Mayo Clinic will handle the PAT, PAV, and TIS metadata: IE everything about the Patients (PAT), what happened to them / when they came in for visits (PAV) and then what tissues were extracted (TIS)
Since our system is not HIPPA/DSL4 compliant, these metadata attributes for PAT/PAV will be binary (0 if it doesnt exist, 1 if it does exist) with a contact email to someone at the mayo clinic to obtain specific values
Then, the lab responsible for the sequencing (IE, the lab that got the Tissues from the mayo clinic), will handle the DNA/D.SEQ/A.GEX sample types.
An in depth description of AssaySheets exists here. You only need to fill out the Samples Page. The instructons page includes references/notes on what each column contains.
And the AssaySheet Templates Exist Below: