The National Institutes of Health (NIH) Genomic Data Sharing Policy expects that genomic research data from NIH-supported studies involving human specimens as well as non-human and model organisms will be submitted to an NIH-designated data repository. The list below provides examples of relevant databases.
NIH Data Repositories, NIH-Funded Databases, and NIH Database Collaborations
Array Express: an NIH-funded database at the
European Molecular Biology Laboratory -European Bioinformatics Institute that
collects and disseminates microarray-based gene-expression data. Read more about Array Express.
DNA Data Bank of Japan
(DDBJ): a data bank
organized by the National Institute of Genetics in Japan that collects sequence data. As a member of the International
Nucleotide Sequence Database Collaboration, DDBJ exchanges data with GenBank at the NIH National Center for Biotechnology Information and the European Nucleotide Archive European Molecular Biology Laboratory -European
Read more about DDBJ.
Genotypes and Phenotypes (dbGaP): an NIH
database at the National Center for Biotechnology Information
originally designed to archive and distribute coded
genotype, phenotype, exposure, and pedigree data from genome-wide
association studies. dbGaP now accepts additional types of data such as copy
number variants and large-scale sequencing.
Read more about dbGaP.
of Short Genetic Variations (dbSNP): an NIH database at
the National Center for Biotechnology Information that includes single nucleotide variations, microsatellites, and small-scale insertions and deletions. dbSNP provides population-specific frequency and genotype data, experimental conditions, molecular context, and mapping information for both neutral variations and clinical mutations.
Read more about dbSNP.
Database of Genomic Structural Variation (dbVar): an NIH database at the National Center for Biotechnology Information for large-scale
structural genomic variations--such as insertions, deletions, translocations, and inversions--and associated phenotype information. dbVar accepts germline and somatic human structural
variant data as well as data from a diverse array of organisms, including agriculturally important plants and livestock. Read
more about dbVar.
Archive (ENA): a database at the European Molecular Biology Laboratory -European Bioinformatics Institute (EMBL-EBI) that collects, maintains, and presents comprehensive sequencing
information--including raw sequencing data, sequence assembly information, and functional annotation--as part of the permanent public scientific record. As a member of the International Nucleotide Sequence Database Collaboration, EMBL-EBI exchanges data with GenBank at the NIH National Center for Biotechnology Information and the Data Bank of Japan.
Read more about ENA.
FlyBase: an NIH-funded database for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly
species. It includes referenced sequence genomes, phenotypic and gene expression data, chromosome maps, and additional resources. Read more about FlyBase.
an NIH genetic sequence database at the National Center for Biotechnology Information (NCBI) that provides an annotated collection of publicly available DNA sequences. As a member of the International Nucleotide Sequence Database Collaboration, NCBI exchanges GenBank data with the European Nucleotide Archive at the European Molecular Biology Laboratory -European Bioinformatics Institute and the Data
Bank of Japan. Read more about GenBank.
Expression Omnibus (GEO): an NIH data repository that archives and distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data. Read more about GEO.
Influenza Research Database
(IRD): an NIH-funded database that provides genomic and proteomic data for influenza viruses as well as surveillance data and phenotypic characteristics of viruses isolated from extracts. Read more about IRD.
Genome Informatics (MGI): an NIH-funded international database for the laboratory mouse Mus musculus
that provides data on gene characterization, allelic variants, gene expression, mouse tumor biology, strain-specific phenotypes and genotypes, and mammalian orthology. Read more about MGI.
Rat Genome Database (RGD):
an NIH-funded database that serve as a repository of genetic and genomic data from the laboratory rat Rattus norvegicus
and also provides curation of mapped positions for quantitative trait loci, known mutations, and other phenotypic data. Read more about RGD.
Read Archive (SRA): NIH's primary archive of high-throughput sequencing data at the National Center for Biotechnology Information (NCBI). SRA stores raw sequencing data as well as alignment information in the form of read placements on a reference sequence. As a member of the International Nucleotide
Sequence Database Collaboration, NCBI exchanges SRA data with the European Nucleotide Archive European Molecular Biology Laboratory -European Bioinformatics Institute and the Data Bank of Japan. Read more about SRA.
an NIH-funded international consortium that provides accurate, current, accessible information concerning the genetics, genomics, and biology of Caenorhabditis elegans and related nematodes. Read more about WormBase.
an NIH-funded database that serves as a biology and genomics resource for research on the African frog species Xenopus
laevis and Xenopus tropicalis. Read more about Xenbase.
Zebrafish Information Network
(ZFIN): an NIH-funded database that collects, curates, and
disseminates genetic, genomic, phenotypic, and developmental data about the zebrafishDanio rerio. Data represented in ZFIN are derived from three primary sources: curation of zebrafish publications, individual research laboratories, and collaborations with bioinformatics organizations. Read more about ZFIN.
Data Repositories Established as NIH Trusted Partners
The National Institutes of Health (NIH) promotes data sharing as an essential element to facilitate the translation of research results into knowledge, products, and procedures to improve human health. To achieve this goal, NIH has created a central repository model for
data storage and distribution through database for Genotypes and Phenotypes (dbGaP). However, in light of the increasing volume and complexity of the data, which necessitate innovative solutions for storing and presenting the data, NIH is exploring new models for data management resources, including structured partnerships with external organizations or “trusted partners.”
A “trusted partner” is defined as a public or private, national or international organization that is able to meet core NIH standards for establishing data quality and data management service protocols. Currently, trusted partners can only be established through a contract between NIH and the trusted partner organization.
Standards for models outside of this scope, such as those using funding mechanisms other than contracts, will be considered at a later date.
NIH Established Trusted Partners
Cancer Genomics Hub (CGHub): CGHub stores, catalogs, and facilitates research using cancer genome sequences, alignments, and mutation information from The Cancer Genome Atlas (TCGA) consortium and related projects.
Bionimbus: Bionimbus is a collaboration between the Institute for Genomics and Systems Biology (IGSB) at the University of Chicago and the Open Science Data Cloud to develop open source technology for managing, analyzing, transporting, and sharing large NCI-funded cancer genomics datasets in a secure and compliant fashion.