Link to U.S. Department of Health and Human Services
Link to National Institutes of Health


Aggregate data (or information) - conclusions that summarize the analysis of data from many individuals, without reporting results from any one individual.  Also called summary data.

Association - comparison of the chance that people who have a particular genetic variation in their DNA have a particular characteristic (trait), symptom, or disease with the chance that people who do not have the particular genetic variation have the particular characteristic, symptom, or disease.

Chromosome - one of the threadlike "packages" of genes and other DNA in the nucleus of a cell.  Humans have 23 pairs of chromosomes, 46 in all: 44 autosomes (or non-sex chromosomes) and two sex chromosomes. Each parent contributes one chromosome to each pair, so children get half of their chromosomes from their mothers and half from their fathers.  Talking glossary available at | Talking Glossary: "chromosome"

Coded data (or information) - data for which a numerical code (for example, 1A, 1B, 1C, etc.) is given to each study participant to track the samples and data rather than using a person’s name or other specific information.  This process removes information from the data that could be used readily to identify the person from whom the data were collected. 

Controlled Access - access to data that has been granted to researchers following review and approval by a NIH Data Access Committee for specific research uses.

DAC (Data Access Committee) - a committee of federal government employees  charged with making decisions about requests to access to data deposited in the NIH Genome-wide Association Studies (GWAS) Data Repository, called dbGaP.  DACs review requests for access to the data and determine: 1) whether access is intended for research uses that are consistent with any limitations or conditions on data use, as identified by the institution that submitted the data to NIH, and 2) whether the researchers who receive the data agree to the terms of the Data Use Certification.

Database - a library of datasets (or a repository) that have been collected and maintained in a format that allows researchers with approved access to retrieve particular datasets.  At NIH, the current repository for GWAS data is dbGaP (see below).

dbGaP (database of Genotype and Phenotype) - the central repository at the NIH that is designed to maintain datasets from genome-wide association studies, to present summaries of GWAS results in an organized and searchable web format, and to release coded GWAS data to approved recipients for specific research purposes through controlled access procedures.  Additional information available at

DNA (deoxyribonucleic acid) - the chemical inside the nucleus of a cell that carries the genetic instructions (genes) for making living organisms.  DNA is bundled into chromosomes within the cell.  Talking glossary available at and "Deoxyribonucleic Acid (DNA)" at

Data Use Certification (DUC) - the agreement among a research organization, an investigator at that organization (or institution), and the NIH that specifies the terms and conditions under which access is provided to particular GWAS datasets in the dbGaP database.

Gene - the functional and physical unit of heredity passed from parent to offspring. Genes are pieces of DNA, and many genes contain the information for making a specific protein.  Talking glossary available at

Genetic marker - a segment of DNA with a known physical location on a chromosome and a discernible inheritance pattern. A marker can be a gene, or it can be a section of DNA with no known function. DNA segments that lie near each other on a chromosome tend to be inherited together.  Therefore, markers often are used as indirect ways of tracking the inheritance pattern of a gene that has not yet been identified, but whose approximate location is known.   Talking glossary available at

Genetic variation (variant) - differences (or variants) in DNA sequences that are found by comparing the genomes of different individuals and can be used as genetic markers to track inheritance patterns in families. (See also "SNPs".)

Genome - all the DNA contained in an organism or a cell, which includes the chromosomes within the nucleus and the DNA in organelles called mitochondria.
 "A Guide to Your Genome" at and Talking Glossary at

Genotype - all or part of the genetic make-up of an individual or group, including variation at a particular genetic marker or gene.

Genotyping - the process whereby the genotype(s) of an individual or many individuals is (are) determined from a DNA sample(s) in the laboratory.  Typically, DNA samples are obtained by drawing a small amount of blood or by collecting cheek cells.

GWAS (Genome-wide Association Studies) - research studies that involve scanning markers (genotypes) across the complete set of DNA, or genomes, of many people to find genetic variations associated with a particular disease.
 "Genome Wide Association Studies" at

Human Genome Project - the international, collaborative research program that completed mapping all the genes of human beings. 
 "All About the Human Genome Project" at

Informed consent - a process whereby an individual decides whether to participate in a clinical trial or other research study, or undergo a genetic test, after being informed of the test’s purpose, study procedures, medical implications, and possible risks and benefits. As part of the informed consent procedure, individuals are told about their privacy rights and who will have access to their personal information. 
 "Informed Consent" at

IRB (Institutional Review Board) - a panel of research and ethics experts and members of the public brought together by a research institution to assess and approve or disapprove protocols for research studies involving human subjects. Protocols specify the design, procedures, and safeguards that researchers will follow in carrying out the proposed study to be sponsored by that institution. The functions of the IRBs are specified in regulation at 45 CFR 46 -

Phenotype - the observable traits or characteristics of an individual such as hair color, weight, or the presence or absence of a disease. Phenotypic traits are not necessarily genetic. Talking glossary available at

SNPs (single nucleotide polymorphisms) - differences, or genetic variations, in DNA sequence that are found at particular chromosomal locations among individuals in the population.  Although most of the differences in DNA sequence among people are of  little consequence, some are associated with the prevalence of a characteristic (trait), disease, or symptom - either as a marker nearby or within a gene causing the characteristic (trait), disease, or symptom.  Talking glossary available at

Summary or aggregate data (or information) - grouped information that summarizes the analysis of data derived from many individuals in a study, without reporting results from any one individual and without identifying individuals.
Additional terms that are important to an understanding of your genome and the NIH genome-wide association study policy are defined in the NHGRI Talking Glossary (

* Unless otherwise noted, all of the definitions in this glossary are presented within the context of the NIH genome-wide association studies data sharing policy (