Compliance Statistics for Policies that Govern Data Submission, Access, and Use of Genomic Data
Under the Genomic Data Sharing (GDS) Policy and its predecessor, the Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS), investigators and their institutions seeking to access data from dbGaP must agree to a Data Use Certification (DUC) Agreement, which describes the terms associated with data use. Failure to adhere to terms of the DUC constitutes a compliance violation. As of July 1, 2015, NIH has approved 20,178 Data Access Requests and, of these Requests, has identified and managed 27 policy compliance violations. The number of compliance violations represents 0.1% of the approved Data Access Requests and are categorized as being related to data submission, research use or data access, data security, and the publication embargo period.
- Data Submission: These violations resulted from errors made by submitting investigators while the data were being prepared for submission to dbGaP.
- Research or Data Access: These violations occurred when dbGaP data were used in an inappropriate manner by approved users or when a dbGaP study configuration error caused the wrong data to be distributed to approved users.
- Data Security: These violations resulted from a computer software error or misconfiguration that allowed or had the potential to allow unapproved individuals to access dbGaP data.
- Publication Embargo: These violations have no impact on research participant protections. They occurred when dbGaP users, who were not part of the submitting investigator's team or collaborators, published or presented secondary research findings prior to the embargo date set by the original submitting investigator. Publication embargos apply only to datasets accessed under the GWAS Policy.
As described in the DUC, investigators and their institutions agree to notify the appropriate DAC(s) of any violation of or departure from the terms of the DUC within 24 hours of when the incident is identified. When a violation is reported, the appropriate DAC Chair is responsible for corresponding with the users and institution involved in the incident to gain an understanding of the full scope of the violation, to take immediate action to protect dbGaP data, and implement remediations as necessary.
Summary of Database of Genotype and Phenotype (dbGaP) Compliance Violations (27 Cases, 2007 to July 1, 20151)
|Brief Summary||Policy Expectation(s)||Actions Taken and/or Preventative Measures Implemented|
|Data Submission Incidents|
|The submitter included data from a research participant who had withdrawn from the study.||Submission of data to the National Institutes of Health (NIH) and subsequent sharing for research purposes should be consistent with the wishes of the research participants from whom the data were obtained.||The mistake was discovered by the submitter. NIH requested, and confirmed, that the 49 approved users2 who received the dataset destroyed the data and resulting analyses. NIH removed the participant's data from the datasets. Approved users were able to download the updated version of the dataset.|
|An investigator erroneously submitted data from research participants who did not consent to broad sharing of their data.||Submission of data to NIH and subsequent sharing for research purposes should be consistent with the informed consent of the study participants from whom the data were obtained.||NIH staff worked with the investigator to ensure that all affected data were removed from dbGaP. NIH requested, and confirmed, that approved users who downloaded the data destroyed their copy of the dataset and all resulting analyses. The submitting investigator reported the incident to the Institutional Review Board, and the institution took corrective actions.|
|An investigator submitted a dbGaP dataset without sufficiently random identifier codes for individual participants, and the dataset was released for access before the submitting investigators realized the error.||Data submitted to dbGaP should be de-identified and coded using a random, unique code.||NIH requested, and confirmed, destruction of the dataset by approved users who downloaded the dataset, and removed the dataset from dbGaP. As a preventive measure, the institution that submitted the dataset implemented an additional review in their system. The institution created the appropriate random codes for the dataset, and this version replaced the inappropriate version in dbGaP.|
|NIH discovered that it was possible to deduce human genome sequence from microbial sequence data posted in the open access Sequence Read Archive (SRA) database. The microbial sequence data were collected from human specimens, and the human DNA sequence data had not been thoroughly removed at the data cleaning stage. Thirty-five institutions downloaded at least one of the affected files.||Human DNA sequence data intended only for controlled-access distribution through dbGaP should not be available in an open-access database.||Affected files were removed from SRA, and were re-filtered with improved data cleaning software to remove detectable human sequence. Data that could be used to reconstruct any remaining human sequence were also removed. The re-filtered datasets were reposted.|
|An investigator who submitted data to NIH inadvertently reversed the data use limitations for two datasets. The dataset restricted to "disease-specific research use only" was labeled for "general research use," and the "general research use" dataset was labeled for "disease-specific research use only."||Submitting institutions certify that the appropriate research uses of the data are delineated.Research with the requested datasets must be consistent with the data use limitations as delineated by the submitting institution.||NIH blocked access to the study data until the error was corrected. The five approved users who had downloaded the data destroyed their copy of the dataset(s), and all resulting analyses, and they confirmed that the data had been used in accordance to the correct data use limitations. The submitting investigator implemented an additional mechanism for checking study data configuration prior to submission to NIH.|
|While uploading human sequence data to controlled access, the submitting investigator was able to upload the data to the open access Sequence Read Archive (SRA) database. An NIH software filter that would have prevented the upload was inactive at the time. The data were available in open access for one month, however, they were not accessed by SRA users.||Controlled-access data will be accessed only by approved users.||NIH removed the data from SRA and reactivated the SRA filter software. The European Bioinformatics Institute and DNA Database of Japan mirror SRA data and, upon NIH request, also removed these data from their databases.|
|Research or Data Access Incidents|
|Due to a dbGaP software configuration error, a user approved to download one of two datasets from a study was able to download both datasets from the study. The user did not have approval to access the second dataset and should not have downloaded it.||Controlled-access data will be accessed only by approved users.||NIH requested, and the user confirmed, destruction of the unapproved downloaded dataset. As a preventive measure, NIH conducted an audit of its system and changed how dbGaP's software was configured.|
|A user was approved to access data that were part of a study the user helped conduct. The Data Access Committee (DAC) failed to determine whether the user had access to personal or identifying information for the research participants which may have meant that Institutional Review Board (IRB) approval was required before accessing the data.||Users who have access to personal identifying information for research participants in the original study at their institution or through their collaborators may be required to obtain IRB approval in order to access data.||The approved user's access to the dataset was suspended until the DAC confirmed that the user did not have access to personal identifying information. As a preventive measure, the DAC implemented multiple checkpoints in its review process. The NIH data use certification template was updated to clarify when IRB approval is required|
|A Data Access Request was approved by a DAC before confirming that the user's institutional information technology (IT) director3 reviewed the dbGaP IT security best practices.||The IT director should be listed as a "Senior/Key Person" in the data access request application.||The NIH DAC implemented additional checks in the data access request review process, and NIH modified the dbGaP system to make the IT director requirement more apparent.|
|An approved user conducted research that was not stated in the data access request.||Approved users are expected to limit research use to that described in the approved data access request.||The approved user's access to dbGaP datasets was revoked for three months, and the institutional signing official4 was asked to take training in oversight of user projects. A memo was sent to all investigators stressing the importance of adhering to the terms of access, and the signing official will review all research use statements and data use limitations in the future and remind investigators that changes in the scope of their project requires submission of a revised statement.|
|An approved user conducted research that was not stated in the data access request. However, the research was within the bounds of the data use limitations assigned to the datasets.||Approved users are expected to limit research use to that described in the approved data access request.||In this case, the user recognized the error and contacted NIH to report the mistake. NIH suspended the user's access to the data for three months. The user may submit a new data access request after the suspension period.|
|Due to a coding mistake in the dbGaP system software, a dataset was misconfigured, and a user who had been approved to access a certain subset of a dataset was able to download data from another subset. The user did not have approval to access the other dataset and should not have downloaded it.||Controlled-access data will be accessed only by approved users.||NIH requested, and confirmed, destruction of the unapproved subset. NIH corrected the coding mistake and configuration error, and, as a preventive measure, implemented an additional configuration review protocol.|
|An approved user who moved to a new institution downloaded data at the new institution without submitting a new data access request.||Approved users agree that if they change institutions during the access period, they will close their project before leaving the institution, and submit a new data access request from the new institution if they wish to continue their work.||NIH requested, and confirmed, destruction of the data. The investigator was allowed to submit a new data access request and was warned that a future violation could lead to access suspension.|
|An approved user conducted research that was not stated in the data access request.||Approved users are expected to limit research use to that described in the approved data access request.||NIH requested, and confirmed, destruction of the data, and suspended the user's access to the data for three months. The user may submit a new data access request after the suspension period.|
|An approved user conducted research that was not stated in the data access request.||NIH requested, and confirmed, destruction of the data, and suspended the user's access to the data for three months. The user may submit a new data access request after the suspension period.|
|NIH incorrectly configured a study file resulting in a swap of participants between the group consented for "eye disease research only" and "general research use".||Research with the requested datasets must be consistent with the data use limitations as delineated by the submitting institution.||NIH blocked access to the two datasets until the error was corrected, and contacted users who downloaded the affected files to request data destruction and downloading of the corrected files.|
|A dbGaP audit identified compressed archive "MULTI" files distributed with several studies that were improperly configured by including individual-level phenotype data that were restricted to disease-specific research use. The MULTI files are intended to distribute data files characterizing the entire study or analysis such as pedigrees, release notes, or analysis-specific sample attributes, where redistribution is not restricted by data use limitations. The restricted individual-level phenotype data should not have been included in these MULTI files.||Research with the requested datasets must be consistent with the data use limitations as delineated by the submitting institution.||NIH blocked access to all affected files until the files were corrected, and contacted users who downloaded the affected files to request data destruction and downloading of the corrected files. dbGaP developed automated checks of data files to confirm that the contents of MULTI files are appropriate. In addition, submitting investigators will be asked to review and certify the final packaging of files before their release.|
|An approved user's trainee uploaded NIH controlled-access data to an unauthorized private data sharing and collaboration website. Controlled-access data were downloaded by three unapproved users who had access to the website.||Controlled-access data will be accessed only by approved users. Approved users must not distribute controlled-access data to any entity (e.g., a third party) not covered in the Data Access Request.||The user notified NIH of the inappropriate data sharing within 24 hours of becoming aware of it. The data were promptly removed from the data sharing site, and the unapproved downloaders were contacted to and asked to destroy the data, which was confirmed. NIH suspended the user's access to the data for six months.|
|NIH incorrectly configured a study file resulting in the reversal of the data use limitations for two datasets. The dataset labeled for "non-profit research use only" was incorrectly labeled "general research use" and the "general research use" dataset was labeled for "non-profit research use only." The users who downloaded the data were approved to use both consent groups.||Research with the requested datasets must be consistent with the data use limitations as delineated by the submitting institution.||NIH blocked access to the two datasets until the error was corrected. NIH also adjusted the processing software and, as a preventive measure, implemented additional checks.|
|Data Security Incidents|
|A dbGaP software bug caused a disapproval decision for a data access request to be recorded as an approval, and the user downloaded the data.||Controlled-access data will be accessed only by approved users.||NIH requested and confirmed destruction of the data. NIH corrected the software problem, and implemented other audit review procedures as a preventive measure.|
|The security system at an approved user's institution was discovered (by the approved user) to be vulnerable to security breaches. The institution corrected the problem before a breach occurred.||Approved users and institutions must adhere to dbGaP security standards. Approved users agree to notify NIH of security breaches within 24 hours.||The institution undertook a thorough analysis of the problem. The institution implemented additional security measures to protect machines used for analyzing dbGaP data.|
|An approved user's trainee uploaded NIH controlled-access data to a file system directory that was accessible from the internet, which is not consistent with NIH data security expectations. The problem was discovered the same day and the data were removed immediately. No unauthorized individuals accessed the data.||Approved users and institutions must adhere to NIH security standards.||The data were removed from the server. NIH suspended the user's access to the data for five months.|
|Publication Embargo Incidents|
|An approved user's trainee submitted an abstract for a scientific meeting prior to the expiration of the embargo date of a dbGaP dataset. The violation was discovered prior to the meeting when the trainee contacted NIH staff to clarify how to acknowledge the dataset in presentations.||Approved users will not make presentations or submit manuscripts for publication before the embargo date(s) of dbGaP datasets.||The NIH requested, and confirmed, withdrawal of the abstract. As a preventive measure, the approved user reminded trainees about NIH expectations for research using dbGaP datasets. NIH improved the visibility of the embargo date information on the study page, and the Data Use Certification agreement.|
|An approved user's manuscript was published prior to the dataset's publication embargo date.||Approved users will not make presentations or submit manuscripts for publication before the embargo date(s) of dbGaP datasets.||NIH suspended the approved user's and collaborators' access to all dbGaP data for six months. NIH requested, and confirmed, destruction of their downloaded data. NIH requested a report from the investigator's institution describing the circumstances surrounding the violation. NIH quickly published an editorial in the same journal emphasizing the importance of the embargo policy.|
|A submitting investigator erroneously set the embargo period for two related dbGaP datasets at zero months, the default setting in the system, instead of 12 months. NIH was notified by the submitting investigator that an approved user had submitted a journal manuscript that included analysis of the datasets within the 1-year embargo period.||Approved users will not make presentations or submit manuscripts for publication before the embargo date(s) of dbGaP datasets. .||The investigator who submitted the data reviewed the manuscript, and provided permission for the embargo date to be waived in this one case. NIH corrected the embargo dates and set the default embargo period in the study registration system for 12 months from the date the study was released. All users approved to access these datasets were notified by NIH of the correct embargo date.|
|An approved user's manuscript was published prior to the dataset's publication embargo date.||Approved users will not make presentations or submit manuscripts for publication before the embargo date(s) of dbGaP datasets.||NIH suspended the approved users' and collaborators' access to all dbGaP data for six months, and requested confirmed destruction of all downloaded data. NIH asked the investigator to inform the journal of the embargo violation.|
|An approved user presented findings at a conference prior to the dataset's publication embargo date.||NIH suspended the approved user's and collaborators' access to all dbGaP data for six months, and requested and confirmed destruction of all downloaded data.|
- As of July 1, 2015, there were 20,718 approved Data Access Requests.
- "Approved users" are users who are approved by an NIH Data Access Committee to access dbGaP data.
- "Information Technology (IT) director" is a senior IT official with the necessary expertise and authority to affirm the IT capacities at an academic institution, company, or other research entity.
- "Signing official" is a senior institutional official with the authority to enter the institution into a legally binding contract.