The Genomic Data Commons (GDC) Two Year Anniversary
This month, the NCI Genomic Data Commons celebrated its second anniversary. The GDC was launched on June 6, 2016 by former Vice President Joe Biden.
Today, the GDC Data Portal contains data from 40 projects and over 32,000 cases. On an average day it is used by between 1,500 and 3,000 unique researchers, with over 20,000 unique researchers using it each month, and over 100,000 unique researchers each year.
The GDC makes over 3 petabytes of harmonized data available to the research community. Harmonized here means that a common set of bioinformatics pipelines are applied to the data, which reduces the impact of batch effects when different projects and different sites use different algorithms to analyze the data they generate.
The GDC exposes an API that supports GraphQL queries using the GDC Data Model. Not only does the GDC Data Portal use the GDC API, but so does a growing list of third party tools and applications, including an R Bioconductor package called GenomicDataCommons.
Over the past year, we have averaged either a data or software release each month. For example, in May, 2018, we released a new slide image viewer that enables researchers to view, zoom, and pan slide images associated with a case directly through their browser. Researchers can apply case filters to perform range searches for images by percent tumor cells and other criteria. In addition, slide images can also be downloaded in the original format (SVS) and are accessible via the GDC API.