At the start of 2017, AACR Project Genomics Evidence Neoplasia Information Exchange, or AACR Project GENIE for short, hit a major milestone when it made available one of the largest fully public cancer genomic data sets ever released.
AACR Project GENIE Steering Committee chairperson Charles L. Sawyers, MD, FAACR, a Past President of the AACR, explained at the time in a post on this blog that the immediate intention of the AACR Project GENIE consortium when it released the data was to give the cancer research community at large, particularly those interested in cancer genomics, a new resource for analysis. To provide the community with a flavor of what is in the expansive data set and what can be learned using it, members of the consortium presented their first analyses of the data during the Pan-Cancer Genomic Analysis major symposium at the recent AACR Annual Meeting 2017.
AACR Project GENIE 101
Sawyers, who is also chairperson of the Human Oncology and Pathogenesis Program at Memorial Sloan Kettering Cancer Center in New York, and a Howard Hughes Medical Institute investigator, prefaced the AACR Project GENIE consortium presentations by explaining that the initiative was launched in November 2015 in partnership with eight global leaders in genomic sequencing for clinical utility, as well as two informatics partners.
He then invited the first presenter, Ethan Cerami, PhD, director of the Knowledge Systems Group and lead scientist in the Department of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute in Boston, to introduce the nuts and bolts of AACR Project GENIE to the audience.
Cerami explained that AACR Project GENIE is a multi-phase, multi-year, international data-sharing project that aims to catalyze precision oncology by developing a regulatory-grade registry that aggregates, harmonizes, and links clinical-grade cancer genomic data with clinical outcomes from tens of thousands of cancer patients treated at multiple international institutions and by making all de-identified data publicly available.
He then provided detailed information on what data are collected at the different institutions, highlighting that the types of sequencing and size of the gene panels used at the individual institutions are different and are evolving over time. For example, three of the institutions perform large panel, exon-capture sequencing, while the other five institutions perform hotspot genomic profiling. More information about the data collected at each site can be found in the AACR Project GENIE data guide.
What is in the publicly available data set and what can we do with it?
Cerami concluded his presentation by giving an overview of the first publicly released data set, which comprises nearly 19,000 de-identified genomic records collected from patients during routine care at the eight participating institutions and a limited amount of linked clinical data for each patient.
Overall, non–small cell lung cancer, breast cancer, and colorectal cancer are the cancer types for which there are the most samples. However, there are many rare cancers represented in the data set, which provides a unique opportunity to analyze the genomic landscape of these cancers.
The subsequent three presentations provided insight into some of the applications of the data set released by AACR Project GENIE, including clinical trial matching and the identification of clinically actionable mutations and the determination of the mutational burden of a tumor.
First, Alison Schram, MD, a medical oncology/hematology fellow at Memorial Sloan Kettering Cancer Center in New York, presented analysis of the landscape of somatic ERBB2 mutations present in the real-world AACR Project GENIE data set and compared this with the landscape of somatic ERBB2 mutations in the ongoing phase II SUMMIT basket clinical trial.
Schram’s data showed that the landscape of somatic ERBB2 mutations was largely similar in the AACR Project GENIE data set and the SUMMIT clinical trial, providing evidence that enrollment in the trial reflects the true landscape of the target alteration. This analysis shows the power of the AACR Project GENIE registry to help researchers design clinical trials and identify the target enrollment population.
The second presentation, by Christine M. Micheel, PhD, a research assistant professor of medicine in the Division of Hematology/Oncology at Vanderbilt-Ingram Cancer Center in Nashville, Tennessee, highlighted the broad clinical actionability and clinical trial matching possible for patients with genotypes present in the AACR Project GENIE data set.
In fact, Micheel and colleagues found that there is a standard therapy match for 7 percent of samples in the AACR Project GENIE data set. Another 7 percent of samples matched a standard therapy used to treat a different type of cancer, suggesting an off-label use of a standard therapy. The proportion of samples for each cancer type to match in these ways varied substantially, with gastrointestinal stromal tumor samples being the most likely to match a standard therapy for the disease, imatinib (Gleevec).
Micheel also showed that nearly every sample in the AACR Project GENIE data set matched at least one biomarker-driven cancer clinical trial, including clinical trials testing targeted therapeutics and clinical trials exploring the impact of mutations. The number of clinical trial matches for each type of cancer closely mirrored the prevalence of the cancer in the data set, with the greatest number of clinical trial matches found for non-small cell lung cancer, which is the cancer type for which there are the greatest number of samples.
In the final AACR Project GENIE presentation, Alexander S. Baras, MD, PhD, director of Precision Medicine Informatics in the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins in Baltimore, talked about his work investigating the ability of different sequencing panels to assess tumor mutational burden.
Tumor mutational burden has been associated with response to immune checkpoint inhibitors, with high tumor mutational burden, assessed using whole-exome sequencing, associated with responses. Baras showed that for samples submitted to the AACR Project GENIE data set by institutions using large sequencing panels of about 1 Mb, the tumor mutational burden estimates across a diverse array of cancer types correlated with tumor mutational burden estimates obtained by whole-exome sequencing. In contrast, for samples submitted to the AACR Project GENIE data set by institutions using small sequencing panels, tumor mutational burden correlated with results from whole-exome sequencing in about only one-third of cases.
With tumor mutational burden being investigated as a potential clinical biomarker for immune checkpoint inhibitor response, it is important to know what type of sequencing approach should be used to accurately determine tumor mutational burden. It is also important for further analyses using the AACR Project GENIE data set to investigate clinical outcomes for different levels of tumor mutational burden.
What can we expect from AACR Project GENIE in the future?
During his presentation, Cerami explained that the consortium is hoping to release data quarterly and that the registry is currently projected to grow to more than 100,000 samples within five years. However, he also noted that the consortium is accepting applications for new participating centers, which could substantially enhance the growth of the registry.
You may also like: