In This Section

BPC NSCLC 2.0-public

The GENIE BPC NSCLC v2.0-public dataset contains 1,846 NSCLC patients from four institutions: MSKCC, DFCI, VICC and UHN.

Data Access:

See an overview of the data set

What is included in GENIE BPC data? 

  • Genomic data: Clinical-grade next-generation sequencing data for each patient from the GENIE Registry. Genomic profiling was performed between 2014 and 2018. 
  • Treatment Histories: All anti-neoplastic systemic therapies–intravenous and oral chemotherapies–are included in the dataset. Dates are provided as intervals from diagnosis to start and stop of each drug. Investigational drugs are masked, no dosing information is included. 
  • PRISSMM™: the BPC NSCLC dataset uses the PRISSMM™ framework developed at the Dana-Farber Cancer Institute to determine outcomes from retrospective real-world data to ascertain cancer treatment responses in the real world. Additional information can be found in the analytic data guide and information about licensing PRISSMM™ can be obtained by emailing [email protected] 
  • Pathologic information: Each pathology specimen from diagnosis through death or last follow-up is curated with specimen type, site, and histology. 
  • Imaging information: Each CT, MRI, PET-CT scan from diagnosis through death or last follow-up is curated for the presence or absence of cancer and an evaluation of whether the cancer was stable, responding, or progressing. These data are used to compute progression-free survival-imaging (PFS-I). Sites of tumor involvement are also recorded.  
  • Medical oncologist’s evaluations: Medical oncology notes (1/month) have been curated to ascertain the presence or absence of cancer and whether the cancer was stable, responding, or progressing. These data are used to compute progression-free survival-medonc (PFS-M) from diagnosis through death or date of last follow-up.  
  • Additional relevant biomarkers: Information about select biomarkers not included on the NGS panels, including PDL1, PD1 and MSI, are also curated. Note that MSI data is currently not exposed in the cBioPortal for this cohort, but can be accessed in raw data files found in Synapse.
  • There are no patient (self-) reported outcomes in the data. 
  • NSCLC cancer diagnosis is considered the index tumor for this patient cohort. There are data about other cancer diagnoses antecedent to the NSCLC and subsequent to the NSCLC. 
  • Overall survival is based on death with censoring at date of last contact known alive. Ascertainment of death varies by institution. 
  • Exact date fields have been masked to preserve confidentiality. However, exact date intervals are available and allow for calculation of the time between events, e.g., diagnosis, treatment start, treatment end, PFS-I, PFS-M, OS, etc.  
  • Patients that had any interval that could be used to determine an individual was over the age of 89 years have been removed and may be returned to the dataset at a later date. 
  • Analytical Data Guide: A more comprehensive overview of the data can be found in the data guide, and a description and location of the variables collected can be found in the variable synopsis spreadsheet