Overall strategy and general description

Our strategy is to use existing genome wide genotype data and identify SNPs that will be genotyped in our consortium datasets to identify genetic loci associated with cancer risk. We will combine these data with data on lifestyle risk factors and tumour characteristics to provide a detailed evaluation of gene-environment interactions and associations with tumour phenotypes and disease outcome.

The genetic loci will be identified using a three-stage procedure. This procedure is the central part of COGS and data generated from the genotyping forms the basis for the work performed by all other WPs. WP2 will have responsibility for the statistical analyses to identify genetic loci. These will be identified at three stages:
  1. Stage I includes gathering the current data from genome wide association studies already performed within the different consortia. Through combined analysis of existing data we will identify 1536 SNPs forwarded to stage II. Stage I should be completed during year 1. The groups providing genome wide SNP data are described in WP2, Task 2.1.

  2. The 1,536 SNPs selected in stage I will be genotyped in 10,000 breast cancer cases and 10,000 matched controls, 10,000 prostate cancer cases and 10,000 matched controls, and 5,000 ovarian cancer cases and 5,000 matched controls. Stage II will be completed during year 2. The genotyping in this stage will be done centrally due to the large amount of work and the high-throughput genotyping needed. A detailed description of stage II and III is given in WP3.

  3. In stage III the most promising 50 loci will be genotyped in up to 70,000 further cases and controls. This step will be performed at each of the groups included in COGS. Stage III will be completed during year 3.
We predict that, using this strategy; we will identify at least 30 susceptibility loci for each cancer.

WP3 will have the overall responsibility for collating the isolation of germline DNA and performing genotyping. This means interaction with either academic centres or SMEs. WP2 will be the responsible for the initial selection of SNPs and to analyze the data generated by WP3 and consequently, the responsibility for the final selection of candidate SNPs that will be used. The genetic loci identified will then be used in three further WPs:

WP4 will undertake fine-scale mapping of the most strongly associated loci. For each cancer studied (breast, ovarian and prostate) a confirmed locus from WP2 will be a defined region of a chromosome in which the causative mutation for the relevant cancer must lie. The ultimate aim of WP4 is to identify the causative mutation within each locus. Knowledge of the exact mutation and the gene affected by it will provide an understanding of the biological activities underpinning each of the genetic risk factors. Knowledge of this kind will generate new hypotheses for potential interactions with known risk factors, which can be tested in gene-environment interaction analyses in WP5. WP4 will also elucidate the genetic marker with the strongest effect on risk at each locus, thus maximizing the statistical power of these WP5 gene - environment analyses. 

However previous experience has demonstrated that it is not often possible to isolate a single mutation within European populations. Europeans show very little genetic diversity and so large numbers of genetic variants tend to be inherited together as a block, making it difficult to separate the causative mutation from nearby genetic variants that have no direct effect on disease. A more realistic goal, therefore, is to reduce the total number of genetic variants within a particular locus to the smallest possible set of highly-correlated, candidate variants that must contain the causative mutation. When available, we will use more diverse, non-European populations in which the same genetic variants are present, to narrow down the number of candidates because it is more likely that they can be separated in other populations.

Before mapping can begin, a complete catalogue of all the common genetic variants within the chromosomal region, defined as the locus, is needed. Previous experience from BCAC has indicated that a defined locus may range from 10-300Kb in length and re-sequencing these demonstrated that only ~50% of the genetic variants (SNPs) we found were already known and listed in the existing SNP data bases.

WP5 will evaluate the combined effects of genetic and lifestyle factors (“gene-environment interactions”). Data on relevant lifestyle risk factors are available through studies within the consortia. WP5 will synthesise these data so that risk factors are coded to a common protocol, and then perform combined analyses of genetic and lifestyle risk factors. They will use standard analytical modelling, as well as the hierarchical Baysian approach, examining multiplicative and additive interaction.

WP6 will examine the effects of susceptibility variants and risk factors on tumour subtypes. They will gather data on tumour subtype in consortium studies using immunohistochemical analysis of tumours and clinicopathological data, Specifically in the case of breast cancer, analysis of existing genomic and gene expression array data from the NBAC study will be substantiated by prospectively acquired large-scale expression array data from the TRANSBIG study. OCAC will have extensive gene copy number data available. They will then perform analyses to evaluate associations between germline genotype, tumour phenotypes and disease outcome. The results obtained in WP4-6 will then be used to develop statistical risk models for each cancer (WP2), that allow prediction of individual risk based on genetic and lifestyle risk factors.

Finally, we will explore the implications for such risk prediction by evaluating the public health potential for population based screening programmes targeted to those at greatest risk and considering the key organisational, ethical, legal and social issues that arise. The aim of WP7 is to develop appropriate policy recommendations.

WP7 will build on the results of the primary research in genetic susceptibility to hormone related cancers and related gene-environment interactions to address some of these questions. It will comprise two main components.

The first component will address the general question: What are the implications of emerging knowledge about the architecture of genetic susceptibility to the hormone related cancers for population health? More specifically, models will be developed to evaluate how knowledge of genotype and environment together add to our ability to stratify population according to absolute risk. (Screening strategies are currently based on general factors such as age). We will then go on to evaluate the potential impact of using such genetic and environmental risk stratification to target preventive interventions on at-risk sub-populations and thence to reduce disease and death within the whole population. It is anticipated that these models will be developed over the first two years using current best estimates. In years 3-4, data becoming available from WPs 2-6 will be integrated allowing the models to be refined.

The second component will address the implications of these emerging findings for health policy including organisational, ethical, legal and social issues. The primary purpose of this component will be to identify the key issues for future implementation and the range of likely obstacles, and to begin to propose potential solutions. These issues will be addressed progressively through a series of workshops in which expert stakeholders from European states will be invited to participate. These will be supported by further background public health and policy analysis undertaken directly by, or commissioned by the PHG Foundation.

Additional information