WP3 - Genotyping technology and sample handling

Work package number 3 Start date or starting event:  Month 1
Work package title Genotyping technology and sample handling
Activity Type RTD
Participant number 5 1 2 3 8 9 13 16
Person-months participant 108.6 2 48 19 4 4 48 90

To coordinate and carry out the genotyping of SNPs selected by WP2 for further evaluation in multi-centre consortia using validated technology platforms

Description of work (WP3)
Work package 3 is responsible for the organisation, coordination, quality control and performing the genotyping for cases and controls of the three cancers considered in COGS (breast, ovarian and prostate cancer). The COGS genotyping will be conducted in a three-stage manner. In the first stage, described in WP2, existing genome wide SNP data will be used to identify the top 1,536 SNPs for each cancer. The two subsequent steps are the responsibility of WP3. For a detailed description of the three stage procedure please see B.1.3.1 Overall strategy and general description.

Task 3.1. Organisation of the samples
DNA from individual groups in the different organ based consortia will be sent to partner 5, Spanish National Genotyping Centre (http://www.cnio.es), Madrid node (CEGEN-CNIO). At CEGEN-CNIO the samples will be organised and prepared for shipment to the two subcontracted centres, please see Task 3.2.

The initial task will be to collect and organise DNA samples from 25,000 cases and 25,000 controls from the different consortia that collaborate in this study: the Breast Cancer Association Consortium (BCAC), 10,000 breast cancer cases and 10,000 controls, the Ovarian Cancer Association Consortium (OCAC) 5,000 ovarian cancer cases and 5,000 controls, and the PRACTICAL consortium, 10,000 cases and 10,000 controls. These consortia have been operating for 2-3 years with excellent results. (Please see Consortium as a whole B2.3 for details). All 50,000 DNA samples will be centralised in CEGEN-CNIO where they will be quantified, normalized and organized in 96- well plate to be re-distributed to the two genotyping centres.

In order to reduce the risk of being left without DNA for some individuals, the total amount of DNA will never sent. Groups with small amounts of DNA will work with amplified DNA. This means that there will always be backups. Samples will be sent by courier to CEGEN-CNIO in 96 well-plates. The corresponding unique sample identifiers will be sent electronically and included in the CEGEN-CNIO laboratory information management system (LIMS). Plates sent to CEGEN-CNIO will be organized according to following criteria:

  1. Although the minimum amount of DNA required for the Illumina Golden Gate assay is 250 ng (excluding DNA quantification), 750 ng of DNA will be sent. This amount will allow for a second round of genotyping if necessary (see Task 3.2.7). Remaining DNA after final genotyping will be returned to the original lab.

  2. Plates will include DNA from a mixture of 90 cases and controls, plus 2 duplicated samples and 4 empty wells that will be used for one Coriell sample in duplicate and two negative controls. That means at least 3% duplicates will be included in every study.

  3. We will avoid the use of amplified DNA where at all possible. Where it is critical to use amplified DNA, the DNA will be analysed in separate plates that do not include any genomic DNA. In these plates, duplicates will be from independent amplifications. The use of amplified DNA will be minimised.

  4. Plates will be re-coded with codes that allow for the linking of genotyping results to of the sample identifiers, plate identifiers and plate positions.

  5. The samples will be quantified with Picogreen and new plates will be created dispensing only 250 ng of DNA, which is the amount required for Illumina genotyping.

  6. The DNA in these new plates will be evaporated for shipping. Evaporated DNA is easier to transport and minimise the risk of degradation.

  7. Of the remaining, quantified DNA will be stored at CEGEN-CNIO as a repository for a second round of Illumina genotyping, if required. Samples that fail in Stage II genotyping will be regrouped in new plates, evaporated and resent to the corresponding genotyping centre. However, in our experience the number of failed samples is expected to be very low (less than 1%).

A majority of the third parties have already extracted DNA from blood samples. However, approximately 10,000 samples will require DNA extraction and DNA amplification. Isolation and amplification of DNA is today considered a routine and non-scientific procedure and is normally not done at research laboratories. We have chosen to subcontract this part of the WP. Please see Consortia as a whole B.2.3.

Task 3.2. High-throughput genotyping, Stage II
In stage II the 1,536 SNPs selected in stage I will be genotyped in 25,000 cases and 25,000 controls. Partner 5 will be responsible for genotyping 10,000 cases of breast cancer and an equivalent number of controls. The subcontractors will be responsible for the genotyping of ovarian and prostate cancer cases and controls. Due to the very large number of SNPs and samples to be analysed during a short period of time (year 2), it is necessary to subcontract part of our work; for this reason, large proportion of the COGS budget will need be spent on subcontracting.

A tender process will be initiated to identify the most suitable subcontractors. We have been in contact with two potential subcontractors to learn about costs and resources. Two potential subcontractors are: (1) the Wellcome Trust Centre for Human Genetics (http://www.well.ox.ac.uk), that, if selected, will be responsible for genotyping of the prostate cancer patients (n=10,000) and controls (n=10,000) (2) The CEGEN-Barcelona (http://www.cegen.org), located at the Centre for Genomic Regulation (http://www.crg.es), that, if selected, will perform the high-throughput genotyping on the ovarian cancer samples (5,000 cases and 5,000 controls). Below we have listed the main reasons for including subcontractors.

  1. Both subcontracted laboratories are core facilities of their academic institutes with all the necessary high-throughput equipment and bioinformatics support (Illumina platforms and other additional platforms, robots, etc) already in place.

  2. The equipment needed to carry out high-throughput genotyping is highly expensive and requires regular maintenance, updating and substitution of obsolete accessories that is also very costly. Only core facilities are therefore able to efficiently carry out this work.

  3. Both potential subcontractors work with a LIMS that is compatible with ours and used to manage and audit samples, laboratory users, control of instruments and standards and other laboratory functions such as invoicing, plate management, and work flow automation.

  4. The two centres already have extensive experience in large genotyping projects, accumulated over more than 4 years and have confirmed their availability for the project and are committed to complying with the proposed timelines.

An Illumina custom array will be prepared for genotyping using the Golden Gate technology. All SNPs selected will be checked to pass the in silico design criteria being suitable for assay development. A SNPScore for each marker will be provided by Illumina – values of >0.6 (out of 1) means that the SNP is included. A low SNPScore could be due to several reasons: assay outside limits, SNPs in duplicated and repetitive region, another SNP within 60 nucleotides away, or tri-or quad-allelic SNPs. We anticipate that this will be rare, because all SNPs will have been identified through previous genome-wide studies with similar design requirements. For those failing SNPs, we will determine whether another SNP in strong linkage disequilibrium can be used as a replacement, or whether another genotyping technology will be required for this SNP.

Plates will be sent to CEGEN-Barcelona and Wellcome Trust-Oxford by courier and unique sample identifiers will be sent electronically. As a quality control, plates with 90 Coriell samples (30 trios) will be also genotyped for each of the 1,536 SNP oligo pool assays. Plates will be included in each genotyping centre’s own LIMS in order to facilitate the control and management of samples. The Illumina standard protocol will be followed at all three genotyping centres, starting with 5 ul of DNA at a concentration of 50ng/ul. Five ul of water will be added to the plates. Following, start the Illumina protocol adding 5 ul MS1.

Once all genotypes have been generated, each genotyping centre will upload results in a repository provided by CEGEN-CNIO (SNPator). This web-based tool will be used as a centralised data repository and will save backups on a regular basis. The repository will permit genetic data to be integrated into specific databases for analyses by WP2 in order to identify the best 50 SNPs for stage III, please Task 3.3.

In addition, and in the absence of genome wide scans for modifiers of BRCA1 which might identify genes that only act epistatically with BRCA1, Georgia Chenevix-Trench, partner 3, will request funds from the National Health and Medical Research Council of Australia (NHMRC) to genotype the 3,000 BRCA1 carrier DNAs for 1,536 SNPs from genes involved in response to ionizing radiation also using the GoldenGate technology. Putative modifiers identified from this approach will be genotyped in the rest of CIMBA using the technologies described in Task 3.3.

Task 3.3. Genotyping 50 SNPs from each study, stage III
The best 50 SNPs from each study will be genotyped in a further 70,000 samples (20,000 breast cancer cases and 20,000 controls; 10,000 prostate cancer case and 10,000 controls; 5,000 ovarian cancer samples and 5,000 controls). An additional 100 SNPs (those selected from the breast and ovarian cancer studies) will be to be validated in 9,000 BRCA1 and 6,000 BRCA2 mutation carriers. Genotyping in stage II was performed centrally. In contrast, genotyping in stage III will be performed at laboratories third parties not participating in stage I and II.

Different genotyping platforms will therefore be used. Platforms to be used include Taqman, Sequenom iPlex technologies and Veracode. For groups using Taqman, assays will be evaluated and optimized at UCAM, partner 2, in 90 Coriell samples (trios) and distributed to each participant group. For each of the SNPs, these 90 Coriell samples will be also sent for genotyping as a control. For those groups that do not have the capacity, genotyping in stage III will be done at CEGEN-CNIO.

Platforms to be used include Taqman, Sequenom iPlex technologies and Veracode. For groups using Taqman, assays will be evaluated and optimized at the UCAM, partner 2, in 90 Coriell samples (trios) and distributed to each participant group. For each of the SNPs, these 90 Coriell samples will be also sent for genotyping as a control. Individual genotyping centres will ensure that cases and controls are mixed on 96-well plates that include at least 1 empty well. A minimum of 3% of samples will be genotyped in duplicate. Genotyping data and cluster images will be uploaded to a repository develop by WP2 for centralised quality control (QC). Essential QC information for each genotyped SNP include: concordance in Coriell samples, concordance in duplicate samples, call rate, and consistency with Hardy-Weinberg equilibrium.

Plates with less than 90% call rate will be excluded. After centralised quality control, all genotype data will be uploaded to the repository of the WP2 Consortium database For BRCA1 and BRCA2 carriers, genotyping will be carried out using Sequenom iPLEX and Veracode technologies. For 6,000 of the 15,000 available BRCA1/2 carrier DNA samples have already been centralised and plated in Brisbane, Australia, and the 100 candidate SNPs (50 ovarian + 50 breast cancer) will be genotyped there by iPLEX (funds to be requested from the Australian NHMRC).

These genotypes will be incorporated into COGS in order to increase the statistical power. For the remaining cases, we plan to use Veracode platform from Illumina that permits the study of 96 SNPs simultaneously. Results will be analysed by WP2 in order to identify the best susceptibility SNPs for each of the three diseases. In summary, the funding sought for the three steps is distributed as follows: Stage I. Genome-wide SNP scans have already been performed by several groups. Data will be sent to UCAM nine months after the start of the project. Costs for the stage I genotyping are not covered by COGS. Stage II. High-throughput genotyping will be performed by one partner and two subcontractors. Third parties will contribute DNA samples and COGS will pay for shipment of the samples. Stage III. Genotyping will be performed by third parties not included in Step I or II. COGS will provide the reagents and control samples.

These data will be used to evaluate the association between each genotype and disease, combined over all available studies. To allow for differences in the genotype platforms used in each genome scan, we will use imputation methods to estimate genotypes at all known SNPs, using the international HapMap as a reference. We will then perform statistical tests of association for each known SNP against disease. We will also provide a web-based tool to visualise results from all the genome-scan data (and, subsequently, the follow-up results). Once the initial combined analyses are complete, we will identify a set of up to 1,536 SNPs for each disease, representing the most promising loci, for further genetic analyses as described in WP3 (task 3.2). First identifying the most significant SNPs, and then using multiple regression approaches to define a set of independently significant SNPs for further genotyping. Simultaneously, those loci with strong evidence of association will be passed to WP4 for fine-scale mapping.
D3.1 Samples shipment to CNIO from other groups (month 9)
D3.2 Samples preparation for shipping and genotyping (month 12)
D3.3 Design and synthesis of Illumina Custom Arrays (month 14)
D3.4 Generation of genotypes (stage II) of the three diseases (50,000 samples) (month 22)
D3.5 Evaluation of assays for Taqman/iPlex/Veracode (month 28)
D3.6 Data replication and generation of genotypes (stage III) (month 33)

Additional information