Home Contact Chinese CAS
Home  About Us    Research     People   International Cooperation   News     Papers   Education & Training  Join Us
Location: Home > Research > Research Progress
Hybridization and amplification rate correction for affymetrix SNP arrays
 Date:18-10-2012 Page Views:
Print
Text Size: A A A
Close

 

Hybridization and amplification rate correction for affymetrix SNP arrays

Quan Wang 1, Pei C Peng 2, Min P Qian1,2, Lin Wan3,4* , Ming H Deng 1,2,5**

1 Center for Theoretical Biology, Peking University, Beijing 100871, People's Republic of China
2 LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, People's Republic of China
3 Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, USA
4 National Center for Mathematics and Interdisciplinary Sciences, and the Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
5 Center for Statistical Science, Peking University, Beijing 100871, People's Republic of China
* Corresponding author. National Center for Mathematics and Interdisciplinary Sciences, and the Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
** Corresponding author. Center for Statistical Science, Peking University, Beijing 100871, People's Republic of China

 

Abstract

 
Background
Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. Nevertheless, CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Since many available algorithms only address one or two of the three factors, a high false discovery rate (FDR) often results when identifying CNV.
Therefore, we have developed a new CNV detection pipeline which is based on hybridization and amplification rate correction (CNVhac).
Methods
CNVhac first estimates the allelic concentrations (ACs) of target sequences by using the sample independent parameters trained through physicochemical hybridization law. Then the raw CN is estimated by taking the ratio of AC to the corresponding average AC from a reference sample set for one specific site. Finally, a hidden Markov model (HMM) segmentation process is implemented to detect CNV regions.
Results
Based on public HapMap data, the results show that CNVhac effectively smoothes the genomic waves and facilitates more accurate raw CN estimates compared to other methods. Moreover, CNVhac alleviates, to a certain extent, the sample dependence of inference and makes CNV calling with appreciable low FDRs.
Conclusion
CNVhac is an effective approach to address the common difficulties in SNP array analysis, and the working principles of CNVhac can be easily extended to other platforms.
Keywords: EM algorithm; Gene expression; Next generation sequencing; RNA degradation; RNA-Seq.
Keywords: SNP array, Copy number variation (CNV), Cross-hybridization, Genomic waves

BMC Medical Genomics

Vol. 5, pp. 24. 2012

[ Close ]  [ Top ]
  Copyright © 2012, All Rights Reserved, National Center for Mathematics and Interdisciplinary Sciences, CAS
Tel: 86-10-62613242 Fax: 86-10-62616840 E-mail: ncmis@amss.ac.cn