Tuesday 23 April 2013

Imputing Microarray Gene Expression Data By Implementing Partial Least Square Method For Survival Analysis


Abstract

          Microarray technology is widely favored in the field of molecular biology and medicine. Microarray technology is applied for survival analysis of cancer patients to predict cancer patients’ survival time to certain events such as metastasis or death. Gene expression data for cancer patients are mostly accurate but still contains flaw within its data set, as the microarray data obtained has many missing values. The presence of missing values even at a low rate can affect the outcome of analysis for survival time. This brings a need to implement various machine learning methods to address this problem by imputing these missing values into the microarray gene expression data. For this research, partial least square algorithm will be implemented to perform imputation of missing values on the data. Solving this complication will benefit experts with better apprehension of DNA microarray data. The data that will be utilized in this research will be diffuse large B-cell lymphoma (DLBCL) dataset and carcinoma dataset as cited from related journals. With this dataset, partial least square algorithm is implemented to address this dataset. The result from the implementation is analysed for its performance and then compared with other imputation methods. From the result analysis, imputation technique using PLS has shown significant effectiveness.

No comments:

Post a Comment