An Optimized Feature Selection Method for High Dimensional Data
Keywords:
ACO, Accuracy, Feature selection, GA, SubsetAbstract
High dimensional datasets consists of large number of both relevant and irrelevant features, hence the computational and prediction time to process the dataset increases. Feature selection (FS) extracts the most relevant features which are known as subsets for prediction and the computational time can be reduced. The dataset is taken from National Centre of Biotechnology Information (NCBI), which is a widely used benchmark dataset for feature selection from Microarray Gene Expression. In gene expression data analysis, the problems of cancer classification and gene selections are closely related. Selecting informative genes is essential for classification performance. However, high dimensional dataset causes a high computational cost and over fitting during classification. Thus it is necessary to reduce the dimension of data by feature selection. In this paper mean based Genetic Algorithm (GA) is proposed to select the optimal subsets from the raw dataset based on the mean value of the features and the accuracy of the subset is evaluated using a classifier of Support vector machine (SVM), which reduces the complexity of the model in terms of computational cost and size. The proposed method is compared with the ant colony optimization (ACO) algorithm and the result shown that the proposed method has a better accuracy rate.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2020 J. Priyadharshini, C. Kanimozhi
This work is licensed under a Creative Commons Attribution 4.0 International License.