Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction

Michael C. Lee, Lilla Boroczky, Kivilcim Sungur-Stasik, Aaron D. Cann, Alain C. Borczuk, Steven M. Kawut, Charles A. Powell

Research output: Contribution to journalArticlepeer-review

114 Scopus citations


Objective: Accurate classification methods are critical in computer-aided diagnosis (CADx) and other clinical decision support systems. Previous research has reported on methods for combining genetic algorithm (GA) feature selection with ensemble classifier systems in an effort to increase classification accuracy. In this study, we describe a CADx system for pulmonary nodules using a two-step supervised learning system combining a GA with the random subspace method (RSM), with the aim of exploring algorithm design parameters and demonstrating improved classification performance over either the GA or RSM-based ensembles alone. Methods and materials: We used a retrospective database of 125 pulmonary nodules (63 benign; 62 malignant) with CT volumes and clinical history. A total of 216 features were derived from the segmented image data and clinical history. Ensemble classifiers using RSM or GA-based feature selection were constructed and tested via leave-one-out validation with feature selection and classifier training executed within each iteration. We further tested a two-step approach using a GA ensemble to first assess the relevance of the features, and then using this information to control feature selection during a subsequent RSM step. The base classification was performed using linear discriminant analysis (LDA). Results: The RSM classifier alone achieved a maximum leave-one-out Az of 0.866 (95% confidence interval: 0.794-0.919) at a subset size of s=36 features. The GA ensemble yielded an Az of 0.851 (0.775-0.907). The proposed two-step algorithm produced a maximum Az value of 0.889 (0.823-0.936) when the GA ensemble was used to completely remove less relevant features from the second RSM step, with similar results obtained when the GA-LDA results were used to reduce but not eliminate the occurrence of certain features. After accounting for correlations in the data, the leave-one-out Az in the two-step method was significantly higher than in the RSM and the GA-LDA. Conclusions: We have developed a CADx system for evaluation of pulmonary nodule based on a two-step feature selection and ensemble classifier algorithm. We have shown that by combining classifier ensemble algorithms in this two-step manner, it is possible to predict the malignancy for solitary pulmonary nodules with a performance exceeding that of either of the individual steps.

Original languageEnglish
Pages (from-to)43-53
Number of pages11
JournalArtificial Intelligence in Medicine
Issue number1
StatePublished - Sep 2010
Externally publishedYes


  • Computer-aided diagnosis
  • Feature selection
  • Genetic algorithms
  • Linear discriminant analysis
  • Pulmonary nodules
  • Random subspace


Dive into the research topics of 'Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction'. Together they form a unique fingerprint.

Cite this