We propose a systematic methodology to quantify incidentally identified pulmonary nodules based on observed radiological traits (semantics) quantified on a point scale and a machine-learning method using these data to predict cancer status. We investigated 172 patients who had low-dose CT images, with 102 and 70 patients grouped into training and validation cohorts, respectively. On the images, 24 radiological traits were systematically scored and a linear classifier was built to relate the traits to malignant status. The model was formed both with and without size descriptors to remove bias due to nodule size. The multivariate pairs formed on the training set were tested on an independent validation data set to evaluate their performance. The best 4-feature set that included a size measurement (set 1), was short axis, contour, concavity, and texture, which had an area under the receiver operator characteristic curve (AUROC) of 0.88 (accuracy = 81%, sensitivity = 76.2%, specificity = 91.7%). If size measures were excluded, the four best features (set 2) were location, fissure attachment, lobulation, and spiculation, which had an AUROC of 0.83 (accuracy = 73.2%, sensitivity = 73.8%, specificity = 81.7%) in predicting malignancy in primary nodules. The validation test AUROC was 0.8 (accuracy = 74.3%, sensitivity = 66.7%, specificity = 75.6%) and 0.74 (accuracy = 71.4%, sensitivity = 61.9%, specificity = 75.5%) for sets 1 and 2, respectively. Radiological image traits are useful in predicting malignancy in lung nodules. These semantic traits can be used in combination with size-based measures to enhance prediction accuracy and reduce false-positives. .
©2016 American Association for Cancer Research.