An Early Diagnosis System for predicting Lung Cancer Risk Using adaptive Neuro Fuzzy Inference System and Linear Discriminant Analysis

Mustain Billah* and Nazrul Islam

Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh

*Corresponding Author:
Mustain Billah
Department of Information and Communication Technology
Mawlana Bhashani Science and Technology University Tangail, Bangladesh
Tel: +8801929175180
E-mail: [email protected]

Received date: May 18, 2016; Accepted date: June 16, 2016; Published date: June 23, 2016

Citation: Billah M, Islam N. An Early Diagnosis System for Predicting Lung Cancer Risk Using Adaptive Neuro Fuzzy Inference System and Linear Discriminant Analysis.2016, 1:1.

Copyright: © 2016 Billah M et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

 
Visit for more related articles at Journal of MPE Molecular Pathological Epidemiology

Abstract

Lung cancer is the number one cause of cancer deaths in both men and women worldwide. The general prognosis of lung cancer is poor because doctors tend not to and the disease until it is at an advanced stage. But it is very necessary to diagnose the disease early for taking preventive steps. In this paper, an adaptive Neuro Fuzzy Inference System (ANFIS) and Linear Discriminant Analysis (LDA) based lung cancer diagnosis system is proposed. This diagnosis system has mainly two steps: Feature extraction-reduction and classification First, lung cancer historical data sets are collected from different hospitals. They are then preprocessed. To reduce the lung cancer features dimensionality, Linear Discriminant Analysis (LDA) is applied. Reduced features are then fed into AN-FIS classifier system. classification accuracy, sensitivity and specificity analysis are performed for performance evaluation of proposed system. Obtained ac-curacy of about 95.4% shows that the proposed intelligent system has a good diagnosis performance and can be used as a promising tool for lung cancer diagnosis.

Keywords

Adaptive neuro fuzzy inference system; Linear discriminant analysis; Lung cancer; Sensitivity and specificity analysis; Classification accuracy

Introduction

Lung cancer is a type of cancer that begins in the lung. Actually it is the uncontrolled cell growth in tissues of the lung. If left untreated, this growth can spread beyond the lung into nearby tissue or other parts of the body. Lung cancer is the leading cause of cancer deaths in the United States, among both men and women. Lung cancer claims more lives each year than do colon, prostate, ovarian and breast cancers combined.

There are three main types of lung cancer Non-Small Cell Lung Cancer is the most common type of lung cancer. About 85% of lung cancers are non-small cell lung cancers. About 10%-15% of lung cancers are small cell lung cancers. This type of lung cancer tends to spread quickly. Fewer than 5% of lung cancers are lung carcinoid tumors. They are also sometimes called lung neuroendocrine tumors. Most of these tumors grow slowly and rarely spread.

However, lung cancer is also the most preventable cancer. Lung cancer symptoms usually do not appear until the disease has progressed. So, cure rate and prognosis depend on the early detection and diagnosis of the disease. But this early diagnosis is not an easy task. Whenever, a patient goes to hospital, doctor asks about medical history and symptoms and arranges for some tests such as CT scan that takes X-Ray through sections of body.

A physician commonly takes decisions by evaluating the current test results of a patient or the physician compares the patient with other patients with the same condition by referring to the previous decisions. Therefore, it is very difficult for a physician to diagnose lung cancer. For this reason, an early diagnosis system for predicting lung cancer risk is proposed in this study to help the physician in the diagnosis of lung cancer diseases. Linear Discriminant Analysis (LDA) and Adaptive Neuro Fuzzy Inference System (ANFIS) is used in this study. ANFIS has been used for many feature selection purposes in many papers [1-7]. LDA-ANFIS based system has been used for various disease diagnoses such as hepatitis, diabetes but there is no work on lung cancer diagnosis using LDA-ANFIS method. Various research works are found on lung cancer diagnosis [8-11]. A comparative analysis on these papers and proposed system is presented in this study also. In the paper [7] authors found out some features of lung cancer. Computer aided diagnosis system using image processing was used for lung nodule detection in [8,9]. Proposed computerized scheme for automated detection of lung nodule. The paper is organized as follows: section 2 gives a brief description of theory of pattern recognition, LDA and ANFIS. Proposed system is explained in section 3. In section 4 the obtained results are presented. Finally, in Section 5, the discussion and conclusion are presented.

Theory

Pattern recognition

It is generally easy for a person to differentiate the sound of a human voice, from that of a violin; a handwritten numeral "3," from an "8"; and the aroma of a rose, from that of an onion. However, it is di cult for a programmable computer to solve these kinds of perceptual problems. These problems are di cult because each pattern usually contains a large amount of information, and the recognition problems typically have an inconspicuous, high-dimensional, structure.

Pattern recognition [12] is the science of making inferences from perceptual data, using tools from statistics, probability, computational geometry, machine learning, signal processing, and algorithm design.

Pattern is defined as composite of features that are characteristic of an individual. In classification, a pattern is a pair of variables fix, wg where x is a collection of observations or features (feature vector) and w is the concept behind the observation (label). The quality of a feature vector is related to its ability to discriminate examples from different classes. Examples from the same class should have similar feature values and while examples from different classes having different feature values.

If the characteristics or attributes of a class are known, individual objects might be identified as belonging or not belonging to that class. The objects are assigned to classes by observing patterns of distinguishing characteristics and comparing them to a model member of each class. Pattern recognition involves the extraction of patterns from data, their analysis and, finally, the identification of the category (class) each of the patterns belongs to.

Linear discriminant analysis

Linear Discriminant Analysis (LDA) is a method of finding such a linear combination of variables which best separates two or more classes [13]. In itself LDA is not a classification algorithm, although it makes use of class labels. However, the LDA result is mostly used as part of a linear classifier. The other alternative use is making a dimension reduction before using nonlinear classification algorithms. LDA seeks to reduce dimensionality while preserving as much of the class discriminatory information as possible [14].

Adaptive neuro fuzzy inference system

ANFIS [15] derives its name from adaptive neuro-fuzzy inference system. It works similarly to that of neural networks. Using a given input/output data set, ANFIS constructs a fuzzy inference system (FIS) whose membership function parameters are tuned (adjusted) using either a back propagation algorithm alone or in combination with a least squares type of method. This adjustment allows the fuzzy systems to learn from the data they are modeling [16] (Figure 1).

molecular-pathological-epidemiology-Neuro-Fuzzy-Inference-System

Figure 1 Adaptive Neuro Fuzzy Inference System with two inputs.

Proposed LDA-ANFIS early diagnosis system of lung cancer has mainly two stages: Feature Extraction and reduction using LDA and classification with ANFIS.

Here,

image Assume that the fuzzy inference system has two inputs x and y and one output f. A first-order Surgeon fuzzy model has rules as the following:

Rule 1

If x is A1 and y is B1, then f1 = p1x + q1y + r1

Rule 2

If x is A2 and y is B2, then f2 = p2x + q2y + r2

Feature extraction - reduction stage

First, clinical data sets of 1000 lung cancer patients diagnosed in different hospitals were extracted. Each patient record contains more than 20 features. As the dataset contains a large number of feature variables, Linear Discriminant Analysis (LDA) was conducted to this dataset and features are reduced to only 9 features (Table 1).

No Feature Values
1 Hemoptysis Yes,No
2 loss of weight Yes,No
3 loss of appetite Yes,No
4 Chest or rib pair Yes,No
5 Fatigue Yes,No
6 severe cough Yes,No
7 finger clubbing Yes,No
8 Thromocytosis Yes,No
9 Abnormal spirometer Yes,No

Table 1: Reduced features of lung cancer dataset using LD

In this lung cancer database, there are two classes: out of risk and very high risk.

Classification with ANFIS

In the classification stage, two class features obtained in the feature extraction and feature reduction stages are given to inputs of Adaptive Network based On Fuzzy Inference System (ANFIS) classifier.

In the structure of the ANFIS classifier, both artificial neural network and fuzzy logic are used. ANFIS classifier is formed of if-then rules, couples of input-output and learning algorithms of neural network. These are used in the training of ANFIS classifier. In this experimental study, the ANFIS classifier has nine inputs (x1, x2, x3, x4, x5, x6, x7, x8, and x9) and one output (y). The parameters of the ANFIS classifier used in this study can be given as below

The number of layers: 5 Input: 9

The number of rules: 512 Output: 1

Input membership functions: Bell-shaped Training parameters learning rule: Hybrid Learning Algorithm

Sum-squared error: 0.0000001

Reaching Epochs number to sum-squared error: 2200

The results obtained using LDA-ANFIS

The correct diagnosis performance of LDA-ANFIS based lung cancer risk diagnosis system is estimated by using sensitivity and specificity analysis and classification accuracy performance evaluation methods, respectively.

In this experimental study, 800 samples were used for training and another 200 samples for testing the system. Figure 2 shows the confusion matrix of output class and target class. Here, '0' represents the class 'out of risk' and '1' represents the class 'high cancer risk'.

molecular-pathological-epidemiology-matrix-two-classes

Figure 2 Confusion matrix for two classes.

From this confusion matrix, sensitivity, specificity and accuracy can be measured. Here, TP: True Positive; TN: True Negative; FP: False Positive; FN: False Negative.

Sensitivity and specificity analysis

Sensitivity (also called the true positive rate, or the recall in some fields) measures the proportion of positives that are correctly identified.

TP

image (2)

Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified.

TN

image (3)

The obtained values of sensitivity and specificity of proposed LDA-ANFIS based early lung cancer risk diagnosis system are given in Table 2.

  Sensitivity Specificity
PCA-AIRS(Polat and Gunes, 2006)[10] 100 94.44
LDA-ANFIS(Esin, Akif and Derya, 2009)[17] 96.66 91.66
Proposed LDA-ANFIS 96.44 92.90

Table 2: Values of sensitivity and specificity.

Classification accuracy analysis

Accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's true value.

image

The classification accuracy of proposed LDA-ANFIS based early lung cancer risk diagnosis system is given in Table 3. From the Table 3 above, it is clear that proposed system has higher accuracy than other methods [17,18].

Used method Authors Name Classi cation Accuracy(%)
PCA-ANFIS[18] Polat and Gunes 89.47
PCA-AIRS[10] Polat and Gunes 94.12
LDA-ANFIS[17] Esin, Akif and Derya 94.16
Proposed LDA-ANFIS Used in this study 95.4

Table 3: Classification Accuracy of different models

Discussion

In this study, an early diagnosis system of lung cancer risk prediction based on Adaptive Neuro Fuzzy Inference System (ANFIS) and Linear Discriminant Analysis (LDA) is proposed. Performance evaluation using sensitivity and specificity analysis and classification accuracy shows that, proposed LDAANFIS based automatic diagnosis system for lung cancer obtains very promising results in classifying the possible lung cancer patients. Using this system physicians can take rapid decision on lung cancer risk prediction. Moreover, patients can also use proposed diagnosis system without any prior medical or clinical knowledge about lung cancer. Diagnostic clinics can also use this model. This model can be implemented in computer software and mobile apps for increasing its availability and usability.

Though proposed model overcomes the problem of previous models and out-performs them, the sensitivity and specificity for proposed model is less than optimal. To solve this problem in future, PCA-ANFIS based system will be proposed for increasing accuracy in predicting lung cancer risk.

References

Select your language of interest to view the total content in your interested language

Viewing options

Flyer image

Share This Article