Mustain Billah* and Nazrul Islam
Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
Received date: May 18, 2016; Accepted date: June 16, 2016; Published date: June 23, 2016
Citation: Billah M, Islam N. An Early Diagnosis System for Predicting Lung Cancer Risk Using Adaptive Neuro Fuzzy Inference System and Linear Discriminant Analysis.2016, 1:1.
Copyright: © 2016 Billah M et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Lung cancer is the number one cause of cancer deaths in both men and women worldwide. The general prognosis of lung cancer is poor because doctors tend not to and the disease until it is at an advanced stage. But it is very necessary to diagnose the disease early for taking preventive steps. In this paper, an adaptive Neuro Fuzzy Inference System (ANFIS) and Linear Discriminant Analysis (LDA) based lung cancer diagnosis system is proposed. This diagnosis system has mainly two steps: Feature extraction-reduction and classification First, lung cancer historical data sets are collected from different hospitals. They are then preprocessed. To reduce the lung cancer features dimensionality, Linear Discriminant Analysis (LDA) is applied. Reduced features are then fed into AN-FIS classifier system. classification accuracy, sensitivity and specificity analysis are performed for performance evaluation of proposed system. Obtained ac-curacy of about 95.4% shows that the proposed intelligent system has a good diagnosis performance and can be used as a promising tool for lung cancer diagnosis.
Lung cancer is a type of cancer that begins in the lung. Actually it is the uncontrolled cell growth in tissues of the lung. If left untreated, this growth can spread beyond the lung into nearby tissue or other parts of the body. Lung cancer is the leading cause of cancer deaths in the United States, among both men and women. Lung cancer claims more lives each year than do colon, prostate, ovarian and breast cancers combined.
There are three main types of lung cancer Non-Small Cell Lung Cancer is the most common type of lung cancer. About 85% of lung cancers are non-small cell lung cancers. About 10%-15% of lung cancers are small cell lung cancers. This type of lung cancer tends to spread quickly. Fewer than 5% of lung cancers are lung carcinoid tumors. They are also sometimes called lung neuroendocrine tumors. Most of these tumors grow slowly and rarely spread.
However, lung cancer is also the most preventable cancer. Lung cancer symptoms usually do not appear until the disease has progressed. So, cure rate and prognosis depend on the early detection and diagnosis of the disease. But this early diagnosis is not an easy task. Whenever, a patient goes to hospital, doctor asks about medical history and symptoms and arranges for some tests such as CT scan that takes X-Ray through sections of body.
A physician commonly takes decisions by evaluating the current test results of a patient or the physician compares the patient with other patients with the same condition by referring to the previous decisions. Therefore, it is very difficult for a physician to diagnose lung cancer. For this reason, an early diagnosis system for predicting lung cancer risk is proposed in this study to help the physician in the diagnosis of lung cancer diseases. Linear Discriminant Analysis (LDA) and Adaptive Neuro Fuzzy Inference System (ANFIS) is used in this study. ANFIS has been used for many feature selection purposes in many papers [1-7]. LDA-ANFIS based system has been used for various disease diagnoses such as hepatitis, diabetes but there is no work on lung cancer diagnosis using LDA-ANFIS method. Various research works are found on lung cancer diagnosis [8-11]. A comparative analysis on these papers and proposed system is presented in this study also. In the paper  authors found out some features of lung cancer. Computer aided diagnosis system using image processing was used for lung nodule detection in [8,9]. Proposed computerized scheme for automated detection of lung nodule. The paper is organized as follows: section 2 gives a brief description of theory of pattern recognition, LDA and ANFIS. Proposed system is explained in section 3. In section 4 the obtained results are presented. Finally, in Section 5, the discussion and conclusion are presented.
It is generally easy for a person to differentiate the sound of a human voice, from that of a violin; a handwritten numeral "3," from an "8"; and the aroma of a rose, from that of an onion. However, it is di cult for a programmable computer to solve these kinds of perceptual problems. These problems are di cult because each pattern usually contains a large amount of information, and the recognition problems typically have an inconspicuous, high-dimensional, structure.
Pattern recognition  is the science of making inferences from perceptual data, using tools from statistics, probability, computational geometry, machine learning, signal processing, and algorithm design.
Pattern is defined as composite of features that are characteristic of an individual. In classification, a pattern is a pair of variables fix, wg where x is a collection of observations or features (feature vector) and w is the concept behind the observation (label). The quality of a feature vector is related to its ability to discriminate examples from different classes. Examples from the same class should have similar feature values and while examples from different classes having different feature values.
If the characteristics or attributes of a class are known, individual objects might be identified as belonging or not belonging to that class. The objects are assigned to classes by observing patterns of distinguishing characteristics and comparing them to a model member of each class. Pattern recognition involves the extraction of patterns from data, their analysis and, finally, the identification of the category (class) each of the patterns belongs to.
Linear discriminant analysis
Linear Discriminant Analysis (LDA) is a method of finding such a linear combination of variables which best separates two or more classes . In itself LDA is not a classification algorithm, although it makes use of class labels. However, the LDA result is mostly used as part of a linear classifier. The other alternative use is making a dimension reduction before using nonlinear classification algorithms. LDA seeks to reduce dimensionality while preserving as much of the class discriminatory information as possible .
Adaptive neuro fuzzy inference system
ANFIS  derives its name from adaptive neuro-fuzzy inference system. It works similarly to that of neural networks. Using a given input/output data set, ANFIS constructs a fuzzy inference system (FIS) whose membership function parameters are tuned (adjusted) using either a back propagation algorithm alone or in combination with a least squares type of method. This adjustment allows the fuzzy systems to learn from the data they are modeling  (Figure 1).
Proposed LDA-ANFIS early diagnosis system of lung cancer has mainly two stages: Feature Extraction and reduction using LDA and classification with ANFIS.
Assume that the fuzzy inference system has two inputs x and y and one output f. A first-order Surgeon fuzzy model has rules as the following:
If x is A1 and y is B1, then f1 = p1x + q1y + r1
If x is A2 and y is B2, then f2 = p2x + q2y + r2
Feature extraction - reduction stage
First, clinical data sets of 1000 lung cancer patients diagnosed in different hospitals were extracted. Each patient record contains more than 20 features. As the dataset contains a large number of feature variables, Linear Discriminant Analysis (LDA) was conducted to this dataset and features are reduced to only 9 features (Table 1).
|2||loss of weight||Yes,No|
|3||loss of appetite||Yes,No|
|4||Chest or rib pair||Yes,No|
Table 1: Reduced features of lung cancer dataset using LD
In this lung cancer database, there are two classes: out of risk and very high risk.
Classification with ANFIS
In the classification stage, two class features obtained in the feature extraction and feature reduction stages are given to inputs of Adaptive Network based On Fuzzy Inference System (ANFIS) classifier.
In the structure of the ANFIS classifier, both artificial neural network and fuzzy logic are used. ANFIS classifier is formed of if-then rules, couples of input-output and learning algorithms of neural network. These are used in the training of ANFIS classifier. In this experimental study, the ANFIS classifier has nine inputs (x1, x2, x3, x4, x5, x6, x7, x8, and x9) and one output (y). The parameters of the ANFIS classifier used in this study can be given as below
The number of layers: 5 Input: 9
The number of rules: 512 Output: 1
Input membership functions: Bell-shaped Training parameters learning rule: Hybrid Learning Algorithm
Sum-squared error: 0.0000001
Reaching Epochs number to sum-squared error: 2200
The results obtained using LDA-ANFIS
The correct diagnosis performance of LDA-ANFIS based lung cancer risk diagnosis system is estimated by using sensitivity and specificity analysis and classification accuracy performance evaluation methods, respectively.
In this experimental study, 800 samples were used for training and another 200 samples for testing the system. Figure 2 shows the confusion matrix of output class and target class. Here, '0' represents the class 'out of risk' and '1' represents the class 'high cancer risk'.
From this confusion matrix, sensitivity, specificity and accuracy can be measured. Here, TP: True Positive; TN: True Negative; FP: False Positive; FN: False Negative.
Sensitivity and specificity analysis
Sensitivity (also called the true positive rate, or the recall in some fields) measures the proportion of positives that are correctly identified.
Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified.
The obtained values of sensitivity and specificity of proposed LDA-ANFIS based early lung cancer risk diagnosis system are given in Table 2.
|PCA-AIRS(Polat and Gunes, 2006)||100||94.44|
|LDA-ANFIS(Esin, Akif and Derya, 2009)||96.66||91.66|
Table 2: Values of sensitivity and specificity.
Classification accuracy analysis
Accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's true value.
The classification accuracy of proposed LDA-ANFIS based early lung cancer risk diagnosis system is given in Table 3. From the Table 3 above, it is clear that proposed system has higher accuracy than other methods [17,18].
|Used method||Authors Name||Classi cation Accuracy(%)|
|PCA-ANFIS||Polat and Gunes||89.47|
|PCA-AIRS||Polat and Gunes||94.12|
|LDA-ANFIS||Esin, Akif and Derya||94.16|
|Proposed LDA-ANFIS||Used in this study||95.4|
Table 3: Classification Accuracy of different models
In this study, an early diagnosis system of lung cancer risk prediction based on Adaptive Neuro Fuzzy Inference System (ANFIS) and Linear Discriminant Analysis (LDA) is proposed. Performance evaluation using sensitivity and specificity analysis and classification accuracy shows that, proposed LDAANFIS based automatic diagnosis system for lung cancer obtains very promising results in classifying the possible lung cancer patients. Using this system physicians can take rapid decision on lung cancer risk prediction. Moreover, patients can also use proposed diagnosis system without any prior medical or clinical knowledge about lung cancer. Diagnostic clinics can also use this model. This model can be implemented in computer software and mobile apps for increasing its availability and usability.
Though proposed model overcomes the problem of previous models and out-performs them, the sensitivity and specificity for proposed model is less than optimal. To solve this problem in future, PCA-ANFIS based system will be proposed for increasing accuracy in predicting lung cancer risk.