Wersja elektroniczna artykułu
Transkrypt
Wersja elektroniczna artykułu
ELEKTRYKA Zeszyt 1 (217) 2011 Rok LVII Marek WŁODARSKI West Pomeranian University of Technology, Szczecin APPLICATION OF ROC ANALYSIS TO BAYESIAN CLASSIFIERS OF PERG SIGNALS Summary. This paper presents the use of ROC analysis for the assessment of the classifiers’ performance. Either linear or quadratic discriminant analysis assigns objects to classes on the basis of parametric model. Fitting decision boundary according to nonparametric ROC curve allows to achieve demanded criteria like maximum accuracy, minimum risk or Neyman-Pearson’s criterion. This method was applied to measure the quality of bayesian classifiers of PERG signal real data base. Keywords: pattern recognition, ROC analysis, electroretinogram ZASTOSOWANIE ANALIZY ROC W KLASYFIKATORACH BAYESOWSKICH SYGNAŁÓW PERG Streszczenie. Artykuł przedstawia wykorzystanie analizy ROC do oceny działania klasyfikatorów. Zarówno liniowa, jak i kwadratowa analiza dyskryminacyjna przyporządkowuje obiekty do klas na podstawie modelu parametrycznego. Dopasowanie granicy decyzyjnej zgodnie z nieparametryczną krzywą ROC pozwala osiągnąć pożądane kryterium: maksymalną skuteczność, minimalne ryzyko lub kryterium NeymanaPearsona. Metodę tę zastosowano do oceny jakości klasyfikatorów bayesowskich rzeczywistej bazy danych sygnałów PERG. Słowa kluczowe: rozpoznawanie wzorców, analiza ROC, elektroretinogram 1. INTRODUCTION In classification problems human expert can be effectively supported by automatic expert system when a large amount of data (both objects and variables) have to be considered. The design of the pattern recognition system involves the comparison of various variants obtained from different decision algorithms and different feature vectors. ROC analysis is the method of the evaluation of classifier performance that provides wider knowledge about their quality than simple comparison of achieved accuracy. The method is presented for the real data problem of electrophysiological signals judgement in medical diagnosis. 112 M. Włodarski 2. PATTERN ELECTRORETINOGRAM CLASSIFICATION The electrophysiological tests assess the state of human organs according to their electrical activity. The decision is based upon the existence of the match with correct pattern. Because of natural variability of the population the statistical model is needed. The difficulty is to determine the boundary that separates normal objects from abnormal ones. 2.1. PERG signal Human eye examination includes the localisation of the dysfunction [7, 10]. Identification of retinal diseases is accomplished by recording bioelectrical responses to visual stimulus, named electroretinograms (ERGs). The conventional flash ERG does not provide information about damages of inner retina layers that cause diseases like glaucoma. Therefore, the temporally modulated pattern (black and white checkerboard with alternating contrast reversals) of constant mean luminance is used to obtain pattern electroretinogram (PERG). The technical details of basic PERG examinations have been established in the document of the International Society for Clinical Electrophysiology of Vision (ISCEV) [8]. Depending on the frequency of stimulation the PERG responses form either transient PERG (about 4 reversals/second) or steady-state PERG (16 rev/s). This paper considers the first one which is noted as basic registration [8]. The typical transient PERG waveform (Fig. 1) consists of three waves located in first 150 milliseconds of recording: small initial negative component N35 at approximately 35 ms, large positive component P50 at 45-60 ms, large negative component N95 at 90-100 ms. Fig. 1. Transient PERG waveform and its 5 basic parameters Rys. 1. Przebieg nieustalony PERG i jego 5 parametrów Application of ROC analysis… 113 The transient PERG is described in time domain by localisation of the extremes of components. The peak times (denoted subsequently A, B, C) represent the delay of the response and peak-to-trough amplitudes (D, E) measure its strength. 2.2. PERG data base The set of 184 transient PERGs used in the research has been recorded in the Chair and Clinic of Ophthalmology at the Pomeranian Medical University in Szczecin. The data base contains 104 recordings from normal (“healthy”) eyes and 80 from abnormal (“diseased”) ones. Knowing proper medical diagnosis the supervised machine learning techniques can be applied and verified. The purpose is to determine whether the whole PERG response should be recognised as negative (normal) or as positive (abnormal). Therefore, the decision rule is dichotomous: object has to be either negative (class 0 ) or positive (class 1 ). The two-class approach is also used in studies that concern detection of a particular disease [1, 2, 9]. For the analysed data base some conclusions have been already stated in previous articles [13, 14]. First of all, the peak-to-trough amplitudes are much more significant than time delays. The peak times have only subsidiary meaning and for some classifiers they slightly improve classification. There has been proved the usefulness of additional amplitude-based parameters: the product of amplitudes and their ratio. Bayesian classifiers LDA and QDA provide better PERG assessment than a medical procedure based upon the normal limits [11]. 3. BAYESIAN CLASSIFIER DECISION BOUNDARIES Bayesian classifier’s decision is based on a posteriori probability P k x which is the probability that object described by feature vector x belongs to class k : P k x P k p x k p x , (1) where P k is a priori probability, px k is class-conditional density function, p x is class-independent density function. Maximum a posteriori (MAP) rule assigns the object to class m , for which: P m p x m P k p x k . k m (2) 114 M. Włodarski The rule (2) assumes equal benefits of all correct decisions and equal losses of all incorrect decisions. Various effects are considered by the introduction of a loss function L k , j L kj that defines the cost of assigning an object to class j when it really belongs to class k . Hence, the risk associated with decision j is the expected value of cost calculated as the average of possible costs weighted with posterior probabilities: R j x Lkj P k x . (3) k Bayesian rule chooses the decision that brings the smallest risk among all Lkm P k p x k Lkj P k p x k . jm k (4) k 3.1. Two-category classification In the two-class problem, MAP rule and bayesian rules are expressed, respectively, by: px 1 P 0 . P 1 (5) P 0 L01 L00 . P 1 L10 L11 (6) p x 0 p x 1 p x 0 Bayesian rule (6) becomes MAP rule when costs of correct decisions are zeros L00 L11 0 and costs of incorrect decisions are equal L01 L10 . If also priors are equal P 0 P1 the MAP rule becomes ML (maximum likelihood) rule. Left side of inequalities (5) and (6) is a function of vector x evaluated from the probabilistic model of class-conditional density for each object. Right side is the threshold t constant for each x. Threshold value depends on prior probabilities estimation and assumed costs. 3.2. Linear and quadratic discriminant analysis The Quadratic Discriminant Analysis (QDA) assumes that for each class k feature vector x is described by multivariate gaussian density function with mean vector μ k and covariance matrix Σ k : Application of ROC analysis… p x k 115 1 1 exp x μ k T Σ k1 x μ k . 2 M 2 Σ k 1 2 2 (7) When covariance matrices in all classes are equal the Linear Discriminant Analysis (LDA) is obtained. Fig. 2 shows the decision boundaries in the feature space {D, E} from three thresholds: t ML 1 for maximum likelihood rule, t MAP P 0 P 1 for MAP rule that replaces priors with fractions of classes in the current data base ratio, t Bayes P 0 P1 L01 L10 bayesian minimal risk rule for costs L00=L11=0 and L01/L10=1/3 with the same priors as previously. Fig. 2. LDA and QDA decision boundaries: ML rule (centre), MAP rule (left), Bayesian rule (right) Rys. 2. Granice decyzyjne LDA i QDA: reguła ML (środek), reguła MAP (lewa strona), reguła bayesowska (prawa strona). In practice such threshold selection may be argued as being uncertain. First, the fraction of categories in data set must not represent the occurrence of classes within entire population. Knowing true prior probabilities surely increases general accuracy but it may cause other troubles. Let us assume that among examined people the number of “healthy” eyes is much greater than “diseased” ones P0 P1 1 . The raising of threshold value will reduce the detection of incorrect signals that are more important from the medical point of view. The rule may be modified by the quotient of costs. However, setting of the loss function is much more arbitrary when the costs are not actual values. 116 M. Włodarski 4. ROC CURVE The left side of inequality (6) is the scalar function of the feature vector x, while the right side is threshold value t independent of x, that can be expressed as: y x t (8) Inequality (8) states the problem of influence of the threshold value t on the object assignment that is considered in the ROC analysis [3, 4, 5, 6]. In the two-class problem the test set is divided into four exclusive groups: true negatives, false positives, false negatives and true positives. The Table 1 shows the measures of classifier performance defined by the numbers of objects in these groups (TN, FP, FN, TP). Table 1 Measures of performance of the two-category classifier Actual class 0 Actual class 1 Predicted class 0 Predicted class 1 TN tnr spec TN FP FN fnr 1 sens FN TP FP fpr 1 spec TN FP TP tpr sens FN TP ROC curve presents measures tpr and fpr for all possible divisions of data set with single inequality. ROC curve depicts the trade-off between sensitivity and specificity with the threshold increasing from the rule that assigns all objects to positive class till the rule that assigns all objects to negative class (fig. 3a). When data cannot be separated homogeneously the ideal classification (sens=1, spec=1) is unattainable. Fig. 3. ROC curve: (a) characteristic points, iso-accuracy lines obtained with different quotients of prior probabilities (b) P1 P 0 1 3 , (c) P1 P 0 3 Rys. 3. Krzywa ROC: (a) punkty charakterystyczne, linie dokładności iso otrzymane z różnymi ilorazami prawdopodobieństwa (b) P1 P 0 1 3 , (c) P1 P 0 3 Application of ROC analysis… 117 The criteria of classifier selection are plotted with isometric lines [5], for example: iso-accuracy lines acc P0 1 fpr P1 tpr , iso-risk lines r L00 P 0 1 fpr L01 P 0 fpr L10 P1 1 tpr L11 P1 tpr . line of minimal sensitivity (or specificity) for Neyman-Pearson criterion. Plots 3b and 3c show the influence of prior probabilities on the slope of iso-accuracy lines. When the negative examples dominate the learning set the maximum accuracy rule tends to be conservative. 5. ROC-BASED CONSTRUCTION OF DECISION BOUNDARIES ROC curve can be obtained by two approaches: parametric and non-parametric. The first one operates on the theoretical model of class-conditional density function [3, 12]. The second technique is model-free and directly classifies the test set [4, 12]. Because of finite number of objects there is also a finite number of rules and so the ROC curve is piecewise linear. The sorting capability of a classifier is measured by the value of area under curve (AUC) [6]. Fig. 4 shows non-parametric ROC curves evaluated for the output (likelihood ratio) of LDA and QDA classifiers. There are marked decision rules obtained from three criteria: bayesian MAP rule with prior probabilities estimated from the learning set, maximum accuracy in the test set, maximum accuracy in the test set among the rules with sensitivity not lower than 90%. Fig. 4. ROC curves for LDA and QDA classifiers with calculated AUC value Rys. 4. Krzywe ROC dla klasyfikatorów LDA i QDA z obliczoną wartością AUC 118 M. Włodarski Table 2 Results of ROC-based correction of LDA and QDA classifier for the vector [D,E]T Reguła spec MAP 83.7 max acc 92.3 sens0.9 60.6 LDA sens 80.0 77.5 90.0 acc 82.1 85.9 73.4 spec 88.5 94.2 79.8 QDA sens 81.2 80.0 90.0 acc 85.3 88.0 84.2 The measures of performance for these classifiers are presented in Table 2. For the differences between the gaussian model and the real scatter of objects the MAP rule does not provide the maximum accuracy in the given data set. This goal is obtained with more conservative rule. To assure that at least 9 of 10 incorrect PERG responses are recognized the last criterion is applied. It provides 90-percentage sensitivity at the expense of the worsening of specificity. In Fig. 5 the decision boundaries (obtained by previous rules) are plotted on the E-D plane. Fig. 5. Decision boundaries of LDA and QDA classifiers for the rules indicated from ROC analysis Rys. 5. Granice decyzyjne klasyfikatorów LDA i QDA dla reguł wskazanych z analizy ROC 6. CONCLUSIONS For the given feature vector x=[D, E]T there are obtained decision rules satisfying two different criteria. The comparison of LDA and QDA classifiers has been made according the best achieved accuracy and also the general ability of class separation measured by AUC. ROC analysis has been applied to compare 100 variants of PERG signal bayesian classifiers [11]. For the considered PERG data base the highest accuracy has achieved about 90%. With the requirement of 90-percentage sensitivity it has been possible to obtain the overall effectiveness of 87.5%. ROC analysis may be applied to assessment of other decision algorithms that produce a scoring output (i.e. result at least in ordinal scale). Application of ROC analysis… 119 BIBLIOGRAPHY 1. Bach M., Speidel-Fiaux A.: Pattern Electroretinogram in glaucoma and ocular hypertension. “Documenta Ophtalmologica” 1989, Vol. 73, p. 173–181. 2. Bayer A.U., Maag K.-P., Erb C.: Detection of Optic Neuropathy in Glaucomatous Eyes with Normal Standard Visual Fields Using a Test Battery of Short-wavelength Automated Perimetry 3. and Pattern Electrorenography. “Ophthalmology” 2002, Vol. 109, No. 7, p. 1350-1361. Brown C.D., Davis H.T.: Receiver operating characteristic curves and related decision measures: 4. A tutorial. “Chemometrics and Intelligent Laboratory Systems” 2006, Vol. 80, p. 24-38. Fawcett T.: An introduction to ROC analysis. “Pattern Recognition Letters” 2006, Vol. 27, p. 861-874. 5. Fürnkranz J., Flach P.A.: ROC’n’Rule Learning – Towards a Better Understanding of Covering Algorithms. “Machine Learning” 2005, Vol. 58, p. 39-77. 6. Hanley J.A., McNeil B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) Curve. “Radiology” 1982, Vol. 143, p. 29-36. 7. Holder G.E.: Pattern Electroretinography (PERG) and an Integrated Approach to Visual Pathway Diagnosis. “Progress in Retinal and Eye Research” 2001, Vol. 20, No. 4, p. 531-561. 8. Holder G.E., Brigell M.G., Hawlina M., Meigen T., Vaegan, Bach M.: ISCEV standard for clinical pattern electroretinography – 2007 update. “Documenta Ophthalmologica” 2007, 9. No. 114, p. 111-116. Kara S., Guven A.: Training a learning vector quantization network using the pattern electroretinography signals. “Computers in Biology and Medicine” 2007, Vol. 37, p. 77-82. 10. Palacz O., Lubiński W., Penkala K.: Elektrofizjologiczna diagnostyka kliniczna układu wzrokowego. OFTAL, Warszawa 2003. 11. Włodarski M.: Wykorzystanie statystycznych klasyfikatorów wzorców do analizy i separacji cyfrowych zapisów elektrofizjologicznych na przykładzie elektroretinogramów PERG. Rozprawa doktorska, Zachodniopomorski Uniwersytet Technologiczny w Szczecinie, Wydział Elektryczny, Szczecin 2010. 12. Włodarski M.: Porównanie parametrycznej i nieparametrycznej metody obliczania krzywej ROC na przykładzie zbioru sygnałów elektroretinograficznych. „Metody Informatyki Stosowanej” 2009, Nr 2, s. 177-192. 13. Włodarski M., Brykalski A.: Comparison of bayesian classifiers of pattern electroretinogram based on PERG waveform attributes and coefficients of wavelet compression. Międzynarodowa Konferencja z Podstaw Elektrotechniki i Teorii Obwodów, XXX IC-SPETO’2007 GliwiceUstroń 23-26 maja 2007, s. 141-142. 14. Włodarski M., Brykalski A.: Klasyfikacja sygnałów elektroretinograficznych z wykorzystaniem kwadratowych funkcji dyskryminacyjnych. Międzynarodowa Konferencja z Podstaw Elektrotechniki i Teorii Obwodów, XXIX IC-SPETO’2006 Gliwice-Ustroń 24-27 maja 2006, s. 517-520. 120 M. Włodarski Omówienie W artykule przedstawiono zastosowanie analizy ROC w projektowaniu systemów rozpoznawania wzorców na przykładzie klasyfikacji jednowymiarowych sygnałów elektroretinograficznych. Rezultat testu medycznego jest interpretowany dwuklasowo jako ujemny (negatywny) lub dodatni (pozytywny). W badaniach elektrofizjologicznych ocenie podlega aktywność elektryczna organizmu zapisana jako sygnał lub obraz cyfrowy. Ze względu na znaczną liczbę danych przy tworzeniu reguł decyzyjnych ekspert może być efektywnie wspomagany przez system automatycznego rozpoznawania wzorców. Analiza ROC pozwala ocenić i porównać różne warianty algorytmu decyzyjnego. Przykładem klasyfikowanych obiektów jest zbiór zapisów elektroretinograficznych wywołanych wzorcem (ang. pattern electroretinogram – PERG). Sygnał PERG jest rejestracją odpowiedzi elektrycznej siatkówki na specyficzne pobudzenie świetlne w postaci czarno-białej szachownicy z naprzemiennie zmieniającymi się polami. Nieprawidłowy zapis PERG może świadczyć o wystąpieniu schorzeń głębszych warstw siatkówki i nerwu wzrokowego, takich jak jaskra. Baza danych zawiera 80 rejestracji, pochodzących od oczu ze schorzeniami, oraz 104 rejestracje, pochodzące od oczu bez schorzeń. Zgodnie z zaleceniami organizacji ISCEV sygnał PERG jest opisywany za pomocą położenia ekstremów. Ze względu na naturalne zróżnicowanie badanej populacji jest uzasadnione stosowanie klasyfikatorów statystycznych modelujących rozkład wektora cech w klasach. Wykorzystano klasyczny binormalny model parametryczny stosowany w liniowej i kwadratowej analizie dyskryminacyjnej (LDA i QDA). Analiza ROC zostaje zastosowana do ciągłej zmiennej wyjściowej (ilorazu wiarygodności) tworzonej przez klasyfikator probabilistyczny. Granice decyzyjne pochodzące z modelu parametrycznego mogą zostać skorygowane na podstawie dostępnych danych. Na nieparametrycznej krzywej ROC można wskazać reguły decyzyjne, spełniające wymagania projektanta, jak przykładowo minimalizacja błędu, minimalizacja ryzyka, kryterium Neymana-Pearsona. Dla rozpatrywanej bazy zapisów PERG stwierdzono, że skorygowanymi metodami LDA i QDA można osiągnąć skuteczność klasyfikacji na poziomie 90%. Natomiast przy wymaganiu 90-czułości (trafności wykrywania zapisów ze schorzeniami) ogólna skuteczność wynosi najwyżej 87.5% dla tego typu granic decyzyjnych. Metoda może zostać zastosowana dla innych algorytmów decyzyjnych tworzących zmienną wyjściową w skali porządkowej.