Wersja elektroniczna artykułu

Transkrypt

ELEKTRYKA
Zeszyt 1 (217)
2011
Rok LVII
Marek WŁODARSKI
West Pomeranian University of Technology, Szczecin
APPLICATION OF ROC ANALYSIS TO BAYESIAN CLASSIFIERS
OF PERG SIGNALS
Summary. This paper presents the use of ROC analysis for the assessment of the
classifiers’ performance. Either linear or quadratic discriminant analysis assigns objects
to classes on the basis of parametric model. Fitting decision boundary according to
nonparametric ROC curve allows to achieve demanded criteria like maximum accuracy,
minimum risk or Neyman-Pearson’s criterion. This method was applied to measure the
quality of bayesian classifiers of PERG signal real data base.
Keywords: pattern recognition, ROC analysis, electroretinogram
ZASTOSOWANIE ANALIZY ROC W KLASYFIKATORACH
BAYESOWSKICH SYGNAŁÓW PERG
Streszczenie. Artykuł przedstawia wykorzystanie analizy ROC do oceny działania
klasyfikatorów. Zarówno liniowa, jak i kwadratowa analiza dyskryminacyjna przyporządkowuje obiekty do klas na podstawie modelu parametrycznego. Dopasowanie granicy
decyzyjnej zgodnie z nieparametryczną krzywą ROC pozwala osiągnąć pożądane
kryterium: maksymalną skuteczność, minimalne ryzyko lub kryterium NeymanaPearsona. Metodę tę zastosowano do oceny jakości klasyfikatorów bayesowskich
rzeczywistej bazy danych sygnałów PERG.
Słowa kluczowe: rozpoznawanie wzorców, analiza ROC, elektroretinogram
1. INTRODUCTION
In classification problems human expert can be effectively supported by automatic expert
system when a large amount of data (both objects and variables) have to be considered. The
design of the pattern recognition system involves the comparison of various variants obtained
from different decision algorithms and different feature vectors. ROC analysis is the method
of the evaluation of classifier performance that provides wider knowledge about their quality
than simple comparison of achieved accuracy. The method is presented for the real data
problem of electrophysiological signals judgement in medical diagnosis.
112
M. Włodarski
2. PATTERN ELECTRORETINOGRAM CLASSIFICATION
The electrophysiological tests assess the state of human organs according to their
electrical activity. The decision is based upon the existence of the match with correct pattern.
Because of natural variability of the population the statistical model is needed. The difficulty
is to determine the boundary that separates normal objects from abnormal ones.
2.1. PERG signal
Human eye examination includes the localisation of the dysfunction [7, 10]. Identification
of retinal diseases is accomplished by recording bioelectrical responses to visual stimulus,
named electroretinograms (ERGs). The conventional flash ERG does not provide information
about damages of inner retina layers that cause diseases like glaucoma. Therefore, the
temporally modulated pattern (black and white checkerboard with alternating contrast
reversals) of constant mean luminance is used to obtain pattern electroretinogram (PERG).
The technical details of basic PERG examinations have been established in the document of
the International Society for Clinical Electrophysiology of Vision (ISCEV) [8].
Depending on the frequency of stimulation the PERG responses form either transient
PERG (about 4 reversals/second) or steady-state PERG (16 rev/s). This paper considers the
first one which is noted as basic registration [8]. The typical transient PERG waveform (Fig.
1) consists of three waves located in first 150 milliseconds of recording:
 small initial negative component N35 at approximately 35 ms,
 large positive component P50 at 45-60 ms,
 large negative component N95 at 90-100 ms.
Fig. 1. Transient PERG waveform and its 5 basic parameters
Rys. 1. Przebieg nieustalony PERG i jego 5 parametrów
Application of ROC analysis…
113
The transient PERG is described in time domain by localisation of the extremes of components. The peak times (denoted subsequently A, B, C) represent the delay of the response
and peak-to-trough amplitudes (D, E) measure its strength.
2.2. PERG data base
The set of 184 transient PERGs used in the research has been recorded in the Chair and
Clinic of Ophthalmology at the Pomeranian Medical University in Szczecin. The data base
contains 104 recordings from normal (“healthy”) eyes and 80 from abnormal (“diseased”)
ones. Knowing proper medical diagnosis the supervised machine learning techniques can be
applied and verified.
The purpose is to determine whether the whole PERG response should be recognised as
negative (normal) or as positive (abnormal). Therefore, the decision rule is dichotomous:
object has to be either negative (class  0 ) or positive (class 1 ). The two-class approach is
also used in studies that concern detection of a particular disease [1, 2, 9].
For the analysed data base some conclusions have been already stated in previous articles
[13, 14]. First of all, the peak-to-trough amplitudes are much more significant than time
delays. The peak times have only subsidiary meaning and for some classifiers they slightly
improve classification. There has been proved the usefulness of additional amplitude-based
parameters: the product of amplitudes and their ratio. Bayesian classifiers LDA and QDA
provide better PERG assessment than a medical procedure based upon the normal limits [11].
3. BAYESIAN CLASSIFIER DECISION BOUNDARIES
Bayesian classifier’s decision is based on a posteriori probability P  k x  which is the
probability that object described by feature vector x belongs to class  k :
P k x  
P k   p x  k 
p x 
,
(1)
where P k  is a priori probability, px  k  is class-conditional density function, p x  is
class-independent density function. Maximum a posteriori (MAP) rule assigns the object to
class  m , for which:
 P m   p x  m   P k   p x  k  .
k m
(2)
114
M. Włodarski
The rule (2) assumes equal benefits of all correct decisions and equal losses of all
incorrect decisions. Various effects are considered by the introduction of a loss function


L  k ,  j  L kj that defines the cost of assigning an object to class  j when it really belongs
to class  k . Hence, the risk associated with decision  j is the expected value of cost
calculated as the average of possible costs weighted with posterior probabilities:
R j x    Lkj P k x  .
(3)
k
Bayesian rule chooses the decision that brings the smallest risk among all
 Lkm  P k   p x  k   Lkj  P k   p x  k  .
jm
k
(4)
k
3.1. Two-category classification
In the two-class problem, MAP rule and bayesian rules are expressed, respectively, by:
px 1 
P  0 
.
P 1 
(5)
P 0  L01  L00
.

P 1  L10  L11
(6)
p x  0 
p x 1 
p x  0 


Bayesian rule (6) becomes MAP rule when costs of correct decisions are zeros L00  L11  0
and costs of incorrect decisions are equal L01  L10 . If also priors are equal P 0   P1 
the MAP rule becomes ML (maximum likelihood) rule.
Left side of inequalities (5) and (6) is a function of vector x evaluated from the probabilistic model of class-conditional density for each object. Right side is the threshold t constant
for each x. Threshold value depends on prior probabilities estimation and assumed costs.
3.2. Linear and quadratic discriminant analysis
The Quadratic Discriminant Analysis (QDA) assumes that for each class  k feature
vector x is described by multivariate gaussian density function with mean vector μ k and
covariance matrix Σ k :
p x  k  
115
1
 1

exp   x  μ k T Σ k1 x  μ k  .

2 M 2 Σ k 1 2  2
(7)
When covariance matrices in all classes are equal the Linear Discriminant Analysis (LDA) is
obtained.
Fig. 2 shows the decision boundaries in the feature space {D, E} from three thresholds:
 t ML  1 for maximum likelihood rule,
 t MAP  P  0  P 1  for MAP rule that replaces priors with fractions of classes in the
current data base ratio,
 t Bayes  P 0  P1   L01 L10 bayesian minimal risk rule for costs L00=L11=0 and
L01/L10=1/3 with the same priors as previously.
Fig. 2. LDA and QDA decision boundaries: ML rule (centre), MAP rule (left), Bayesian rule (right)
Rys. 2. Granice decyzyjne LDA i QDA: reguła ML (środek), reguła MAP (lewa strona), reguła
bayesowska (prawa strona).
In practice such threshold selection may be argued as being uncertain. First, the fraction
of categories in data set must not represent the occurrence of classes within entire population.
Knowing true prior probabilities surely increases general accuracy but it may cause other
troubles. Let us assume that among examined people the number of “healthy” eyes is much
greater than “diseased” ones P0  P1   1 . The raising of threshold value will reduce the
detection of incorrect signals that are more important from the medical point of view. The rule
may be modified by the quotient of costs. However, setting of the loss function is much more
arbitrary when the costs are not actual values.
116
M. Włodarski
4. ROC CURVE
The left side of inequality (6) is the scalar function of the feature vector x, while the right
side is threshold value t independent of x, that can be expressed as:
y x   t
(8)
Inequality (8) states the problem of influence of the threshold value t on the object assignment
that is considered in the ROC analysis [3, 4, 5, 6].
In the two-class problem the test set is divided into four exclusive groups: true negatives,
false positives, false negatives and true positives. The Table 1 shows the measures of
classifier performance defined by the numbers of objects in these groups (TN, FP, FN, TP).
Table 1
Measures of performance of the two-category classifier
Actual class  0
Actual class 1
Predicted class  0
Predicted class 1
TN
 tnr  spec
TN  FP
FN
 fnr  1  sens
FN  TP
FP
 fpr  1  spec
TN  FP
TP
 tpr  sens
FN  TP
ROC curve presents measures tpr and fpr for all possible divisions of data set with single
inequality. ROC curve depicts the trade-off between sensitivity and specificity with the
threshold increasing from the rule that assigns all objects to positive class till the rule that
assigns all objects to negative class (fig. 3a). When data cannot be separated homogeneously
the ideal classification (sens=1, spec=1) is unattainable.
Fig. 3. ROC curve: (a) characteristic points, iso-accuracy lines obtained with different quotients of
prior probabilities (b) P1  P 0   1 3 , (c) P1  P 0   3
Rys. 3. Krzywa ROC: (a) punkty charakterystyczne, linie dokładności iso otrzymane z różnymi
ilorazami prawdopodobieństwa (b) P1  P  0   1 3 , (c) P1  P 0   3
117
The criteria of classifier selection are plotted with isometric lines [5], for example:
 iso-accuracy lines acc  P0   1  fpr   P1   tpr ,
 iso-risk lines r  L00 P 0 1  fpr   L01 P 0  fpr  L10 P1 1  tpr   L11 P1 tpr .
 line of minimal sensitivity (or specificity) for Neyman-Pearson criterion.
Plots 3b and 3c show the influence of prior probabilities on the slope of iso-accuracy lines.
When the negative examples dominate the learning set the maximum accuracy rule tends to
be conservative.
5. ROC-BASED CONSTRUCTION OF DECISION BOUNDARIES
ROC curve can be obtained by two approaches: parametric and non-parametric. The first
one operates on the theoretical model of class-conditional density function [3, 12]. The second
technique is model-free and directly classifies the test set [4, 12]. Because of finite number of
objects there is also a finite number of rules and so the ROC curve is piecewise linear. The
sorting capability of a classifier is measured by the value of area under curve (AUC) [6].
Fig. 4 shows non-parametric ROC curves evaluated for the output (likelihood ratio) of
LDA and QDA classifiers. There are marked decision rules obtained from three criteria:

bayesian MAP rule with prior probabilities estimated from the learning set,

maximum accuracy in the test set,

maximum accuracy in the test set among the rules with sensitivity not lower than 90%.
Fig. 4. ROC curves for LDA and QDA classifiers with calculated AUC value
Rys. 4. Krzywe ROC dla klasyfikatorów LDA i QDA z obliczoną wartością AUC
118
M. Włodarski
Table 2
Results of ROC-based correction of LDA
and QDA classifier for the vector [D,E]T
Reguła
spec
MAP
83.7
max acc 92.3
sens0.9 60.6
LDA
sens
80.0
77.5
90.0
acc
82.1
85.9
73.4
spec
88.5
94.2
79.8
QDA
sens
81.2
80.0
90.0
acc
85.3
88.0
84.2
The measures of performance for these classifiers are presented in Table 2. For the differences
between the gaussian model and the real scatter of objects the MAP rule does not provide the
maximum accuracy in the given data set. This goal is obtained with more conservative rule.
To assure that at least 9 of 10 incorrect PERG responses are recognized the last criterion is
applied. It provides 90-percentage sensitivity at the expense of the worsening of specificity.
In Fig. 5 the decision boundaries (obtained by previous rules) are plotted on the E-D plane.
Fig. 5. Decision boundaries of LDA and QDA classifiers for the rules indicated from ROC analysis
Rys. 5. Granice decyzyjne klasyfikatorów LDA i QDA dla reguł wskazanych z analizy ROC
6. CONCLUSIONS
For the given feature vector x=[D, E]T there are obtained decision rules satisfying two
different criteria. The comparison of LDA and QDA classifiers has been made according the
best achieved accuracy and also the general ability of class separation measured by AUC.
ROC analysis has been applied to compare 100 variants of PERG signal bayesian classifiers
[11]. For the considered PERG data base the highest accuracy has achieved about 90%. With
the requirement of 90-percentage sensitivity it has been possible to obtain the overall
effectiveness of 87.5%. ROC analysis may be applied to assessment of other decision
algorithms that produce a scoring output (i.e. result at least in ordinal scale).
119
BIBLIOGRAPHY
1.
Bach M., Speidel-Fiaux A.: Pattern Electroretinogram in glaucoma and ocular hypertension.
“Documenta Ophtalmologica” 1989, Vol. 73, p. 173–181.
2.
Bayer A.U., Maag K.-P., Erb C.: Detection of Optic Neuropathy in Glaucomatous Eyes with
Normal Standard Visual Fields Using a Test Battery of Short-wavelength Automated Perimetry
3.
and Pattern Electrorenography. “Ophthalmology” 2002, Vol. 109, No. 7, p. 1350-1361.
Brown C.D., Davis H.T.: Receiver operating characteristic curves and related decision measures:
4.
A tutorial. “Chemometrics and Intelligent Laboratory Systems” 2006, Vol. 80, p. 24-38.
Fawcett T.: An introduction to ROC analysis. “Pattern Recognition Letters” 2006, Vol. 27,
p. 861-874.
5.
Fürnkranz J., Flach P.A.: ROC’n’Rule Learning – Towards a Better Understanding of Covering
Algorithms. “Machine Learning” 2005, Vol. 58, p. 39-77.
6.
Hanley J.A., McNeil B.J.: The meaning and use of the area under a receiver operating
characteristic (ROC) Curve. “Radiology” 1982, Vol. 143, p. 29-36.
7.
Holder G.E.: Pattern Electroretinography (PERG) and an Integrated Approach to Visual Pathway
Diagnosis. “Progress in Retinal and Eye Research” 2001, Vol. 20, No. 4, p. 531-561.
8.
Holder G.E., Brigell M.G., Hawlina M., Meigen T., Vaegan, Bach M.: ISCEV standard for
clinical pattern electroretinography – 2007 update. “Documenta Ophthalmologica” 2007,
9.
No. 114, p. 111-116.
Kara S., Guven A.: Training a learning vector quantization network using the pattern
electroretinography signals. “Computers in Biology and Medicine” 2007, Vol. 37, p. 77-82.
10. Palacz O., Lubiński W., Penkala K.: Elektrofizjologiczna diagnostyka kliniczna układu
wzrokowego. OFTAL, Warszawa 2003.
11. Włodarski M.: Wykorzystanie statystycznych klasyfikatorów wzorców do analizy i separacji
cyfrowych zapisów elektrofizjologicznych na przykładzie elektroretinogramów PERG. Rozprawa
doktorska, Zachodniopomorski Uniwersytet Technologiczny w Szczecinie, Wydział Elektryczny,
Szczecin 2010.
12. Włodarski M.: Porównanie parametrycznej i nieparametrycznej metody obliczania krzywej ROC
na przykładzie zbioru sygnałów elektroretinograficznych. „Metody Informatyki Stosowanej”
2009, Nr 2, s. 177-192.
13. Włodarski M., Brykalski A.: Comparison of bayesian classifiers of pattern electroretinogram
based on PERG waveform attributes and coefficients of wavelet compression. Międzynarodowa
Konferencja z Podstaw Elektrotechniki i Teorii Obwodów, XXX IC-SPETO’2007 GliwiceUstroń 23-26 maja 2007, s. 141-142.
14. Włodarski M., Brykalski A.: Klasyfikacja sygnałów elektroretinograficznych z wykorzystaniem
kwadratowych funkcji dyskryminacyjnych. Międzynarodowa Konferencja z Podstaw
Elektrotechniki i Teorii Obwodów, XXIX IC-SPETO’2006 Gliwice-Ustroń 24-27 maja 2006,
s. 517-520.
120
M. Włodarski
Omówienie
W artykule przedstawiono zastosowanie analizy ROC w projektowaniu systemów
rozpoznawania wzorców na przykładzie klasyfikacji jednowymiarowych sygnałów elektroretinograficznych. Rezultat testu medycznego jest interpretowany dwuklasowo jako ujemny
(negatywny) lub dodatni (pozytywny). W badaniach elektrofizjologicznych ocenie podlega
aktywność elektryczna organizmu zapisana jako sygnał lub obraz cyfrowy. Ze względu na
znaczną liczbę danych przy tworzeniu reguł decyzyjnych ekspert może być efektywnie
wspomagany przez system automatycznego rozpoznawania wzorców. Analiza ROC pozwala
ocenić i porównać różne warianty algorytmu decyzyjnego.
Przykładem klasyfikowanych obiektów jest zbiór zapisów elektroretinograficznych
wywołanych wzorcem (ang. pattern electroretinogram – PERG). Sygnał PERG jest
rejestracją odpowiedzi elektrycznej siatkówki na specyficzne pobudzenie świetlne w postaci
czarno-białej szachownicy z naprzemiennie zmieniającymi się polami. Nieprawidłowy zapis
PERG może świadczyć o wystąpieniu schorzeń głębszych warstw siatkówki i nerwu
wzrokowego, takich jak jaskra. Baza danych zawiera 80 rejestracji, pochodzących od oczu ze
schorzeniami, oraz 104 rejestracje, pochodzące od oczu bez schorzeń. Zgodnie z zaleceniami
organizacji ISCEV sygnał PERG jest opisywany za pomocą położenia ekstremów.
Ze względu na naturalne zróżnicowanie badanej populacji jest uzasadnione stosowanie
klasyfikatorów statystycznych modelujących rozkład wektora cech w klasach. Wykorzystano
klasyczny binormalny model parametryczny stosowany w liniowej i kwadratowej analizie
dyskryminacyjnej (LDA i QDA).
Analiza ROC zostaje zastosowana do ciągłej zmiennej wyjściowej (ilorazu wiarygodności) tworzonej przez klasyfikator probabilistyczny. Granice decyzyjne pochodzące
z modelu parametrycznego mogą zostać skorygowane na podstawie dostępnych danych. Na
nieparametrycznej krzywej ROC można wskazać reguły decyzyjne, spełniające wymagania
projektanta, jak przykładowo minimalizacja błędu, minimalizacja ryzyka, kryterium
Neymana-Pearsona. Dla rozpatrywanej bazy zapisów PERG stwierdzono, że skorygowanymi
metodami LDA i QDA można osiągnąć skuteczność klasyfikacji na poziomie 90%. Natomiast
przy wymaganiu 90-czułości (trafności wykrywania zapisów ze schorzeniami) ogólna
skuteczność wynosi najwyżej 87.5% dla tego typu granic decyzyjnych. Metoda może zostać
zastosowana dla innych algorytmów decyzyjnych tworzących zmienną wyjściową w skali
porządkowej.

Wersja elektroniczna artykułu

Transkrypt

Podobne dokumenty

Taiwan Fellowship

Cyprus - World Outreach Church

multiple classifier error probability for multi

as PDF

STUDIA NIESTACJONARNE SEM. DYPLOMOWE Dr Grzegorz