A Concise Course in Statistical Inference Brief Contents

Transkrypt

All of Statistics: A Concise Course in
Statistical Inference
Brief Contents
1. Introduction…………………………………………………………11
Part I Probability
2. Probability…………………………………………………………...21
3. Random Variables…………………………….…………………….37
4. Expectation…………………………………………………………..69
5. Equalities…………………………………………………………….85
6. Convergence of Random Variables………………………………...89
Part II Statistical Inference
7. Models, Statistical Inference and Learning………………………105
8. Estimating the CDF and Statistical Functionals…………………117
9. The Bootstrap………………………………………………………129
10. Parametric Inference……………………………………………..145
11. Hypothesis Testing and p-values…………………………………179
12. Bayesian Inference………………………………………………..205
13. Statistical Decision Theory……………………………………….227
Part III Statistical Models and Methods
14. Linear Regression………………………………………………...245
15. Multivariate Models……………………………………………...269
16. Inference about Independence…………………………………..279
17. Undirected Graphs and Conditional Independence……………297
18. Loglinear Models…………………………………………………309
19. Causal Inference………………………………………………….327
20. Directed Graphs………………………………………………….343
21. Nonparametric curve Estimation……………………………….359
22. Smoothing Using Orthogonal Functions..………………………393
23. Classification……………………………………………………..425
24. Stochastic Processes………………………………………………473
25. Simulation Methods………………………………………………505
Appendix Fundamental Concepts in Inference
!#"%$&%')(*,+.-0/12!34+5-6(*879:<;2+.=,>"?/9@#"%=A/B"%CD3E!9:%FG=+5G7,9:!/B"H/+.$5+5(JIK"%G=
-@(*"H(*+5-@(*+5CL-M'N!9O-P(*F=,LG#(*-+.G-P(D"H(:+.-@(:+.CQ-LRSCL!T7F,(:L9U-:CQ+.QGCLVWQ-:7XLCQ+Y"%$5$5I=B"H(D"T+.G[Z
+.G>"%G=8T>"%CD+.G,\$5]"%9@G+.G,E^_RBT>"H(*,LT>"H(*+5CL-LR"%G=K9@L$."H(*Q=`=+5-:CQ+.7$5+.GQ-La)+.-b/X[!3
CL<;!Q9:-K"cTdFCDfeb+5=L9?9:"%G!%'\(*%7+.CQ-?(:B"%Gf"(JI[7,+.C]"H$6+.G#(*9@[=,FCQ(:!9@I(:_g2(8!G
Th"i(*QTh"H(:+.CL"%$j-P(D"H(:+.-@(:+.CQ-La)kl(m+5GCL$5F=Q-nT2=L9@G?(:!7+5CL-b$.+53%0G!G7"%9*"%TQ(:9:+5C6CLF9P;!
L-P(*+.T>"H(:+.!GoRi/X[%(:-@(:9*"%77,+.Gm"%G=MCQ$Y"%-@-:+qpBC]"H(:+.!GjRi(:!7+.CQ-(:B"H(&"H9:rF,-:FB"%$5$5I09@L$.Q#"H(:L=
(*0'N!$.$5<esZJF7OCQ!F9:-@L-Qa),m9@]"%=Q9S+.-)"%-:-@FTL=>(:t3EG<euCL"%$.CQF$.F,-S"%G=v"0$.+q(:(:$.r$5+.G[Z
]"%9n"%$5!L/,9*",a&wnU79:_;[+5!F-b3EG<eb$.Q=!\%'x7,9:!/B"H/+.$5+5(JIh"%G=?-@(*"H(*+5-@(*+5CL-+5-9:LyEF+59:Q=oa
0(*_g2(n+.--@F+5(*"%/$56'N!9b"%=,;H"%GCQL=8FG=Q9:!9:"%=FB"i(*L-m"%G=8!9:"%=FB"H(:z-@(:F=LG#(:-La
{j|<}[|~_|~# RE },|]}4~W~W "%G= }Bi~N2},H~W "H9:n"%$.$BCQ!GCQL9:G,L=heb+q(*
CL!$5$.QCQ(*+5G4"HG="%GB"H$5I2L+5G4=B"H(*",aB!9t-:!T>(:+.T%Rx-P(D"H(:+.-P(*+.CQ-d9@L-:L"%9:CDcer"%-MCL%G[Z
=FC_(*L=+5Gh-P(D"H(:+.-@(:+.CQ-r=Q7B"%9@(:T>QG#(*-reb+5$.n=B"H(*"tT+.G+5Gd"HG=vT>"%CD+.G,m$5]"%9@G+.G\9:Z
-:L"%9:CD\es"%-CL%G=FC_(*L=t+.G\CL!T7F,(:L9-@CL+.QGCL)=Q7B"%9@(:T>QG#(*-La&2(D"H(:+.-P(*+.CQ+Y"%G,-j(*%F!#(
(*B"i(vCL%T>7F[(*L9>-:CQ+.LG#(:+.-@(:-heL9:9@L+5GE;%LG#(*+5G(:KebQL$ar!T>7,F,(*Q9h-:CQ+.QGE(:+.-P(*(*%F!#(b(*B"H(b-P(D"H(:+.-P(*+.CL"%$j(:L!9PI=+5=Go(n"%77$qIh(*U(*,L+.97,9:!/$5LT-La
+.G%-U"%9:CDB"HG!+.G,a2(*"H(*+5-@(:+.CL+."%G-tG<e9:QCL!%G+.Qv(*B"H(MCL!T7F,(:L9O-:CQ+.QG[Z
(*+5-@(*-r"%9:mTh"H32+5G\G<;!L$1CL!G#(:9:+./,F,(*+5!G-Seb+.$5mCQ!T>7,F,(*Q9r-:CQ+.QGE(:+.-P(*-rG,ef9@LCL%!G+.Q
(*0!QGL9:"%$.+q(JIv%'-@(*"H(*+5-@(:+.C]"H$1(*,L!9PI?"%G=8T>_(*2=!$5!%I%a&r$._;!Q9m=B"i(D"OT>+5G+.G,O"%$qZ
!!9@+5(*,T>-z"H9:UT>%9:M-@C]"%$."%/$.M(:B"%GA-@(*"H(*+5-@(*+5CL+."%G-6_;!L9z(:!F!A7X!-:-@+./$5%aUB!9:T>"%$
-@(*"H(*+5-@(*+5C]"%$#(:L!9PId+5-xT!9:)7XL9@;H"%-@+5;%r(*B"HGMCL!T7F,(:L9-:CL+5LG#(*+5-@(:-B"%=O9@]"%$5+.LQ=oa$.$
"%!9@LU-@(:F=LG#(:-\eb8=L"%$xeb+5(:(*"%GB"H$5I2-:+5-6%'=B"H(D"-@!F$5=A/XOerQ$.$%9:!FG,=L=
+.G?/B"%-@+.Cz79@!/B"%/,+.$.+q(JIv"HG=?Th"H(:LT>"H(:+.C]"H$j-@(*"H(*+5-@(:+.CL-QaSm-@+.GO'W"%GC_Iv(*2!$5-s$.+53%zGQF[Z
!

!
9*"H$1G_(*-QR/12!-P(*+.G,M"%G=-@F77X!9@(r;!LC_(*!9T>"%CD+5GL-)eb+5(:!F,(F,G=L9@-@(*"%G=+5GO/B"%-@+.C
-@(*"H(*+5-@(:+.CL-+5-$.+53%z=!+5GO/9:"%+.G?-:F,9:!Q9@I?/XQ'N!9@z32G,eb+5G<e (*F-@t"O/B"HG=B"%+5=oa
"rF,(>ebL9@C]"%Gu-P(*F=QG#(*-$.L"%9:G /B"%-@+.C479@!/B"%/,+.$.+q(JIc"%G= -@(D"i(*+.-P(*+5CL-hyEF+5CD3E$5$
I#
wm<ebQ9:Ha\s(6$.]"H-@(]RX(*B"i(6es"H-6TdI`CL!GCQ$.F-@+.!G`ebLG TdI8CL%T>7F[(*L9-:CL+5LGCQOCL!$ Z
$.L"%!FQ-3%Q7,("%-:3E+.G,nT& %'( L9@)CL"%G0k1-@LG=tTdI6-P(*F=,LG#(*-(*!_("s![2=0"FG=Q9PZ
-@(*"%G=+5G8%'rT2=L9@G-P(D"H(:+.-@(:+.CQ-zy2F,+.CD3E$5$I #*) U(JI[7,+.C]"H$&Th"H(:LT>"H(:+.C]"H$-@(*"H(*+5-@(*+5C
CL%F9:-@`-@71QG=-h(:[ TMFCD (*+5T>?!Gc(*Q=+.!F,-h"%G,= FG,+.G-@7+.9@+.G(*%7+.CQ-4VNCL!FG#(:+.G
TQ(*,[=-QR(Jer=+5T>QG-:+5!GB"%$1+.G#(*Q!9*"%$5-Q(:C%a ^>"H((:tg[71QG-:\%'CQ<;!L9@+.GT[=Q9:G
CL%GCLQ7,(*->VW/X2%(*-P(*9*"H77+.G,RoCLF,9@;!UQ-@(*+5Th"i(*+.%GoRj!9:"%7+5C]"%$T>2=Q$.-_(*CHa ^aU[?km-@Q(
!F,(b(:>9@L=Q-:+.%G`!F,9nF,G=L9@!9*"H=FB"H(:t%G!9:-nCL%F9:-@t!G879@!/B"%/+5$.+q(JIv"%G,=KT>"H(*Z
T>"H(*+5C]"%$-P(D"H(:+.-@(:+.CQ-Lan+.-m/12!38"%9:%-:t'N9@!T (*B"i(6CL%F9:-@%a +nL9:t+.-"h-:FTT>"%9@I?%'
(*,\T>"%+5Gv'N]"H(:F9:Q-nH'(:+.-/X[%3Xa
ab/12!3U+.-)-:F+q(D"%/,$.m'N%9%G!9:-rFG=Q9:!9:"%=FB"H(:L-s+5GvT>"H(*oR2-@(*"H(*+5-@(:+.CL-"%G=
CQ!T>7,F,(*Q9-@CL+.QGCL6"%-eL$5$j"%-!9*"H=FB"H(:-@(:F=QGE(:-+.G>CL!T7F,(:L9r-@CL+5LGCQ6"%G=
%(:L9yEFB"HGE(:+5(*"H(*+q;!zpBL$5=-La
abkvCQ<;!L94"%=,;H"%GCQL= (:!7+.CQ-8(:B"H(K"%9: (*9:"%=+q(*+.%GB"%$.$qI G%(8(D"%F%E(K+.G " pB9@-@(
CQ!F9:-@%a B!9>g,"%T>7,$.%RrG!G7B"H9*"%TQ(:9:+.C?9@L!9@L-:-@+.!GjR/12%(*-P(*9:"%77+5GRr=,LG[Z
-@+5(JIvQ-@(:+.T>"H(*+5!G?"%G=8!9*"H7+.CL"%$jT2=L$5-La
, abkOB"];%4!T+5(:(:L= (*!7+5CL->+.G 7,9:!/B"H/+.$5+5(JI(*"H(v=G%(v7,$Y"]I "CLQG#(*9*"H$9:%$.
+5G-@(*"H(*+5-@(*+5C]"%$)+.G['NL9:QGCLHaA %9d_g,"%T7$5%R&CL%FG#(*+.G,T>_(*2=-M"H9:h;2+.9P(*FB"H$.$5I
"%/,-:LG#(La
- abkJG !LG,L9*"H$R)kt(*9PI(*A"];!!+5= /XL$."%/X!9:+5G4(*Q=+.!F,-CL"%$.CQF$Y"i(*+.%G-M+.Gc'W"];!!9U%'
QT>7"%-:+5L+.G,OCL!GCQL7,(:-La
. abk1CL<;!Q9xG!G,7B"%9*"HT>_(*9:+5CS+.G,'NQ9:LG,CL)/1_'N!9:)7B"%9:"%T>_(*9@+.C&+.G['NL9:QGCLHax+5-+5-(*
!7,71!-@+5(:&%',T!-P(-P(D"H(:+.-P(*+.CQ-/12!3E-/F,(kj/XL$5+._;!S+5(+5-o(*)9:+5!#(es"]Iz(*n=b+5(La
/"%9*"%TQ(:9:+5C\T2=L$5-m"H9:dF,G9:L"%$.+5-@(*+5C\"%G=K7XL="%!!!+5C]"%$5$5IhFGGB"H(:F9*"H$a\0V +n<e
e!F$5=>e3EG<e (*nQ;%L9@IE(:+.GU"H/1!F[((:m=,+.-@(:9:+5/F,(*+5!G_g[CLQ7,(s'N!9S!Gm!9
(JeM7B"%9:"%TQ(*Q9:1- #i^dk&+5G#(*9:2=FCQm-P(D"H(:+.-P(*+.CL"%$B'NFGC_(*+.%GB"%$.-)"%G,=v/12%(:-@(*9:"%77+5G
;%L9@I?L"%9:$qI"%G,=`-P(*F=QG#(*-bpBG=?(*+5-y2F,+5(*0GB"i(*F9:"%$a
$ ':x+59:-@(MQ9:T 45/&9:%/B"%/+5$.+5(J6I )"HG7= ':[LCQ!G= L9@T
2 abkt"%/B"%G,=!Gc(*?F-@FB"%3
42(D"H(:+.-P(*+.CQ8- )8"%77,9:#"%CDja[!T-P(*F=,LG#(*-\!G$qIK(D"H3%O(*OpB9@-@(tB"%$q'r"HG=+q(
,
e!F$.=\/Xr"bCL9@+.T)+q'[(:QI0=,+.=\G%(-:Qs"%G#I0-@(*"H(*+5-@(:+.C]"H$%(*Q!9@I%aBF9P(*Q9:T!9:HR
79@!/B"%/+5$.+q(JI`+5-6T!9:OQG#"%!+5GhebQG-P(*F=,LG#(*-\C]"%G -@LU+5(z7,F,(z(*e!9:34+5G
(*,\CQ!G#(*g[(m%'-@(*"H(*+5-@(*+5CL-Qa
abCQ!F9:-@ T><;!Q-v;!Q9@IuyEF+5CD32$qIu"%G,=uCL<;!Q9:-8TMFCD Th"i(*L9@+Y"%$a 8I CL!$ Z
$.L"%!FQ-@!3%U(*B"i(0kCL<;%L9M"H$.$%'-@(*"H(*+5-@(:+.CL-z+5GA(:+.-zCQ!F9@-:>"%G=ALGCQ>(:
(*+q(*$5%asCL%F9:-@+.-=QTh"HG=+.G,6/F,(&kB"];%ser!9@3%Q=>B"H9:=M(*Th"%3Hr(*,sTh"Z
(*Q9:+."%$#"%-+.G#(*F,+5(*+q;!S"%-7X!-:-@+./$5&-:m(:B"H((*,T>"H(*Q9:+."%$%+5-;%L9PItF,G=L9@-@(*"%G=B"%/,$.
=Q-:7+q(*z(*z'W"%-P(n7B"%CLHaSG#IEes"]I!RX-:$5<e CQ!F9:-@L-m"%9:0/X!9:+5Ga
ab- n+5CDB"%9@= BQI2GT>"%G 71!+5G#(*L= !F,(LRs9:+5!!9U"%G= CL$Y"H9:+5(JI"%9:8G%(>-@I2G!G#I#Z
T!F-Lakn"<;%M(*9@+.L=`(*v-P(*9:+53%O">![2=4/B"H$Y"%GCQ%a6"];!%+.=!_(:(*+5Gh/X!!%L=
=<ebGM+5GdFG+5G#(*L9@L-P(*+.G,m(*QCDG+5C]"%$2=Q(*"%+.$5-LRT>"%G#It9@L-@F$5(:-x"%9@r-P(D"H(:L=teb+q(*%F,(
79@[%' a`/+5/$.+5!!9*"H7+.CO9@Q'NQ9:LG,CLL-U"i(d(*,vLG=H'b]"%CDCDB"H7,(*Q9O71!+5G#(
(*,\-P(*F=,LG#(m(:"%779@!79:+."H(*z-@!F9@CLL-Qa
a 6GdTdIzeL/,-:+5(:"%9@)pB$5L-eb+5(: CL2=Seb+.CDM-P(*F=,LG#(*-CL"%GdF,-:'N%9=%+.Gm"%$.$
(*,CL%T>7F[(*+.G,a +neQ;%L9QR,(*m/12!3O+5-)G,%((:+.Q=>(* "%G=v"HGEICQ!T>7,F,(*+5G
$Y"HG!FB"%%6C]"%G8/X\F,-:L=ja
mpB9@-@(7B"%9P(r%'1(*n(*g[(+.-SCQ!GCLQ9:GQ=veb+5(:>79@!/B"%/,+.$.+q(JId(:L!9PI!RE(*b'N!9@Th"%$
$Y"%G,!FB"%!zH'xFG,CLL9P(D"%+5G#(JIeb+.CD8+.-s(:0/B"%-:+5-b%'x-P(D"H(:+.-P(*+.CL"%$o+5G,'NL9@LGCQ%aS\/B"%-:+5C
79:%/$.QT(*"H(ber0-@(:F=,I?+5G87,9:!/B"H/+.$5+5(JI>+.- %
!"#$&% !('*)+-,,+.0/213 !"451% !('0%6!(#7+,8'9:51;'0<>=
)-'*?4+,#@
)-:LCQ!G=\7B"%9P(%'[(:S/12!3m+.-"%/X!F,(-P(D"H(:+.-@(:+.CL"%$%+5G,'NL9@LGCQ"HG=t+q(*-CQ$.!-@&CL!F-@+.G-QR
=B"H(*"0T+.G,+.Gz"HG=UTh"%CD,+.G$5]"%9@G+.G,ab/B"H-:+.C7,9:!/$5LT %'X-@(D"i(*+.-P(*+5C]"%$[+.G,'NQ9:LG,CL
+.-s(:0+.G#;!Q9:-:\%'79:%/B"%/+5$.+5(JI %
A51B'0<C)-'*?4+,+.0/213)-B/D;,EGFIH6'*<51% !('*)+-,,51J6 !K=
C-L51A@
L-@O+.=L"%-6"%9@d+5$.$.F,-@(*9:"H(*Q=`+5G4x+5!F9@ a a/&9:Q=+.C_(*+.%GoR1CQ$Y"%-@-:+qpBC]"H(:+.!GjRXCQ$.F- Z
(*Q9:+.G,M"%G=hL-@(:+.T>"H(*+5!G>"%9:m"%$.$ -@71QCL+Y"H$1CL"%-:Q-s%'-P(D"H(:+.-@(:+.CL"%$ +.G,'NQ9:LG,CL%aNMz"i(D"d"HGB"%$qZ
I2-:+.-QRT>"%CD+.G,$.L"%9:G,+.Gd"%G=v=B"i(D"dT+.G+5Gt"%9:;H"%9@+.!F,-rGB"%TL-!+5;%LGv(:d(*79*"%CZ
(*+5CLhH'-@(*"H(*+5-@(*+5C]"%$S+5G,'NL9@LGCQ%R&=Q71QG=+.G,4!G(:hCQ!G#(*_g2(La4v-@LCL%G=c7B"%9@(t%'
-
!
S9@!/B"%/,+.$.+q(JI
/
Mz"H(*"O!LGQ9*"H(:+.GU79@[CQL-@-
6/-:Q9@;%L=K="H(D"
kJG,'NL9@LGCQd"HG= Mz"i(D" `+5G+.G
x+5!F9: a % /&9@!/B"%/+5$.+q(JIv"%G,=8+5G,'NL9@LGCQ%a
(*,K/X[!3CQ!G#(D"%+5G-!GKT!9@`CDB"H7,(*Q9h!G7,9:!/B"H/+.$5+5(JI(:B"H(hCL<;!Q9:-v-P(*2CDB"%-@(:+.C
79@[CQL-:-@L-n+.GCQ$.F=,+.G K"H9:3%<;?CDB"H+.G-Qa
khB"];%A=,9*"]ebG L"];[+5$5I !G %(*Q9/12!3E-?+.G T>"%G#I7$."%CLQ-La `!-P(8CDB"%7,(:L9:CL%GE(*"%+.G "-@LC_(*+.%GfCL"%$.$5L= "r+5$./$5+.!%9*"%7,+.C bLT>"%9:3E-veb+5CDf-@L9P;!L-8/X%(*u(*c"HC_Z
3EG<eb$.Q=!8TdIc=L/[(>(:%(:L9>"%F,(:!9:->"%G=c(*71%+.G#(9:]"H=L9@-(:%(:L9UF-@Q'NF$
9:_'NL9@LGCQL-Laukter!F,$.= L-:7XLCQ+Y"%$5$5I$5+.3H?(:TLG#(:+.!G(*,`/X[%32-/#I M6 9:2%("%G=
[CDQ9@;2+.-@ V ^m"%G,= 9@+.TT>_(:(m"%G=AE(*+.9@]"%3HL9UV 6 ^b'N9:%T eb,+.CDkn"%=B"%7[(*L=
T>"%G#Iv_g,"%T7$5L-n"%G=8_g[CLQ9:CQ+.-:Q-La
.
#7,C#7)-, N L 6 )+#7'* !(F
2 (D"H(:+.-P(*+.CQ+Y"%G,-b"%G=KCQ!T7F,(*Q9b-:CL+5LG#(*+5-@(:-m%' (:LG8F-:t=+ 1L9@LG#(n$Y"%G,!FB"%!z'N!9(:
-*"%TK(:+.G,a +mQ9:4+5-v"=+5CQ(*+5!GB"%9PI(*B"i(v(*K9:]"H=L9vT>"]Ices"%G#((:9:_(*F9@G(*
(*9@!F!,!F,(b(*0CQ!F9:-@%a
#7,C#7)-,
L-P(*+.T>"H(:+.!G
CL$."%-:-@+5pBCL"H(*+5!G
CL$5F-@(:L9:+5G
=B"H(*"
CL<;H"%9:+."H(*QCL$."%-:-@+5pBQ9
#I[7X%(:L-@+.CL!G[pB=LG,CL\+.G#(*Q9@;H"%$
=+.9@LC_(*L=`"%C_I[CQ$.+5Cz!9*"%7,
s"]I!Q-:+Y"HG`+5G,'NL9@LGCQ
'N9:Qy2F,LG#(*+5-@(m+.G,'NQ9:LG,CL
'*?%I<C! )7)-
$.L"%9:G+5G
-:F7XL9P;2+.-:Q=K$5]"%9@G+.G
FG-@F71Q9@;2+.-@L=4$5]"%9@G+.G,
(*9:"%+.G+5GO-*"HT>7$5
'N]"H(:F9:Q#I[7X%(:L-@+.#
#
s"]I!Q-mG_(
"
s"]I!Q-:+Y"HGK+.G['NL9:QGCL
"
"
#
$Y"%9@!6=_;2+Y"H(:+.!G?/X!FG=- 1/ n $5]"%9@G+.G,
-6$
F-@+.GU=B"i(D"M(*UL-P(*+5Th"H(:
"%G?FG,32G,ebG`yEFB"%G#(*+q(JI
79@L=+5CQ(:+.GU"U=+.-@CL9@Q(* 'N9:%T
7F[(:(*+5GO=B"H(*"+5G#(*O!9@!F7V P^ V %^
(: "OT>"%7'N9:!TCL<;H"%9:+."H(*Q-s(*!F,(*CQ!TL-@F/-:_(H'"O7B"%9:"%TQ(*Q9-:7B"HCL
+5GE(:L9P;%"H$j(*B"i(bCL!G#(D"H+.G-bFG3EG<ebG`y2F"%G#(*+5(JI
eb+q(*8"O79:Q-:CL9@+./XL=?'N9:Qy2F,LGC_I
TMF,$5(*+q;H"%9:+."H(*n=+.-P(*9@+./F,(:+.!Gveb+q(*
-@71QCL+5pL=8CL!G=,+5(*+5!GB"%$
+5G=L7XLG,=LGCQt9@L$."H(*+5!G-P(D"H(:+.-@(:+.CL"%$jTQ(:[=,-b'N%9F-:+5GU=B"H(D"
(:F,71=B"i(*0-:F/ @LCQ(:+5;%t/XL$5+.Q'N-P(D"H(:+.-@(:+.CL"%$jTQ(:[=,-b'N%979:2=FCQ+.G
7X!+.G#(Q-@(:+.T>"H(*Q-m"HG=?CL!G,p=LGCQd+5GE(:L9P;%"H$.eb+q(*?!FB"%9:"%G#(*QL-b!G?'N9:LyEFQGCQI8/XLB"];2+.%9
FG,+5'N!9@T/X!FG=,-m!G?79@!/B"%/,+.$.+q(JIhH'xQ9:9:%9:
"!
%$
2
!
'#7'*
FI?H6'
-6$
@9 ]"%$oGEFTd/1Q9:+.G['
xV1^
+5G,pBTMFT %x(:0$Y"%9@!L-P(bG2F,TM/1Q9h-:F,CD`(*"H(
V1^'N%9b"%$.$ (:+.G3>%'(*+5-b"%-(*0T+.G+5TMFT %'
-:F,7 V1^
-@F79:QTMFT %(*0-@Th"%$5$.Q-@(GEFTM/XL9 v-@FCD`(:B"H( xV 1^r'N!9b"%$5$ (:+.G3>%'(*+5-b"%-(*0T>"ig[+.TMFT %'
V 1 " ^ V #" ^ %$&$&$' ,

)+ *
!
) -, )2#
(
,/. -0 ,
0 <-0>=+? "%TTh"t'NF,GCQ(:+.!G768 ;:
3)V54x^
9
!F[(*CL%T>
@
A
-:"%T>7,$.6-@7B"%CLhVW-@Q(n%'!F,(:CL!TL-*^
_;!LG#(MVW-@F/-:_(H' A ^
BxV @ ^
+5G=+.CL"H(*%9r'NFGC_(*+.%GDC +5' @ E "%G= H(*Q9@eb+.-@
FVG6^
79@!/B"%/,+.$.+q(JI%'x_;!QGEH( I I
GEFTM/XL9b%'7X!+.G#(*-+5G?-:Q( JLK
CQFTMF$."H(*+q;!z=+5-@(*9@+./F[(*+.%Gh'NFGC_(*+.%G
MK
79@!/B"%/,+.$.+q(JI=LG-@+5(JIVN!9Th"H-:-D^'NF,GCQ(:+.!G
ONPJ
B"%-b=+5-@(:9:+./,F,(*+5!GQJ
ONR
B"%-b=QG-:+q(J7I 4S
"%G= B"];!0(*,\-:"%T\=,+.-@(:9:+5/F,(*+5!G
TNPJ
-:"%T>7,$.6%'x-:+5L 'N9:!TUJ
V
-P(D"%G="%9:=`wm!9@Th"H$j79@!/B"%/+5$.+q(JIh=QG-:+q(JI
W
-P(D"%G="%9:=`wm!9@Th"H$j=+5-@(*9@+./F[(*+.%Gv'NFGCQ0 (:+.!G
F7,71QY9 4yEFB"HGE(:+.$5z%'[ZV ^r+a %\a W V " 4^
X :
?
]6V A^ 4^6_ JhV`1^
g,7XLC_(*Q=`;H"%$.F,VWT]"%G ^rH'x9:"%G=!T;%"H9:+Y"H/$.
]6VGa,V A^@^ 4 6 aV`1^ ? JhV`1^ g,7XLC_(*Q=`;H"%$.F,VWT]"%G ^rH'a,V A^
bUV A^
;H"%9@+Y"%GCQz%'9*"%G=,!T ;H"%9:+."%/$5
' V M^
CQ;H"%9@+Y"%G,CLz/XQ(JeLLG "HG=
=B"i(D"
-:"%T>7,$.6-@+.L
c
CQ!G#;!L9@!LG,CLd+5G?79:!/"%/+.$5+5(JI
"ed
CQ!G#;!L9@!LG,CLd+5G?=+.-P(*9:+5/F,(:+.!G
f
g"eid h
CQ!G#;!L9@!LG,CLd+5G?y2F"%=9*"i(*+.CzT]"HG
TjPZVk 2lDm ^
V " ko^ npl f ZV ^
"

FI?H'
-6$
'# '0 '0#$I<-
-P(D"H(:+.-P(*+.CL"%$oT[=,L$ C,"O-:Q(n%'x=+.-P(*9@+./F,(:+.!G>'NFGCQ(:+.!G,-LR
=,LG-@+5(JI'NFGC_(*+5!G-!9b9:Q!9:Q-:-@+.!G?'NFGC_(*+.%G
7"%9*"%TQ(:L9
Q-@(:+.T>"H(*z%'7B"%9:"%TQ(*Q9
MVGJd^
-P(D"H(:+.-P(*+.CL"%$j'NFG,CQ(*+5!GB"%$&VN(*,\T]"HGoR,'N!9g"HT>7$5<^
BV ^
$5+.3HL$.+52[=h'NF,GCQ(:+.!G
4,V #^
n d I n I +.-/X!FG=,L=?'N!9$Y"%9@! 4
vV %^
4&V #^
n "c d I n I +.-/X!FG=,L=K+5G79:%/B"%/+5$.+5(JI'N!9$."%9:! 4
SV !^
"
"

!
, G9 < 51)+C,
< 4 ) ) 4
8 9 ,
8 ) a 4 0 ( 'N!9 m , a

$&$&$
*
<
$.+5T 8
4
"%T>T>"\'NF,GCQ(:+.!G+5-s+.-s=,QpBGQ=?/EI#3VG4^ 46\9 8 : 0 < 0>= ? 'N!9 4 a&kl'
4 (*QG3)V54x^ 4 V54 " ^3)V54 " ^aSkl' +.-n"%G8+.G#(*Q!L9s(:LG 3)V ^ 4 V Q" ^ a
[!Tz-:7XLCQ+Y"%$1;%"H$.FQ-m"H9:&%3)V ^ 4 "%G= 3)V n ^ 4 &a
!#"%$&"('*)*',+.-/'*01+325467$8+32(467$8+3':9;$&)<)*$&=5>#?%$8>#4A@B&DCE?%$&=<+3',@F-G':=5>H?(=594I!+J$8'*=<+.-#KDL4
9;$&=M$&N5N():-ON5!#"%$&"('*)*',+.-P+Q254&!-R+QH$TS5':U&4!0Q4A0Q4I+1&@(N5Q&"5)*4I670IVW@B!#6X9I#'*=/Y%':N5N5'*=(>Z+3
+3254/$8=%$&):-[0!'*0\8@]9I#6N5?(+34I$&):>##Q',+32560K1^254O0_+J$&_+3'*=(>`Na#':=<+Z'*0A+3`0!Na4I9':@F-b03$&6N5):4
0QN%$894&V%+Q254/0Q4I+c&@]Nd#0Q0!'*"5):4O#?(+39I#640K
e fgih jkl fjmgiclRnogi prq7lRDn
^254`st%uvxwBysv1t%z#y|{RVa':0c+Q254i0!4I+H&@DNd#0Q0!'*"5):4M&?(+39I#674I0P8@$&=}4e~GNa4IQ'*64=<+K
]#':=<+30O'*={$&!479$&)*):4Sst%uv]wydWz#duy[s&MWyEt%w<t(d1s&HdyGdWsb$&!4
0Q?5"(0Q4I+Q0H&@x{RK
E&<M< 5,B\\ ¢¡£J£T¤¦¥J¡¨§F© ª§«¥J\ F¬%© {¯®°¨±m±³²J±|´T²Q´P±³²Q´T´/µG¶· ¬%c¸&©% B¬(¤& A F¬%¹ºJ£ A .¡£J£§*£O¬%Q¤<»¨£/§*£T¼ ®½°¨±m±|²J±³´/µG¶\¾
E&<M< 5¿¯À ÂÁ O B¬%Ã¡¨Ä[ ¢¥J¡¨Åb¦¡¢M¤bÅbQ¤¨£Ä[º3Åb©% Z¡¢R£¡¨ÅbZÆ%¬[ÇW£§«¥Q¤&ÈÉÄ5¤&©%Ê
ª§F ÇË%;¡¨ºM!ÌE¤&ÅHÆdÈ*JËx ¢ÅHÆ%º!¤& ÄGº3 ¶7· ¬%© {¯®ÎÍÎ®ÂÏ_ÐHÑX²eÑÓÒ¶`· ¬%i¸&©% D F¬(¤& B¬%
ÅbQ¤¨£ÄGº3Åb©% ]§*£HÈ,¤&ºªÔ[ºc B¬(¤&©³Õ[Ö Á Ä[ xÈ*J£J£P B¬(¤&©m¡¨ºOQÉÄ5¤&È× .¡¦ØEÙ`§*£¼ ®ÂÏ Ú ²eÛ&Ü¨Ý.¶¾
E&<M< 5¿ÞßBT\P .¡£J£R¤b¥J¡¨§F©R;¡¨º3¸&ºT F¬%©| F¬%T£¤&ÅHÆdÈ*£.Æ(¤[¥Jc§*£P B¬%P§F©;¹©%§F .£ {¯®áà%®ÏFâe²!]ã;²_xä²;å;å;å²eÒ²c]æç°¨±³²Q´/µ%è1å
À xé Á B¬%`¸&©% F¬(¤& \ F¬%¹º3£ ¬%Q¤<»¤Æ<Æ%Q¤&º3£¦¡¨©m F¬% F¬[§FºQ»b .¡£J£ ¶· ¬%©
é ®êàDÏFâe²!]ã;²!]ä;²å;å;å²JÒ\ëPâ®Î´T²_xã\®ì´T²_xä\®X±|²11æ1ç°¨±³²Q´/µZ@B&Híî¯Üaè1åH¾
Û
Û#Û
,' U#4I=ß$8= 4U#4I=E+ ¼ VA)*4+ ¼ ® °; ç½{!¦#ç " ¼ µ}S54I=5&+Q4+Q2549&67N5):464=<+
&@ ¼ K%$.=(@B&Q67$&)*),-#V ¼ 9$&= "a4ÃQ4$&S$&0'&!=5&+ ¼ K)(½^2549#6N5):464=<+O&@\{ ':0R+3254
46N(+.-0!4I+*5K^254M?(=5'*#=&@]4IU&4=<+30 ¼ $&=5S,+á':0cS(4.-%=54IS ¼0/ +Â®½°; ç {!T ç
¼ #D¯1ç +#\ çm"d&+32 µ 2T25'*9J279;$&="a4T+32(#?5>#2<+&@$&03& ¼ #4+bK5(1$ @ ¼ âe² ¼ ã;²;åå;å
'*0T$`0!4CE?54=(94i8@D0Q4+30+32(4=
7 6 ¼ æ ®êà× ç{rë`¯ç ¼ æ@B#T$8+c)*4$&0!+T&=54/'_è1å
æ98aâ
^254`':=E+Q4!0Q49+3':#= &@ ¼ $&=50S + '*0 0¼ : + ® °; çß{ !Rç ¼ $&=(S ;ç +µ7!4;$&S
& ¼ $8=51
S +b5K (=<G#64I+Q'*64>0 2?4 2TQ',+34 ¼ :;+ $&0 ¼ +b@K $ @ ¼ âe² ¼ ã²;å;ååa':0O$0Q4IC[?(4=59I4
&@]0Q4+30T+3254I=
A 6 ¼ æ ® à ç{rëÃ¯ç ¼ æ@B#T$&)*)a' è å
9æ 8aâ
B4I+ ¼ C
Ð +Â®°;¯ë¦ ç ¼ ²!D Eç " +7µEFK $ @]4IU&4_-4)*4I674I=<+T&@ ¼ ':0$&):0Q`9&=E+3$&'*=(4S'*=
+D2
4 2T!':+34 =¼ G +Â#IVG4IC[?(':U8$&)*4I=<+3):-&HV +JI ¼ FK $ @ ¼ ':0\K$ -%=5':+Q4P0!4I+V()*4M+ L ¼ L8S54=(&+34
+32(4=E?56i"a4Ic8@x4I)*464=<+Q0c':= ¼ K <G4I4^]$&"5):4 @B&T$Ã0!?566b$&_-#K
^]$&"5):4 K4<($&6N5)*4R0!N%$&9I4i$&=(S³4U#4=<+Q0K
{
03$&6N5)*4H0QN%$&9I4

#?(+39I#64
¼
4IU#4I=<+`ÏB0Q?5"50!4I+P&@x{TÒ
L¼ L
=[?(6¦"a4IT&@xNd#':=E+Q0'*= ¼ Ï',@ ¼ ':N0 -%=5',+34Ò
¼
9#6N5)*4I674I=<+&@ ¼ ÏB=5&+ ¼ Ò
¼0/ +
?5=5'*&=Ï ¼ #+`Ò
¼0: + # ¼ + '*=<+34IQ0Q4I9I+Q'*#=Ï ¼ $&=5,S +`Ò
¼ OÐ +
0Q4I+cS5Q' P 4IQ4I=5947ÏNd#':=E+Q0'*= ¼ +32%$¨+c$&Q4/=5&+T':='+`Ò
¼ G +
=
0Q4I+c':=59):?50Q':#= Ï ¼ ':0P$`0!?5"50Q4+H8@x#4ICE?%$&)+3R+`Ò
*
=[?()*) 4U#4=<+¦Ïª$&Q) 2Z$;-[0@$&)*0!4Ò
{
+3Q?(44U#4I=E+¦Ïª$8S) 2\$-[0+3!?54Ò
L 40Q$;-R+32%$¨+ ¼ â² ¼ ã;²;å;åå;$&!4T]sVUda#$&!4u 1t%wwXWyZYz<w]s;FdyT':@ ¼ æ :ß¼[ ®
*\2T254=54U#4Iií%®_
] ^%Ka`%#O4~($&6N5)*48V ¼ âH®cb Ú ² Ò² ¼ ã/®cb ²eÛ#Ò² ¼ ä/®cb¿Û[²JÜ<Ò²å;å;å $8Q4
&Û Ü
S5'*0 !#'*=<+K v1t5WBd&@1{Ó'*0$M0Q4CE?54I=594&@S5':0 !#':=<+Z0Q4+30 ¼ â² ¼ ã;²å;å;å[0Q?59J2+325$8+
/ æ96 8aâ ¼ æ ®r{RK ':U&4=|$&=|4IU&4=<+ ¼ V%S54.-5=54O+3254
T1ªz#tG× .11z&Wd ¼ "ExÏF\Ò® aÏF ç ¼ Ò® Ú ':':@\@A= ¯ç "ç ¼ ¼ å
0Q4CE?54I=594³&@c0Q4+30 ¼ âe² ¼ ã²;å;ååx'*0¦ud1%dyÓª1z#WyEt%s; ':@ ¼ â G ¼ ã G
$&=5S 24³S54 -%=54)*':6 6 ¼ ® / 9æ6 8aâ ¼ æ.K 0Q4CE?54I=594}&@H0!4I+Q0 ¼ âe² ¼ ã²å;å;åx':0
ud1%dyCTy[z#Wy[t5s;ª ':@ ¼ â I ¼ Mã I $&=5Sm+Q254= 24¦S54 -%=54`)*':6 6 ¼ 7®
: æ96 8aâ ¼ æ K$.=³4I':+Q2549$&0Q48V 2
4 2T'*):) 2TQ',+34 ¼ ¼ K
E&<M< 5 ÓÀ { ®ìÍ ¤&©d»È* d¼ æ ® b Ú ² "8í.Ò ;¡¨º í]® ²eÛ[²;å;å;å:¶Ã· ¬%?© / 9æ6 8aâ ¼ æ ®
© / æ6 8aâ ¼ æ®
b Ú ² Ò ¤&©d» : 9æ6 8aâ ¼ æD®° Ú µG¶ F¦§F©(£ .Q¤<»|\`»[B¹© R¼ æ® Ï Ú ² "¨í.Ò F¬%E
Ï Ú ² Ò ¤&©d3» : 9æ6 8aâ ¼ æ ® *×¶\¾
Â mg ;kI
L42Z$8=E++Qi$&0!0Q':>#=$!4;$&)%=E?56¦"d4 \Ï ¼ Òx+3/4IU&4!-4U#4=<+ ¼ VE9;$8)*)*4IS`+3254Tv1¨
t ]ªwB W8@ ¼ KDL4/$8)*0Q`9$&)*) Â$¦v1W 1t ]ªwF WOT]sI¨ xWBa#Z$Ãv1W ]t ]wªF W
uyEt%s;Wyd^1C[?5$&)*',@F-$&0c$`N5!#"%$&"5':)*',+.-#V 2%$&0+3Ã03$¨+3'*0_@F-b+Q25Q4I4i$¨~G':#670Ië

5 áeÄG© ¥ §«¡¨© F¬(¤& Ã¤¨£J£§,Ô&©(£}¤ º3Q¤&ÈP©%Ä[Å Á º \Ï ¼ Ò .¡ Q¤[¥J¬
¸&©% ¼á§*£/¤ v]W ]t xwªF W T1sI¨ ]Wd ¡¨º/¤ v1W 1t ]ªwF WuyEt%s;Wy §
§F £¤& §*£B¹ZJ£/ F¬%;¡¨ÈFÈ*¡¨§F©[Ô F¬[º3Ji¤;Ì#§«¡¨Åi£
Ydu
\Ï ¼ Ò Ú ;¡¨ºi¸&ºeÇM¼
Ydu
\Ïª{TÒ® Ydu ZFT¼ âe² ¼ ã²;å;å;å ¤&º3M»&§*£ ;¡¨§F©% F¬%©
7 6 ¼ æ Â® 6 Ï ¼ æÒeå
æ98aâ
æ98aâ
!
#"
%$
)&
+*
,&
<

W
W ¼ <
< ¨ I

;<b &
Ó

8
;<
8 × G
;< <<#
<
&('
)&
&
&
)&
3@1BA
)*
c
_,5=1 /Z1 7E5
0d]e
$
&
E&
,&
<G
)&
i g
$
$
h
l
g
h
m
g
kj
n
$Ro
qp
r
$
^ 254!4$&Q4A67$&=<-/'*=<+34IQN5!4I+J$¨+3'*&=50]8@ \Ï ¼ ÒKx^2(4+ 2R9#66#=M':=<+34!N5Q4+J$8+Q'*#=(0
$&Q4Ã@BQ4ICE?54=59I'*4I0`$&=5SS54>#!44I0i&@\"a4I)*'*4@B0K $.= @BQ4IC[?(4=59-':=E+Q4!N5Q4+J$8+Q'*#= V \Ï ¼ ÒO':0
E$
$
5/
F
3
3R/K S:
9F 7EA
3[/?7
R^ >`_
V
L
/
a
MVWN
A=1 >
1 / V
1
/ L Kb5=1 L
/
K87
-U583C/
3C3\7OI]3
V
/K
5 V 1
1 3
>
9F D ON
143
3X314DY5
I7 FV
<G 9FH;
/?7
1T5
:
1 /Z1
7=3?>
1 IJ/K
ML
3QP L K
L
<; 3
3C3@1 DE5
A=1 1 / ;
3
g
h
:
/?7
9F 7EA
$
f&

.-0/214365879/
Û
+32(4M):#=5>ÃQ?5=³N5Q&Na#_+3':#=&@x+Q'*640+325$8+ ¼ ':0T+QQ?54'*=!4Nd4I+3',+3':#=50K`%#T4~($&6N5)*48V
':@ 24M0Q$-+Q2%$8+T+32(4N(Q#"%$8"5'*):':+.-8@x254$&S50c'*0 Û[V 2T254=³674$&=|+Q2%$8+T',@
24MY%':N+3254
9&'*=}67$&=<-+3':674I0c+Q254=m+Q254¦N5!#Na&!+3':#=|&@+3':674I024¦>#4+R254$&S50H+Q4=5S50P+3 Û7$&0
+32(4H=E?56¦"d4Z8@ +3#0!0Q4I0'*=59IQ4;$80Q40IK H=':= -%=5',+34),-Ã)*#=(>5VE?5=5N5!4S5':9I+J$8"5)*4P0Q4CE?54I=594/&@
+3&0Q0Q4I0 2T25&0Q4H):'*6':+Q'*=5>RN(Q#Nd#!+Q'*#=`+Q4=5S50A+3i$9I#=50!+3$&=<+Z':0\$&=7'*S54$&)*' ;$8+Q'*#= V&6¦?59J2
)*' &4+3254`':S54;$&@$0_+33$8'*>#2<+O):'*=54M'*=>#4I#674+3_-#K^254`S54>&Q44 .&@ "a4I)*':4I@'*=<+Q4QN(Q4I+3
$ +3':#= ':0M+325$8+ \Ï ¼ ÒM674$&0Q?(Q40¦$&=#"50!4!U&4 ¿0Ã0!+QQ4I=5>&+32 &@T"d4):'*4@+32%$8+ ¼ '*0+3Q?(4&K
$.=}4I':+Q254T'*=<+Q4QN(Q4I+3$8+3':#= V 24i!4CE?5'*!4+Q2%$8+ c~G':#670 +37Ü25&)*S K^254iS5Q' P 4IQ4I=594
'*='*=<+34IQN5!4I+J$¨+3'*&=C2T'*):)=58+`6b$¨+Q+34Ii6¦?(9J2 ?(=E+Q'*) 2\4S54$&) 2T',+320_+J$8+Q'*0!+Q'*9$&)':=(@B4 4=(94&K^254!4&V+3254S(S' Pa4Q':=5>m':=E+Q4!N5Q4+J$8+Q'*#=50i)*4;$8S +3m+ 2 0!9J25[#)*0`&@c':=(@B4!4=59I4&ë
+32(4/@B!4CE?54I=E+Q'*0_+H$8=5S|+Q254 Z$;-#4I0Q' $8=³0!9J25G&)*0KL4/S54I@B4IcS('*0Q9I?50Q0!'*#=|?5=<+3':) ) $¨+34IK
=54/9;$&=|S54!':U&467$&=<-N5!#Nd4!+Q'*4I0&@ @BQ#6á+3254$¨~G'*&670I
K P4IQ4M$&Q4/$¦@B.4 2/ë
\Ï *EÒ ® Ú
¼ G + ® ZÏ ¼ Ò Ï +¦Ò
=
Ò Ú \Ï ¼ \Ï ¼ Ò ® Ð ZÏ ¼ Ò
¼ A +½® * ® ¼ 7 +³® \Ï ¼ Ò \Ï +`Òeå
Ï«ÛGK Ò
)*4I0Q0T#"<U['*#?(0cN(Q#Nd4_+.-'*0T>#',U#4=':=+Q254O@B#)*): 2T'*=5%> B4I6767$(K
8¨MH¦5 "!1¡¨º¤&©%Ç¸&©% B£c¼ ¤&©d» + Ë Ï ¼ / +`Ò® ZÏ ¼ Ò ZÏ +`ÒxÐ \Ï ¼ +`Ò¶
# % `KRL!':+34 0¼ / + ® Ï ¼ + Ò / Ï ¼ +`Ò / Ï ¼ +`Ò$&=5S½=5&+34+Q2%$8++32(40Q4
$
4IU&4=<+307$&!4S5'*0 !#'*=<+;&K P4I=5948V67'$ E'*=(>Q4Nd4;$¨+34S ?50!4&@T+3254@$&9+`+32%$¨+ '*0¦$&(S S5',+3':U&4R@B#S5':0 !#':=E+Z4IU#4I=<+30V 24M0!44O+32%$¨+
® Ï ¼ + Ò 7 Ï ¼ +`Ò 7 Ï ¼ +`Ò ¼ 7 + ® \Ï ¼ + Ò ZÏ ¼ +`Ò \Ï ¼ +¦Ò
® \Ï ¼ + Ò ZÏ ¼ +`Ò \Ï ¼ +¦Ò \Ï ¼ +`ÒxÐ \Ï ¼ +¦Ò
® Ï ¼ + Ò 7 Ï ¼ +`Ò Ï ¼ +`Ò 7 Ï ¼ +`Ò Ð \Ï ¼ +`Ò
® \Ï ¼ Ò \Ï +`ÒxÐ \Ï ¼ +¦Òå}¾
E&<< /5 ) · \¡M¥J¡¨§F© ¢¡£J£J£ ¶ À ±bâTÁ T B¬%O¸&©% F¬(¤& ¬%Q¤<»¨£R¡¥J¥ÄGº3£R¡¨©b ¢¡£J£PÕ
¤&©d»RÈ* ±`ãPÁ Z F¬%T¸&©% B¬(¤& d¬%Q¤<»¨£c¡¥J¥Ä[º3£P¡¨©Ã .¡£J£ZØ ¶ B\¤&ÈFÈ%¡¨ÄG .¥J¡¨ÅbJ£Z¤&º3PQÉÄ(¤&ÈFÈ Ç
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
!*
$
$
$
$
$
$
$
$
$
*
$
$
$
$
$
$
Û
È § <È ÇWË F¬(¤& /§*£JË Ï¢°¨±bâe²J±`ãµ8ÒÃ® \Ï.°¨±âJ²Q´ ã;µ8Ò`® Ï.°´1â²3±Ãâ ãIµ8Òâ ® Zâ Ï.°´1âJ²Q´ ãIµ8Ò®
" Ë B¬%© \Ïª±bâ / ±`ãeÒ® \Ïª±bâ!Ò \Ïª±`ãeÒxÐ Ïª±bâ!±`ãeÒ® ã ã Ð ®rÜZ"×¶Z¾
8
I BA¼ ¼Â F¬%© ZÏ ¼ #Ò \Ï ¼ Ò ¤¨£
Ñ ¶
<G?(N5Na&0Q4R+325$8+ ¼ ¦'*0\6#=58+3#=54H'*=59IQ4;$80Q'*=(>¦0Q`+325$8+ ¼ â GÓ¼ ã G
K
B4I+ ¼ ® )*':6
V +Pã}® °; ç { ëê ç
6 ¼ ® / 9æ6 8aâ ¼ æ K R4 -%=54 +/â® ¼ â
¼ ã²! ç " ¼ âeµEV +Hä® °; ç { ë ç ¼ ä²! ç " ¼ ã²! ç " ¼ âeµE²å;å;4å $ +9;$&=ì"d4
0Q25 2T= +32%$¨K+ +â² +Hã²;å;å;åa$&Q4`S('*0 !&'*=<+;V ¼ b® / 9æ 8aâ ¼ æ® / æ 8aâ +cæD@B#R4$&9J2 $&=5S
/ 9æ6 8aâ +Pæ ® / 9æ6 8aâ ¼ æ«KcÏ <G4I4M4e~(9I4!9'*0!4 K %
Ò `%Q#6 c~G'*&6 Ü(V
\Ï ¼ #Ò® æ97 8aâ +Pæ Â® æ98aâ Ï +cæÒ
$&=5S|254=(94&V×?50!'*=5> T~(':#6 Ü$&><$8'*= V
):'*6 6 \Ï ¼ #Ò® ):'*6 6 æ98aâ \Ï +PæFÒ® æ968aâ \Ï +PæFÒ® æ97 68aâ +cæ ® \Ï ¼ Òå}¾
H _
7 F
K
@
$
$
$
$
$
l +7E5/Z1B5bP=1 / ;
$
F 7EA
7OI
$
$
A=1 1 /Z1
3 $
%
$
!#"
$
&%
$
$]o
p
r
$
$
r
r
$
$
o
$
p
$
Â mg ;kI ¯;Ixl fg¦h jkIl fjmg¦TlOn
<G?5N5Nd#0Q4¦+Q2%$8+R+Q254¦0Q$&67N()*4i0!N%$&9I4Ã{ ®á°;âe²;åå;å²! [µÃ'*0-5=5':+Q4&K`%#P4~($&6N5)*48V
':@ 2\4/+3&0Q0c$S('*4R+ 2T'*9I4&V5+Q254=}{r2%$&0cÜ Ã4):464=<+30IëA{¯®½°GÏBíQ² ^[Ò !cíQ²V^bç° ²;åå;å [µ<µEK
$ @4;$89J2m#?(+Q9#64/'*0T4CE?%$&):):-):' &4I):-&V5+3254I= \Ï ¼ Ò\® L ¼ L "&Ü ?2T254IQ4'L ¼ L[S54I=5&+Q40H+Q254
=E?56¦"d4R&@A4I)*464=<+Q0R'*= ¼ K/^254`N5Q&"%$&"5':)*':+.-+Q2%$8+R+Q254¦0!?56 &@+Q254`S5':94¦':0 ':0
Û "&Ü `0!'*=59I4/+Q254!4$&Q4O+ 27#?G+39&674I0T+Q2%$8+T9I#Q!40QNd#=5S+3Ã+32('*04IU&4=<+;K
$.=³>#4I=54Q$&)«V(':@x{r'*N
0 -5=5':+Q4/$&=5S|':@14;$&9J2³#?(+Q9#64O'*04IC[?5$&)*),-)*' &4I):-#VG+3254I=
ZÏ ¼ Ò® LL ¼3{ LL ²
2T25'*9J2³'*09$&)*):4S+32541] ×¨u v]W ]t xwªF W T]sI¨ xWBdA^]Ã9&67N5?G+34/N5Q&"(
$&"5':)*':+Q'*4I0V 24=544IS +Q 9#?5=<+`+Q254=[?(6¦"a4I`&@cNa&'*=<+30`':= $&= 4IU#4I=<+ ¼ K ³4+325[S50
@B#\9I#?5=<+3':=5>`Nd#'*=<+30\$&Q4H9;$&):)*4S79#6i"5'*=%$¨+3#!' $&)%64I+Q25GS(0KDL4R=544IS5= +TS54),U#4O':=E+Q
+3254I0Q4`':=m$&=<-³>#Q4$8+HS(4I+J$8'*)«KTL34 2T'*):)«V%25 24IU#4IV=54I4S $7@B4 2@$&9I+Q0P@BQ&6o9I#?5=<+3':=5>
('
!
#"
)
*
+*
$
*
*
$
&
E&
&
-,
Û
3+ 2(4#_-+32%$8+R2T'*):)"a4|?50!4I@B?5)c) $8+Q4IK ,' U#4I= #" !49+30IVT+Q254|=[?(6¦"a4Ib&@>2Z$;-[0&@
#!S54!'*=5>/+3254I0Q4R&" !49+30Z':0 ® Ï Ð Ò Ï ÐÛ#Ò Ü Û K
`%#9I#=<U#4I=5'*4I=5948V 2\4
S54 -%=54 Ú ® KDL4$&)*0!`S5.4 -5=54
Ï«ÛGK Û#Ò
® Ï Ð Ò ²
Q4$&0S & 9J25[#0!4 (5V 2T25':9J2Ã'*0x+Q254=E?56¦"d4&@×S5':0!+Q'*=59+2Z$;-[0&@d9J25[#0Q':=5> #" !4I9I+30
@BQ&6 4K `×&\4~($&6N5):4&V(',@ 24O2%$;U#4$i9)*$&0Q0\&@xÛ Ú Nd4#N5):4O$&=5S 24 2\$&=<++3¦0!4)*4I9I+c$
9&676':+!+344H&@DÜ`0_+3?5S54I=<+30V×+32(4=+3254IQ4$&Q4
Û Ú Û Ú ® Û Ú ® Ú
®
Ü Ü Ü Û Nd#0Q0!'*"5):4O9#667',+Q+Q440IKxL4=(&+34O+32(4/@B&)*)* 2T':=5>¦N5!#Na4I!+Q'*40Ië
®
®
&
$
(
=
S
®
Ð å
Ú
mlOjlOmlO prqlOn
$ @ 2\4cY%'*N$@$&':A9I#'*=+ 2T'*9I4&VG+Q254=+Q254HN(Q#"%$8"5'*):':+.-`&@ + 2\¦2(4;$&S50\'*0 ãâ ãâ K]L4
6¦?5),+3':N5):-+3254MN5Q&"%$&"5':)*':+Q'*4I0Z"a4I9;$&?(0Q34 24i!4><$&!S³+Q254M+ 27+Q#0Q0!40O$&0c'*=(S54Nd4=5S(4=<+;K
^254O@B#!6b$&)aS5.4 -5=5':+Q'*#=&@]'*=5S(4Nd4=5S54I=594'*0T$80T@B&)*)* 2T0IK

5 · \¡¸&©% F£T¼ ¤&©d» + ¤&º3 T1yEvy[
Ty[a §
\Ï ¼ +¦ÒA® \Ï ¼ Ò \Ï +`Ò
ÏªÛGK Ü#Ò
¤&©d»7\ºe§F .c¼ +¶ £ ¡¢¦¸&©% B£ ° ¼ æ1ëíç %µ §*£§F©d»[.Æ%©d»[©% §
A ¼ æ æ ® æ ZÏ ¼ æFÒ
;¡¨º¦¸&ºeÇc¹©%§F .R£Ä Á £ Î¡¢ a¶
*
c
_ 5=1 /Z147E5
$
$
$
e
$
o
p
$
.=5S54INa4I=5S54=(94|9;$&= $&!'*0Q4b'*=+ 2 S5'*0_+3'*=(9I+32\$;-[0K <G&674+3'*640IV 244~GN5)*':9',+3):$&0!0Q?564}+325$8++ 2 4U#4I=E+Q0|$&Q4m':=5S54INa4I=5S54=<+K `%#74~($&6N5)*48Vc':= +3&0Q0Q':=5> $9I#'*=
$
Û
+ 2T'*9I4&V 24?(0Q?%$&):):-}$&0Q0!?564¦+3254¦+Q#0Q0!40$&Q4i'*=5S54INa4I=5S54I=E+K2T25':9J2 Q4IY549I+Q0O+3254¦@$&9+
+32%$¨+P+Q254i9I#'*=³2%$&0H=(674I67&!-&@x+3254@-%Q0_+H+3&0Q0K$.=m&+Q254P'*=50_+J$&=(940IV 24iS54IQ',U#4
'*=5S(4Nd4=5S54I=594Ã"<-}U&4!':@F-['*=5>+Q2%$8+ Ï ¼ +`Òc® \Ï ¼ Ò \Ï +`Òc25#)*S(0K@`%#H4e~5$867N5):4&V ':=
+3#0!0Q':=5>$7@$&':PS5':4&V ):4I+ ¼ ®ê°8ÛG² 5² [µÃ$&=5S):4I++ ®á° ²eÛ[²JÜ(² (µEKP^254= V 0¼ : + ®
°8ÛG² (µEV ZÏ ¼ +`Ò³® Û " ß® \Ï ¼ Ò \Ï +`Ò|® Ï "&Û#Ò Ï«Û "&Ü#Ò$&=(SÓ0Q ¼ $&=5S + $&!4
'*=5S(4Nd4=5S54I=<+;K$.=}+Q25'*0c9;$80Q4&V 2\4¦S('*S5= +H$&0Q0!?5674+32%$¨+ ¼ $8=5ES + $&!4i':=5S54Nd4=(S54=<+
':+ !?50_+T+3?5!=54S|#?(++Q2%$8+T+3254- 24Q48K
<G?5N5Nd#0Q4+32%$¨+ ¼ $&=5SC+ $&!4S('*0 !&'*=<+4IU#4I=<+30V4$&9JC
2 2T',+32 Na#0!':+Q':U#4ÃN5!#"%$&"('*) ':+.-&K Z$&=+Q254I-"a47'*=5S54INa4I=5S54I=E+ P(Km^25'*0@B#)*): 2T0i0Q':=594 \Ï ¼ Ò \Ï +`Ò7î Ú -&4I+
\Ï ¼ +¦Ò`® Ï *EÒ`® Ú K x~G94NG+Ã'*=+Q25'*0¦0!Na4I9'*$&)9$&0Q48V+3254IQ4'*0M=5 2\$-+3 !?5S(>#4
'*=5S(4Nd4=5S54I=594"<-)*[ E'*=5>¦$Ã+Q254/0Q4I+Q0T'*=|$ 4=5=³S5' $&>&3$&6|K
E&<M< 5, · ¡£J£/¤I¤&§FºM¥J¡¨§F©mÕ[Ö¦ ª§FÅbJ£ ¶ À1 ¼ ® ¤& È*Q¤¨£ \¡¨© Q¤<» ¶ À ´ [
Á B¬%`¸&©% F¬(¤& «¤&§FÈ £`¡¥J¥Ä[ºJ£`¡¨©} B¬% ^ .¡£J£ ¶7· ¬%©
\Ï ¼ Ò ® Ð \Ï ¼ Ò
® Ð \Ïª$8)*) +J$8'*)*0QÒ
® Ð \ÏB´â.´aã ´â Ò
® Ð \ÏB´â!Ò â ZÏF´ ãÒ \ÏB´â Ò ?(0Q'*=(>'*=5S54INa4I=5S54I=594
® Ð Û å å¾
E&<M< 5,& · \¡7Æ%J¡JÆdÈ* «¤ < ªÄ[ºe©(£ ºÇ¨§F©[Ô .¡£§F© ¤ Á ¤¨£ < Á ¤&ÈFÈ\§F©% .¡ ¤ © ¶
ºJ£¡¨©bÕH£Ä×¥J¥JJQ»¨£§F F¬PÆdº3¡ Á ¤ Á §FÈ §F ÇOÕ aÙ/x¬[§FÈ*1Æ%º3£¡¨©ÃØO£Ä×¥J¥JJQ»¨£§F F¬PÆdº3¡ Á ¤ Á §FÈ §F Ç
Õ ¶ ¬(¤& T§*£¦ B¬%RÆdº3¡ Á ¤ Á §FÈ §F Ç B¬(¤& DÆ%º3£¡¨©Õ£Ä×¥J¥JJQ»¨£ Á B;¡¨º3HÆ%º3£¡¨©Ø À Aé
»[© ¡¨ ¢³ F¬%¸&©% Ã¡¢|§F©% ¢º3J£ ¶ À1 HN¼ [ Á ³ F¬%¸&©% M B¬(¤& ¦ F¬%M¹ºJ£ /£Ä×¥J¥JJ£J£³§*£
Á ÇÆ%º3£¡¨© Õ¤&©d»} F¬(¤& P§F /¡¥J¥ÄGº3£¡¨© ºe§¤&È©%ÄGÅ Á º ^d¶ ¡¨ . B¬(¤& ¼ â² ¼ ã;²;å;åå ¤&º3
»&§*£ ;¡¨§F©% ¤&©d»b B¬(¤& ]é ® / [6 8aâ N¼ [ ¶ M© ¥JJË
\Ï é ÒA® [ 68aâ Ï N¼ [ Òå
¡¨xË \Ï ¼ âQÒ® "8Ü%¶ ¼ ã ¡¥J¥ÄGº3£M§ /\/¬(¤&¸& B¬%/£QÉÄ%© ¥JiÕÅ`§*£J£J£JËAØÅ`§*£J£J£JË\Õ
£Ä×¥J¥JJQ»¨£ ¶ · ¬[§*£¦¬(¤¨£OÆdº3¡ Á ¤ Á §FÈ §F Ç \Ï ¼ ãÒO® ÏªÛ "&Ü<ÒIÏªZÜ "<ÒÏ "&Ü<ÒR® Ï "#Û#ÒIÏ "8Ü<Ò¶ !1¡¨È Ê
; 4
% a
#
$
$
?$
*
$
$
*
$
!$
$
$
$
$
$
$
$
$
$
$
!
!#"
'&
%$
)(
n
r
(
$
$
$
$
Û
È*¡¨§F©[Ô7 F¬[§*£/È*¡QÔ&§«¥\R£J/ F¬(¤& \Ï ¼N[ Ò®ÂÏ "#Û#Ò [ â Ï "&Ü<ÒI¶ M© ¥JJË
6 [ â 6 [ â Û
\Ï é Ò® [ 8aâ Ü Û ® Ü [ 8aâ Û ® Ü å
ºJ\OÄE£Q»b F¬(¤& dI¤[¥ A F¬(¤& BËA§ Ú
F¬%© [6 8 [ ® "5Ï Ð Ò¶Z¾
$
r
$
r
[?56767$&_-&@
$.=5S54INa4I=5S54=(94
K ¼ $&=(S,+á$&!4':=5S54INa4I=5S54=<+c':@ \Ï ¼ +`Ò® \Ï ¼ Ò \Ï +`ÒeK
Û[K$.=(S54Nd4=5S(4=59I4i'*0T0!#674+3':674I0$&0Q0!?564S³$&=5S|0Q#64I+Q'*640S54IQ':U&4S K
ÜGK R':0 !#':=<+4IU&4=<+30 2T':+32Nd#0Q',+3',U#4ON5!#"%$&"5':)*',+.-b$8Q4/=5&+T':=5S54Nd4=(S54=<+;K
<
$
?$
%
$
IDmg¦k Â mg ;kI
H0!0Q?56'*=5>R+Q2%$8+ \Ï +¦Ò\î Ú V 24TS54.-%=(4T+Q2549&=5S5':+Q'*#=5$&)GN(Q#"%$8"5'*):':+.-&@ ¼ #> ',U#4=
+325$8++á25$&0TG9I9?5!Q4S³$&0@B#):)* 2T0K

5,;¯B \Ï +`Òî Ú B¬%© B¬% z#d T]FWBd1t%wMv]W ]t xwªF W ¡¢` ¼
Ô&§F¸&© + §*£
ZÏ ¼ L +`Ò® \ZÏ ¼Ï +`+`Ò Ò å
ÏªÛGK <Ò
!
#"
c
%$
_ 5=1 /Z147E5
$
&
E&
$
$
$
^ 25'*= b&@ \Ï ¼ L +`Ò\$80\+3254R@BQ$&9I+Q'*#=b&@+3'*640 ¼ G9I9?5!0T$&6#=5>i+Q25#0Q4O':= 2T25'*9J2
+Â[99I?5Q0IK$P4!4O$&!4R0Q#64P@$89I+30\$&"d#?(+Z9I#=5S5',+3':#=%$&)%N5!#"%$&"('*)*',+3':40K
`%#Z$8=E-a-(~G4IS
+ 0Q?(9J2Ã+Q2%$8+ \Ï +`Ò\î Ú V \Ï L +¦Ò1'*0D$HN5Q&"%$&"5':)*':+.-/'ªK¿48Kx',+D03$¨+3'*0 -%40D+Q254\+Q25Q4I4T$¨~G'*#60
&@N(Q#"%$8"5'*):':+.-#K $.=N%$8!+3':9?5)*$&V \Ï ¼ L +`Ò Ú V Ï3{ L +`ÒT® $&=5Sm',@ ¼ â² ¼ ã;²;åå;å $8Q4
S5':0 !#':=E+a+3254I= \Ï / 9æ6 8aâ ¼ æ L +`Ò® 9æ6 8aâ ZÏ ¼ æ L +`ÒeK \?G+':+':0 '*=O>#4I=54Q$&)E×+QQ?54+Q2%$8+
\Ï ¼ L + / MÒO® \Ï ¼ L +`Ò \Ï ¼ L MÒKb^2(47!?5)*4I0M&@\N5Q&"%$&"5':)*':+.-}$&N5N():-m+Q³4U#4=<+Q0
$
$
O$
W$
$
$
6j
$
R$
$
$
Û
#=}+Q254¦)*4@F+H8@D+3254i"%$&K$.=>#4I=54Q$&)]',+H':0R%¦+3254M9;$&0!4i+325$8+ \Ï ¼ L +`Ò\® \Ï +,L ¼ ÒeK
]4I#N5)*4M>#4I+P+325':0P9I#=(@B?50!4S$8)*)+3254+3':6748K`×&c4e~($&67N()*4&V×+32(4iN5!#"%$&"5':)*',+.-&@0!Na8+30
>#':U&4=-##?|2%$;U#464;$&0!)*40':0 "(?(++3254/N5!#"%$&"('*)*',+.-7+Q2%$8+-&#?}2%$;U&464;$80Q)*4I0T>#':U&4=
+32%$¨+A-&#?b25$U&4P0!Na8+30A'*0=5&+ K $.=+Q25'*09;$80Q4&VE+32(4cS('SPa4Q4I=594c"d4I+ 244= ZÏ ¼ L +`Ò$&=5S
\Ï +,L ¼ Ò1':0D#"<U['*#?(0x"5?(+x+3254IQ4T$&!4\9;$80Q402T2(4Q4',+D':0x)*4I0Q0D&"EU[':#?50K]^25'*0x67':0!+J$ &4\':0
6b$8S54H&@F+Q4=4I=5#?5>#2b'*=):4><$&)d9;$80Q40+32%$8+Z':+'*0\0!#64I+3':674I09$&)*):4Sb+Q254HN(Q#0!49?G+3#¿0
@$&)*)*$&9I-&K
Z 6$
$
$
$
E &<M< 5,;Þ ÅbQ»&§«¥Q¤&È .J£ x;¡¨º¤ »&§*£Q¤¨£ ¬(¤¨£¡¨ÄG .¥J¡¨ÅbJ£ &¤ ©d» Ð¦¶ · ¬%
Ædº3¡ Á ¤ Á §FÈ §F ª§«J£M¤&º3
]e
g
¶ Ö<Ö EÕ ¶ Ö [Ö<Ö
Ð ¶ Ö<Ö [Ö ¶ [Ö&Õ[Ö
! º3¡¨Å B¬%|»[B¹©%§F ª§«¡¨©Ó¡¢³¥J¡¨©d»&§F §«¡¨©d¤&ÈÆdº3¡ Á ¤ Á §FÈ §F ÇWË Ï L ÒÃ® \Ï ¦² Ò " Ï Ò®
å Ú&Ú "5Ï¢å Ú&Ú å Ú&Ú#Ú Ò1® å ¤&©d» \Ï_aÐ L Ò® \Ï¢ÐM² Ò " \Ï Ò®½å #Ú Ú "5Ï_å #Ú Ú å Ú #Ú#Ú Ò å ¶ ZÆ<Æ(¤&º3©% È ÇWË B¬% .J£ §*£Ì¤&§FºeÈ Ç ¤[¥J¥Ä[º!¤& ¢ ¶ §«¥ mÆ%J¡JÆdÈ*Ç¨§«È,»¯¤
Æ%¡£§F ª§F¸& [Ö`Æ%ºJ¥J©% R¡¢` F¬% §FÅbb¤&©d»¬%Q¤&È B¬[Ç`Æ%J¡JÆdÈ*`Ç¨§«È,»¤³© «Ô<¤& §F¸&7¤ Á ¡¨Ä[ [Ö
Æ%º3¥J©% A¡¢T F¬%H §FÅb ¶ ÄÆ<Æ%¡£cÇ<¡¨Ä`Ô[¡A;¡¨ºP¤` .J£ ¤&©d»OÔ[ D¤PÆ%¡£§F ª§F¸& ¶ ¬(¤& ]§*£P B¬%
Ædº3¡ Á ¤ Á §FÈ §F ªÇRÇ<¡¨Äi¬(¤&¸&\ B¬%T»&§*£Q¤¨£ ¡£ (Æ%J¡JÆdÈ*Z¤&©(£\º ¶ [Ö ¶¦· ¬%T¥J¡¨ºº3J¥ 1¤&©(£\º
§*£ \Ï L /Ò® \Ï ¦² Ò " Ï /Ò®½å Ú#Ú "5Ï_å Ú#Ú må Ú &Ú#Ú Òx®å Ú å· ¬%TÈ*J£J£¡¨©¬%º3T§*£
B¬(¤& ]ÇE¡¨Ä7© JQ»M .¡¥J¡¨ÅHÆdÄ[ .c B¬%R¤&©(£\º©%Ä[Åbº§«¥Q¤&ÈFÈ Ç ¶ ¡¨© ] ºeÄE£ ]ÇE¡¨Ä[ºP§F©% Ä[§F ª§«¡¨© ¶
¾
$ @ ¼ $&=5'
S +á$&Q4/'*=(S54Nd4=5S(4=<+c4IU#4I=<+30P+Q254=
\Ï ¼ L +`Ò® ZÏ ¼Ï +`+`Ò Ò ® ZÏ ¼ZÏÒ +`ZÏÒ +`Ò ® \Ï ¼ Òå
<G`$&=(&+3254I\':=E+Q4!N5Q4+J$8+Q'*#=7&@'*=(S54Nd4=5S(4=59I4O'*0+Q2%$8+ E=5 2T'*=5%
> + S5G4I0Q= +9J2%$8=5>#4
+3254/N5!#"%$&"('*)*',+.-b8@ ¼ K
`×!#6 +Q254 S5.4 -5=5':+Q'*#=Ó&@9I#=5S5',+3'*&=%$&)HN(Q#"%$8"5'*):':+.- 2\4 9$&= 2T!':+Q4 ZÏ ¼ +`Ò®
\Ï ¼ L +`Ò \Ï +`Òc$8=5S$&):0Q ZÏ ¼ +`Ò\® \Ï +'L ¼ Ò Ï ¼ ÒK @F+34= Vd+Q2540!4i@B#Q6i?5) $&4/>#',U#4?50
$`9#=<U#4I=5'*4I=<+ 2\$-+QÃ9#6N5?(+Q4 ZÏ ¼ +`Ò 2T254I= ¼ $&=(S +á$&Q4O=(&+c'*=5S54INa4I=5S54I=E+K
$
$
$
$
M$
$
&
$
$
M$
$
$
$
$
$
$
$
$
$
?$
e
M$
$
$
\$
$
ÜÚ
E&<< /5 º!¤& ª\¡i¥Q¤&ºQ»¨£ eºJ¡¨Å ¤¦»[J¥ Ë1§F B¬%¡¨Ä[ 1ºJ.ÆdÈ,¤[¥JÅb©% ¶ À a¼ Á T B¬%
¸&©% F¬(¤& 1 F¬%¹º3£ ]»&º!¤&§*£ H¥JH¡¢DÈ Ä Á £c¤&©d»/È* + Á T F¬%H¸&©% F¬(¤& 1 F¬%Z£J¥J¡¨©d»
»&º!¤&§*£ ZÄ×J©¡¢ §¤&Åb¡¨©d»¨£ ¶M· ¬%© \Ï ¼ ² +`Ò® ZÏ ¼ Ò ZÏ +,L ¼ Ò®Ï " #Û#Ò Ï " Ò¶
¾
e
$
$
$
[?56767$&_-&@ \#=5S5',+3':#=%$&) !#"%$&"('*)*',+. K$ @ Ï +`ÒZî Ú +Q254=
\Ï ¼ L +¦Ò® \Ï ¼Ï +`+¦Ò Ò å
Û[K \Ï L +¦Ò0Q$8+3':0 -%4I0]+32(4Z$¨~G'*&6701&@%N5!#"%$&"('*)*',+.-#V@B#-(~G4Sa+bK
$.=¦>#4I=54Q$&)«V ZÏ ¼ L Ò
S(G4I0c=(&+c03$8+Q'*0!@F-7+3254$¨~G':#6708@xN5!#"%$&"('*)*',+.-#VE@B#N-(~G4S ¼ K
ÜGK$.=|>#4I=54Q$&)«V \Ï ¼ L +¦Ò ® ] \Ï +,L ¼ ÒK
(K ¼ $&=(,S +á$&!4':=5S54INa4I=5S54=<+c':@x$&=5S|#=5),-',@ \Ï ¼ L +¦Ò® \Ï +`ÒeK
<
$
$
$
2$
$
M$
E$
$
$
$
lOn
mlRlOh
Z$;-#4I0 <+3254I#Q4I6'*0D$/?(0Q4I@B?()%Q40!?5):++Q2%$8+':0x+3254T"%$80Q'*08@N&_4~GNa4I!+A0!-[0_+3460 (M$&=5S
& Z$;-&40 ×=54+30I)K (C`x'*!0!+;V 2\4/=54I4Sm$`N5!4):'*6'*=%$&_-Q40!?5):+K
¨ 8
< À c¼ â²;å;å;åI² ¼ Á ³¤Æ(¤&ºe §F ª§«¡¨©
¡¢ {/¶7· ¬%©(Ë%;¡¨ºM¤&©%Ç¸&©% + Ë \Ï +`Ò® 9æ 8aâ Ï +,L ¼ æBÒ \Ï ¼ æÒI¶
K
g
6"
7 F
l
fi
Q:
K
7OI
7O/
F 7EA
+
$
A=1 1 / ; $
?$
R4.-5=54 [ ® + N¼ [ $&=5S=5&+Q4H+325$8+ Pâe²å;å;å² R$&!4RS5'*0 !#'*=<+\$&=5S+Q2%$8+
® / [ 8aâ [ $K c4=59I4&V
\Ï +¦ÒA® [ ZÏ [ Ò ® [ \Ï + ¼[ Ò® [ \Ï +'L ¼N[ Ò ZÏ ¼N[ Ò
!#"
$!%
$
r
$
r
$
r
$
$
Ü
0Q':=594 ZÏ + ¼N[ Ò ® \Ï +'L ¼N[ Ò ZÏ ¼N[ Ò@B!#6+32(4HS54 -%=5':+Q'*#=&@9I#=5S5',+3':#=%$&)%N5!#"%$&"('*)*',+.-#K
¾
H > ; ,
$
$
$
8 I ¨ 1À (¼ âe²;åå;å² ¼ OÁ \¤Æ(¤&ºe ª§F §« ¡¨©¡¢ { £ Ä×¥J¬M F¬(¤& \Ï ¼ æFÒZî Ú ;¨¡ º¦Q¤[¥J¬ í3¶ B \Ï +¦Òî Ú F ¬%©(Ë×;¡¨ºiQ¤[¥J¬ í]® ²;å;ååI² Ë
\Ï ¼ æ L +¦ÒA® \[ Ï\+'Ï +'L ¼ L N¼æFÒ [ Ò Ï \¼ Ï æN¼ Ò [ Ò å
«Ï Û[K #Ò
7 F
K
l fi
<;
3
$
7 F
K
$
$
$
?$
$
?$
a¨R 5 ) m¥Q¤&ÈFÈ \Ï ¼ æBÒ B¬% v]¨Bdv]W ]t xwªF W½ ¼ ¤&©d» \Ï ¼ æ L +`Ò B¬%
vdsIyE¨×bv1W ]t ]wªF W ¼ ¶
L4\$&N5N5),-R+32(4\S54 -%=5':+Q'*#=&@(9#=5S(':+3':#=%$&)<N(Q#"%$8"5'*):':+.-H+ 2T':948V8@B#):)* 2\4IS
"<-+3254/) $ 2r&@1+3&+3$&) N5!#"%$&"5':)*',+.-aë
ZÏ ¼ æ L +`Ò® \\Ï ¼ Ï +¦Xæ +¦Ò Ò ® \Ï +,\L ¼ Ï æB+`Ò \Ò Ï ¼ æFÒ ® \[ Ï\+,Ï +'L ¼ L N¼æBÒ \[ Ò Ï Z¼ Ï æBN¼ Ò [ Ò åm¾
E&<M< 5, 7»&§F¸§»[Å`Ç Å¤&§FÈ§F©% .¡} F¬[º3J¥Q¤& ¢«Ô[¡¨ºe§«J£ ¦¼ â £.Æ(¤&ÅiË ¼ ã
È*¡¨Ædºe§«¡¨ºe§F Ç ¤&©d»T¼ ä ¬[§,Ô¨¬PÆdºe§«¡¨ºe§F Ç ¶ ! º3¡¨Å¯Ædº3¸§«¡¨ÄE£c!ÌIÆ%ºe§«© ¥Ja¹©d»O F¬(¤& \Ï ¼ âQÒ®½å Ë Ï ¼ ãÒ® å Û ¤&©d» \Ï ¼ äeÒ® å ¶ O¥J¡¨Ä[º3£JË å å Û å ® ¶ À + Á
B¬%M¸&©% B¬(¤& F¬%iÅ¤&§FÈ¥J¡¨©% ¤&§F©(£R B¬%R\¡¨º!» eº3J ¶ ! º3¡¨ÅÆdº3¸§«¡¨Ä<£M!ÌIÆ%ºe§«© ¥JJË
\Ï +,L ¼ â!Ò® å Ë \Ï +,L ¼ ãeÒ®å Ú Ë \Ï +,L ¼ â!Ò\®å Ú ¶ ¡¨ . å Îå Ú rå Ú ® ] ¶
º3J¥J§F¸&c¤&©³Å¤&§FÈG§F B¬¦ B¬%T\¡¨º!» eº3J ¶ ¬(¤& §*£T F¬%xÆdº3¡ Á ¤ Á §FÈ §F Ç/ B¬(¤& §F §*£Z£.Æ(¤&Å
¤&Ç<J£ E B¬%J¡¨º3Å Ç¨§«È,»¨£JË
\Ï ¼ â L +¦ÒA® Ï_å å Ò rÏ_å Ú å åå Û#Ò XÏ¢å Ú å Ò ®½å [å ¾
9F
$
$
&
!#"
)&
&
E&
,
$
$
$
$
$
$
$
$
$
$
$
$
$
$
g
$
?$
$
$
(
g
&
$
$
k 7gij ; lRh g¦ n
^2546b$8+Q4!' $&)A'*= +Q25'*0`9J2%$8N(+34I`'*0`0!+3$&=5S%$8QS K R4I+J$8'*)*0M9;$&= "d4@B#?5=5S ':= $&=<=E?56¦"d4&@1"dG E0IK Z++3254O':=E+QQ[S5?59+3#_-b):4IU#4I)«V[+3254IQ4O'*0 R4 Q[&+$&=(S'<G9J254_U['*0Q2
!
%
%
%
Ü<Û
Ï«Û Ú#Ú Û#ÒeV$8+O+3254Ã':=E+Q4!674IS5' $8+Q4¦)*4U#4I)«V !'*6674+Q+R$&=5SC<[+3': ;$&47Ï Û#ÒP$&=5S$8Q
Ï Ü<ÒeV×$&=5Sm$¨+P+Q254i$8S(U8$&=594ISm)*4U#4I)«V \'*):)*':=5>#0Q):4I-}Ï Ò$&=5S \Q4I'*67$&=Ï ÒK$
$&S%$8N(+34IS 67$&=<-4e~5$867N5):40/$&=(S N(Q#"5):460R@BQ#6 R4 Q[&+O$8=5S <[9J254_UG':0Q2¯Ï«Û Ú#Ú Û#Ò
$&=5S Q':6764I+!+$&=5S <[+Q'* '$ &4IMÏ Û#ÒK
lO cgik jjlO 4I=54Q$&)*),-#V',+'*0=5&+@B4$&0Q':"5)*4x+Qc$&0Q0!'*>#=N5Q&"%$&"5':)*':+Q'*4I0a+3c$&):)&0!?5"50Q4+30]&@5$0Q$&67N()*4
0QN5$&94T{RK $.=(0!+34$&S V<#=(4Q40_+3!'*9I+Q0$8+!+34=<+Q'*#=¦+QO$R0Q4+&@d4IU#4I=<+309;$&):)*4IS`$ Qt%w ×y 1¨t
#T$ xy[w TC2T25'*9J2|'*0T$`9I) $&0!0 ê+325$8+c03$8+Q'*0 -540ë
Ï' NÒ *ç V
Ï':'FÒ',@ ¼ â² ¼ ã²;å;å;åI²;ç á+3254I= / 9æ6 8aâ ¼ æ1ç $&=5S
Ï':'*'FÒ ¼ ç ':67N()*'*4I0+Q2%$8+ ¼ ç K
^25470Q4I+Q0¦'*= $&!470Q$&'*S +3}"d4uy[t5s;1Wt ]wBydKL4b9$&)*)PÏª{R² Ò$³uyEt%s;¨t xwBy
sv1t%z#y× $ @ '*0Z$¦N5Q#"5$&"5'*):':+.-64;$&0!?5Q4OS54 -%=54IS³#=
+Q254=Ïª{R² ² iÒ'*0Z9$&)*):4S$
v1¨ 1t ]ªwB W sv1t%z#y×iLì254= { ':0c+Q254¦Q4$&)1)*'*=(4&V 2\4i+3'$ &4 +3"d4M+Q254¦0Q67$&):)*40_+
-%4):Sb+32%$¨+9#=<+3$&'*=50T$8)*)×+3254R&Na4I=|0!?5"50!4I+30IV 2T25'*9J2':0Z9;$&):)*4ISb+325
4 ¦×Wy[w ]y[w TK
p ³TlOPnxlOn
K `x':)*)¨'*=O+32(4AS(4I+J$8'*)*0 &@G+Q254N5Q[&@G&@G^254#!46½ÛGK K H):0Q5V¨N5!WU&4+32546#=5&+Q#=54
S54I9!4;$&0!'*=5>9;$&0!4&K
ÛGKcQU&4+Q254/0!+J$¨+3464=<+Q0c':=4CE?%$8+Q'*#=Ï«Û[K ÒeK
Ü(K B4I+M{Â"d4$|0Q$&6N5)*4`0!N%$&94b$&=5S )*4+ ¼ âe² ¼ ã;²;å;å;åI²"a44IU&4=<+30IK R.4 -%=(4 + ³®
/ 9æ6 8 ¼ æx$8=5S /® : 9æ6 8 ¼ æ«K
Ï$<Ò <G25 2X+Q2%$8>+ +â I +Hã I $&=5S+32%$¨+ Pâ G +Hã G K
ÏB"×Ò <G2( 2o+325$8+b ç : 6 8aâ + ',@$&=5SÓ#=5),- ':@Rá"d4)*&=5>#07+Q $&=Ó'*= -%=5':+Q4
=E?56¦"d4T&@1+32(44U#4I=E+Q0 ¼ âe² ¼ ã²;å;å;å,K
ÏB9KÒ <G25 2Â+Q2%$8+O ç / 6 8aâ ':@\$&=5S #=5),-m',@X"d4)*&=5>#0O+3|$&):)x+3254Ã4U#4=<+Q0
¼ âe² ¼ ã²;å;å;å54e~G94NG+HNd#0Q0!'*"5),-%$ -%=5',+34O=E?56¦"d4T&@1+325&0Q4/4IU&4=<+30IK
5KB4I+P° ¼ æ1ë¦íç %µ/"d4R$i9I#)*):49I+Q'*#=&@4U#4I=E+Q0 2T254!4 Ã'*0$&=$&!"5':+Q3$&_-Ã'*=5S54e~
*
%
^ '
f
&
^ '
)&
E&
*
&
*
)&
^
^ '
%
E
E
ZM (
Ü#Ü
0Q4+;K<G25 2 +Q2%$8+
A ¼ æ $&=5S
A ¼ æ ® 7 ¼ æ
¼
æ
®
æ
æ
æ
æ
P':=<+;ë`x':Q0!+TN(QU#4/+325':0Z@B# `®° ²;å;å;å² µEK
GK><G?5N(Na#0!4R24Ã+3#0!0i$@$&'*H9#':=?5=<+Q'*)247>#4+4~($&9I+Q):-m+ 2}2(4;$&S50IK R4I0Q9!'*"d4
+32(4i03$867N5):4/0QN%$&9I4 ZKZLì2%$8+H'*0T+3254N5!#"%$&"5':)*',+.-b+325$8+H4e~($&9I+Q):- +Q#0Q0!40O$&!4
Q4ICE?5'*!4S
(KB4I+H{Î® ° Ú ² ²;å;å;å²µEK!WU&4i+325$8+P+Q254!4iS5[40P=5&+P4~G'*0_+R$?5=5',@B#Q6 S5':0!+QQ' "5?(+Q'*#=#= {'«K 4&K³':@ \Ï ¼ Òi® \Ï +¦KÒ 2T254I=54IU&4E L ¼ L® L +,Ld+3254I= Î9;$&=5=(&+
03$¨+3'*0_@F-+32(4M$W~(':#60Z&@]N5Q#"5$&"5'*):':+.-&K
K B4I+ ¼ âJ² ¼ ã²;åå;å("d4/4IU#4I=<+30NK <G25 2X+Q2%$8+
6
76 ¼
Ï ¼ <Ò å
8aâ
8aâ
P':=<+;ë R4 -%=5M
4 + ¦® ¼ /Ð / æ98aâ â ¼ æ.KZ^254I=m0Q2( 2 +Q2%$8+c+325M4 + 7$&!4MS5':0 !#':=<+
$&=5S+325$8>+ / 6 8aâ ¼ M® / 6 8aâ + GK
K <G?5N(Na#0!4+Q2%$8+ \Ï ¼ æBÒ® @B#4$&9J2³í!KA!U#4/+32%$8+
>
A 6 ¼ æ Â® å
æ98aâ
7
o
p
o
p
%
*
k$
$
o
$
$
r
p
%
$
$
$]o
p
K`%#N-(~G4S +á0!?59J2}+Q2%$8+ \Ï +`ÒZî Ú V%0Q25 2 +Q2%$8+ \Ï L +`Ò\03$¨+3'*0 -%40T+32(4M$W~(':#60
&@]N5Q&"%$&"5':)*':+.-&K
Ú K A#? 25$U&4N5Q&"%$&"5),- 254;$&!S':+M"a4@B#Q48K P 2 -&#?9;$&=0!#):U&4b',+i!'*>##!#?50!):-#K
$ +`':0i9;$8)*)*4IS+Q254 & ³&=E+.- H$&):)A!#"5):46|K)(
N(Q' I4b':0iN5)*$&94IS $8+¦Q$&=5S5#6
"d4I+ 244= #=54&@c+325!44S5[#Q0IK A&? N('*9 $ S5[#K^1"d49#=(9Q4+348V\)*4+ 0
0Q?(N5Na&0Q4Ã-##?$&S) 2\$;-[0N('*9 mS5[# K P 2 ³&=E+.- P$&)*)]9J25[#0Q4I0M#=(48@+Q254
&+Q254D+ 2MS(G#!0V<#Nd4=(0':+$&=(S0!25 2T0A-&#?`+32%$¨+':+':046N(+.-#Kc4Z+3254I=7>&':U#4I0
-#&?+Q254P&N5Na&!+3?(=5':+.-M+3&44IN7-&#?5S5[##0 2T',+39J2Ã+3/+3254T&+Q254?5=5&Na4I=54S
$
2$
,
,
Ü'
S5[#IK<G25#?()*S-##?|0!+3$-#0 2T',+39J2 $.=<+3?(':+3':#=0Q?5>&>#40_+30c':+S5[40!= +P67$8+!+34IK
^254/9I#Q!49I+H$&=50 2\4IP':0Z+32%$¨+c-&#?}0!25#?5):S0 2T':+Q9J2 K!U#4i',+;KF$ +2T'*):) 254I)*N+3
0!Na4I9':@F-b+3254/03$867N5):4/0QN%$&9I4i$8=5S|+Q254/Q4I)*4IU8$&=<+c4IU&4=<+30H9;$&!4I@B?5):):-#K^2E?502TQ',+34
{ ®°GÏFâe²_xãJÒë¦1æ1ç° ²eÛ[²JÜGµ<µ 2T254IQ4OAâ':0 2T254!4+Q254ON5!' 4O':0T$&=5S]ãT'*0
+Q254/S5G& ³#=<+.-#Nd4=50IK
K><G?(N5Na&0Q4M+Q2%$8+ ¼ $&=5S + $&Q4'*=(S54Nd4=5S(4=<+/4IU&4=<+30IKK <G25 2+32%$¨+ ¼ $&=5ES +
$&!4O'*=5S(4Nd4=5S54I=<+H4U#4=<+Q0K
ÛGKT^254IQ4$&!4¦+325!44Ã9;$&!S50KÃ^25%4 -%!0!+'*0O>&Q44I= #="d&+Q2 0!'*S54I0V+32(4`0Q49I#=5S'*0
!4S#="d&+320!'*S54I0Ã$&=5S +Q254b+32('*QS':0i>#!44I= &=#=540!'*S54$8=5S Q4S&= +3254
&+Q254IKiL49J2(G#0!4$9$&QS$8+OQ$&=5S5&6 $&=5ES 240!44#=54`0!'*S54³Ïª$8)*0Q9J2(#0Q4I=
$8+M3$8=5S5#6bÒ\K $ @+Q254b0!'*S5a4 2\4b0Q447'*0>#Q4I4= V 2T2%$8+¦':0+Q254bN(Q#"%$8"5'*):':+.-m+Q2%$8+
+Q254T&+3254I0Q'*S(4c':0$&)*0!/>#Q4I4= ³$&=<-`Na4I#N5):4c':=<+3?5',+3':U&4),-¦$&=(0 24 ÛGK <[25 2
+Q2%$8++32(49I#Q!49+H$8=50 24c'*0cÛ Ü(K
Ü(>K <G?(N5Na&0Q4P+Q2%$8+Z$@$&':A9I#'*=7'*0A+3#0!0Q4SbQ4Nd4;$¨+34S():-7?5=<+3'*)×"d&+32b$i254$&S$&=5S7+J$&':)
2%$;U&4i$&N(Na4$&Q4IS}$8+T):4;$&0_+T#=59I4&K
Ï$<Ò R40!9Q':"a4R+Q254/03$&6N5):4O0QN%$&9I4i{RK
ÏB"×ÒLì2%$8+c'*0Z+Q254/N5Q&"%$&"5':)*':+.-+325$8++325!44O+3&0Q0Q4I0 2T':)*)a"a4/Q4ICE?5'*!4S
5K><G2( 2á+325$8+`':@ \Ï ¼ ÒÃ® Ú # Ï ¼ ÒÃ® +Q254= ¼ ':0¦'*=5S(4Nd4=5S54I=<+78@P4U#4I!&+Q2544IU&4=<+;K <G25 2ì+32%$¨+':@ ¼ '*0\':=5S54INa4I=5S54=<+T&@',+30!4):@ +3254I= ZÏ ¼ Ò':0\4I':+32(4
Ú # K
GKT^254\N5Q#"5$&"5'*):':+.-O+Q2%$8+$P9J25'*):S¦2%$&0]"5)*?54\4I-#4I0'*0 5K H0Q0!?564Z'*=5S(4Nd4=5S54I=594
"d4I+ 244I=}9J25':)*S5!4= K \#=(0Q'*S(4c$Ã@$867':):a- 2T':+Q2 ¦9J25':)*S5!4= K
Ï$<?Ò $ @R':+Ã':0 [=( 2T=ß+32%$8+7$8+):4;$&0_+7&=54³9J2('*)*S 2%$807"5):?544I-&40NV 2T25$8+b':0`+3254
N5!#"%$&"('*)*',+.-Ã+32%$8+P$8+T):4;$&0_++325!44/9J25':)*S5!4=2%$;U#4"5)*?(44-#4I0
ÏB"×@Ò $ @T',+¦'*0 E=5 2T=+32%$8+M+32547-##?5=(>#40_+Ã9J25'*):S2%$&0i"5)*?5474I-&40V 2T2%$8+i'*0+3254
N5!#"%$&"('*)*',+.-Ã+32%$8+P$8+T):4;$&0_++325!44/9J25':)*S5!4=2%$;U#4"5)*?(44-#4I0
(>K <G2( 2r+32%$¨+
Ï ¼ + MÒA® ZÏ ¼ L + iÒ Ï +,L iÒ \Ï MÒå
,
,
%
$
$
$
*
$
$
$
$
Ü
K<G?5N(Na#0!4 4IU#4I=<+30b@B#!6 $N5$&!+Q':+3':#= &@H+Q254|03$&6N5)*40!N%$&94}{RVZ'«K 4&Kß+3254$&!47S5':0 !#':=<+$&=5S / æ98aâ ¼ æ\® {RK P0Q0!?5674+32%$¨+ \Ï +`Ò¦î Ú KQU &47+325$8+':@
\Ï ¼ â L +`Ò ZÏ ¼ âQÒ+Q254= \Ï ¼ æ L +`Ò\î ZÏ ¼ æÒ@B&0Q#64/í1® ÛG²;å;åå² K
K<G?5N(Na#0!4¦+32%$¨+/Ü Ú Nd4Q9I4=<+/&@A9&67N5?G+34H 2T=(4Q0/?50!4Ã$ }$&9I'*=<+3#0!2 V Ú ?(0Q4
Lì'*=(S5 2T0]$8=5S¦Û Ú Nd4!94I=E+]?50!4B'*=E?G~aK<G?5N5Nd#0Q4+Q2%$8+ ZNa4IQ9I4=<+x8@G+Q254 }$&9
?50!4Q025$U&4R0Q?(99?(6¦"a4IS+3`$i9I#6N5?(+34IU['*!?50V ÛMNd4!94=<+T&@+Q254PLì':=5S5 2T0
?50!4Q0\>#4I++3254cUG':Q?50\$&=5S Ú Nd4Q9I4=<+&@+Q254 B':=[?[~b?(0Q4!0Z>#4+\+32(4PU['*!?50KDL4
0Q4I)*4I9I+T$¦Nd4!0Q#=$8+\3$&=(S5#6$&=5S)*4$&Q=7+32%$8+Z254Z0_-[0!+34I6 2\$&0Z'*=G@B49I+Q4S 2T',+32
+32(4/U[':Q?50IKLì25$8+T'*0Z+Q254/N5Q#"5$&"5'*):':+.-+32%$¨+T0Q254'*0T$¦Lì'*=(S5 2T0c?(0Q4
K "d~9#=<+J$8'*=50 ¦9&'*=50T$&=(S|4$&9J2|2%$&0c$`S5S' Pa4!4=<+TN5Q&"%$&"5':)*':+.-&@]0Q2( 2T':=5>
254$&S50K B4I+ âJ²;å;å;åI² S54I=5&+34m+Q254mN5!#"%$&"('*)*',+.- &@/254;$8S50#=Ó4;$89J2Î9I#'*= K
<G?5N(Na#0!4+Q2%$8+
aâ® Ú ² 5ãZ® "5²%ä\® "#Û[² ®rZÜ " $8=5
S \® å
B4I+A± S54I=5&+34 &!2(4;$&S50':0#"(+J$8'*=54IS (/$&=5SÃ)*4+ \æ×S54=(&+34+32(4c4U#4I=E+A+325$8+9#':=
í'*00!4)*4I9I+Q4S K
Ïª$#Ò <G4):49I+i$9#':= $8+3$&=5S(#6 $&=5S +3&0Q0/':+RK <G?5N5Nd#0Q47$254;$&S '*0O&"(+J$&':=54S K
Lì2%$8+]'*0+Q254Nd#0_+34!'*#N(Q#"%$8"5'*):':+.-H+Q2%$8+19&'*=í 2\$&010Q4):49+34SÏí1® ²å;å;å² #Ò
$.=|&+3254I
2\#!S50V -5=5S \Ï \æ L ±mÒ@B#í]® ²;åå;å² [K
Ï"%ÒP^]&0Q0R+32(4¦9#':=$&><$8'*= KHLì25$8+O'*0c+3254`N(Q#"%$8"5'*):':+.-&@A$&=(&+3254IR254$&S $.=
&+Q254 2#QS50 -%=(S \Ïª±`ã L ±â¢Ò 2T254!4i± [ ® &!254;$8S50T#=|+Q#0Q40 ^%5K (
P 2Ó0!?5N5Nd#0Q4T+325$8++3254T4~GNd4!'*64=<+ 2\$&09;$8QQ':4S#?(+$&0@B#):)* 2T0K1L4T0Q4I)*4I9I+
$`9#':=|$8+T3$&=(S5#6 $&=(S|+Q#0Q0T':+T?(=E+Q'*)$`254$&S|'*0#"(+3$&'*=(4S K
Ï9;Ò `x'*=5S \Ï \æ L + Ò 2T254!34 + ® &-%Q0_+P2(4;$&S|'*0#"G+J$&':=54S#=+3#0!0 5K)(
Û Ú KÏ E[& P eK Ò <G?5N5Nd#0Q4 $9I#'*=Ó2%$80N5Q#"5$&"5'*):':+.- Î&@/@$&)*):'*=5>
254$&S50K $ @ 2\4OY%':N+3254O9I#'*=67$&=<-b+Q'*640IV 2K4 2#?5):S4~GNd49I+T+Q254ON5Q&Na#_+3':#=
&@254$&S50\+Q¦"d4R=54;$8
KxL4 2T'*):)×6b'$ 84H+32('*0@B&Q67$&)×) $8+Q4Kx^x'$ 84 ® å Üi$&=5S
® ;Ú#Ú#Ú $8=5Sm0Q':6¦?5)*$8+34 9I#'*=|Y%'*N(0KR):&+P+Q254iN(Q#Nd#!+Q'*#=³&@254$&S50O$80R$
@B?5=59+3':#=&@ K #c4INa4$8+@B#b® å Ú Ü(K
Û KÏ <[# P eK %Ò <G?5N5Nd#0!N4 2\4Y%':N`$R9#':= +3':674I0D$&=(S):4I+ ÃS54=58+34
+32(47N5!#"%$&"('*)*',+.-}8@Z254$&S50K B4+ "d4Ã+Q254=[?(6¦"a4IM8@Z254$&S50K7L49;$&):
) ZM (
$
$
$
$
$
,
!*
,
$
$
$
+7
P/ 9F
9F 1
5/
+7
P/ 9F
9F 1
5/
Ü
*
$ "5':=5#6' $&)TQ$&=5S5#6 U8$&Q'*$&"5)*4'2T25'*9J2ì':0S5'*0!9?50!0Q4ISX'*=¯+Q254=54e~G+9J25$&N(+34IK
$.=<+3?(':+3':#=0!?5>#>#4I0!+Q0¦+32%$8+ 2T'*):)"d49):#0Q4+3 K}^]0!44':@\+325':0¦'*0+3Q?(4&V
24¦9;$8=Q4Nd4;$¨+R+325':0H4e~(Nd4!'*64=<+O67$&=<-³+Q'*640H$&=5S$;U#4I3$&>&4¦+3254 U8$&):?540IK
Z$&!!-Ã#?G+Z$0Q'*6i?5) $8+Q'*#=Ã$8=5Sb9I#67N5$&Q4T+3254H$;U#4Q$&>#4P&@ +Q254 0+3 KD^]_+Q25'*0Z@B#®½å¿ÜÃ$&=5S ® Ú ² ;Ú#Ú ² Ú&Ú#Ú K
Û#ÛGKMÏ HE[& H eK Ò P4IQ4 24 2T':)*)>#4+¦0Q&6744~GNd4Q':4=59I40Q':6¦?5)*$8+3':=5>
9I#=5S5',+3'*&=%$&)dN5Q#"5$&"5'*):':+Q'*40IK \&=50Q':S54+3&0Q0Q':=5>Ã$¦@$&'*S5'*48K B4I+ ¼ ®°8ÛG² 5² Gµ
$&=(?S + ®° ²eÛ[²JÜ(² (µEK^254I= V \Ï ¼ Ò® "#Û[V \Ï +`Ò®XÛ "&Üc$&=5S \Ï ¼ +`Ò® "8Ü(K
<G':=594 \Ï ¼ +`Ò® \Ï ¼ Ò \Ï +¦ÒV<+3254T4U#4=<+Q0 ¼ $&=5S + $&!4T'*=5S54INa4I=5S54I=E+4
K <G'*6i(? )*$8+34cS53$ 2T0A@BQ#6 +32(4H03$867N5):4P0!N%$&9I4O$&=5SbU&4!':@F-Ã+32%$¨+ Ï ¼ +`Ò® Ï ¼ Ò bÏ +`Ò
2T254IQ4 Ï ¼ Òi'*0¦+32(4N5Q#Nd#_+3'*&= &@c+3'*640 ¼ G9I9?5!Q4S '*= +Q2540Q':6¦?5)*$8+3':#=
$&=(S70!'*6'*) $8Q):-O@B& Ï ¼ +`Ò$&=5S 7Ï +`ÒeK P 2 -%=(S+ 2i4U#4=<+Q0 ¼ $&=5S ++Q2%$8+
$&!4R=5&+T':=5S54INa4I=5S54=<+K \&67N5?G+34 Ï ¼ Ò² 7Ï +`Ò$&=5S Ï ¼ +¦ÒK \#6N%$&Q4P+3254
9$&)*9I?5) $8+Q4SU8$&)*?(40O+3+3254I'*R+Q254#!4I+Q'*9;$8)DU&$8)*?54I0K #c4INa#_+/-##?5Q4I0Q?5),+30$&=5S
':=E+Q4!N5Q4+;K
P/ 9F
#7
9F 1
5/
*
O$
$
$
9$
$
?$
*
*
*
*
*
*
*
*
*
#! "%$'&'()+*+,-$'./(0"
132546287:9;2<7:=>9%4@?BACAD46254EF7G?H7:?HI4@JLKM=/NO?B=/K/JL?HK/ACPQ7R2<SCAD462<4BTVUWNXPYAHNPZK+[G7:?H\
9<4@E0]H[GK^9L]D4@=>K/9_46?HA`KbaOK/?c289d2<N0AD462<4cegf-SBKh[G7:?H\%7G9Q]HJ8N a37:AHK>Aìcjk2<SBKh=>NO?H=/K>]B2_N6l'4
J<4@?BAHNOEa64@JL7m4@iH[GK@T
n'o>prqOs t<s uBqwvHxzy|{~}XDr BH}BV:0bcr3+ ^ B@
5bR@B^<8@ ¡D¢30£bb¤¥¦Z§¨ª©k83«5`©¢¬«5©-¦d®
¯°o/±²cqOs³±µć¶·¶ ¸/¹º´¼»m´½qµ¾
Ï 2Q4Ð=/K>JL2<4@7:?%]rNO7G?½2Z7:?%E0NO9;2¤]HJLNOiD4@iH7G[:7R2jw=/NOÑBJ898K>9/ÒB28SHK_984@EF]B[:Kd9L]D4@=>K_7:9ÓJ<4@JLK/[Gj ¿cÄOuBokÀ Àdo/Á´´ÇÈ»ÂÆcsÃć»ćÄcÄc¶ oÅ¶ o@x%ÀhÉBÆXo>Çªo t
EFK>?c2<7:N@?HK/AÔ4@?BAÔPÓK-PZN@J8\ÐAB7:J8K>=>28[Gj^PQ7G2<SwJ84@?HAHN@Ea64@J87:4@iH[GK/9/TÕZÑB2'j@NOÑÔ98SHN@ÑH[:Aw\6K/K/] t5²Xo tªo/±²cqOs³±/´½¶´½Êµ¾
7:?E07:?HA%28SD462Q2<SHKÖ984@EF]B[:K_9L]D4@=/K^7:9-JLKµ4@[G[Gj02<SHK>J8K6Òr[GÑHJ8\½7G?HIw7:?k2<SBKîH4@=5\½IOJ8NOÑB?HA°T ÊOo6qX¿Bs ËÍÌuc»¿co>t;ÓsÎ¶ Ç/x
×½Ë@ćÀhÊc¶ o^vHxÙØÛÚÜ Ý·ÞM«5©ßb5/®Íà#bá¥m¦Z§Í£bÐDÐD¢¬0£bbÔ©ªâ^D8cãÐßD
8ä/¢åbæ«5Z¦d®QÚ©ÐLç½@dr :5è' âZ¦êéìëMëîíïëMëîíïëMëîíQíDbF¥¦Z§'éñðH®¤ò
×½Ë@ćÀhÊc¶ o^vHxÙvÛà#brÛéôó¬¥õ#ö<÷B§bø'õåùcúF÷3ùQûVü@ýÔ£b¤D¤¢¬Dã@: «/®ÖþÓ©Bbã3b-ã@L@ÿ'3
_D©DÂ@'8@rã3©#âÈ<©Ö®Mdÿ' r0 c_3:_ã38w©<Ür55«b: d R@b/® {
5r «8@ D©¢¬«5©Ó:ï©ªâ Dæâµ©ÈÛ¦êé ¥õ#ö<÷B§/® °©QLç½@dr :5-©ªâ 8@rã3© @Èc£/ :5
@<Q¥¦Z§áé õ°è %¥¦Z§'é ÷åè Ð¥¦Z§'é õwú÷Dè ¥¦Z§áé õ ù úê÷ ù ®-ò
Ö7RaOK/?^4-J<46?HAHNOE a64@J87:4@iH[:K 4@?HA^4¤98ÑHiB98K>2 ÞN@l32<SBK'J8Kµ46[O[:7G?HK@ÒµABK D?HK b¥ §Üé
óµ"¦ !MìÔ¥¦Z§ ! ý 4@?BA [GK>2
#%$
#
!
"
¥ ! §~é # ¥ ¥ §8§Üé # ¥óµ¦"!+_øQ¥m¦Z§ ! ý6§
# ¥é õ §~é
# ¥ ¥õ §L§Üé # ¥ªóµ¦ !+_øQ¥¦Z§'éCõ¡ý6§%$
AHK/?BN@2<K>9Q28SHKÖJ<4@?BAHNOEa64@JL7m4@iH[GK¨4@?HA õ AHK/?HN62<K/9ï4Ô] NO9L987GiH[:Kda64@[GÑHK¨N@l T
#
½× Ë@ćÀ^Êc¶ oÖvHx'& ÚÜ Ý·ô «5©ñÿ' «5@rã :b¨ £b+DßD¢¬0£bb©ªâîD8cã/®)(°DbBè
# ¥ é+*c§-é # ¥ó íQí^ý6§Zé ü-,/.¬è # ¥ éÅüX§¤é # ¥ªóëîíQöLídëßý6§ZéÅü0,1+@rã # ¥ é
1O§Mé # ¥ªóëMëßý6§`é ü-,2.å®3(°DML@rã3© @bc£/ :M@rãÞgã@:bÂÈ£/¢3Â ©ô«8@ £b
b¢3w0@b5 48ãFÜâµ© :©ÿ76
¦ # ¥ªóµ¦Qý6§ ¥m¦Z§
õ # ¥ éCõ §
(8(
9%:;
<
<
-ü ,2.
(>=
9%:;
9
9
-ü ,1
=?(
9%:;
9
-ü ,2.
@
==
9%:;
@
(H Ð3bæb8@ Ý5 4/3Ô3:^B
© ADCÓ·D/®¤ò
FE
I. H$á& .7J+*`$'./(0" K *+"+,-$'./(0"LHNMÐ"+) O &'(?JPMQJ+.IRµ.>$TS K *+"+,
$'./(0"LH
Ö7RaOK>?g4%J<4@?HABNOEYa64@JL7m4@iB[:K ÒæPZKwABKD?HKÔ46?+7:E0] NOJ;254@?c2WlÑH?H=b2<7:N@?+=µ4@[G[:K/Aî2<SHK
=/ÑBEÍÑH[m42<7Ga@KFAB7:9L28J87GiHÑB2<7GNO?glÑH?H=>287:NO? ¥ NOJhAH7:9;2<JL7:iHÑB287:NO?glÑH?H=b2<7GNO? § 7:?28SHKFlN@[:[:N PQ7G?HI
PZ4 j@T
n'obp qOs t<s³uBqÍvHWx VX(°DZ Y\[] [_ ^X _a `b^ }c [d^X f eg[Z Yh^X j i>kZlnmZo Oì
p *Böµ7ü qQ©ªâ^L@rã3© @Èc£/ :- :Ðã3s rÜæ8ã%£ mZoÖ¥õ §Üé # ¥ ûÛõ %§ $
G
U
_ "! 8Z%8QZ%"B d7QZ%"
¨¥õ §
mZo
T$
ü
T1
#
õ
ü
1
7GIOÑHJ8K # T ü@i>kl lNOJ D7:]B]H7:?HIÔ4Ô=/N@7:?k2PQ7:=>K ¥ 46EF]H[GK # T ð T §
NOÑÈ07:IOSc2QPÓNO?HAHK>JïPQScjîPÓKÐirN@28SHK/JQ2<NFAHK D?BKh2<SBK i>kZl T NOÑîPQ7G[:[¡98K>KÐ[:462<K>J
2<SD42W28SHK i>kZl 7:9Q4ÔÑH9LK>lÑH[°lÑH?B=>2<7GNO? 7R2QK K/=>287Ga@K/[Gj%=/NO?c25467:?H9W4@[G[ 2<SBK^7G?BlNOJLE46287:NO?
4@irNOÑB2-2<SBK^J84@?HAHN@Ea64@J87:4@iH[:K6T
×½Ë@ćÀhÊc¶ o^vHx ÛÚÜ Ý·ÞÖâ>@Ô«5©ÿ' «50@rãk :bá £bwDÍD¢30£bbF©ªâÖD8cã/® (°Db
# ¥é *c§'é # ¥ é 1O§'é-ü ,2.0@rã # ¥ éü § éô-ü ,1H® (°D¤ã@:bÂÈ£/¢3Â ©QâÈ¢3æ«bÂ ©
:
*
õ *
0ü ,2. *ÔûCõ ñü
mZo_¥õ §Üé
# ,2. ü^ûCõ 1
ü õ 1$
(°D?
i>kZl :%5D©ÿ' ÚÜR@¢¬< ½® 9c®{d ÝD©¢@ê3:MLç½@dr :k:bdr :5è¨bÂ¢Bã _«8@5âÈ¢3 @® iTkl %«8@º£b @b M«5©µâÈ¢½b3c® ^©Â «5wB@QD¤âÈ¢3æ«bÂ ©º:ÍÈR3
«5©DÂD¢D©¢c5èÖæ© ;ã35«b<8b3+@rã+B@¨_:kã3s rÜæ8ãÐâµ©k@ #õ @bD©¢@|D
L@rã3© @Èc£/ :Í©D F c5 @ Ý¢D5 *Böµü%@rã 1H® Í© c©¢ 5^ÿ mF¥;ü $W.½§'é $ $ e
ò
f-SHKwlNO[:[GNXPQ7G?HIJLK/9LÑH[G2/ÒæPQSH7:=5S+PÓKÔAHNk?HN@2¨]BJ8N aOK6Ò9LSHN PQ9¨2<SD462¨28SHK i>kZl =>NOE
]H[:Kb2<K>[Gj%ABK>2<K>J8E07:?HK>9-2<SHKÖAH7G9L2<JL7:iHÑ¬2<7:N@?kN@lá4wJ84@?HAHNOEa@46J87m46iH[:K6T
¯W²Xo>uc»mo6À Và#b' B @ i>kl m~@rã :b B @B i>kZl F® â mF¥õ §ïé ¥õ §
âµ©h@ BõDb # ¥ ! §'é # ¥ ! §Üâµ©^@ ®
¯W²Xo>uc»mo6À V{âÈ¢3æ«bÂ © m0bcr3ÍD_<8@ å Ýæ_© p *Bö/7ü qá:¨ iTkl¼âµ©W ©
r<©@£5c£/ Ý Í8b¢3< # â¨@rãF©D Í âï/@:s r¤5dDâµ© :©ÿ'3w3<5h«5©rã@ ©B7 6
*
!
#"
%$
'&
(
)
+*-,/.
+*-,65
0
%1
20
¯°o/±²cqOs³±µć¶·¶ ¸/¹ 'oÍuBqc¶ ¸
²´XÁXoVt5²´ t # ¥ !
§ôé # ¥ ! §
Ìu½»½obÁXo6»G¸dÀïoµ´XÇbÆc»m´½Äc¶ o
o>ÁXo6q t x
43
.\*

!
"
¼:^æ© ;ã35«b58b3w®/®Óõ õ ù dr Ý 5¨B@mF¥õ §¤û mF¥õ ù §>®
:^æ©b0@ Ý5 48\ã 6 [:7GE mF¥õ §Üé *+@rã [:7GE m¥õ § éôü½®
:ÖÈR3 8«5©DD¢å©¢c5è ®/® mF¥õ §Üé mF¥õ#§áâµ©h@ Hõ°è ÿDb<
E mF¥÷B§%$
mF¥õ § é [:7:
T #Kb2ÖÑH9Ö98SHN P2<SD462 ¥ 7:7G7 § SHNO[:AB9/T¡Kb2 õ
l 1¬ÑH]H]rNO98Kw28SD462 m 7G9Ö4 i>kZl irKÍ4FJLKµ4@[?½ÑHEÍirK/Jd4@?HA`[GK>2 ÷ ö8÷ ù Iö $7$I$ i KÐ4F98K ½ÑHK/?H=>KwN@l'JLKµ4@[?½ÑHEÍirK/J89d98ÑH=5SM28SD462
÷ ÷ ù 4@?HA[G7:E ! ÷ ! éCõ "T #Kb2 #! é %¥ $&ñö<÷ ! q 46?HAî[GK>2 é %¥ $&ñöLõ q (T 'WN@2<K
2<SH462 é*) !, + #!4@?HAî4@[G98N0?HN@2<K¨28SD462 .- ù -/ TZÕZK>=µ4@ÑB98Kh28SHKÖK>a@K/?c2<9¨46J8K
E0NO?HN@28NO?HK@ÒB[:7:0E ! # ¥ #! § é # ¥ ) ! 1! § T f-S½ÑH9>Ò
m¥õ § é # ¥ §'é #324 #!65 é [G7:!E # ¥ 1! §Üé [G7:!E m¥÷ ! §'é m¥õ %§ $
!
1¬SHN PQ7G?HI ¥ 7 § 4@?HA ¥ 7G7 § 7G9h9L7:E07:[m46J/T 7'J8N a37:?HI 2<SHK0N@28SHK/JÖAH7GJ8K/=b2<7GNO??H4@EFK>[Gj@Ò#28SD462h7Gl
m 98462<7G9 DK>9 ¥ 7 § Ò ¥ 7G7 § 4@?BA ¥ 7:7:7 § 28SHK/? 7G2Q7G9Q4 i>kZl lN@JW9LNOE0K¨J<4@?HABNOEa64@J87:4@iH[GK@ÒBÑB98K/9
98N@EFKÄHK>K/] 2<N3NO[:9¤7G? 4@?D4@[Rj¬9L7:9>T ò
m
Bm
Bm
'&
&
;Ç o>t s³Ç±/uBÆcq t;ćÄc¶ o
s Ì s t s Ç prqOs tªo u½»
s t±/´½q ÄOo ÊcÆ t s·q
úBqXob¾ tªu6¾u¬qXo|±/uc»Â»mob¾ n'obp qOs t<s³uBqÍvHx@?ê :ï `7YO}\^ + âÖ c5Í«5©¢¬D c£/ 00@ @ Ý¢D5
ÇÈÊOu¬qX¿co6qX±/o Qs t5²ßt5²Xo
ó õ ö8õ ù ö7$I$I$Ýý $
s·q tª:o 9Oo»mÇ>x¯Q²XoMo>ÁXoq
qcÆcÀ^ÄOo»mÇb¹ t5²Xo u3¿c¿ `hã3srÜæ¨D1A}åHÂs^CB e[ZYh^Xr ©DA}XåHÂ ^CBÞ| `7èg[ZYh^X
qcÆcÀ^ÄOo»mÇÜćqX¿^t5²Xo¨»m´ ¾
E
t<s uBqć¶ Ç ´6»mo ±/uBÆcq t ¾ âµ©- £ ćÄc¶ <o ;t5²XoÔÇ;o>tü@ÌW»mo/´½¶
o_¥õ §Üé # ¥ éCõ §%$
qcÆcÀ^ÄOo»mÇ`Ä@o>t 'o>o6>q =
ćqX¿êyîs³ÇFqXu@t^±/uBÆcq t ¾
E
E
ćÄc¶ o@x
-f S½ÑH9/Ò E o_¥õ § * lNOJÐ4@[:[ õ !ì 4@?HAGF ! oÖ¥õ ! §Íé ü T`f-SBK i>kl N@l 7:9
J8K>[m4628K/Ak2<N o i½j
E
mZo_¥õ §Üé # ¥ ûÛõ §ÜéIH
o¨¥õ ! §%$
CJ
L
M
K
E
E
1¬NOE0K>287:E0K/9ZPÓKÖPQJ87G28K o 4@?BA m!o 987:E0]H[Rj%4@9 4@?HA m T
8
3
3
Hü
_ "! 8Z%8QZ%"B d7QZ%" .
E
¨¥õ §
o
T1
T
ü
õ
ü
1
7:I@ÑHJ8K # T 1¬ 7'JLNOiD4@iH7G[:7R2jFlÑH?B=>2<7GNO?klNOJ D7G]H]H7:?BIÔ40=>NO7:?k2PQ7:=>K ¥ @4 EF]B[:K # T ð T §
*
×½Ë@ćÀhÊc¶ o^vHxRy = °( D¤r<©@£5c£/ Ý WâÈ¢3æ«b ©Ðâµ©áç½@dr : ½®Ô:
-ü ,2. õké *
E
-ü ,1 õkéü
o_¥õ §Üé
-ü ,2. õké 1
*
õ !|, /ó *BöµüOö 13ý $
°5_ÚÜR@¢3< ½® @3®¤ò
'n o>prqOs t<s uBqwE vHxzyOy|{ 8@rã3©E @Èc£/ :Í :YO ^XÂT[E [Z`ß âkDb<MLçO:bî
âÈ¢¬æ«b © o b¢D«5ßB@ oÖ¥õ § *Fâµ©F@ °õ°è o_¥õ §OõÞé üg@rãhâµ©
@b Fûµè
E
b§ é o_¥õ §Oõ!$
¥ # T üX§
# ¥ E
(°D'âÈ¢3æ«bÂ © o :w«8@ :8ãFD#
A}XåBÂ ^CB3Z `µ ^B e[ZYh^Xr kZl
M¨B @¨B@
E
mZoÖ¥õ §Üé o_¥ L§ E
@rã o_¥õ §Üé mo ¥õ §Í@Z@ ¬D©DQõ@ ÿ3 «5 m!o :Ðã@ _b<bDc£/ :/®
E
1¬NOE0K>2<7GEFK>9¤PZKÖ9LSD4@[:[°PQJL7G28K ¥ õ §Oõ NOJ-9L7:E0]H[Gj
E
28N0E0Kµ46?
E
¥õ § cõ T
,
. 1
¨¥õ §
m!o
!
"
ü
õ
ü
l NOJï?H7RlNOJ8E ¥ * Ò üX§ T
*
G7 IOÑHJLK # T # i>kl
×½Ë@ćÀ^Êc¶ oÖvHxzy Ø å¢XcD© ÖB@ B kl
E
ü lNOJ *0ûÛõ`ûVü
o_¥õ §Üé
* N@2<SBK/JLPQ7G98K $
E
E
þá :8@È Xè o¨¥õ § *+@rã o_¥õ § cõéü½®Q{ÅL@rã3© @Èc£/ :Öÿ' 3:Íã3bBb :_/@ã©wB @h W?H7GlNOJLE < è 9 Fã@:bÂÈ£/¢3Â ©r® (°D i>kl :ï@ @b+£ * õ
*
õ *ÔûÛõMûVü
m!o¨¥õ §Üé
ü õ ü $
°5_ÚÜR@¢3< ½® ½®¤ò
×½Ë@ćÀ^Êc¶ oÖvHxzy v å¢XcD© ÖB@ B kl
E
¥õ §'é * lN6N@2<J SHK>õ JLPQ7:9L* K $
E
åæ«5 ¥õ § cõkéü6è 3:Ö:ÐÿZb ;ã3s rÜæ8
ã kZlê®¤ò
B}6 NO?c2<7G?½ÑHNOÑH9ÔJ<46?HAHNOE a@46J87m46iH[:K>9Í=µ4@?ê[GKµ4@Aº28Nß=>NO?BlÑH9L7:NO?æT 7GJ89L2/Ò
?HN@28Kw2<SD462ÖE 7Rl 7G9¨=/NO?c2<7G?½ÑHNOÑH9Ö2<SHK>? # ¥ é õ §¨é * lN@J¨K>aOK>JLj õ NO? Ã2¨2<J;jM2<N
2<SB7:?H\MN6l ¥õ § 4@9 # ¥ é õ § TFf-SH7G9ÖNO?H[GjMSHNO[:AB9ÖlN@JÄH7:9L=/J8Kb2<K0J<4@?BAHNOEa64@JL7m4@iH[GK/9>T
K_IOK>2¤]HJ8NOiH4@iH7:[G7G287:K/9'lJLNOE4 kZl i½j7:?c2<K>IOJ<42<7:E ?BIHT Ï kZl =/4@?kirK_iH7:I@IOK/J 2<SD4@?
ü ¥ ÑB?H[:7G\@Kw4kEF4@989dlÑB?H=>287:NO? § T NOJ_K 4@EF]B[:K@Òæ7Gl ¥õ §dé lNOJ õ ! p *Böµ-ü , -q 4@?HA *
/
&
_ "! 8Z%8QZ%"B d7QZ%" .
E
E
#
N@2<SBK/JLPQ7G98K6Òá2<SHK>E ? ¥õ § n* 4@?HA ¥õ §Oõ é ü 98NM2<SB7:9Ð7G9Í4îPÓK/[G[ AHKD?BK/A kl
K>a@K/?º2<SBNOÑHIOS ¥õ E §Ðé 7:?ß98NOE0K]H[:4@=/K>9/T?|l4@=b2µÒá4 kl E =µ46?ºi K0ÑH?½irNOÑH?HAHK>A°T
NOJhK 4@E EF]B[:K@Ò7Gl ¥õ §^é ¥ 1, # §ªõ E lN@J * õ ¼ü 4@?HA ¥õ §hé * N@28SHK/J;PQ7:98K6Ò
2<SHK>? ¥õ § Oõéü K>a@K/? 2<SHNOÑBIOS 7G9-?HN@2Wi N@ÑH?HAHK>A°T
)
×½Ë@ćÀhÊc¶ o^vHxRy7& à#b
°3:^:Öæ©-
&V®¤ò
(
¥õ §Üé * l6N N@2<J SHK>õ JLPQ7:9L* K $
E
bæ«5 ¥õ § Oõé Oõ ,H¥;üÓúÞõ §Óé
E
kZl
T,%é
6oÀhÀd´wvHxzyIV à#bm£b¨Di>kZlâµ©hFL@rã3© @Èc£/ :-®?(°Db
# ¥é õ §Üé mF¥õ § $ mF¥õ §ÖÿDb5m¥õ §Üé [:7GE mF¥÷B§5è
# ¥õ ûC÷B§'é mF¥÷B§$ mF¥õ §5è
# ¥ õ §Üéü $ m¥õ §5è
â¤ :Ð«5©DÂD¢D©¢½^Db
[GNOI ¥ &C§ é
6
1
¥ b§ é # ¥ûÛ b§Üé # ¥ ûb§Üé # ¥FûÛ û b§%$
2ï7G9Q4@[:9LNÔÑH98KblÑH[°2<N0AHKH?HK^28SHKÖ7:?caOK>J89LK i>kZl ¥ NOJD½ÑD4@?c2<7G[:K_lÑH?B=>2<7GNO? § T
#
n'o>prqOs t<s uBqwvHxzy Ûà#b# £b^ÔL@rã3© @Èc£/ :¨ÿ' iTkl mF® (°D-Â r½}/`/
i>kZl ©
T[D ^XÂ eg[ZYh^X C:Ðã3srÜæ8ã%£ m ¥ O§'é 7:?Bl õM mF¥õ §Zû
âµ© ! p *¬öµ7ü q ® â m :¨bÂÈ «b æ«b<8b3@rã «5©DD¢å©¢c^Db m ¥O§Ö:
D¨¢¬Dä/¢D^<8@ ¡D¢¬0£bb¤õÞb¢D«5B@c m¥õ §'é B®
1
ÃÌ%¸>uBÆ ´»moñÆcq ÌćÀ^s·¶ ¾
sÃ´6» Qs t5²;sÎq ÌD¹5ÆXÇªt
t5²OsÎq u@ÌÛs t ´XÇ t5²Xo
-K =/4@[:[ m b¥;ü-,2.c§ 2<SHK DJ89;2(3ÑH4@JL287:[:K6Ò m >¥;ü0,1O§ 28SHK-E0K/AH7:4@? ¥ N@Já98K>=/NO?HA½ÑD4@J
2<7G[:K § 4@?HA m ¥ # ,2.½§ 2<SHK¨2<SB7:J8A ½ÑD4@J;2<7G[:K@T
À^s·qOsÎÀ^ÆcÀÍx
)
3
..
!
"
¤f PZNgJ84@?HAHNOE a64@J87:4@iH[:K>9 4@?HA 46J8K >[#Dw a`b^ }c[d^Xr PQJL7G2828K/?
é 7Rl m!o¨¥õ §Óé mÓ¥õ § lNOJQ4@[G[ õ TÜf-SH7:9QAHN3K/9Q?HN62ïE0Kµ46?28SD462 4@?HA 46J8K
K:½ÑD46[ Tï42<SHK>J/ÒD7G2¤EFK/4@?H9-2<SD42d46[:[°]HJLNOiD4@iB7:[:7R2j09L2542<K/E0K/?c289d4@irNOÑB2 4@?HA PQ7:[G[
irKÖ28SHKÖ9<4@E0K@T
5 k( ! |(0&á$ MÍ"%$ G I. Há,W& d$ MÍ"+)M( MÍ&Ü.%U
M JLR
H
B }6
%' [d^î ^ ^Xr 2'7G92<J84@AH7R2<7:N@?D4@[c2<NdPQJL7G28K m 28N¨7:?HAH7G=µ4628K
2<SH462 SD4@9-AH7G9L28J87:iBÑB2<7GNO? m T'f-SH7:9-7G9¤ÑH?BlN@JL2<ÑB?D462<K¨?BN@2546287:NO?987:?B=/K_2<SBK¨9Lj3EÍirNO[ 7:9á4@[:9LN¨ÑH98K>A028NÄHK/?HN62<KW4@?Ô4@]H]BJ8N 7:EF462<7GNO?°Tf-SHK-?HN@2<462<7GNO? m 7:9á9LNÖ] K>JLa64@987RaOK
2<SH462ZPÓK¨4@JLK_9L28ÑH=5\FPQ7R2<S%7R2µT QKµ4@A m 4@9 SD4@9¤AH7G9L28J87:iBÑB2<7GNO? mw# ^ 4@9
7:9Q46]H]HJ8N 7:EF4628K/[Gj m T
!" $#&%('*),+.-/-10#&-2'(#&354('(#6% h SH4@9¨4%]rNO7:?c2_EF4@9L9_AH7:9;2<JL7:iHÑB287:NO?`462
ÒHPQJ87R282<K>? 76 ÒD7Rl # ¥ é 3§ éü 7G?kPQSH7:=5S =µ4@9LK
* õ m¥õ §Üé ü õ $
E
f-SHKÖ]HJLNOiD4@iH7G[:7R2jFlÑH?B=>2<7GNO?7:9 ¥õ §Üéü lNOJ õé 4@?HA * N@2<SHK>JLPQ7G98K@T
2 = ü i Kd4hIO7RaOK>?%7:?c2<K>IOK/J>T
!" 08#&- i " ' ":9 %# l (;<08#&-2'(#&354('(#6% ¡K>>
1¬ÑH]B] NO9LK¨2<SD462 SD4@9Q]HE JLNOiD4@iB7:[:7R2jEF4@9L9¤lÑH?H=>287:NO?IO7RaOK>?îicj
¥õ §Üé *-ü , = lN@NO28J SHK/õJ;PQéô7:98üOK Iö $ $I$7$/ö =
K^984µj%2<SD42 SD469ï4wÑH?B7GlNOJLEAH7G9L2<JL7:iHÑ¬2<7:N@?kNO? ó½üOIö $I$I$>ö = ý T
!"@?A" (% $4BCBC#0#&-2'(#&354('(#6% ¡K>2 J8K/]BJ8K/9LK/?c2 4|=>NO7:? D7:]°T f-SHK>?
é D 46?HA # ¥ é *c§'éü $:D lNOJ-98NOE0K D ! p *¬öµ7ü q T KÖ9<4µj%28SD462 SD4@9
# ¥éüX§'E
40ÕZE K/JL?HNOÑH[G[:7°AH7G9L28J87:iBÑB2<7GNO?%PQJ87R282<K>? ÕZK/JL?HNOÑH[G[:7 F¥ Dr§ TÓf-SBKh]HJLNOiD4@iB7:[:7R2jFlÑB?H=>287:NO?
7:9 ¥õ §ÜEé D ¥;ü $:Dr§ lNOJ õ !|/ó *Böµü6ý T
!"G? #&% ;H#I+.BJ08#&-2'(#&354('(#6% 13ÑH]H]rNO98K+PÓKSD4µa@K|4ê=/NO7G?CPQSH7:=5S l4@[:[G9
SHK/4@AH9dPQ7R2<SM]HJLNOiD4@iB7:[:7R2j D lNOJW98NOE0K *,û Dû ü T [:7:]2<SBKÍ=/NO7G? A 287:E0K/9d4@?HAM[GK>2
"> ">
.
!
"
E ri K2<SHK?3ÑBEÍi K>JÍN@lQSHK/4@AH9>T Ï 989LÑHE0K2<SD42Ð2<SBK2<NO9L98K>9Ô4@J8KF7:?HAHK>] K>?HAHK>?½2/T #Kb2
¥õ §Üé # ¥ éCõ § i K¨2<SBKÊF4@9L9¤lÑH?H=>287:NO?æT 2ï=/4@? i KÖ98SBNXPQ? 2<SD42
E
¥õ §'é* D ¥ª1ü $ Dr§ lN@NO28J SHK/õkJ;PQé 7:9L*BK ö7$ $I$I$/ö A
Ï J<46?HAHNOE a64@J87:4@iH[:KgPQ7G28Sñ2<SBKºE46989`lÑH?B=>2<7GNO? 7G9M=µ4@[G[:K>A 4CÕZ7G?HNOE07m4@[¨J84@?HAHNOE
a64@J87:4@iH[:K 4@?BAÛPZK`PQJL7G28K ÕZ7G?HNOE07m4@[ a¥ Aö Dr§ T l ÕZ7:?HN@EF7:4@[ a¥ Aö D § 4@?HA
ù ÕZ7G?HNOE07m4@[ s¥ Aö D ù § 2<SHK>? úê ù ÕZ7:?HN@EF7:4@[ a¥ Aö D ú D ù § T
H}Â ¡K>2hÑH9^2<4@\@KÔ2<SB7:9^NO]H]rNOJL28ÑH?H7G2j`2<N ]HJ8KbaOK/?c2w9LNOE0K0=>NO?BlÑH9L7:NO?æT
7:9h4îJ84@?HAHNOE a64@JL7m4@iH[GK øõ AHK/?HN62<K/9Í4î]D4@J;2<7G=/ÑH[:4@Jä@46[:ÑHK0N@l¤28SHKJ84@?HAHNOE a64@JL7m4@iH[GK ø
Ò K/AºJLKµ4@[Ü?½ÑHEÍirK/JL9/Tf-SHK]H4@J<4@E0K>28K/J D 7G9
A 46?HA D 4@J8K AH}XD ^µ3/} ` Ò2<SD42h7G9/
ÑH98ÑH4@[:[Rj+ÑH?H\½?HN PQ?ê4@?HAßEÍÑH9;2ÐirK0K/9L287:EF462<K>AßlJLNOE AD462<4 ø 28SD462 Ù9ÖPQSD462h9L2<462<7G9L2<7G=µ4@[
7:?BlK>J8K>?H=/K_7G9Z46[:[r4@irNOÑB2µT ?EFNO9;2¤9;2546287:9L287:=/4@[åEFN3AHK/[G9/Òc2<SBK/J8K_4@J8KïJ<4@?HABNOE a@46J87m46iH[:K>9
4@?HA ]D4@J84@EFKb2<K>J89 ABNO? Ã2ï=>NO?BlÑH9LK^28SHK/E T
!"
" ; " '(# i 0#&-/' (#&354('(#6$% _ SD469_4IOK>NOE0K>2<JL7:=ÖAH7G9L2<JL7:iHÑ¬2<7:N@? PQ7R2<S
]D4@J84@EFKb2<K>J D !êa¥ *BöµüX§ ÒBPQJ87R2828K/? ¨K/N@E ¥ Dr§ ÒB7Gl
é D¥ª1ü $ Då§
ö = Vü $
# ¥ é = §ÜE
KÖSD4µa@K^28SD462
H # ¥ é = §Üé D H ¥;ü $:Då§
ïé ü $ ¥;D ü $:Då§ éôü $
+ + f-SH7:?B\N@l 4@9¤28SHK¨?½ÑHEÍirK/J-N@l D7G]H9¤?HK>K/AHK>AîÑH?c287:[ 28SHK DJL9L2QSHK/4@AH9-PQSHK/? D7G]H]H7G?HI
4w=/NO7G?°T
!" #&- - % 08#&-2'(#&3 4' #6$% ê SD4@9Ô4 7NO7:9L98NO?AH7:9;2<JL7:iHÑB287:NO?PQ7G2<Sê]D4
J<4@E0K>28K/J¡ÒHPQJ87R282<K>? 7NO7:9L98NO? ¥ § 7Rl
E
¥õ §'é õ õ *_$
'ïN@28K¨2<SD462
E
H ¥õ §Üé H õ é éü $
+ + )
cð
!
"
f-SHK 7NO7:9L98NO?07:9áN6l2<K/?ÔÑH9LK/AF4@9Ü4_E0N¬AHK>[¬lNOJ=/NOÑH?c289 N6læJ84@J8K¤KbaOK>?½289Ó[G7:\6K-J<4@AH7GNc4@=b2<7Ga@K
AHK>=µ4µjî4@?HAî28J<40=Ð4@=/=>7:AHK>?½289/T l 7NO7G9898N@? ¥aAö § 4@?HA ù 7NO7:9L98NO? ¥sAö ù §
2<SBK/? úÞ ù 7NO7G989LNO? ¥aAö ú ù § T
B}6 KÍAHK H?HK/AgJ<46?HAHNOE¼a64@J87:4@iH[GK/9Q2<N%irKÍEF4@]H]H7G?HIO9QlJ8NOEY4984@EF]B[:K
98]H4@=/K 2<N iBÑB2ïPÓKÐAB7:A`?BN@2dE0K/?c287:NO?2<SHK^9<46EF]H[GKÖ98]D4@=>KÐ7G?M4@?cjN@lá28SHKÂH7:9;2<J87
iHÑB287:NO?B9d4@irN aOK6T Ï 9 ZEFK>?½287:NO?BK/A`K/4@J8[G7:K>J/ÒH2<SBKh984@EF]B[:K^98]D4@=>KÐN@l28K/? LAH7:984@]H]rKµ4@JL9 iHÑB2d7G2ï7:9WJ8K/4@[:[Rjk2<SHK>J8KÍ7G?î2<SBKÍiD4@=5\½IOJLNOÑH?HA°T ¡K>2 9_=>NO?H9L28J8ÑH=b2Ö4984@EF]B[:K^98]D4@=>KÍK
]H[G7:=/7R2<[Rj¨lNOJ'4ÖÕZK/JL?HNOÑH[G[:7¬J<4@?HABNOEa64@JL7m4@iH[GK@T ¡K>2 Ûé p *Bö/7ü q 4@?BA0ABK D?HK # 2<NÖ9<46287:9;lj
é $ lNOJ *Ôû û dûVü T 7 D ! p *Böµ7ü q 4@?HA AHK D?BK
# ¥ p Dö q§Ü
ê¥¦Z§'é *ü ¦Û¦ û DdD $
f-SHK>? # ¥ éüX§ïé # ¥m¦ôû Dr§Wé # ¥ p *Bö D qm§ïé D 4@?BA # ¥ é *c§Wé¼ü $ D Thf-S½ÑH9/Ò
ÕZK/JL?HNOÑH[G[:7 F¥ Dr§ T KÐ=/NOÑH[GAMAHN%28SH7:9WlNOJ_4@[G[28SHKÍAH7G9L28J87:iBÑB2<7GNO?H9WAHK D?BK/A4@i N a@K@T
?Í]HJ84@=>287:=>K@Ò6PÓKZ28SH7:?H\¨N6lr4ïJ<46?HAHNOEVa64@J87:4@iH[GK [G7:\6KZ4WJ<4@?HABNOEô?3ÑBEÍi K>JiHÑ¬2lNOJLE4@[G[Gj
7G2¤7:9Q4ÔE4@]B]H7:?HIwABK D?HK>A`NO?9LNOEFKÖ984@EF]B[:K_9L]D4@=/K6T
k( ! |(0&'c$ MÍ"%$ (0"%$'.µ"*M(0L* H, MÍ"+)M( MÍ&Ü.%U
.
)
)
M JLR
H
T SD4@9'4 W?H7GlN@J8E ¥Dö b§ AB7:9L28J87GiHÑB2<7GNO?°ÒPQJ87G2
2<K>? W?H7GlNOJLE ¥ Dö b§ ÒH7Rl
E
¥õ §Üé * lN@NO2<J SBK/õ JLPQ! 7G98p DK ö q
QP SHK>J8K TÜf-SHKÖAH7:9;2<JL7:iHÑB287:NO?%lÑB?H=>287:NO?7:9
*
õ õ ! p Dö q
m¥õ § é
ü õ 0$
!"19 %# l (;70#&-2'(#&354('(#6%
Ó DS 469Q4 'ïNOJLE4@[ ¥ N@J ^46ÑH989L7m4@? § AH7:9;2<J87GiHÑB287:NO?PQ7R2<S
]D4@J84@E0K>2<K>J89ß4@?HAáÒDAHK>?HN@28K/A i½j |¥ ö ùb§ ÒH7Rl
E
¥õ §Üé ü 1 K ] D$ 1 ü ù ¥õ $ § ù ö õ!`
(;
+.B +(4- -/#I+.%
)
"> 88Z c
"j
$
.
!
"
PQSHK/JLK !Å 4@?HA * T #4628K/J%PÓK+98SD46[:[d9LK/KM28SD462 V7G92<SHK*;=/K/?c28K/J ì¥ NOJ
EFK/4@? § N@lÓ2<SHKÔAH7G9L2<JL7:iHÑ¬2<7:N@?46?HA 7:9¨28SHK ;98]HJLKµ4@A º¥ NOJÖ9;254@?HAH4@J8AßAHK>a37:462<7GNO? § N@l
2<SHKîAH7:9;2<JL7:iHÑB287:NO?æTCf-SHK 'ïN@J8EF4@[-]H[m4µj39%46?Û7:E0] N@JL2546?½20JLNO[:K7:?]HJ8N@iD4@iH7G[:7G2jÞ4@?HA
9L2<462<7G9L2<7G=/9>T `4@?cj]HSHK>?HNOE0K/?D4Ô7:??D4628ÑHJ8KÖSD4µa@KÐ4@]B]HJ8N 7:EF462<K>[Gj 'ïNOJLE46[æAH7G9L2<JL7:iHÑ
2<7GNO?H9/(T #4628K/J>ÒBPZKÖ9LSD4@[:[ 98K>K^28SD462¤28SHKÖAH7:9;2<J87GiHÑB287:NO?%N6l4w9LÑHEN@lJ<46?HAHNOEÅa@46J87m46iH[:K>9
=µ4@? irKh46]H]HJ8N 7:EF4628K/Aicj4 'ïNOJLE4@[æAH7:9;2<J87GiHÑB287:NO? ¥ 2<SHK¨=>K/?c2<J84@[#[:7:E07G2 2<SBK/NOJLK/E § T
KÜ9<4µj_28SD462 SD4694 `%^ DH} å}6Dåa `b^ }c [d^X 7Rl é * 4@?BA éü T
fJ84@AH7R2<7:N@?FAH7G=>2<462<K>9Ó2<SH462¤4^9L2<4@?HAD4@JLA 'ïNOJLE46[DJ<4@?HABNOE a@46J87m46iH[:KQ7:9 AHK>?HN@2<K>A%icj T
f-SHK kl 4@?HA i>kZl N6l_4M9;254@?HAH4@J8A 'ïN@J8EF4@[Z46J8KkAHK>?HN@2<K>Aicj ¥ O§ 4@?HA ¥ c§ T
f-SHK kl 7:9-]H[GN@2828K/A7:? 7GIOÑHJLK # T . T'f-SHK/JLKh7G9¤?HN0=/[GNO98K>A lNOJLEK ]HJ8K>98987GNO?îlN@J dT
UïK>J8K^4@J8KÖ9LNOEFK¨ÑB98K>lÑB[#l46=>2<9
¥7§ l º¥ ö °ù § 28SHK/? é ¥ $ § , º¥ *¬öµüX§ T
¥ 7G7 § l |¥ *Bö/üX§ 2<SHK>? é ú |¥ ö ùÈ§ T
¥ 7G7:7 § l ! º¥ ! ö !ù § Ò éü@Iö $I$I$bö A 4@J8K¨7G?HAHK>] K>?HAHK/?c2W2<SHK>?
H ! H ! ö H ! ù $
!+
!,+
!+
)
)
2QlNO[:[GNXPQ9ÓlJ8NOE ¥ 7 § 28SD462Q7Rl º¥ ö ùb§ 28SHK/?
#
¥ b § é
#
0$
"$
é
$
$
0$
$
¥ # T 1O§
f-S3ÑB9PÓKÜ=µ4@?^=/NOE0]HÑB28KÜ4@?cj¨]HJLNOiD4@iH7G[:7R2<7:K>9rPÓK'P¤46?½24@9¡[GNO?HIQ4@9°PÓKÜ=µ4@?^=/NOE0]HÑB28K
2<SHK i>kZl ¥ c§ N6l#4Ð9L2<4@?HAD46J8A 'WNOJ8EF4@[ T Ï [G[å9L2542<7:9;2<7G=µ4@[D=>NOEF]BÑB2<7G?HIÐ]D46=5\646IOK/9 PQ7:[:[
=/NOE0]HÑB28K ¥ c§ 4@?HA ¥ O§ T Ï [G[¤9L2<462<7G9L287:=/9Í28K 2<9>ÒZ7G?H=/[GÑHAH7:?BIM2<SH7G9wNO?HK@Ò SD4µaOKî4
254@iB[:K¨N@la64@[:ÑHK>9QN@l ¥ c§ T
×½Ë@ćÀhÊc¶ o^vHxRy å¢ cD© ÖB@º¥ # ö O§>®QÚÜrã # ¥ ü §/® (°D¨ © Ý¢3Â ©`:
ü $ # éü $ %¥ $*_$ .h.½§'é $ ü $
# ¥
ü §'éüD$ # ¥ üX§'éü $ # .
E
Ö¥õ §
o
$^ü
$1
!
"
ü
*
õ
1
:7 IOÑBJ8K # T .H K/?B987G2j%N@lá4w9;254@?HAH4@J8A 'ïNOJLE46[ T
^©ÿ rÜrã b¢å«5`B@ # ¥ O§Zé $ 1H® ÈÞ©Db^ÿZ©8ã5è rÜrã wé
© @Ö3:Ð£ Fÿ'È3
$ $
$ 1Öé # ¥
O§Üé # $ é
Úæ5©D ^©È0@ ¡ c£/ :5è ¥ $B$ .Bü ðc§Üé $ 1H® (°Db<âµ©<5è
$B$ .Bü ðÐé $ é $ #
@rãÔDbæ«5 ^é # $ $ .Bü ð ^éü $Gü@ü ü $ ò
$
1
¥ $ 1O§>®`
$
DS 4@9ï4@? ]rNO?HK/?c287m4@[#AH7G9L2<JL7:iHÑ¬2<7:N@?PQ7R2<S
§ ÒH7Gl
E
¥õ §Üé ü ö õ *
PQSHK>J8K * T-f-SHKhK ]rNO?HK/?c287m4@[AH7:9;2<JL7:iHÑB287:NO?7G9WÑB98K/AM28NFE0N3AHK/[¡2<SBKh[G7GlK>287:E0K/9-N@l
K/[GK/=b2<J8N@?H7:=¨=>NOEF]rNO?HK>?c2<9W4@?HAk2<SBKÖPZ4@7R2<7:?BIÔ287:E0K/9¤irK>2PÓK/K>?MJ<46J8K¨K>a@K/?c2<9>T
$% " %' #I+B 0#&-2'(#&354('(#6% ¤
]D4@J84@E0K>2<K>JÜÒHAHK/?BN@2<K>Aìcj ] ¥ ON J * Ò#28SHK
DºNe[Yh^ r 7:9ÖAHK D?HK>A
icj ¥ §-é
T SD469¨4 ^46EFEF40AB7:9L28J87GiHÑB2<7GNO?PQ7G2<S`]D4@J<46EFKb2<K/JL9
4@?HAÜÒDAHK>?HN@2<K>Aîicj Ö4@EFEF4 ¥ ö § ÒB7Rl
E
¥õ §Üé ü ¥ § õ ö õ *
+.;H; +@08#&-2'(#&3 4' #6$%
÷ c÷
"> 88Z c
"j
!
"
.
PQSHK/JLK ö * T f-SHKQK ] N@?HK/?c2<7:4@[HAH7G9L28J87:iBÑB2<7GNO?Ð7G9 ÑH9L2 4 Ö4@E0E4 ¥;ü@ö § AH7G9L2<JL7:iHÑ
2<7GNO?°T l ! Ö@4 EFEF4 ¥ ! ö § 46J8K'7:?BAHK/]rK/?HABK/?c2µÒX2<SHK>? F ! + ! Ö4@EFEF4 ¥ F !,+ ! ö § T
)
HS 4@9¤4@?ÕZK>2<4ÐAH7G9L2<JL7:iHÑ¬2<7:N@?ÔPQ7R2<SF]D4@J<46EFKb2<K/JL9
* ÒHAHK/?BN@2<K>Aìcj ÕZK>2<4 ¥ ö § ÒB7Rl
E
¥õ §Üé ¥ ¥ § ú ¥ § § õ ¥;ü $ºõ § ö * õ ìü$
!" ?A" ' + 08#&-2'(#&3 4' #6$%
*
4@?HA
SD469¤4 HA 7:9;2<J87GiHÑB287:NO?wPQ7G28SÔAHK/IOJLK/K>9ZN@l
lJ8K>K/AHNOE PQJL7G2828K/?
7Rl
E
ù ¥õ § é üÓú ü ù $
ù f-SHK AH7G9L2<JL7:iHÑ¬2<7:N@?7:9Ó987GEF7G[m4@Já28Nw4'ïN@J8EF4@[åiHÑB2-7R2¤SD4@9 2<SH7G=5\@K>J¤254@7G[:9/T ?%l4@=>2/Ò¬28SHK
'ïNOJLE46[D=/NOJLJ8K/9L] N@?HAH9Z28NÐ4 PQ7G2<S
é& Táf-SHK 4@ÑH=5ScjAB7:9L28J87GiHÑB2<7GNO?Ô7:9Ó4h9L] K>=/7m46[
=µ4@9LKhN6l28SHK AH7G9L28J87:iBÑB2<7GNO?k=/NOJLJ8K>98]rNO?HAH7G?HI028N éü T'f-SHKÖAHK/?H9L7G2j7G9
E
¥õ §'é Ü¥;üÓúêü õ ù § $
fN098K>K¨2<SD462Q28SH7:9-7G9-7:?HAHK>K/Aî4wAHK/?H9L7G2j@Òr[GK>2 Ù9-AHNÔ2<SBK^7G?c2<K/I@J<4@[
E
¥õ §Oõ é ü üÓúêOõ õ ù é ü 25O46õ? ü
é 2546? ¥ &C§ $ 2<4@? %¥ $&C§ Wé ü 1 $ $ 1 éü $
+%
+ 4
k
i
0 #&-2'(#&354('(#6%
Ü
SD4@94 ù AH7G9L2<JL7:iHÑ¬2<7:N@?_PQ7G2<S D AHK/I@J8K/K>9N@l3lJ8K>K/AHN@E
ù 7Gl
PQJ87G2L2<K>?
E
¥õ § é ¥FD ,1@ü § 1 ù õ ù ù öñõ *_$
l öI$7$I$/ö 4@J8K7:?HAHK>] K>?HAHK>?½29;254@?HAH4@J8A 'ïN@J8EF4@[@J84@?HAHNOE a64@J87:4@iH[:K>9 28SHK/? F !,+ ! ù
ù T
!"
ù
#&-2'(#&354('(#6%
k
'
h*
!
"
. BMÍ&Ü.7Mh$$
G .IH$á&Ü.IJ+*`$'./(0"LH
Ö7RaOK>?ì4º]H4@7:J%N6lhE AB7:98=>J8Kb2<K+J<46?HAHNOE a@46J87m46iH[:K>9 4@?HA ÒWAHK D?HK+2<SHK ^
| `7`?e[ Yh^ r icj ¥õ#ö<÷B§ é # ¥ éìõ 4@?HA é ÷H§ T E J8NOEE ?HN PôNO?°ÒDPZK^PQJ87R2<K
4@9 o PQSHK/?0PZKWP¤4@?c2
# ¥é õ 4@?HA ôé ÷H§ 4@9 # ¥é õ#ö é ÷H§ T KWPQJL7G2<K
2<NÔirKÖEFNOJLK¨K ]H[G7:=/7R2µT
×½Ë@ćÀ^Êc¶ oÖvHxzy =hb<`:g£/ @È@ªMã@:bb£/¢3 ©ßâµ© ÂÿZ©|L@rã3© @bc£/ :50
@rã 83«5 3 @ Ý¢D5 <k© 9\6
é * éü
9
<
9%
: @7:
9%:
E
°3¢c5è # ¥éüOö ôéüX§'é ¥ªüOöµüX§áé
(
@7
: ;h:
9%:
. ,
9%:
%9 :
®¤ò
9
E
'n obp qOs t<s³uBqÍvHxRy ? b+DÔ«5©DÂD¢D©¢cw«8 5è ÿZE 0«8@ dâÈ¢3æ«bÂ © ¥õ#ö<÷B§ÐhBãLâ
âµ©kD L@E rã3© @Èc£/ :5 ¥Mö Ð§ â ¥õ#ö<÷B§ * âµ© @ -¥õ#ö<÷B§5è E ¥õ#ö8÷H§ cõ c÷ é üê@rãèÓâµ©`@ g b ê-è # ¥8¥Mö w§ !
§ é ¥õ#ö<÷H§ Oõ ½÷æ® b%Ddã@: «b<bª¨©_«5©DD¢å©¢c_«8 WÿZïã3srÜæ-D
©Dc i>kZl mZo Ó¥õ#ö8÷H§'é # ¥ ûÛõ#ö ûC÷B§/®
1
1
×½Ë@ćÀ^Êc¶ oÖvHx Ø =Ûàb¤¥`ö Í§h£bÖ¢¬D âµ©È©MD^¢3DÜ/ä/¢B@</®?(°DbBè
E
¥õ#ö8÷H§áé *ü 7GN@l 28SH*wK/ûÛJ;PQõM7:98K ûV$ ü@ö *wûC÷%ûVü
ÚÜrã # ¥ ì-ü ,1¬ö -ü ,1O§>® (°Dd @bD éôE ó ñ-ü ,1¬ö ì-ü ,13ýF«5©b<5D©rã
©0b¢H£b bQ©ªâÖD^¢3D /ä/¢B@</® ÈDª @L@3 © @bÖ3:^b¢B£b b-«5©b<5D©rã5èÜ
3:|«8 5èª©Û«5©dr¢33D@<8C©ªâMDM b ÿ3 «5C:] 9%:;r® °© è # ¥
-ü ,13ö ñ-ü ,h1O§'é-ü ,2.D®¤ò
2
2
1
2
N 0 ¬ü
"! 8Z%8"
×½Ë@ćÀhÊc¶ o^vHxÙØ¬yêà#bE Z¥Mö Í§¨B @^ã3bBb
¥õ#ö<÷B§'é õÔ* úê÷ 7GN@l 28SH*wK/ûCJ;PQõM7:98K ûV$ üOö*0ûC÷ûVü
(°Db
¥õwú÷B§ Oõ c÷ é õ Oõ c÷¨ú ÷ Oõ c÷
ü
é 1 ½÷_ú ÷ ½÷wé 1ü ú 1ü éü
ÿ3 «5 @bÈ r¤5ÖB@Ó3:^:Ð klïò
×½Ë@ćÀhÊc¶ o^vHxÙØ@Ø âÜD¤ã@:bb£/¢3 ©w:-ã3s rÜæ8ã¨© @bZ_æ© <5«b @3@¢¬ R@'< @ ©BèæDb
DÔ«8@ :«b¢B@Â ©B^@5Ð£/ ©<w«5©dr Ý «8@ª8ãc® =^b<Ð:Ð@+D «5wLç½@dr :¨ÿ3 «5
£b©È5©ÿZ8ãQâÈ5© Ð ¤5© E ©Ó@rã °«5Db :5 g@_<\</@ @®-àbZ¥`ö Í§_B @hã3bBb ¥õ#ö<÷H§é * õåù5÷ 7GN@l 28SHõåK/ùïJ;PQûC7:98÷%K $ûVü
h©8 rÜ<bWB@ $^ü%ûõûÅü½® ^©ÿ :bQ¢c8 rÜrã D @ Ý¢D%©ªâ ® (°DÍÂÈ « %Db<
:Í©k£bF«8@<âÈ¢3 ác£b©¢3¤DÍL@330©ªâÐD @L@Â ©r® `ïr « ©æ @bc£/ :5èõê/ Xè
@rãÍ :báL@33Ö© @bï @ Ý¢å5/® (°DbBè¬âµ©¨83«5 rç¬8ã @ Ý¢åÖ©ªâÜõ°è'ÿZï :bæ÷ @ © @b%L@33Fÿ3 «5:^õåù û÷û ü½® bd0 Db â c©¢Þ :© © M@c r@¢35 ½® c®
(°3¢½5è
E
ü é ¥õ#ö<÷B§ c÷ Oõé õ ù ÷ c÷ Oõ
é õ ù ÷ c÷ Oõé õ ù 1ü $1 õ cõké 1¬. ü $
=hbæ«55è
é 1¬0ü ,2. $ h©ÿV :bï¢½«5©dr¢3ª # ¥ w§>® (°3:k«5©È<5D©rãÔ©îD
b éó¬¥õ#ö8÷H§Èø *ûÅõìû üOö8õåùkû÷ûÅõ¡ý¬® ©¢«8@Þ 53:k£ +ã@8@ÿ'3g
ã@ @L@0® °© è
Ð§~é 1¬. ü õ ù ÷ ½÷ Oõé 1¬. ü õ ù ÷ ½÷ cõ
# ¥
1¬ü
å
õ
ù
"
º
$
õ
#
é . õ ù 1 cõké 1h* $|ò
1
'&
1
(
$
$
1
$
1
÷
÷ÔéCõåù
ü
!
"
÷Ôé õ
ü
õ
7GIOÑHJ8K # T 3 f-SHKÔ[:7:I@S½2_98SD4@ABK/AßJ8K/I@7:NO?+7G9 õ ù û ÷ûü Twf-SHKÔAHK>?H987R2j+7:9¨]rNO987R2<7RaOK
N aOK>Jh2<SB7:9^J8K/I@7:NO?°Tf-SHKFSD4628=5SHK/AÞJLK/IO7GNO?7G9^28SHKKbaOK>?½2 7:?c28K/J89LK/=b2<K/AºPQ7R2<S
õ ù ûC÷%ûVü T
MÍ& 0.µP
" MQR G I. H$á& 7. J+*`$'./(0L" H
*
E
n'obp qOs t<s³uBqÍvHxÙØ@v âÜ¥Mö Í§'B @ ©D°ã@:bÈ£/¢¬ ©wÿ'¨05åâÈ¢¬æ«b © o 'è
Db+E DïH} #DZ| Ì` eg[ZYh^X e å} :hã3srÜæ8ã£ E o¨¥õ §Üé # ¥ é õ §Üé H
¥õ#ö<÷B§ ¥ # T # §
# ¥ éCõ#ö éì÷H§'é H
@rãFE Dï|H} rDÓ| Ìè[ Yh^ rN e å} Y:hã3s rÜæ8ã%£ E Ó¥÷B§áé # ¥ é ÷H§'é H # ¥é õ#ö é ÷B§'é H
¥õ#ö<÷HF§ $ ¥ # T .c§
1
E
½× Ë@ćÀ^Êc¶ oÖvHx Øh& å¢XcD© B@ o +:@ @bÍwD c£/ :'B@âµ© :©ÿ/® (°DÜ0@Â@r@ ã@:bb£/¢3 ©dâµ©á «5©È<5D©rãQ©ÖDW<©ÿº© @ ·ï@rãhDQ0@Â@r@ åã@:bÂÈ£/¢3Â ©
âµ© «5©È<5D©rã¨©0Dw«5© Ý¢3wBÖ© @ ·/®
\
D8 "! 8
Z%8 "
E
9
<
é
9%:T9<
b:T9<
;h:TE 9<
*
éü
@7:T9<
;h:T9<
:T9<
Ú©ÐLç½@dr :5è _o ¥ *c§'é # ,¬ü0*M@rã Öo ¥ªüX§'é $ ,¬ü0*D®¤ò
b:T9<
:T9<
9
n'o>prqOs t<s uBqwvHx ØhVCÚ©-«5©DD¢å©¢c¤L@rã3©@Èc£/ :55è°D-0@Â@r@ 3ã3bBb 5
@< E
E
E
E
oÖ¥õ §Üé ¥õ#ö<÷B§ c÷ ö 4@?BA Ó¥÷H§'é ¥õ#ö<÷B§ O!õ $ ¥ # T O§
(°D«5©b<5D©rã@3`0@Â@r@ Óã@:bÂÈ£/¢3Â ©âÈ¢3æ«bÂ ©B%@<ã3bæ©8ãg£ mZo
@rQã mÓ®
×½Ë@ćÀhÊc¶ o^vHxÙØ å ¢ cD© ÖB@
E
Ó¥õ#ö8÷H§'é âµ©-õ#ö<÷ *D® (°Db oÖ¥õ §Üé c÷Ôé ®-ò
×½Ë@ćÀhÊc¶ o^vHxÙØ å¢ E cD© ÖB@
¥õ#ö<÷B§'é õÔ* úê÷ 7GN@l 28SH*wK/ûCJ;PQõM7:98K ûV$ üOö *0ûC÷ûVü
(°Db
E
Ó¥÷B§'é ¥õÔúê÷H§ Oõé õ cõÍ
ú ÷ c÷Ôé 1ü ú÷ $ò
E
o ×½Ë@ćÀhÊc¶ o^vHxÙØ Cà#bZ¥MEö Í§¨B @^ã3bBb
¥õ#ö<÷B§áé * ù õåù5÷ 7GN@l 2<SBõåK/ùïJLPQûC7G98÷%K $ûVü
(°3¢½5è
E
E
÷ c÷wé 1¬ü õ ù ¥ªü1$ºõ §
1¬ü
ù
oÖ¥õ §Üé

¥
#
õ
<
ö
B
÷
§
c
Í
÷
é
õ
.
E
âµ© $hü^ûÛõMûVü@rã oÖ¥õ §Üé *ß©DbÈÿ': /®¤ò
#
2.
!#"+)
¨"+) ¨
"%$
Ð"+)M(
Ð& .bM
M
M
!
"
JLR
H
'n obp qOs t<s³uBqÍvHxÙØ ?X(#ÿZ© L@rã3© @bc£/ :5Ö @ rã @<ÐÂ# A'33 ^ â<è
âµ©Í @b @rã 0è
§ é # ¥ ! § # ¥! w§F$
¥ # T ðO§
# ¥ ! ö ! wÜ
`^ÿ'ÈªQ %®
?Ô]HJ87G?H=/7G]H[:K6Ò@2<N¨=5SHK>=5\wPQSHK>28SHK/J 4@?HA 4@JLK-7:?HAHK>] K>?HAHK>?½2ÜPÓKQ?HK/K>A028N¨=5SHK/=5\
K:½ÑD42<7:N@? ¥ # T ðc§ lNOJÖ4@[:[98ÑHiH9LK>289 4@?HA T NOJL28ÑH?D4628K/[Gj@Ò°PZK0SD4µaOKÔ2<SBKÔlN@[:[:N PQ7G?HI
J8K>98ÑH[R2¨PQSH7:=5SgPÓK09;254628KwlNOJ¨=/NO?c287:?½ÑHNOÑH9ÖJ84@?HAHNOEYa64@J87:4@iH[:K>9d2<SBNOÑHIOSg7G2Ö7G9d2<JLÑHKwlNOJ
AH7G98=/JLK>28K^J84@?HAHN@Ea64@J87:4@iH[:K>9Z2<N3NHT
E
¯QE ²Xo/uc»moÀ E à#b¡E @rã ¼B @ ©D°BãLâ o Z?® (°Db0 â^@rãk©D 0 â
¯Q²Xo Çªt;´ tªo6Àïo6q t ³s Ç o ¥õ#ö<÷B§'é o_¥õ § Ó¥÷B§áâµ©Ð@ @ Ý¢å5Qõ@rãh÷æ®
qXu6t »Âs 9@u½»:uBÆXÇ Ä@ob¾
±/´½ÆXÇ;o%t5²Xo%¿coqXÇÈs tÂ¸gs³Ç
¿cobp qXo>¿ uBqc¶ ¸ÆcÊ tªu
Ç;o>tªÇQu@Ì Àïoµ´XÇbÆc»:o =Hx ×½Ë@ćÀ^Êc¶ oÖvHx vByÞàb @rã ¼B @ÖDÜâµ© :©ÿ'3Fã@:bÂÈ£/¢3Â © 6
é * éü
+*
,6*
<
E
9
9%:;
9%:;
9%: @
E
E
9%:;
9%:;
9%: @
9%: @
%9 : @
9
E
°DbBè o¨¥a*c§Wé Eo_¥;üX§WéE ü-,1ß@rE ã Ó¥ *c§QE é ÓE¥ªüX§Qéü-,1HE ®^ @rEã @E <wrã3
DE brã3bDïE £b5«8@¢½E o_¥ *c§ E Ó¥ *c§dé ¥a*Bö *c§<è o_¥ *c§ Ó¥;üX§dé ¥a*BöµüX§<è o¨¥;üX§ ¥ *O§¨é
¥ªüOö *c§<è oÖ¥ªüX§ ¥;üX§Öé ¥;üOöµü §/® å¢ cD© 0Bbª8cãîB@Ó @rã B @0DQâµ© :©ÿ'3%ã@:bÈ£/¢¬ © 6
é * éü
(
9
<
9%: @
<
%9 : @
<
%9 : @
%9 : @
9
9%: @
%9 : @
&
&
0 !
"
E (°D5 @<Þæ©krã3Dbrã3bDk£b5«8@¢½
¥ *¬öµüX§'é H* ®-ò
E
E
_¥ *c§ Ó ¥;üX§Cé ¥;ü0,1O§/¥ªü-,1O§ é ü-,2.cb
o
×½Ë@ćÀhÊc¶ o^vHxÙv@Ø å ¢ cD© QB@å @rã @<Qrã3Dbrã3bD@rãÍ£b©ÍB @QD-/@
ã3bBb
E
¥õ §Üé *1õ 7RN6l 2<SH*ÔK>ûÛJLPQõ`7:9LK ûô$ ü
à#bÓ¢c rÜrã # ¥ú ûôüX§>® ¡b3rã3Dbrã3bæ«55èD ©D¤ã3bBb F:
E
E
E
¥õ#ö<÷B§'é o_¥õ § Ó¥÷B§'é .@* õå÷ 7RN@l 28SH*wK/ûÛJ;PQõ`7:9LK û$ ü@ö *wûC÷%ûVü
h©ÿè
E
K ¥õ#ö<÷B§ c÷ Oõ
# ¥ú ûVüX§~é
é . õ ÷ c÷ Oõ
;
¥
ü
º
$
é . õ 1 õ §;ù cõké ðü $ò
$
-f SHK^lN@[:[:N PQ7G?HIÍJ8K>98ÑH[R2Q7:9-SHK>[:]BlÑB[°lNOJ¤aOK>J87Glj37G?HIÔ7:?HAHK>] K>?HAHK>?H=/K6T
¯W²Xo>uc»mo6À E å¢XcD© îB@ÍD`L@33g©ªâÔ @rã :M GD© 5b£/ |IrÜD <5«b @3@ :/® â ¥õ#ö<÷B§Óé æ¥õ § #¥÷B§Üâµ©_ ©ÜâÈ¢¬æ«b ©B |@rã æ©Óæ5«555/@b r<©@£5c£/ Ý ã3bBb -âÈ¢3æ«bÂ ©B 0Db @rã @<^rã3Dbrã3bD ®
×½Ë@ćÀhÊc¶ o^vHxÙ2v & à#b# @rã B @^ã3bBb E
7Rl õ * 4@?HA ÷ *
1 Hù ¥õ#ö<÷B§'é *
N62<SHK>JL PQ7:9LK $ E
(°DÜL@33-©ªâ¡~@rã : D <5«b @3@ :Ó¥ *Bö &C§ ï¥ *BCö &C§>® M-«8@wÿ'È ¥õ#ö8÷H§áé
æ¥õ § #¥÷H§¨ÿDb< æ¥õ §'é 1 @rã #¥÷B§'é ù ?® (°3¢½5è¡ %®-ò
+*-,6*
*
1
@ð
!
"
(0"+)+.>$á./(0"PM R G .7H$'&Ü.IJ+*`$á./(0"LH
l 4@?HA 4@J8KÓAH7:9L=/JLK>2<K6Ò@2<SBK/?ÍPÓK-=µ4@?Í=>NOE0]HÑB2<K 2<SHKZ=/NO?HAB7G2<7GNO?D4@[cAB7:9L28J87GiHÑB2<7GNO?
N@l IO7Ga@K/?g2<SD42_PZK0SD4µaOKÔNOiB98K/J;aOK>A ¼é ÷ TÔ1¬]rK/=/7 D=µ46[:[Gj@Ò # ¥ é õ ¼é ÷H§ïé
# ¥ éõ#ö é÷H§ , # ¥ é÷B§ T%f-SH7G9Ð[GKµ4@AH9^ÑH9Ö2<NîAHK D?BKF2<SBK=/N@?HAH7G287:NO?H4@['E46989
lÑH?H=b2<7GNO? 4@9-lNO[:[GNXPQ9>T
n'obp qOs t<s³uBqÍvHxÙ2v V (°D YOr ^X #D( A}XåHs ^CB| `7èg[Z E Yh^X `
E
o E Ó¥õ#ö<÷B§
# ¥é õ#ö é ÷B§
é
o ¥õ ÷B§'é # ¥é õ éì÷H§é
Ó¥÷B§
# ¥ éì÷B§
E
â Z¥÷H§ *D®
+o´»moÞt5»mo/´¿Bs·q 9ôs·q
¿co>o6Ê '´Xtªo» ²Xo6»:oOx
²Xo6q 'oô±>uBÀhÊcÆ tªo
! é ÷B§
# ¥
s·q t5²Xo ±/u¬q t<sÎqcÆXuBÆXÇ
±/´Ç;o 'o ´»moÛ±>uBqX¿Bs ¾
t<s uBqOsÎq 9ÐuBq¨t5²Xo_o>ÁXoq t
ó é ÷rý W²Os ±6² ²´Ç
Ê»mu¬Ä´½ÄOs·¶Ùs t/¸ =Hx go
´ ÁXu¬s ¿t5²Os³Ç Ê»muBÄc¶ o6À
Äb¸ ¿co>prqOsÎq 9 t5²OsÎq 9OÇ
s·q tªo» ÀïÇ u@Ì t5²Xo
kZlÞx ¯W²Xo Ì´±bt
t5²´ tÐt5²Os Ç%¶ o/´¿cÇÍtªuß´
áo6¶Ù¶ ¾¿cobp qXo>¿ t5²Xo>u½»G¸
s ÇÊ»mu@ÁXo>¿ s·q~Àduc»mo
´X¿OÁćqX±/o/¿ ±/uBÆc»:ÇLo>Ç/x
+o ÇÈs·ÀhÊc¶ ¸ t;´ 5os t
´XÇQ´Ð¿cobp qOs t<s³uBq3x
3
3
ON J¨=/NO?c287:?½ÑHNOÑH9ÂH7G9L2<JL7:iHÑ¬2<7:N@?H9dPÓKFE ÑH9LK028SHKÔ9<4@E0KwAHK D?H7G287:NO?B9/TÔf-SHK07:?c2<K>J8]HJLK
2542<7:N@?ÀH7 æK>J89 7G?`2<SHKÐAH7:9L=/J8Kb2<KÍ=/4@98K6Ò o ¥õ ÷B§ 7:9 # ¥ éôõ Åé÷B§ iHÑ¬2_7:?î2<SHK
=/N@?½287:?½ÑHNOÑB9W=/4@98K6ÒDPZKÖEÍÑH9;2W7:?c2<K>IOJ<4628K_2<NÔIOKb2d4w]HJLNOiD4@iB7:[:7R2jOT
)
'n obp qOs t<s³uBqÍvHxÙv Ú©Ó«5©DÂD¢D©¢c 8@rã3© @bc£/ :55è DYOrs^ r#D A}Xå
HÂ ^CBê3Z`µ ^B e[ZYh^XrÛa`
E
E
o E ¥õ#ö<÷B§
o ¥õ ÷B§'é
Ó¥÷B§
E
5b¢3w3FB@ Z¥÷H§ *D?® (°DbBè
E
é o Z¥õ ÷B§ O!õ $
# ¥ ! ôé ÷H§'
3
3
3
×½Ë@ćÀ^Êc¶ oÖvHx v E Ûàb' @rã B @0¢¬D âµ©È ã@:bÈ£/¢¬ ©©DÍ¢¬D¤/ä/¢B@5/®
bÈ â ÖB@ o ¥õ ÷B§'éüÓâµ©*wûÛõMûVüw@rã<w©DbÈÿ': /® (°3¢½5èæ@ @b é ÷åè
: W?H7GlN@J8E < è 9 @® `w«8@Mÿ'Èª^3:ÐQ ôéì÷ W?H7Gl ¥ *Bö/üX§/E ®¤ò
J8NOE28SHKÔAHK H?H7G287:NO?gN@lÓ2<SHKÔ=/N@?HAH7G287:NO?H4@[áAHK>?H987R2jOÒ#PZK098K>K028SD462 o Ó¥õ#ö<÷H§dé
E
88 7Z%8 "! Z%"
E
¥ õ ÷B§ ¥÷B§^é
?HK 2ïK 4@E0]H[:K6T
×½Ë@ćÀhÊc¶ o^vHxÙv Cà#b
E
o o
E
$
¥÷ õ § do ¥õ § Tîf-SH7G9Ð=/4@?º98N@EFKb2<7:E0K/9ÖirKÑH9LK>lÑH[Z4@9Ð7G?28SHK
E
¥õ#ö<÷B§'é Ôõ* úê÷ 7GN@l 28SH*wK/ûCJ;PQõM7:98K ûV$ üOö*0ûC÷ûVü
E
à#bh¢c rÜrã # ¥ ~ü-,2. é -ü , # §>® È Lç½@dr : ½® @ `ÿZ%/@ÿ B@ Ó¥÷B§Fé
÷¨úì¥;ü-,1@§/®=^bæ«55è
E
E
o E Ó¥õ#ö8÷H§
õÔú÷ $
é
o ¥õ ÷H§'é
Ó¥÷B§
÷Öú ù
°© è
E
ü
ü
B . é # é o Hõ #ü Oõ
é õw úú Oõé ù ú ú é #ü7.1 $ºò
ù
ù
×½Ë@ćÀhÊc¶ o^vHxÙv ? å¢ cD© ÖB@
ï?H7Rl ¥ *Böµü §/®¤{áâÈªbÍ©@£/ @D3% @ Ý¢DÍ©ªâ- ÿZ
3bæbL@ª éC1õ ï?H7RlNOJ8E ¥õ#öµüX§>® %B@':_D_0@Â@r@ æã@:bÈ£/¢¬ ©ß©ªâ ÚÜ<bÓæ©^B@è
E
ü 7Rl *0ûÛõ`ûVü
o¨¥õ §Üé * N@28SHK/J;PQ7:9LK
@rã
E
7Gl * õ ÷ ü
o_¥÷ õ §áé * N@2<SHK>JLPQ7G98K $
°© è
E
E
E
7Rl * õ ÷ ü
o Ó¥õ#ö<÷B§'é
o_¥÷ õ § od¥õ §Üé * N62<SHK>JLPQ7:9LK $
(°DÖ0@Â@r@ @âµ© :
E
E
Oõ é $ /
Ó¥÷B§'é o Ó¥õ#ö<÷H§ Oõké é $ [:N@I ¥;ü $Þ÷B§
ü $|õ
âµ© * ÷ ñü½®-ò
1
!
"
E
×½Ë@ćÀ^Êc¶ oÖvHx'& =þÓ©Bbã3bkDîã3bBb ß ç½@dr : ½® @½®Cà#b rÜrã E o ¥÷ õ §/®
Dbß é õ°èï÷Ûw¢cbd/@Â:â %õ ù û¼÷ û~ü½® -@È Ý b<è^ÿZ/@ÿB@ oÖ¥õ §Ôé
¥ 13-ü , §õåùX¥;Dü $ºõ >§>® =hbæ«55èDâµ©E-õåùWûC÷%ûVü6è
E
ù õåù<÷ é 16÷ $
E ¥õ#ö<÷B§ é
o ¥÷ õ §áé
ù õ ù ¥ªü $ºõ § ü $|õ
o¨¥õ §
^©ÿ :b¨¢cÈ «5©dr¢3ª # ¥ # ,2. é -ü ,1O§>+® (°3: «8@C£b`«5©dr¢38ã£ rÜ<b
æ©3FB@ o_¥÷ Î-ü ,1@§áé # 1÷ ,¬ü H® (°3¢c5è
E
# ¥ # ,/. é0ü ,1O§'é ¥÷ ·-ü ,1O§ ½÷hé # ü 16÷ c÷wé ü $ $|ò
!
"
$
* R>$á. BMÍ&Ü.7Mh$$
G .7H$'&Ü.IJ+*`$á./(0"LHNMÍ"+) M! LR
H
L
¡K>2 éY¥ öI$I$7$/ö8 E § PQSHK/JLK öI$7$I$>öL 4@J8KÔJ<46?HAHNOEYa@46J87m46iH[:K>9/T Kw=µ4@[G[
4 8@rã3© @5«b© T ¡K>2 ¥õ Iö $I$7$>ö8õ § ABK/?HN@28 Kï28SHK_]rABlªT 2¤7G9Ó]rNO9L987:iB[:KQ2<NÐABKD?HK
2<SBK/7:JEF4@J8IO7G?D4@[G9/Ò=/NO?HAB7G2<7GNO?D4@[G9#K>2<=6TEÍÑB =5SÍ2<SHK¤984@E0KÓPZ4 jÍ4@97G?h28SHK¤iH7Ra@46J87m42<K =µ4@9LK@T
K^984µj%2<SD42 7ö $I$I$/öL 4@J8KÖ7:?BAHK/]rK/?HABK/?c2W7GlªÒBlNOJ-KbaOK>JLj 7ö $I$I$/ö Ò
¥ # T$ §
# ¥
! Iö $I$I$>ö8 ! §Üé !,+ # ¥ ! ! 1! F§ $
E
E
2Ü98Ñ F=>K/9 2<N^=5SHK/=5\Í28SD462 ¥õ Iö $I$I$>ö8õ §Üé !, + o J ¥õ ! § T l Iö $I$7$>ö8 E 4@JLK-7:?HABK
]rK/?HAHK>?c2Q4@?HAkK/4@=5SkSD4@9Z28SHK_984@EFKïE46 J8IO7G?D4@[DAH7G9L 2<JL7:iHÑ¬2<7:N@?0PQ7R2<S% AHK>?H987R2j Ò½PÓK¨9<4µj
2<SH462 7ö $I$I$bö8 46J8K # # kV¥ 7:?HE AHK>] K>?HAHK>?½2_4@?HAî7:AHK>?½287:=/4@[:[RjAH7G9L28J87:iBÑB2<K>A § T K^98SD4@[G[
PQJ87R2<K^2<SH7G9_4@9 7ö $I$I$ª NOJ/Ò 7:?`28K/JLEF9ïN@l'2<SHK i>kl Ò Iö $I$I$; m Tïf-SH7:9
E0Kµ4@?H9^2<SD42 Iö $7$I$/ö8 4@J8K07:?BAHK/]rK/?HABK/?c2wAHJ84 PQ9hlJ8NOE 2<SBK9<46EFK0 AH7:9;2<JL7:iHÑB287:NO?æT
Kh46[:98Nw=/4@[:[ 7ö $I$I$bö8 4 L@rã3©/@dr : lJLNOE m T
îÑH=5SMN6l9;2546287:9L287:=/4@[æ 2<SBK/NOJ;j 4@?HA ]HJ<4@=b2<7G=/KÖi K>IO7:?B9QPQ7R2<S # # k N@iH98K>JLa6462<7GNO?H9W4@?HA
PÓK^ 9LSD 4@[G[¡9L2<ÑBABj%2<(SB7:9Q=µ!4@ 9LKÖ|7:?(0AHK>&á2<$4@MÐ7:[ "kPQSH$ K/? PZKÖ*LAH7GR>98$á=/ÑB. B989WMÍ9L25&Ü4.72<M^7:9;2<$7G
=/9/T G .7H$'&Ü.IJ+*DU
$á./(0"LH
)
\+ 8_ dZ0
"! Z%"
ÜT f-SHK_EÍÑB[G2<7Ra64@J87:462<K¤a@K/J89L7:NO?N@l4ÍÕZ7G?HNOE07m4@[H7G9Ó=/4@[:[GK/A4 îÑH[R2<7
?HNOE07m4@[ÂT NO?H987GAHK/J AHJ84 PQ7G?HI^4îH4@[:[BlJLNOE 4@?0ÑHJL?0PQSH7G=5SSD469 iH4@[:[G9áPQ7G2<S8=wAH7 K/JLK/?c2
=/NO[GNOJ89[m4@irK/[GK/AÛ=>NO[:NOJ ü ÒW=/NO[GNOJ 1 Ò^TRTGTÅÒï=>NO[:N@J\åT ¡Kb2 Dé ¥ D öI$I$7$/ö D § PQSHK>J8K
D
* 4@?HA F + D wé ü 46?HA9LÑH]H]rNO98K02<SD42 D 7:9_2<SBKF]HJLNOiD4@iB7:[:7R2j`N6l¤AHJ84 PQ7G?HI
4%iD4@[G[N@l =/N@[:NOJ DT J<4µP A 2<7GEFK>9 ¥ 7G?HAHK/]rK/?BAHK/?c2ÖAHJ84 PQ9¨PQ7R2<S+J8K>]H[m46=/K/E0K/?c2 § 4@?HA
[:Kb2 é ¥ Iö $I$7$/ö8 § PQSBK/J8K 7G9¤2<SHKÖ?½ÑHEÍirK/J-N@l2<7GEFK>9¤2<SD42Q=/NO[GNOJ 46]H] K/4@J89>T
UïK>?H=/K6Ò Aêé F + T Kw9<4µj+2<SD42 SH4@9Ö4 îÑH[G287:?HN@EF7:4@[ s¥ A Ò Dr§ AH7G9L2<JL7:iHÑ¬2<7:N@?
PQJ87R282<K>? îÑB[G2<7G?HNOE07m4@[ a¥ Aö Dr§ T'f-SHK^]HJLNOiD4@iH7G[:7R2j0lÑB?H=>287:NO? 7:9
E
¥õ §Üé õ $7A $I$Lõ D D ¥# T §
PQSHK/JLK
A
A é
õ $I$7$Lõ õ õ $
6oÀhÀd´wvHx'&Hy å¢ cD© FB@Ó ÑH[G287:?HNOE07m46[ ¥aAö Dr§FÿDb<^ é~¥ öI$7$I$/ö8 §
@rã D%é F¥ D Iö $I$I$>ö D §/?® (°DÖ0@Â@r@ #ã@:bÈ£/¢¬ ©|©ªâ¤ w: Qæ©w@ A#è D @®
) 4B'(# +(#I+ ' " (; +.B TFf-SBK0ÑB?H7Ga64@JL7m4628K 'ïNOJLE46[áSD4@Ag2PZN ]D4@J84@E0K>2<K>J89>Ò
Û4@?BA áT ?|28SHKEÐÑH[G287Ga64@J87:462<KhaOK/JL987GNO?°
Ò 7:9h4a@K/=b2<NOJw46?HA 7:9^J8K/]B[m4@=>K/A|icj4
E42<J87 TÜfNFirK/I@7:?°ÒH[GK>2
TT
é
PQSHK/JLK Iö $7$I$/ö ºa¥ *BöµüX§ 46J8KÖ7:?HABK/]rK/?HAHK>?c2µTÜf-SHKÂHK/?H9L7G2jkN6l 7:9
E
E
¥ c§Üé !,+ ¥ ! § é ¥ 1#ü § ù K ] $ 1ü H + ù é ¥ 1 ü § ù K ] $ 1ü $
Kd9<4µjF28SD462 SD4@9¤4Ð9L2<4@?HAD4@JLA%EÍÑH[R2<7Ra@46J87m42<K 'ïN@J8EF4@[åAH7:9;2<J87GiHÑB287:NO?ÔPQJ87R282<K>? *
ºa¥ *Bö ¬§ PQSHK>J8KÔ7G2¨7G9_ÑH?HABK/J89;2<N3N¬A+28SD462 * JLK/]HJLK/98K>?c2<9h4ka@K/=b2<NOJ¨N@l = /K>J8N3K/9Ö4@?BA
7:9¤28SHAK = =k7:ABK/?c2<7R2j%EF462<JL7 T
îNOJ8KFIOK/?BK/J<46[:[Gj@Ò4aOK>=>2<N@J SH4@9Í4 EÍÑH[G287Ga64@JL7m4628K 'ïNOJLE46['AH7:9;2<J87GiHÑB287:NO?°Ò#AHK
?HN@28K/A i½j |¥ ö -§ ÒD7Gl7G2QSH4@9QAHK/?H9L7G2j
E
¥õ#ø ö -§Üé ¥ 1§ ù AHü Kb2 ¥ -§ ù K ] D$ 1ü ¥õ $ § ¥õ $ § ¥ # T §
) 4B'(#&% $
;H#I+.B
)
6
$#
"!
ÃÌ
c´ qX¿ `´6»:oMÁXo/±b¾
tªu½»:Ç¼t5²Xo6q é
F !,
+ ! ! x
!
)
%
%
'
s Çt5²Xo_sÎq ÁXo»mÇ;o-u@Ì
t5²XoÔÀd´ t5»Âs Ë _x
&
ð*
!
"
PQSHK>J8KwAHKb2 ¥ § AHK>?HN@2<K>9¨2<SHKÍABK>2<K>J8E07:?D46?½2_N@l 4%E42<J87 Ò Þ7:9d4aOK>=>2<N@J¨N@l [:K/?BI@2<S =
8¼Àd´Xt5»s Ë s Ç`ÊOu@Çª¾ 4@?HA 7:9Ô4 = =º9Lj3E0EFKb2<J87G=@ÒÜ]rNO987R2<7Ga@K%AHK D?H7G28KE4628J87 Tº1¬K>2L2<7G?HI é * 4@?HA
s t<s ÁXoº¿cobp qOs ªt oÛs Ìª¹hÌuc» é IO7RaOK/9QiD46=5\2<SHKÖ9;254@?HAH4@J8A 'ïNOJLE46[ T
ć¶·¶ qXuBqµ¾ o»muÁXo>±>tªuc»mÇ
#õ ¹Hõ Ó *Hx K 7G9L1¬2<9F7:?B4g=/K E46287:J89h7 9Lj3EFE0 ù K>28J87G=%=µ464@[:?H[:K>AÞA]r28NOSH9LK7G2<7R9 aO½K%ÑD4@ABJLK KîD?HJL7RN¬2<N@K620Ò'7GN@2Íl =/4@?Þ irKkPQ987GSH28SN PQ28?ÞSHK 2<SDlN@4[:2Í[:N 2<PQSH7GK>?HJ8IK
]HJLNO] K>JL287:K/9 +¥ 7 § ù 7:9Ð9Lj3EFE0K>28J87G=@Ò ¥ 7G7 § é ù ù 4@?HA ¥ 7G7:7 § ù ù%é
ù ù-é PQSHK>J8K ù-é ¥ ùb§ T
¯Q²Xo/uc»moÀ â |¥ *Bö B§ß@rãß é ú ù Db º¥ ö -§>®
þÓ© @b< b Xè' â¤
|¥ ö -§5è Db ù6¥ $ § ºa¥ *Bö ¬§/®
1¬ÑH]H]rNO9LKFPÓK%]D4@J;2<7G287:NO?ß4 J<4@?HABNOE 'ïNOJLE4@[áa@K/=b2<NOJ 4@9 é ¥ ö8 § K
=µ46?î9L7:E07:[m46J8[GjÍ]D4@JL287G287:NO? é ¥ ö § 4@?BA
é $
¯Q²Xo/uc»moÀ à#b
º¥ ö -§>?® (°Db 6
9 L(°D^0@Â@r@ ¡ã@:bÈ£/Â ©|©ªâ¤ :- º¥ ö §/®
g@ L(°Dw«5©rã@ ©r@ ¡ã@:bb£/ ©|©ªâ¤ @ @b% éCõ :
éCõ ú ¥õ $ §bö $ $
â +:h @5«b©ÖDb r
º¥ ö ¬§>®
; é ¥ $ § ¥ $ § ù
®
!
!
+*
,
!
1
%
,
'
+*
"!
%
*
1
Ô & MÐ"LHO(0& Mh$á./(0"LHY(J MÐ"+)M( MÐ& .bM JLR
H
E
1¬ÑH]H]rNO9LK028SD462 7G9^4 J84@?HAHNOE a@46J87m46iH[:KÍPQ7R2<S kl o 4@?HA i>klo mZo T ¡K>2
é B¥g§ irK%4lÑH?H=>287:NO?N@l ÒlNOJhK 4@EF]B[:K@Ò é ùÐNOJ é T KF=µ4@[G[
é B¥ß§ 4¨2<J84@?H9LlN@J8EF462<7GNO?wN@l T'UïN PCAHN^PZKW=/N@EF]HÑ¬2<KQ2<SHK kl 46?HA i>kZl N@l
e ?28SHKÖAH7:9L=/JLK>2<K¨=/4@98K6ÒH2<SHK^4@?B9LPÓK/JQ7:9¤K/4@9Lj@TÜf-SHKÖE4@9L9¤lÑH?H=b2<7:N@? N6l 7G9¤IO7RaOK/?
icj
+ " 8DZ%8"
E
ðBü
!
"
¥÷H§'é # ¥ ôéì÷H§'é # ¥ B¥ß§'éì÷B§áé # ¥ ó õ#ø H¥õ § é ÷rý6§áé # ¥
! ¥÷B§8§%$
×½Ë@ćÀhÊc¶ o^vHxW&h& å¢ cD© ^B@ # ¥ é $hüX§Zé # ¥ é ü §¤é ü-,2.ß@rã # ¥ +é *c§Zé
ü-,1H®Mà#b é ù ® (°DbBè # ¥ én*c§^é # ¥ én*c§^éü-,1Þ@rã # ¥ é Xü §hé
@È54/3\6 E
# ¥éüX§¡ú # ¥E é $hü § éôü-,1H® å¢¬w0
õ o¨¥õ §
÷ Ó¥÷H§
&
<
9
9%:;
9%: @
9%:;
c5QâµbÿZb @ Ý¢å50B@M
ò
9
9
<
9%: @
%9 : @
£b5«8@¢c 0D0ÂL@Bâµ©È0@Â ©|:Ôæ©Ö©æ © 8©æ/®
&
&
E
f-SHKh=>NO?c2<7G?3ÑBNOÑH9Q=µ4@9LKh7G9-SD4@JLAHK/J>T f-SBK/J8K^4@JLK¨2<SHJLK/KÖ9L28K/]H9QlNOJ D ?HAB7:?HI '
-f SBJ8K/KÖ9;2<K/]B9WlNOJ¤28J<4@?B9LlNOJLE46287:NO?B9
ü T NOJ-K/4@=5S ÷ Ò D?BAî2<SBK^9LK>2 éó õMH¥õ §Zû ÷rý T
1 T 7G?HAk2<SHK i>kZl
mÓ¥÷H§ é
# ¥ û ÷B§áé # ¥ H¥ß§ZûC÷H§
E
é # ¥ó õ#ø B¥õ §¤ûC÷åý6§áé o¨¥õ § cõ
E
# T-f-SHK kl 7:9 Z¥÷H§'é m ¥÷B§ T
E
E
×½Ë@ćÀhÊc¶ o^vHxW&hV à#b o¨¥õ §`é âµ©0õ *D® (°Db mZo¨¥õ §îé oÖ¥ § é
ü $ ®Qà#b éB¥g§'é [:NOI ê®?(°Db éôó õ`%õ`û ýî@rã
mÓ¥÷B§'é # ¥ ûC÷B§'é # ¥ [:NOI ûC÷H§'é # ¥ û §'é m!o¨¥ §Üéôü1$ $
E
(°Db<âµ©55è Ó¥÷B§áé âµ©Q÷ !`d®Öò
¥ # T üI*c§
ð\1
!
"
×½Ë@ćÀ^Êc¶ oÖvHx'& Ûàb# ï?H7Rl ¥%$hü@ö # §/®-ÚÜrã0D-BãLâÐ©ªâ é MùX®?(°Dhã3bBb k©ªâ
:
E
ü-,2. 7Gl $ ü õ #
oÖ¥õ § é N@2<SHK>JLPQ7G98K $
*
8« @ñ©D ß c @ Ý¢å5 ¥a*Bö §>® Óþ ©Bbã3bkÂÿZ©ê«8 76" D* ÷ E üê@rã
küû÷
®FÚ©Ô«8 è é p $ ÷ ö ÷\qd@rã?m.ZE¥÷H§Wé o¨¥õ §Oõ|é
¥;ü0,1O§ ÷ ®ÓÚ©Z«8 è é p $^üOö ÷\q¡@rã m.Ó¥÷B§'é o¨¥õ §Oõé ¥ªü-,2.½§>¥ ÷¬ú
üX§>® Ö _b<bDÂ@3Q m¼ÿZd3b
7Gl * ÷ ñü
E
7Gl ü ÷
Z¥÷H§'é
N@2<SBK/JLPQ7G98K $ºò
*
(
SHK/? 7:9Ö9;2<J87G=>28[GjME0NO?HN62<NO?HKw7G?H=/JLKµ4@9L7:?HINOJÖ9;2<J87G=>28[GjME0NO?HN62<NO?HKwABK/=/JLKµ4@9L7:?HI
2<SBK/? SD4@9W4@? 7:?caOK>J89LK é 4@?BAî7G?28SH7:9-=/4@98KÖNO?BK^=/4@? 98SHN P 28SD462
E
E
¥÷B§ $
Ó¥÷B§áé o_¥ ¥÷H§L§ ¥ # T üOüX§
c÷
BE
& MÐ"LHO(0& hM $á./(0"LH (
¨& M
MQJLR
H
R
Í"+)M(
M
Ð&Ü.%U
M
?%98N@EFKW=µ4@9LK/9ZPÓK¨4@JLKï7G?c2<K/JLK/9;2<K/Ak7G?F28J<4@?H9;lNOJ8EF46287:NO?ÔN@l¡9LK>a@K/J<46[°J<4@?BAHNOE a64@J87
4@iH[GK/9>T N@JWK 4@EF]B[:K@Òr7Gl 4@?BA 4@J8KhIO7Ga@K/?`J84@?HAHN@Ea@46J87m46iH[:K>9/ÒDPZKhEF7GIOSc2QP¤4@?c2
2<N`\3?BNXPÅ2<SBKkAH7:9;2<J87GiHÑB287:NO?N@l , Ò ú Ò'EF4 åó `ö 0ý NOJhEF7G? ó ME ö 0ý T ¡K>2
¼é H¥Mö Ð§ irKF2<SBKlÑH?H=b2<7:N@?ºN@lQ7:?c28K/J8K>9L2/TMf-SHK%9;2<K/]B9ÍlNOJ D?BAH7:?HI 4@J8K02<SHK
9<46EFKÖ4@9QirK>lNOJLK
)
_+ " 8DZ%8"
ü T ON J-K/4@=5S Ò H?HA
1 T 7G?HAk2<SHK i>kZl
m ¥ c§ é
é
E
# T-f-SHK>? ¥ c§Üé m
!
"
28SHKÖ98Kb2 é ó¬¥õ#ö<÷B§Ó B ¥õ#ö<÷B§Óû ¬ý T
"_
ð#
¥ û c§ é # ¥ B¥`ö Í§¤û c§
# ª¥ ó¬¥õ#ö8÷H§Èø H¥õ#ö<÷H§ û 3ý6§ é ¥ c§ T
#
E
Ó¥õ#ö8÷H§ Oõ ½ ÷ $
o ½× Ë@ćÀhÊc¶ o^vHxW& Cà#b¤ 8ö ù W?H7Gl ¥a*BöµüX§k£b%rã3Dbrã3bD ®|ÚÜrã+Dã3bBb|©ªâ
é úÞ ù ® (°D ©DÓ3ã bBbk©ªâh¥ ö8 ù §Ö:
E
¥õ öLõ ù §'é *ü N@* 28SHK/õJ;PQ 7:9LK $üOö* õ ù ñü
à#b B¥õ ö8õ ù § é õ úÞõ ù ® h©ÿè
mÓ¥÷H§ é
# ¥ ûC÷H§'é # ¥ H¥ ö8 §ZûC÷H§
ù
E
é # ¥ó¬¥õ ö8õ ù §- B¥õ ö8õ ù §ZûC÷rý6§'é ¥õ ö8õ ù § cõ Oõ ù $
h©ÿ«5©5ÍDÍB@LãwB@È 68rÜrã@3 ®0ÚÜ<b¤b¢XcD© ÍB@ * V÷gû ü½® (°Db
:%Db@3@E :%ÿ'
@bÈÂ «55k¥ *¬ö *c§bö ¥÷rö *c§@rãß¥ *¬ö<÷H§>® °50ÚÜR@¢3< ½®@® b
3:+«8 5è ¥õ öLõ ù § Oõ Oõ ù : D`@<8©ªâ3: ÂÈ@3@ :ÿ3 «5ß÷3ù ,h1H® â
ü ô÷ 1ßDb :F @b 33î|DÔ¢3D-/ä/¢B@<%Lç¬«5r-DÔÂÈ@3@ :Íÿ'
@bÈÂ «55Ð¥ªüOö<÷ $CüX§böX¥ªüOöµüX§ÈöX¥÷ $CüOö/üX§/® (°3:d bÜBh@<8ü $º÷3Fù ,1H® (°Db5âµ©<5è
*
÷ *
ùDü $ *wüÖûû ÷F÷FûVû 1ü
m.Ó¥÷B§'é
ü ù ÷ 1$
ã@ _b5bD@ ©BèÜD kZl :
÷ *0ûC÷%ûVü
E
Dü $Þ÷ ühûC÷%û 1
Ó¥÷H§'é
N@2<SHK>JLPQ7G98K $ºò
*
%
$
$
1
1
ðh.
ü
¥a*Bö<÷B§
¥÷$Û üOö/üX§
ü
!
"
¥;ü@ö<÷0$Cü §
*
¥ ÷rö *c§ ü
*
ü
f-SH7:9Q7:9¤28SHKÖ=µ4@9LK *0ûC÷%ûVü T f-SH7G9W7G9¤2<SHKÖ=/4@98K ü ÷%û 1 T
7:IOÑBJ8K # T ðB f-SHK^98Kb2 lNOJ-K 4@EF]B[:K # T . $ T
Ö E
¨, +"+.µ, M R ¨"+)+.
WK>=µ4@[G[2<SH462h4]BJ8NOiD46iH7:[G7G2jîEFK/4@98ÑHJLK # 7:9ÖAHK D?HK/ANO?4 DK>[:AN@lZ49<4@E
]H[GK`98]H4@=/K T Ï J<4@?BAHNOE a@46J87m46iH[:K 7G9%4 |½ `7[}XH EF4@] W~ T
îK/4@98ÑHJ84@iH[GKÊ0Kµ46?H9¤2<SD42µÒHlNOJ-KbaOK/J;j õ Ò óµ¦ÛÔ¥m¦Z§ZûÛõ¡ý ! kT
î>, ¨&Ü,ï7. H$
H
ü Tï1¬SBNXPì2<SD42
# ¥é õ §Üé mF¥õ § $ mF¥õ §
4@?BA
mF¥õ § $ mF¥õ §'é # ¥ ûÛõ § $ # ¥ ûÛõ %§ $
ù
ù
T ¡K>2 i Kß98ÑB=5Sñ2<SD42 # ¥ é 1O§ßé # ¥ é # §ßé -ü ,¬0ü * 4@?BA # ¥ é
1 1
O§é ,¬Iü * T 7'[:N@2Ô28SHK iTkl m T ï9LK m 28N D?HA # ¥ 1 û . $ § 4@?HA
# ¥ 1ÔûÛ û . $ § T
# 1T 7'J8N a@K ¡K/E0EF4 # T ü T
T ¡K>2 SD4µaOK^]HJLNOiD4@iH7G[:7R2jFABK/?H9L7G2jklÑH?H=b2<7GNO?
. 1
E
-ü ,2. *
õ ü
o_¥õ §Üé
#, # õ
N@28SHK/J;PQ7:9LK $
*
*
*
4)
)
ð
N
c "_
"
¥ 4 § 7:?HAk28SHKÖ=/ÑHEÍÑB[m46287GaOKÄB7:9L28J87GiHÑB2<7GNO?lÑH?B=>2<7GNO?N@l T E
¥ i § ¡K>2 Åéü-, T 7:?BA`2<SHKw]BJ8NOiD46iH7:[G7G2jkAHK>?H987R2j`lÑH?B=>2<7GNO? ¥÷H§ lNOJ T
Uï7G?c2 NO?H9L7:AHK>J-2<SHJLK/KÖ=µ4@9LK/9 ûC÷%û Ò ûC÷%ûVü 4@?BA ÷ Vü T
T1¡K>2 46?HA irKFAB7:98=>J8Kb2<K0J<E 4@?HABNOEa64@J87:4@E iH[GK/9/TE 1¬SHN P 28SD462 4@?HA 4@JLK
7:?BAHK/]rK/?HABK/?c2d7Rl4@?HA NO?H[Gj7Gl o Z¥õ#ö<÷B§'é o¨¥õ § Ó¥÷B§ lNOJQ4@[G[ õ 46?HA ÷ T
E
ð T1¡K>2 SD4µa@KWAB7:9L28J87GiHÑB2<7GNO? m 4@?HAwAHK>?H987R2jÍlÑH?H=b2<7:N@? 4@?HAÔ[:Kb2 ÛirKQ4_98ÑHiH9LK>2
N@l2<SHKÖJLKµ4@[¡[G7:?HK6T ¡K>2 ¥õ § irK_2<SHKÖ7:?BAH7:=/462<NOJÓlÑH?H=>287:NO?klNOJ
¥õ §Üé *ü õõ!! , $
¡K>2 é ¥ß§ T 7:?HA 4@?îK ]HJLK/989L7:NO?îlNOJ-2<SHK^=/ÑHEÐÑH[m46287Ga@KÂB7:9L28J87GiHÑB2<7GNO?kN@l
T ¥ Uï7:?c2 DJ89;2 H?HA 28SHKÖ]HJ8N@iD4@iH7G[:7G2jFEF4@989¤lÑH?B=>2<7GNO?klNOJ T §
$ 1T ¡K>2 4@?BA irKZ7G?HAHK>] K>?HAHK/?c2á4@?HAÐ9LÑH]H]rNO98KÓ2<SD462Kµ46=5SÍSDE 4@94 ï?B7GlNOJLE ¥ *Böµü §
AH7G9L2<JL7:iHÑ¬2<7:N@?°T ¡K>2 é E07:? ó Mö Ôý T 7:?HA%28SHKÖAHK/?B987G2j ¥ c§ lNOJ TÜUï7:?c2
2WEF7GIOSc2-irK^K/4@987GK/J¤28N HJ89L2 D?BA # ¥ O§ T
1T ¡K>2 SD4µaOK^=/ABl m T 7:?HAk2<SBK^=>ABláN6l é EF4 å/ó *BöLºý T
1T ¡K>2 ] ¥ § T 7G?HA m¥õ § 46?HA m ¥ O§ T
ü0* T ¡K>2 4@?BA i K¨7G?HAHK>] K>?HAHK/?c2/TZ13SHN Pì28SD462 æ¥g§ 7:9Ó7:?HAHK>] K>?HAHK>?½2WN@l #¥ Ð§
PQSHK>J8K 4@?HA 46J8K¨lÑH?H=b2<7:N@?H9/T
üOü TW1¬ÑH]B] NO9LK¨PZK_28NO989W4Í=/N@7:?kNO?H=>KÖ4@?HA[:Kb2 D i K_28SHK¨]HJLNOiD4@iH7G[:7R2jÔN@lSBKµ4@AH9>"T #Kb2
AHK/?BN@2<KÖ2<SHKÖ?½ÑHEÍirK/JQN@lSHK/4@AH9W4@?HA [:Kb2 ABK/?HN@28K¨2<SHKÖ?½ÑHEÍirK/JQN@l254@7G[:9>T
¥ 4 § 7ÜJLN aOKÖ2<SD462 4@?HA 4@JLKÂBK/]rK/?HAHK>?c2µT
¥ i § #Kb2 7N@7:989LNO? ¥ § 4@?BA9LÑH]H]rNO98KwPÓK028NO989^4k=/NO7G? 287:E0K/9>T ¡K>2
4@?HA irKî2<SBK`?½ÑHEÍirK/JFN@l¨SHK/4@AH9k4@?BA 254@7G[:9>T 1¬SHN P 2<SD462 4@?BA 4@JLK
7:?BAHK/]rK/?HABK/?c2µT
ü-1 T 7'J8N aOK^f-SHK>NOJ8K>E # T # # T
!
!
!
ðOð
!
"
ü # T1¡K>2 º¥a*BöµüX§ 46?HAî[GK>2 é o T
¥ 4 § 7:?BA%2<SHKÖ]rABllNOJ T"7Ü[GN@2-7R2µT
¥ i §+¥ æuBÀ^ÊcÆ tªo6»0×½Ë3Ê@o6»Âs·Àdoq tbx·§ ¨K/?HK>J<4628Kg4Þa@K/=b2<NOJ õ é ¥õ öI$I$I$bö8õ §
=>NO?H987G9L287:?HIïN@l CJ0ü * Ò *** J<4@?BAHNOEô9L2<4@?HAD4@JLA'ïNOJLE46[:9/T ¡K>2 ÷Ôé ¥÷ ö7$I$I$/ö8÷ §
PQSHK>J8K ÷ ! é T J<4µP 4^SH7:9;2<NOIOJ84@E N@l ÷ 4@?HA=/NOE0]D4@JLKd7G2 2<Nh2<SHK kZl jONOÑ
lNOÑB?HA7:?]D4@J;2 ¥ 4 § T
üI. 1T ¡K>2 ¥Mö Ð§ i K-ÑB?H7GlNOJLEF[RjÂH7:9;2<J87GiHÑB28K/AÔNO?w2<SHK-ÑB?H7G2ÜAH7G98= ó¬¥õ#ö<÷H§Z¨õ ù úk÷ ù û
ü@ý (T ¡Kb2 é ù ú ù T 7:?HAk2<SBK^=>ABl'46?HA] ABlN@l ÍT
ü T ¥ 8 ÆcqOs ÁXo6»mÇLć¶^»m´½qX¿cu¬À qcÆcÀhÄ@o6» 9Oo6qXo»´ tªuc»xÙ§ ¡K>2 SH4 a@K 4 =/N@?½287:?½ÑHNOÑB9/Ò
9;2<J87G=>28[GjÖ7:?B=/J8K/4@987G?HI iTklnm T ¡K>2 é m¥ß§ T 7G?HAh28SHKZAHK>?H987R2jÐN6l Tf-SH7:9
7G9 =/4@[:[GK/Aw2<SBKï]HJLNOiD4@iB7:[:7R2jÍ7G?½28K/IOJ84@[H2<J84@?H9;lNOJ8E T 'WNXPÛ[:K>2 W?H7GlN@J8E ¥ *¬öµüX§
4@?BAß[GK>2 é m ¥ § TÔ1¬SHN P2<SD42 Xm T 'WN P PQJ87R2<KÔ4%]HJLNOIOJ84@E 28SD462
2<4@\@K>9 ï?H7RlNOJ8E ¥ * Ò üX§ J<4@?HABNOE¼a@46J87m46iH[:K>9d4@?BAÌOK>?HK/J8462<K>9_J<4@?BAHNOE¼a64@J87:4@iH[:K>9
lJLNOE4@? ] N@?HK/?c2<7:4@[ ¥ § AH7G9L28J87:iBÑB2<7GNO?°T
ü ð 1T ¡K>2 7NO7:9L98NO? ¥ § 4@?HA @ 7NO7G989LNO? ¥ § 4@?HA04@9L98ÑHE0KZ2<SH462 4@?HA 46J8K
7G?HAHK/]rK/?BAHK/?c2µTÍ1¬SHN P2<SD462d2<SHKwAH7G9L28J87:iBÑB2<7GNO?`N@l IO7RaOK>?g28SD462 ¼ú é A
7G9QÕZ7:?HN@EF7:4@[ a¥ Aö § PQSHK>J8K +é ,B¥ ú § T
UW7:?c2 üO NOÑgEF4µjMÑH9LK028SHKÍlNO[:[GN PQ7:?HIl46=>2 l 7NO7G9898N@? ¥ § 4@?HA 7NO7:9L98NO? ¥ § ÒB4@?HA 4@?HA 46J8K_7G?HAHK/]rK/?BAHK/?c2µÒ¬2<SHK>? ú 7NO7:9L98NO? ¥ ú
§T
UW7:?c2 1¬ 'ïN62<K¨2<SD42 ó é õ#öWú é AýÖéVó éCõ#ö é A $õ¡ý T
ü $ 1T ¡K>2 E
¥õwú÷ ù 3
§ *0ûÛõ`ûVü 46?HA *0ûC÷%ûVü
o Ó¥õ#ö<÷B§'é N@2<SBK/JLPQ7G98K $
*
7G?HA ù é ù T
ü 1T ¡K>2 º¥ # öµü ðO§ T 1¬N@[GaOK2<SBKîlNO[G[:N PQ7:?HI+ÑB987:?BI2<SBK 'WNOJ8EF4@[¤2546iH[:K 4@?HA
ÑH9L7:?HI04w=/N@EF]HÑ¬2<K/JQ]D46=5\646IOK@T
¥ 4 § 7:?HA # ¥ $ § T
ð$
N
c "_
"
¥ i § 7G?HA # ¥ $1O§ T
¥ = § 7G?HA õ 9LÑH=5Sî28SD462 # ¥ õ §Üé $ * T
¥ A § 7G?HA # ¥ *ÔûÛ N.½§ T
¥ K § 7G?HA õ 9LÑH=5Sî28SD462 # ¥ õ ³§'é $ * T
7'J8N aOKÖlNOJLEÍÑH[m4 ¥ # T üOüX§ T
¡K>2 `ö ï?H7Rl ¥a*BöµüX§ irKÍ7:?HABK/]rK/?HAHK>?c2µT 7:?BA`2<SHK kZl lNOJ $ 4@?HA
, T
¡K>2 Iö $I$I$>ö8 ] ¥ § i K # # k T ¡Kb2 é E4 åó ö7$I$I$/öL ý T 7G?HA
2<SBK kZl N@l TÜUW7:?c2 ûC÷ 7Rl46?HAîN@?H[Gj7Gl ! û ÷ lNOJ éüOöI$I$7 $>ö A T
¡K>2 46?HA i K^J<46?HAHNOEa64@JL7m4@iB[:K/9>T-1¬ÑB]H] N@98KÖ2<SD462 ¥ ß§ éñ T-1¬SBNXP
2<SH462 °u6ÁH¥Mö Í§'é 0¥ß§ T
¡K>2 ï?H7RlNOJ8E a¥ *BöµüX§ T ¡K>2 * ìü (T ¡K>2
é *ü N@* 2<SBK/õ JLPQ7G98K 4@?HA [:Kb2
õ ü
é *ü N@ 28
SHK/J;PQ7:9LK
¥ 4 § Ï J8K 4@?HA 7:?HABK/]rK/?HAHK>?c2Èe Scj S½j?BN@2Èe
¥ i §Í¥;0ü * ] NO7G?c2<9 § 7:?HA ¥ d§ TÖUï7G?c2 SD462_a64@[GÑHK/9 =/4@? 254@\6K e 'WNXP
D?H
A ¥ é c§ T
ü T
T
1h*
¬ü T
1
T
#
T
11
1
!"$#&%(')%+*-,/. ,/01' 23'4.65,/7 8':9;*-'4<6=-"
>&?A@B@DCFEG@-HJILKMILNPORQTSUOWVYX/@ZKWQ\[]OW^_K`VLKMQAaAORXcbMKWVdNeKWfAge@ihjNPkYIL?A@:KbW@-VdKWlR@:bMKWgemn@
OWôhp+>&?A@i^qORVdXrKWgsan@JtuQANvILNPORQwNek]KWk&^qORgPgeOx]k-p
y+zJ{G|R} ~L} n|:uu4+WAuUoA4u(4iM D¡¢G£W)¥¤
h ¦e§4¨ q©;ªs ¨r« `¬D
NPk¼anNekdHJVd@DIL@
$Sqh®[+¯±°³²Y´µ`S¶²£[+¯³·³¸1½ ²£¹ º(²£Sqº»²£Sq[¾²£´R[² NPNP^]^]hj
hjNPk¼HJORQILNPQmnORmAk SU¿npPÀÁ[
Â¦ ª§ÃÆr§DÄFªsÅB ¦« ªÆ«Ó¦ « ª Â ««Ç « u¨ Dªs§DÄ «ÅÉ « È uB Õ¦ ªÔJÖu« ÊÃÆW× « ÂW ¨rË Ì Ø ¦e§ÎÄÍ : Ë¶¥ËÐÏ¤ ¨ hTq©;Ù ªs ¨Ñ`Ò Ä§ « u(¤Z Ë¶Ë Í Ï
Â
ÂWË
ÎSqh®[+¯ÚÇhÉ¯Û°Ü²Î´µrSq²£[+¯1Ý!¯ÚÝsÞYß
SU¿npáàR[
&> ?A@4@âCnEG@-HDIÃKMIdNeORQãNekäKåORQA@âæçQmAX:fG@-VäkdmAX/XrKWVÕèÒW^»Id?A@anNekÕIdVdNPfAmnILNPORQp;>&?ANPQAéêOW^
iS¶h®[ëKWk]IL?A@)KZbR@JVLKWlW@)bMKWgPmA@$èRORmãxìORmngeaãORfnIÃKMNeQwNPôèRORmãH-OWXrEAmFIL@-aãIL?A@iQmAX/@-VÕNeHZKMg
KZbR@-VdKWlR@îíðïuñ ¸1òóÐô ñ h ó OM^YKgeKWVdlW@rQmAX:fG@-VBOM^¼õÓõ¶ö÷aAVdKx]k:h ñâø ßZßZß ø h p6>&?A@êÛKWHDI
IL?uKI¼ÎSqh®[ìùí ïuñ ¸ òóô ñ h ó NPkÎKWHDILmuKMgegPèêXrORVÕ@)Id?uKWQúK`?A@-mnVdNek¾ILNPHWûëNvIYNPò kYKåIL?A@JORVd@JX
HZKWgPge@JaIL?A@úgüKZxcOWîgüKMVdlR@!QmnX:f£@JVdkêId?uKMI`x_@x]NegPg¼anNekdHJmAkdkîgeKMIL@JV-p±>&?n@QAOWILKMILNPORQ
ýRþ
½ ²Y´µ`S¶²£[oaA@Jkd@JVÕbR@Jk;kdORX/@ëH-OWXrX/@-QIZp @ëmAkd@&NvI»X/@-VÕ@-gvè4KMk+KYH-OWQbW@-QANP@-QIÇmAQANv^¶èFNPQAl
QAOWILKMILNPORQ6kdOwx_@raAOWQ"! Ii?AKbW@/IdOãx]VÕNPId@ ¸ ¹ ²£º»Sq²£[Y^qORV$aANPkdH-VÕ@JId@/VdKWQAaAOWXbMKWVdNeKWfAge@Jk
KWQAa ½ ²£º(S¶²£[Õ´R²å^qORV»HJORQILNeQmAOWmAk(VLKMQAaAORXbMKWVdNeKWfAge@JkfnmnI(èROWmBkd?AORmngea:fG@&KZxëKMVd@ìId?uKMI
½ ²Y´µ`S¶²£[ì?uKWkäKBEAVd@JH-NPkd@)Xr@-KWQANeQnl:IL?uKMI]NPk&aANekÕH-mAkÕkd@JaNeQîVÕ@ZKWgðKWQngPèkdNPk&H-ORmAVÕkd@Jk-p
>O]@JQAkdmAVÕ@+IL?uKMIÎS¶h®[£Nekx_@-gPgRaA@DtuQA@-$a #x_@;kLKZèYId?uKMIÎSqh®[s@âCFNekÕIdkNP^ ½ ¹ % ² % ´µðÞ$S¶²£[&
p (YIL?A@JVÕx]NPkd@$xì@)kLKZèêId?uKMI]IL?n@)@âCFE£@JHJIÃKILNeOWQãanOF@JkäQnOWIä@DCFNek¾IZp
' ñ ô<; ²£º(S¶²£[Y¯cS >= S¥À ?
)+*,.-0/.1 z$u3 254ð «(h 687ì@-VÕQAORmAgPgeNU:S 9G[-Ñ uDª ÎS¶h®[¼¯
¸
D
¹
9G[ÕA
[ @ÛS¾À = 9\[ÇB¯ 9oÑ C
)+*,.-0/.1 z$u3 D5E Ë ¦ Ö Â ¤ Â ¦ Y×Ã ¦ ª «UÍ «U¦¶Å §-Ñ 4ð «uh ¬D « uäª ÄFÅ ¬DDY¥¤_u Â ¨§-Ñ uDªn
ÎSqh®[]¯ ½ ²G´µoÞiS¶²£[]¯ ¸ ¹ ²£ºÁÞiS¶²£[¼¯³S F= º(S [d[ @ S¾À = º»S¾ÀÁ[Õ[@ SÓà = º(SÊàR[Õ[]¯
S G= S¥IÀ HM¿[Õ"[ @ÛS¥À = S¾IÀ HRàW[d"[ @1SÊà = S¾IÀ H¿[d[+¯ÀR"ß C
)+*,.-0/.1 z$u J
4ð «K
h 6MLäQANP^ZNS ?)À Pø O [-Ñ uDªn $S¶h®[+¯ ½ ²G´µðÞiS¶²£[Ç¯ ½ ²£ºÞiSq²£[¾´R²w¯
ñ ½SR ²Y´²ê¯ ÀÑTC
Q ïuñ
)+*,.-0/.1 z$uV UJW]Ã× ÂWË¶Ë « Â « Â Â ª ¨ Å Ø Â ¦ Â ¬ Ë ì Â § F
Y ¨W¦e§D« ¦ ¬ Ä«Ó¦ ª ¦ ¤ ¦¶« Â §
Â
X;Â Ä ×Ã
¨ Dª §D¦¶« Y ºÁÞiS¶²£[¼[
¯ Z]\ÇS¥À @_² ^âP[ ` ïuñ bÑ a§D¦ ªÆ ¦ ª « ÊÆW Â «Ó¦ ª c¬ Y4Ö Â «q§ ÈÓ§ S« d®¯ ² Â ª ¨
e ¯ÚIÃKWQ ïuñ ² Ì
¯ há² ILKWQ ïuñ S¶²£k[ j ;f ? l° f ILKWQ ïuñ ²Î´R²î¯ '
° % ² % ´µrSq²£[;¯ \ à ° f gÀ ²Y@ ´R²² ^ i
;
;
§ « u Å Â ª ¨ § ªs « ÕÔ ¦e§D«ÊnÑ m o¤ Y Ä§D¦¶ÅBÄ ËvÂ « 5Â X;Â Ä ×Ã Y ¨W¦e§D« ¦ ¬ Ä«Ó¦ ª Å Â pª Y
«U¦¶Å § Â ª ¨/« rÂ q « u Â Ø D Â ÆÃY ÄwÍ+¦ Ë¶Ë § Ã « Â «;« u Â Ø D Â Æiªs Ø D § «U« Ë §4¨ Í ª Ñ
¦e§`¦e§ ¬DÃ× Â Ä§ « u X;Â Ä ×Ã Yú Â §î« ¦ × q « Â ¦ Ë § Â ª ¨ uDªs×ÃãÕÔ « Ã Å !W¬ § D Ø Â «Ó¦ ª §
Â L:×Ã ÅBÅ ª ÑsC
tuVdORX QAOxÜOR"
Q #Çx]?A@-Qn@JbR@JVrx_@waANekÕH-mAkÕkr@âCnEG@-HDIÃKMIdNeORQnck #_x_@ãNeX/EAgeNPH-NPIdgPè¡KMkdkdmnXr@
IL?AKMI&IL?A@Dèw@DCFNek¾IZp
u@Jw
I vJ¯ xnSqh¡[DSp yäOÁx1aAO4x_@$H-OWXrEAmFIL@ä$zS v4|[ {}(ÎQA@¼xëKZèêNPk_IdO4tuQAaãIº ~ìS A[;KWQAa
IL?n@-QãH-ORX/EAmnId@$ÎS v:[;¯ ½ AIº ~ÇS n[Õ.´ £p 7ìmnI&IL?n@-Vd@)Nek]KMQã@-KWkdNP@-V&xìKèWp
w[ ÁzJ üz -
wG
À
w Áz .1 iz o~ ÁzG1 , ¥~N,~L} ¥~L} } ,| 4o «v¯ nx Sqh®[JÑ uDª
$zS v:[;¯÷iSxAS¶h®[d[(¯Û° xnSq²£[¾´µoÞÎSq²£[Dß
US ¿np O [
!#"$
%$
%&
')(
&> ?ANekrVd@-kÕmAgPI`X`KMéW@-krNeQILmnNPILNvbR@ãkd@JQAkd@MpÛ>&?ANPQAé OW^$EAgeKèNPQAl K®lKWX/@ãx]?A@JVd@úx_@
aAVLKZxh KMI$VLKMQAaAORXKWQAa6IL?A@JQ ¼EuKZè!èRORm vÜ¯ xnSqh®[âp ÇOWmAV$KZbR@JVLKWlR@åNPQAH-ORX/@4NPk
xAS¶²£[$IdNeX/@-kÎId?A@`HÃ?uKMQAH-@/IL?uKMIih ¯ ² #»kÕmAX/Xr@Ja÷SqORViNeQIL@JlRVLKMId@-a\[$ObR@JV4KWgPg(bMKWgemA@Jk
OW^+²ð0p y¼@JVd@:NPkÎKêkdEG@-HJNüKWgoHZKMkd@W0p u@JI f£@:KWQ@JbW@-QIKWQAa6gP@JI xnSq²£[]¯ (S¶²£[äx]?A@JVd@
»Sq²£[;¯ ÀiNP^ð² KWQAa »Sq²£[;¯ NP^ð² H 4p;>&?A@-Q
ÎS (S¶h®[d[+¯ ° (S¶²£[dºÁÞ$Sq²£[¾´²î¯ ° ºÞ$S¶²£[Õ´R²w¯ ìS¶h $[Dß
çQ!OMIL?A@JV]x_ORVÕaAck #uEAVÕORfuKWfnNegeNvIçè`NPk]KBkdEG@-H-NeKWgHZKMkd@)OW^»@âCFE£@JHJIÃKILNeOWQp
)+*,.- /.1 ziu l4ð «oK
h 6ML¼QnNP^-S ø ÀÁ[-Ñ 4ð « v¯ xAS¶h®[+¯ Þ Ñ uDªn
ÎzS v:[;¯1° ñ ¹ º(Sq²£[¾´²î¯±° ñ ¹ ´R²w¯ ? ÀRß
;
;
ËÀ & « Dâ ª & Â «U¦¶Ø Ñ Ë YÁ0 uDYªn Ä i×ÃS v:Ä [;Ë ¨ ¯ ©;½ ª ¨!ëIº º(~_S S nn[Õ[`.´ 4Í ¯ ¦ ×Ã ? «UÄ ÀâsÑ ª §C Ä«i« ¡¬D Iº ~ìS A[B¯ IÀ H ¤Z
ñ
+*
-,
.
/31
546.
0/21
7/31
8/21
46.
/21
:9
;46.
1
*
=<
?>
>
>
?>
@
CB
A>
D>
iz u Ârq Â §D«U¦ × q ¥¤ Ä ª ¦¶« Ë DªÆ « Â ª ¨ ¬-L Ârq ¦¶« Â « Â ª ¨ Å/Ñ 4ð «Av ¬D « u
Ë« DuªDÆª « h ¥¤ 6 « uL¼ QANvË ^-Sª ÆDÀ¼[ Ö ¦ ª Ã×ÃG¨ Ñv Ò ¯ xAÂ S¶«]h®¦e§B[ë« ¯u XrÅ KC Â Zª h ¥¤ Àv ?hbm `F¤ Ñ h ¦e§BÄ§ « uxArS¶²£¬-L[& Âr¯ q ÀÖu?T¦ ª ² «
Í uDª & }² &ÛIÀ HRà ø Â ª Â0¨ xAS¶²£[Ç¯Ú² Í uDª ÀIHRà ÷²}ø &ÛÀÑ Dªs×ÃÃ
ÎzS vB[+¯ ° xnSq²£[¾´µrSq²£[+¯ ° ñ ^ S¾À ? ²£[Õ´R² @ ° ñ ²Î´R²w¯ ¿O ß C
ñ^
;
t\mnQAHJIdNeORQAkoOW^Fkd@DbR@-VdKWgFbMKWVÕNüKWfnge@-kðKMVd@;?uKWQAange@-a)NeQK]kdNeX/NegeKWV£xìKZèRp ^ 1¯5xAS¶h v4[
ø
IL?A@JQ
ÎS Î[;¯ $S xnSqh ø v4[Õ[Ç¯Û° ° xnSq² ø n[Õ´µrSq² ø n[Dß
Sq¿Ap ¿[
)+*,.- /.1
=E
GF
H
JI
K
K
L* NM
M
à
z$u
ð « ¯ nSqh
ÎS Î[;¯ ° °
ð «sSqh ø v:[ Â Ø Â ¦ ª « Ë Y Ä ª ¦ ¤Z Å¨W¦e§D« ¦ ¬ ÄF«U¦ ª`ª « u Ä ª ¦¶«\§ -Ä Â L Ñ
ø :[;¯ h ^@ v ^ÁÑ uDªn
ñ ° ñ Sq² ^ @ ^ [A´R²G´+:¯ ° ñ ² ^ ´² @ ° ñ ^ ´+:¯ à ßC
xnSq² n[Õ´µrSq² n[»¯ °
ø
ø
)+*,.-0/.1
54
4 LM Jx
v
M
;
;
;
>&?A@ ¢G£ OW^h NPk&aA@JtAQA@-aúIdOrfG@ÎÎSqh Z[]KMkdkdmnXrNPQAlBIL?uKMI&ÎS % h % -[&
p@)kÕ?uKWgPgVLKWVÕ@-gvèêXrKWéW@iX:mAHÃ?ãmAkÕ@)OM^(X/ORXr@JQILk&fG@JèROWQAa /¯±àFp
'
;
9;,/"Î9+%(*-" ,/0±!"$#&%(')%+*-,/.
Áz- üzrm¤ h
ßZßZß ø h Â Lã Â ª ¨ ÅjØ Â ¦ Â ¬ Ë § Â ª ¨ ñâø ß-ßZß ø Â Ã6×Ãª Ï
â
ñ
ø
§D« Â ª «q§ « uDª
ò
ò

h ó ¯ ó ÎSqh ó [âß
SU¿Ap R[
ó
ó
ó
)+*,.-0/.1 z$uRB
4o «Çh 6 7ìNeQnORXrNeKWgqSqí ø 9G[JÑ Ò Â «ä¦e§/« u Å Â ª ¥¤ h Ò î×Ã Ä Ë ¨
« Y « Â ÖÖu ÂWË « « u ¨ q©;ª ¦¶«U¦ ª Ù
$Sqh®[+¯Û°Ü²Y´µoÞÎSq²£[;¯ ²£ºÞiSq²£[;¯ ò ² í² 9 ¹ S¾Àw?}9G[ ò ï ¹
¹
¹D<ô ;
¬ Ä«)« ¦e§w¦e§ ªs « Â ªÚ Â § Y §DÄÅ « Ø ÂWË Ä Â « Ñ m ª §D« Â ¨ )ªs « « Â «]h ¯ ¸Ûòóô ñ h ó
Í uDL h ó ¯³Àú¦ ¤ « u « §Ã§å¦e§ u Â ¨§ Â ª ¨Bh ó ¯ « uD Í+¦e§ Ñ uDª ÎSqh ó [¼¯
S 9 = ÀA[ @ÛSdS¥wÀ ?}9G[ =} [;¯ 9 Â ª ïiS¶h®[+¯ÚÎS ¸1ó h ó [Ç¯ ¸1ó iS¶h ó [+¯1í 9oÑ
Áz- ürz 4ð «oh
ñâø ßZßZß ø h ò ¬D ¦ ª ¨ çÖuDª ¨ Dª « Â ª ¨ ÅÜØ Â ¦ Â ¬ Ë §-Ñ uDªn
ò h ó ¯ ÎSqh ó [Dß
SU¿Ap ý [
óÐô ñ
ó
¼OWIdNeHJ@/Id?uKMI)IL?A@rkdmAX/X`KILNeOWQ®VÕmAge@åaAO@-kQAOWIVd@ mnNeVd@rNeQAan@-EG@-QAaA@JQAH-@`fAmnI)IL?A@
X:mAgvILNPEAgeNPHZKMIdNeORQ/VdmAgP@$aAO@-k-p
J
!
F
#"
%$
& J
'
(
)
)
*
,+
O
0w l
| ~ Õz iS¶h
¼z ezú
Ã}á| -z ÎSqh
¯É$Sqh®[ Ý¯
Ý ¯ ¡z
| ¾ YzD~L} ¼z
¾z h Ý
¼z ez ez
¼ ez ¶~¥z| +z
~ Áz U} | -zW
, $
?
, $>,b- , $ , $ &
?
?
l, $ - 3$
% ?
% , $
$
, $ L$ / ,
- ' r, V,
&
&
w 8 '49;*-':.6#]" '4.65 , )':9;*-'4.6#ä"
>&?A@)bMKWVÕNüKWQnH-@$X/@ZKWkÕmAVd@Jk]Id?A@ ÕkdEnVd@ZKMa rOM^+KBaANek¾ILVÕNefAmnIdNeORQsp
y+zJ{G|R} ~L} n|:u] Dl4ð «nh ¬D Â Â ª ¨ Å Ø Â ¦ Â ¬ Ë Í+¦¶« Å Â ª Ý;Ñ u»nA UuoR
¥¤ h ¨ Dªs « ¨ c¬ Y ^ Þ^ Sqh¡[ Sqh®[ h ¦e§i¨ q©;ªs ¨ ¬cY
^ ¯ÚÎSqh ? Ýð[ ^ ¯ ° Sq²F?Ýð[ ^ ´µrSq²£[
SU¿np [
Â §Ã§DÄFSqÅBh®¦ ª[ Æ «ª ¨r¦e§ ¦e§ ÕÔJÖu§Ã× « Â ¨ «UD¦ ªsª! « Õ¨Ô ¦e§D¬c«qY §-Ñ uª Ç¨ JZÞiuÑ ðAð Un U¢G ¦e§ÇkdaSqh®[;¯
Â ÂWË
Â
O
ÁzJ üz §Ã§DÄÅB¦ ªÆ « u Ø Â ¦ Â ªs×Ã ¦e§úÍ Ë¶Ë ¨ q©;ªs ¨ ¦¶« Â §ú« u¤Z Ë¶Ë Í+¦ ªÆ
ÖGLÃÖuD «U¦ §-Ù
Ñ Sqh®[+¯ÚÎS¶h ^â[? Ý"^ÁÑ
Ñ m ¤ Â ª ¨ Â LB×Ãª §D« Â ª «q§i« uDª S Rh[@ D[;¯ ^ S¶h®[-Ñ
Ñ m ¤ h ñDø ßZßZß ø h Â L ¦ ª ¨ çÖuDª ¨ Dª « Â ª ¨ ñDø ßZß-ß ø Â Ã4×Ãª §D« Â ª «¶§ « uDª
ò
ò
Sq¿Ap [
ò ó h ó ¯ ò ó^ Sqh ó [âß
óÐô ñ
óÐô ñ
)+*,.- /.1 ziuvc U54ð «(h 6 7ìNPQAORX/NüKWg¶SUí 9G[-ÑÒ Í ¦¶« h ¯
h ó Í uDL h ó ¯À
ø
ó
¸
¦ ¤ « §Ã§ ¦e§ u Â ¨§ Â ª ¨Bh ó ¯ « uD Í+¦e§ Ñ uDª h ¯ ¸ ó h ó Â ª ¨ã« uå Â ª ¨ Å
Ø Â ¦ Â ¬ Ë § Â L ¦ ª ¨ çÖuDª ¨ Dª «ÊÑ Ë § ëS¶h ó ¯ ÀÁ[B¯ 9 Â ª ¨ ìSqh ó ¯ [å¯ À ?B9oÑ
W]Ã× ÂWË¶Ë « Â «
$Sqh ó [+¯ 9 = À +@ vS¾Àw?o9G[ = \¯ 9ß
Í
$Sqh ó ^ [+¯ 9 = À ^ @ vS¾wÀ ?}9G[ =} ^ G¯ 9ß
uDLq¤ZÃÃ S¶h ó [w¯ ÎSqh ó ^ [ ? 9 ^ ¯ 9 ? 9 ^ ¯ 9oS¾À ? 9G[JÑ E ¦ ª ÂWË¶Ë Y Sqh¡[ã¯
S ¸1ó h ó [ë¯ ¸1ó S¶h ó [ë¯ ¸1ó 9oS¾À ? 9G[ë¯ í 9oS¥À ? 9G[JÑ «U¦ ×Ã « Â « Sqh®[ì¯ ¦ ¤
Y Äw§ Ã Í Y « ¦e§iÅ rÂ q §i¦ ª «UÄF¦¶«U¦¶Ø § Dª § Ñ
9î¯À 9`¯ Ñ
rÂ q §DÄ L0
!
@
"
%
$#
&#
'
)(
&
@
+*
9
9
-,
.*
,
/
+*
,
.*
,
0
0
21
/
¿
^»h ñâø -ß ßZß ø h KMVd@iVLKWQnaAORXbMKWVÕNüKWfnge@-k_IL?A@JQ!x_@)an@JtuQA@iIL?n@ JuUúu ILO
Gf @
ò
h ¯ íÀ ò h ó
SU¿Ap þ [
ò óô ñ
KWQAaîId?A@ -uUãAA quoR ILOåfG@
À ò h ? h ^ ß
^ ¯
Sq¿ApvÀ [
ó
í
?÷À
ò
ò
óô ñ
Áz- ürz 4ð «oh
ñâø ßZßZß ø h ò ¬D õUõ¶ö Â ª ¨ Ë «Ý!¯ÚÎS¶h ó [ ^ ¯ Sqh ó [JÑ uDª
ÎS h [+¯Ý ø S h [+¯ í ^ KWQAa ÎS ^ [+¯ ^ ß
ò
ò
^Yh KWQAa v KWVd@wVdKWQAaAORX bMKWVdNeKWfAge@Jkc#+IL?n@-Q Id?A@wH-ObMKWVdò NeKWQAHJ@!KMQAa H-ORVÕVd@-geKMILNPORQ
fG@JIçx_@-@-QhjKWQA}a v Xr@-KWkdmnVd@4?nOÁxk¾ILVÕORQAlÌd?A@4gPNeQA@-KWV]Vd@-geKMILNPORQAkÕ?ANeEwNekäfG@JIçx_@-@-Q6h
KWQAa v`p
*
J
*
y+zD{£|R} ~L} n|4uv 54ð «Çh Â ª ¨ v ¬Då Â ª ¨ ÅÉØ Â ¦ Â ¬ Ë §åÍ+¦¶« Å Â ª §iÝsÞ Â ª ¨
Ý"~ Â ª ¨Î§D« Â ª ¨ Â ¨¨ Ø¦ Â «U¦ ª § Þ Â ª ¨ ~_Ñ q©;ªs « u+R¢FnA UuoRî¬D «UÍ ÃDª
h Â ª ¨ v c¬ Y S¶h ø v4[Ç¯÷ PSqh ?ÝsÞë[-SvM?Ý"~o[
SU¿npPÀRÀ[
Â ª ¨r« u¼R¢\Áqn q¢GÚc¬ Y
Sqh ø vB[ ß
SU¿npPÀÁàW[
¯ Þ ~¡¯ Sqh ø v:[;¯
~
¹
<
*
,
3u:×Ã Ø Â ¦ Â ªs×Ã § Â «U¦e§ ©ë §-Ù
Sqh ø v:[;¯ $Sqh v:[ ?ÎSqh®[ ÎSzv:[âß
u:×ÃâL ËvÂ «Ó¦ ª § Â «Ó¦e§ ©ë §-Ù
?)À
S¶h ø v:[ ÀRß
J
Áz- üzr-
H
H
F w
w B o
<G
wGJ
¤ v ¯ @ Lh ¤Z § Å î×Ãª §D« Â ª «q§ Â ª ¨ w« uDª Sqh ø v: [i¯3Àú¦ ¤ Â ª ¨
Sqh ø vB[+¯n?)ÀÎ¦ ¤ & Ñm ¤ h Â ª ¨sv Â L ¦ ª ¨ çÖuDª ¨ Dª « « uDª Sqh ø vB[+¯ ¯ Ñ
u:×Ãª Ø D § ¦e§ ªs «Ç« Ä ¦ ªãÆDªsD ÂWË Ñ
ÁzJ üz - S¶h @Jv4[i¯ Sqh®[@ Szv:[ @1à Sqh ø v:[ Â ª ¨ S¶h ?lvB[$¯
S¶h®[ @ zS v:[ ?$à Sqh ø v:[JÑ L(ÆDªsD ÂWË¶Ë Y¤Z; Â ª ¨ ÅÛØ Â ¦ Â ¬ Ë §h ñDø ßZßZß ø h
ò
h ó ¯ ó^ Sqh ó [ @ à
ó
ó
Sqh ó ø h [âß
ó
ó
ó
m
#
#
#
#
1
¾
!"$#&%(')%+*-,/. '4.65 8'49Ç*J':.6#]" ,å0
ð7 ,/9(%('4.ê%
23'4.65,r7 8':9;*-'4<6=-"
y¼@-VÕ@)x_@iVd@JH-ORVÕaãId?A@i@DCFE£@JHJILKMILNPORQ!OM^»kÕORXr@iNPXrEGORVÕILKWQIëVdKWQAaAORXbWKMVdNüKMfAge@Jk-p
yì} ¥~ Ó} ~L} n|
:z ,|
, ÓV} ,| -z
OWNeQIäX`KWkÕk]KMI
7ì@JVdQAORmngegeN+SUE\[
E
EðS¾ÀJæ E\[
7ìNPQAORX/NüKWg»SU"
Q # E\[
E
Q!ES¾ÀJæ E\[
@-ORX/@JIdVdNeH/SqE\[
À ME
S¾wÀ ?}9G|[ H 9 ^
OWNekdkÕORQS G[
L¼QnNP^qORVÕX SÓK # f\[
SÓ.K @Îf\[ Rà S T? [ ^ HFÀÁà
¼OWVdXrKWg;SUÝ ø ^ [
Ý
^
»CFE£ORQn@-QILNeKWgÇS [
^
KWXrXrKwS ø [
^
7ì@DIÃK!S »[
HAS @[
HASdS @[ ^ S @ @ÚÀÁ[d[
ø
SqNP^#"$±ÀÁ[ "<HAS" ?TàR[$SqNP^#"% àR[
!
9
à 9
^
& '
(!mAgvILNPQAORX/NüKWg»SUí ø 9G[
QAE
kd@-@)fG@-geOx
(!mAgvILNvbWKMVdNüKIL@ äORVdXrKWg+SqÝ *ø ) [ Ý
)
%$ !
#
*
&
*
@]aA@JVdNPbW@-aBiS¶h®[(KWQAa Sqh®[^qOWV(IL?A@&fANPQAORX/NüKWgNeQBId?A@&EAVd@DbNeORmAk;kÕ@-HJIdNeORQsp(>&?A@
HZKWgPH-mAgeKMILNPORQAkì^qOWV&kdORX/@$OWôIL?A@iOWId?A@-VÕkYKWVÕ@$NeQîIL?n@)@âCFH-@-VÕH-NPkd@-kJp
ý
&> ?A@igüKWk¾IëIçx_O/@JQILVdNP@-k&NPQêIL?A@$IÃKMfAge@$KMVd@ÎX:mngPILNvbMKWVdNeKMIL@äXrOaA@Jgekìx]?ANPHÃ?wNeQbRORgvbR@iK
VLKMQAaAORX bR@-HDILORV]h OWôIL?A@$^qORVÕX
h pñ
h ¯ p ß
h
>&?A@iX/@ZKWQwOW^»K/VdKWQAaAOWXbR@JHJIdORVäh NPkäan@JtuQA@Jaúfè
Ý pñ
$Sqh p ñ [
Ýú¯ p ¯
p ß
Ý
ÎSqh [
>&?A@ An quoR ÃR¢AA quRn ¶ ) Nek&an@JtuQA@Ja!IdO/fG @
S¶h ñ [
S¶h ñâø h ^ [ Sqh ñâø h [
S¶h®[;¯ pp S¶h ^ ø h ñ [ pp S¶h ^ [
pp pp Sqh ^ ø h [ ß
S¶h ø h ñ [ S¶h ø h ^ [
Sqh [
^»i
h 6 (!mAgvILNPQAORX/NüKWgqSqí ø 9\[_IL?A@JQãÎSqh®[+¯í 9ê¯í+:S 9 ñDø ßZß-ß ø 9 [ëKWQAa
í 9 ñ S¥À ?}9 ñ [ ?¼í 9 ñ 9 ^
?¼
í 9 ñ9
?äí 9 9
í
9 S¥
À
? 9 [
?¼
S¶h®[+¯ pp ^ ñ pp ^ ^ pp pp í 9 ^ 9
ß
?äí 9 9
±í 9 S¾wÀ ?o9 [
ñ ?¼í 9 9 ^
>oOkÕ@-@!Id?ANek #ìQAOMIL@wIL?uKMIåId?A@!XrKWVÕlRNeQuKMgìaANPkÕILVÕNefAmFILNeOWQTOM^$KWQè ORQA@ãH-OWXrEGORQA@JQI/OW^
IL?n@ãbW@-HJIdORV`NPkrfnNeQAORX/NüKMg #(IL?AKMI`NPk/h ó 6 7ìNPQAORX/NüKWg¶SUí ø 9 ó [âp >&?mnck #ëÎSqh ó [î¯ í 9 ó
KWQAa S¶h ó [&¯í 9 ó S¥À ? 9 ó [Dp äOWIL@)IL?uKIYh ó @ h 67ìNeQnORXrNeKWgqSqí ø 9 ó @ 9 [Dp¼>&?mAck #
Sqh ó @h [+¯1í(S 9 ó @ 9 [-S¥ÀS? 9 ó @ 9 ü[Dp(ÎQÌd?A@YOWId?A@-V_?uKWQAa$#nmAkÕNeQAl)IL?A@ä^qORVdX:mngüK
^qOR VBIL?n@ãbMKWVÕNüKWQAHJ@wOW^$K¡kdmAX #_x_@ã?uKZbR@!Id?uKMI Sqh ó @Ûh [ê¯ Sqh ó T[ @ Sqh ó g[ @
à Sqh ó ø h [;¯1í 9 ó S¾wÀ ?}9 ó A[ @ í 9 S¥wÀ ?}9 A[ @ à Sqh ó ø h [Dp ^»èWORmú@ muKMIL@iIL?nNek
^qORVÕX:mAgüK)x]NPIL?ãí+:S 9 ó @ 9 [JS 9 ó @ 9 []KMQAawkdORgvbR@ #uOWQA@ilR@JIdk Sqh ó ø h [+n¯ ?äí 9 ó 9 p
t»NeQAKWgegvè #?A@JVd@ÇNekoKägP@-X/X`KìId?uKMIoHZKMQ4fG@Çmnkd@J^qmng^qORVtuQnaANeQAläX/@ZKWQAkoKWQnabMKWVdNeKWQAHJ@-k
OW^geNPQA@ZKWVìH-ORX:fnNeQuKMIdNeORQnkëOW^X:mAgPIdNPbMKWVÕNüKMId@¼VdKWQAaAORXbR@-HDILORVÕk-p
Mrz - -,4u3 2 m ¤ Ú¦e§ Â Ø Ã× « Â ª ¨!h ¦e§ Â Â ª ¨ ÅjØ Ã× « Í+¦¶« Å Â ª Ý Â ª ¨
Ø Â ¦ Â ªs×Ã ) « uDª ÎS \h®[;¯ £Ý Â ª ¨ S \h®[;¯ ) GwÑ m ¤ ³¦e§ Â Å Â « ¦ Ô « uDª
ÎS äh®[;¯ YÝ Â ª ¨ S ¼h®[;¯ ) (Ñ
*
#*
*
,
8.
:.
.
.
.
*
.
,+
+M c
b w
/, .656*J%+*-,/.'4= !"$#&%(')%+*-,/.
mAEAEGORkd@êId?uKMIh KWQAa v3KWVd@rVLKWQnaAORX bMKWVdNeKWfAge@Jk-p Ú?uKMIBNPk)Id?A@êX/@ZKWQOW^ëh
KWX/ORQAlåIL?AORkÕ@4ILNPXr@Jk&x]?A@-Q v M¯ {>&?A@BKWQAk¾xì@JVÎNek]IL?AKMI¼x_@:HJORXrEnmnIL@iIL?A@Xr@-KWQ
OW^Îh KMk`fG@J^qORVÕ@!fAmFI`x_@kdmAfnkÕILNvILmnId@¡º Þ ~ Sq² % A[:^qORV`ºÁÞiS¶²£[/NPQ IL?n@úaA@DtuQANPIdNeORQTOW^
@DCFEG@-HJILKMILNPORQp
y+zJ{G|R} ~L} n|:u3 2nu:×Ãª ¨W¦¶«U¦ ª ÂWË ÕÔJÖuÃ× « Â «U¦ ª¥¤ h Æ ¦¶Ø Dª v¯ ¡¦e§
ÎS¶h % v¯ n[+¯·¸½ ²Î²$º ºÞ Þ ~ ~ÇSq² Sq² % n% A[A[A´R´R² ² aAH-ORNekÕQH-IdVÕNe@JQILmA@4ORHZmAKWkYkÕ@HZKWkÕ@Rß SU¿ApvÀ O [
m ¤ xnSq² A[$¦e§ Â ¤ Ä ªs× «Ó¦ ª¥¤ ² Â ª ¨ ¡« uDª
ø
ÎS xnSqh ø v:[ % vJ¯ n[+¯ · ¸½ xnxASq² S¶² øAA[u[uººÞ Þ ~;~ S¶² S¶² % n% n[A[A´´R² ² aAHJORNPkdQHJILVdNe@JQIdmA@:OWHZmAKMkÎkdHZ@ KMkd@Rß
ø
SU¿ApvÀZ¿[
Ú?A@-VÕ@ZKWk #ðÎSqh¡[YNek$KîQmAX:fG@-V #ÎS¶h % v³¯ n[¼NPk$Kê^qmAQAHJIdNeORQ6OM^TGp 7ì@D^qORVd@:x_@
ORfAkÕ@-VÕbW@v>#Rxì@äaAORQ"! I;éQAOx÷IL?n@]bMKWgPmA@ëOW^\$Sqh % v¯ A[okdO)NPI+NPk;KiVdKWQAaAOWXbMKWVdNeKWfAge@
x]?ANeHÃ?xì@/aA@-QnOWIL@å$S¶h % v4[Dp çQOWIL?A@JV)x_ORVÕaAkc#ðiS¶h % v[iNPk$IL?A@rVLKWQAanORXÉbMKWVdNeKWfAge@
x]?AORkÕ@úbMKWgemA@ãNek:$Sqh % v ¯ n[Bx]?A@JlQ Ú¯ Gp NPXrNPgüKWVÕgPè #»$SxnSqh ø v[ % v4[/NPk/Id?A@
VLKWQnaAORX bWKMVdNüKMfAge@åx]?AORkÕ@êbWKMgemA@êNPk)$S xnSqh ø v:[ % v ¯ A[$x]?A@-Q ¯ £p6>&?nNek:NPk:K
bR@JVÕèwH-OWQn^qmAkdNPQAl/E£ORNPQI&kdO/ge@DI]mAk]geOORéêKMIäKMQã@âCnKWXrEnge@Wp
)+*,.- /.1 ziu 22
Ä ÖÖu § Í ¨ Â Íh 6KL¼QANv^-S ø À[-Ñ ¤ « D Í ãW¬ § D Ø h ¯ ²
Í ¨ Â Í v % h ¯² 6[LäQANP^-S¶² ø ÀÁ[->Ñ m ª «UÄF¦¶«U¦¶Ø Ë Y Í `ÕÔJÖuÃ× «ä« Â «+$S v % h ¯²£[Î¯
S¾gÀ @ ²£[ HRàAÑ m ª4¤ Â × « Iº ~ Þ$S % ²£[+¯ IÀ HAS¥À ? ²£[ ¤Z ² & &ÛÀ Â ª ¨
$S v % h¯ ²£[Ç¯ ° ñ &º ~ Þ S % ²£[Õ.´ 4¯ Àw? À ² ° ñ ì.´ B¯ gÀ @ à ²
¹
¹
Â § ÕÔJª Öu¨ Ã × Å « ¨Ø Ñ ¦ Ä¬ § $Í uzS v § % h® Ø [+¯ Ä S¾$À ¦e§)@ã« h®u)|[ HWª àAÄÑ Å ¬DD«U¦ ×Ã$ Szv« %Â hÉ«GÎ¯S v ²£[Ç% h®¯ [+S¾¯ Àg@TS¾$À ²£@ã[ h®HRà [ HRà/ªs×Ã¦e§
h Â ¯ Â ² ¦e§ W¬ Â § D Â Ø Ë ¨sÑ C ÂWË
*
@
/
(
Áz- üzr- Áz .1 zi ;} ~¥z ,Á~¥z êz */Wz D~N,Á~L} n| E+ Â ª ¨ Å±Ø Â ¦ Â ¬ Ë §h Â ª ¨
v Â §Ã§DÄÅB¦ ªÆ « uBÕÔJÖuÃ× « Â «U¦ ª § ÕÔ ¦e§D« Í $ Â Ø « Â «
ÎzS v % h®[ £¯÷iSv:[ KWQAa $Sqh % v[ £¯ÚÎSqh¡[Dß
Sq¿ApvÀ R[
ÃYÆDªsD ÂWË¶Ë Yu¤Z Â pª Y]¤ Ä ªs× «U¦ ª xAS¶² ø n[iÍ Î Â Ø
$S xAS¶h ø v4[ % h®[ G¯ $S xAS¶h ø v4[d[ KWQAa $S xnSqh ø v:[ % h®[ G¯ $SxnSqh ø v:[Õ[Dß
Sq¿ApvÀ ý [
_@ ! gegEAVdObR@ëId?A@ìtuVÕkÕI+@ muKMILNPORQp L¼kdNPQAlYId?A@ëaA@DtuQANPIdNeORQ4OW^\H-ORQnaANPIdNeORQuKMg
@DCFEG@-HDIÃKMIdNeORQãKWQAawIL?n@iÛKMHJI&IL?AKMIYº(Sq² ø n[(¯±º»Sq²£[dº»S % ²£[ #
ÎzS v % h®[ ¯ ° $zS v % h¯²£[ÕºÁÞÎSq²£[Õ´R²î¯ °c° nº(S % ²£[¾.´ Aº(S¶²£[Õ´R²
¯ ° ° nº(S % ²£[Õº(Sq²£[¾´R²G+´ 4¯±°c° Aº»Sq² ø n[Õ´R²G.´ :¯÷iS v:[â}ß C
J
'
*
3&
$2)(
,
*
,
1
*
,
*
,
,+
*
,
$z u32W X ª §D¦U¨ DÕÔ Â Å Ö Ë Ñ Ñ Í × Â ª Í ×Ã Å Ö Ä« $Szv4[ ªs
Å « u ¨¼¦e§Ç« ð©;ª ¨ä« u ¦ ª «s¨ Dª §D¦¶« Y º(S¶² ø n[ Â ª Ÿ« uDªr×Ã Å Ö Ä« ÎSzv:[+¯ ½`½ Aº(S¶² ø n[Õ´R²G´.Ñ
ªr Â §D¦ D Í Â Y ¦e§Ç« ¨ « ¦e§Ç¦ ª «UÍ §D« çÖ §-Ñ E ¦ §D« Í ÂWË Ã Â ¨ Y q ªs Í« Â «$Szv % h®[+¯
S¾Àg@h®|[ HRànÑ Ä§
$zS v:[;¯÷_ÎzS v % h®[+¯ S¥gÀ @ à h®[ ¯ S¾Àg@Îà S¶h®[d[ ¯ S¥Àg@ÛàS¾À]HRàR[d[ ¯ O HM¿Aß C
y+zJ{G|R} ~L} n|:A 2U u:×Ãª ¨W¦¶«U¦ ª ÂWË Ø Â ¦ Â ªs×Ã ¦e§¨ q©;ªs ¨ Â §
zS v % h¯²£[;¯ ° S ?Ý;S¶²£[d[ ^ º(S % ²£[¾+
´
Sq¿ApvÀ [
Í uDL Ý;S¶²£[Ç¯÷iS v % hÉ¯ ²£[-Ñ
Áz- ürz Eo$ Â ª ¨ Å³Ø Â ¦ Â ¬ Ë §ìh Â ª ¨ v
S vB[+¯Ú
S v % h®A
[ @ ÎzS v % h®[Dß
)+*,.-0/.1
"
"
I
F
@
"
$
J
'
.
þ
B ziu 2 Â Í Â ×Ã Ä ª « Y Â « Â ª ¨ Å ¤âL Å « u a ª ¦¶« ¨ « Â « §-Ñ uDª ¨ Â Í
í ÖuÃÃÖ Ë Â « Â ª ¨ Å ¤âL Å « uw×Ã Ä ª « Y Ñ 4ð «_h ¬D « u`ª ÄÅ ¬DDê¥¤ « u § iÖuÃÃÖ Ë
Í uå Â Ø Â ×ÃD « Â ¦ ª ¨W¦e§ Â § Ñ m ¤ ¨ Dªs « §« u¼ÖGLÃÖu «Ó¦ ª ¥¤å¥¤&ÖuÃÃÖ Ë ¦ ª « Â «
×Ã Ä ª « Y Í+¦¶« « u ¨W¦e§ Â § « uDª ¦e§ ÂWË § Â Â ª ¨ Å Ø Â ¦ Â ¬ Ë §D¦ ªs×Ã ¦¶«+Ø Â ¦ § ¤âL Å
×Ã Ä ª « Y « ×Ã Ä ª « Y Ñ ë¦¶Ø Dª ¯ Í / Â Ø « Â «ìh 6 7ìNPQAORX/NüKWgUSqí ø W[-Ñ Ä§
iS¶h % ¯ W[¼¯í Â ª ¨ Sqh % ¯ W[ä¯í nS¾À? R[JÑ Ä ÖÖu § « Â «&« uå Â ª ¨ Å
Ø Â ¦ Â ¬ Ë Â § Â a ª ¦ ¤Z Å È Ì ¨W¦e§D« ¦ ¬ Ä«Ó¦ ª Ñ uDªn iS¶h®[`¯3Ç$S¶h % [`¯
iSqí [Y¯ íGÎS [Î¯í"HRàAÑ 4ð «äÄ§ ×Ã Å Ö Ä« « u Ø Â ¦ Â ªs×Ãr¥¤ hTÑ Í S¶h®[Y¯
Sqh % S[ @ iS¶h % [Dß 4ð « § ×Ã Å Ö Ä« « u § «UÍ « D Å4§-Ñ E ¦ §D« Sqh % [:¯
í S¾À ? [ £¯1íGÎS S¾À ? [Õ[Ç¯í ½ nS¾À ? W[dº(S R[¾´ $¯1í ½ ; ñ AS¥À ? W[Õ´ i¯"í H ý Ñ
ÕÔ « $Sqh % [&¯ Sqí [&¯í ^ S [&¯í ^ ½ S 0?1S¥IÀ HRàR[Õ[ ^ ´ B¯í ^ HFÀàAÑ Dªs×ÃÃ
S¶h®[+¯ SqAí H ý A[ @1SUí ^ HFÀÁàR[JsÑ C
)+*,.- /.1
<
!
,
/
/
*
I
T"Î# 6.6*Z#]':= 6"$.656*D
"!#$
%'&(*),+-/.0-/13245.768.94;:<4-=),>@?A.9B
>&?A@åNeQIL@JlRVLKMg»OW^_KîXr@-KWkdmnVLKWfAgP@4^qmAQAHDILNPORQ xAS¶²£[YNekÎan@JtuQA@JaKMkÎ^qORgegPOÁx]kJp tNeVdk¾I$kdmnEFæ
E£OWkd@BIL?uKI x`Nek$kdNPXrEnge@ #X/@ZKMQANeQAlêId?uKMIiNvI$IÃKWéM@-kituQANPId@-gvèXrKWQèúbMKWgemn@-k ñDø ßZßZß ø
ObR@-VãKEuKWV¾ILNvILNeOWQ ñDø ß-ßZß ø p>&?A@-Q ½ xnSq²£[¾´µrSq²£[6¯ ¸ óô ñ ó ìSxnSqh®[ ó [âp
>&?A@YNPQIL@-lWVLKWg\OW^ðK)E£OWkdNPIdNPbW@¼X/@ZKWkÕmAVLKMfAge@]^qmAQAHDILNeOWQFxiNekÇaA@JtAQA@-aêfè ½ xnSq²£[¾´µrSq²£[+¯
geNPX ó ½ x ó Sq²£[Õ´µrSq²£[ox]?n@-Vdw@ x ó NPk+KÎkd@ mA@-QAHJ@¼OM^£kÕNeX/EAge@Ç^qmAQAHDILNeOWQAk+kdmAHÃ?åIL?AKMSI x ó S¶²£[
xAS¶²£[rKWQna x S¶²£[
xnSq²£[/KWk
' pÛ>&?nNekànOF@JkêQAOWIàA@-EG@-QAa OWQ÷IL?A@!EuKWVÕIdNeHâæ
ó
mAgüKMVãkÕ@ mA@JQAH-@Mp >&?A@NPQId@-lRVdKWg$OW^:KTX/@ZKWkÕmAVLKMfAge@6^qmAQAHDILNPORQ xNPkwaA@JtuQn@-aÛIdO fG@
½ xnSq²£[¾´µrSq²£[`¯ ½ x +S¶²£[Õ´µ`S¶²£w[ ? ½ xïoS¶²£[Õ´µ`S¶²£[/KMkdkdmnXrNPQAlfGOWId? NPQId@-lRVdKWgek/KWVÕ@
tuQANvIL@ #Ax]?n@-Vd@ x Sq²£[;¯1XrK_C ZIxnSq²£[ ø `4KWQAa x ï Sq²£[;n¯ ?X/Ne_Q ZIxAS¶²£[ ø `p
A.
.
,+
DC
&
C
+
FE
E
"!#HG
I
2@JK)L4-NMN)L4O),?A.0-A1P4O>RQOST4O+-A1324O6
9
4
.
H
(
+y zD{£|R} ~L} n|4u 2 uo¢G£\oÁn UçooW U¢G d Zð;AUuR
Áuo ¢\T&¥¤ h e¦ §4¨ q©;ªs ¨ ¬cY
ÞiS [;¯ $S Þ [+¯ ° ¹ ´µrSq²£[
Í uDL Ø Â ¦ § Ø D « u)Ã ÂWË ª ÄÅ ¬DD §-Ñ
E
> G
(
> çQãx]?AKMI¼^qOWgegeOx]k #Fx_@4KWkÕkdmAX/@iIL?uKMI&Id?A@)XrlWôNPk]x_@-gPgaA@JtAQA@-aã^qORVäKWgeg NeQwkdXrKWgPg
QA@JNelR?fGORÞVd?AOOaîOW^ pVd@JgüKMId@-aw^qmAQAHDILNeOWQãNPkëIL?A@)HÃ?uKWVdKWHJId@-VÕNekÕIdNeH$^qmAQAHDILNPORQ"#AaA@DtuQA@Ja
fè$S ó [x]?A@-VÕ@ ¯ ?)ÀRp(>&?ANPk(^qmAQAHDILNeOWQBNek+KWgvxëKZèk+xì@JgegAan@JtuQA@Ja/^qOWV+KWgeg p(>&?A@
X/lW^(NPkämnkd@J^qmngo^qORVäkd@DbR@JVLKWgVd@ZKMkdORQAkJp t»NeVÕkÕI #\NvI¼?n@-geEnk¼mAk¼H-ORX/EAmnId@)Id?A@)XrORX/@-QIdk¼OW^
KîaANek¾ILVÕNefAmnIdNeORQsp @JH-ORQA$a #NvI$?A@-gPEAkimAk$tuQAa¡IL?A@åaANPkÕIdVdNefnmnILNPORQ6OW^_kdmAX/k$OW^_VLKWQAanORX
bMKWVdNeKWfAgP@-k-p>&?ANPVd"a #äNPI`Nekêmnkd@-aÚILOEnVdObR@6IL?n@6H-@-QIdVLKWgÎgPNeX/NPIåIL?n@-ORVÕ@-X x]?nNeHÃ? xì@
aANPkdH-mnkdkägüKMId@-V-p
Ú?A@-QBId?A@&XrlM^\Nekx_@-gPgAaA@DtuQA@-$
a #NvI(H-KWQBf£@ìkd?AOx]QåIL?uKMI»xì@&HÃ?uKMQAlR@&NeQId@-VdHÃ?AKWQAlR@
IL?n@)OWE£@JVLKMIdNeORQAkìOW^(aAN s@JVd@JQIdNüKMIdNeORQêKWQAa ¾IÃKWéNPQAlB@DCFEG@-HJILKMILNPORQp >&?ANPk]ge@ZKMaAkëILO
S [+
¯ ´ ´ Þ ¯÷ ´ ´ Þ ¯Ú h h Þ j ô<; ¯ $Sqh®[âß
ô<;
ô<;
7_èrIÃKWéNeQnl:^qORV&aA@-VÕNPbMKMIdNPbR@Jkëxì@$HJORQAHJgemAaA@$Id?uKMI S [+¯ÚÎSqh [Dp+>&?ANPk&lRNPbW@-k&mAkäK
X/@JIL?nOFaî^qORV&HJORXrEnmnILNPQAl:IL?A@iX/ORX/@-QILk&OW^»KåaANek¾ILVdNPfAmnIdNeORQp
)+*,.-0/.1 z$u3 2 l4o «oi
h 6 (CFEoS¥ÀÁ[-Ñ Eo Â pª Y &±À
Þ$S [;¯Ú Þ ¯ l
° f ¹ ï ¹ ´R²w¯ l° f ïuñ ¹ ´²î¯ wÀ ? À ß
;
;
u ¦ ª « ÊÆW ÂWË ¦e§:¨W¦¶Ø DÓÆDª «Ç¦ ¤ ÀÑ Þ$S [Ç¯ IÀ HAS¾wÀ ? [ ¤Z ÂWË¶Ë &ÀÑ Í
S [B¯ À Â ª ¨ S [B¯àAÑ Dªs×ÃÃ ÎSqh¡[B¯ À Â ª ¨ Sqh®[å¯Ü$S¶h ^ s
[? Ý^ ¯
à ?÷ÀY¯ÀÑ
Mrz - -,4u3 D _ÃÃÖuD «U¦ § ¥¤ « u Å ÆÕ¤ Ñ
È Ì m ¤ v¯ [
h @ /« uDª ~ìS [+¯ ! ÞiS [-Ñ
È Ì m ¤ h ñâø ßZßZß ø h Â L ¦ ª ¨ çÖuDª ¨ Dª « Â ª ¨ v¯ ¸1ó h ó « uD"ª ~_S [+$¯ # ó ó S [
Í uDL
ó ¦e§)« u Å ÆÕ¤ò ¥¤ h ó Ñ
*
>
&
> > +> > > >
>
/
I
"
!
#
?>
nÀ
] (
iz u DF 4ð «+h 6 7_NeQAORX/NüKMgqSUí ø 9G[-Ñ § ¬Dq¤ZL Í q ªs Í1« Þ Â «+h ¯ ¸ ó h ó
Í uDL ìS¶h ó ¯ÀÁ[;¯ 9 Â ª ¨ ëS¶h ó ¯ [+¯ À?G9oÑ Í ó S [;¯Ú ¯S 9 = U[@
SdS¥À?}9\[d[+¯ 9 @ wÍ uDL )¯ À?}9ðÑ Ä§
ÞiS [Ç¯ # ó ó S [;¯S:9 @ R[ ò Ñ
ÁzJ üz 4ð «h Â ª ¨ v ¬D) Â ª ¨ ÅÜØ Â ¦ Â ¬ Ë §-Ñsm ¤
ÞiS [;¯ ~_S [ ¤Z ÂWË¶Ë ¦ ª Â ªÃÖuDª ¦ ª « D Ø ÂWËoÂ Ã Ä ª ¨ « uDª h ¯ vêÑ
)+*,.- /.1 ziu DD 4ð «K
h 6M7ìNeQAOWXrNeKWgqSUí 9G[ ª ïi
h 6M7ìNPQAORX/NüKWgUSqí ^ ø 9G[ ¬D ¦ ª ¨ çÖuDª Ï
¨ Dª « Ñ 4ð « v¯Úh ñ @h ^ Ñ Í ñDø Â
~_S [;¯ ñ S [ ^ S [;¯ :S 9 @ R[ S 9 @ W[ ¯ S 9 @ W[
ò
ò
ò ò
Â ª¦ ªs¨/×ÃÍ Î« uÃ Ã×ÃÅ dÆWÆÕ¤Bª ¦ ×Ã « u × « ËvDÂ «Ó«¦ D §)Â « §$u« u¨W ¦e§DÅ « ÆÕ¦¤)¬ Ä¥¤ «Ó¦ Âª 7_ÈNeQA¦UÑ OR X/Ñ$Nü« KMugqDSULí åñ × @ª í ^« ø ¬D9G [i¨Wªs¦e§D « « u¦ D¬ )Ä«Ó ¦ ªª Ñ
¨ Å Ø Â ¦ Â ¬ Ë Í ¦ Â×Ã:Â Â §]« u § Â Å Å ÆÕ¤ Ì Í $×Ãªs× Ë ÄA¨ « Â «$v Â 6 í+Â Sqí ñ @îí ^ ø Â 9\[-Ï Ñ
(!OWXr@JQI @-QA@JVLKMIdNeQAl tumAQAHDILNPORQê^qORV ORX/@ ìORX/XrORQ ÎNPkÕIdVdNefnmnILNPORQAk
ÎNek¾ILVdNPfAmnIdNeORQ X/lW^
7ì@-VÕQAORmAgPgeN;SqE\[
9
@1S¾w
À ?}9G[
7ìNeQnORXrNeKWg»SU"
Q # E\[ :S 9 @ÛS¾wÀ ?}9G[Õ[ ò
ïuñ ORNPkdkÕORQ S £[
¼ORVÕX`KMg;SUÝ # [ @âCFE GÝ @ ^
KMXrXrKwS # [
ï ^qOWV & !#]"$9;#¼* "
ÀRp mAEnE£ORkÕ@äx_@äEAgüKZèåKilKWX/@ëx]?A@-VÕ@äx_@äkÕIÃKMVÕIÇx]NPIL? ëaAOWgegüKMVdk-p (ÎQr@-KWHÃ?ÈAgeKZè:OW^
IL?n@YlKWX/@]èRORmê@JNPId?A@-VÇaAORmAfnge@YOWV_?uKMgP^sèWORmAV_XrORQn@Jè #x]NPId?ê@ muKMg£EAVÕORfuKWfnNegeNvIçèRp
Ú?uKMIäNPkëèRORmAV]@âCnEG@-HDIL@Ja!^qORV¾ILmAQA@)KM^¶Id@-V&í6ILVÕNüKWgPPk {
àFp ?AOxcIL?AKMI Sqh¡[ã¯ NP^$KWQAaOWQAgPèTNP^¼IL?A@JVd@!NekêK®H-ORQnkÕIÃKMQI ãkÕmAHÃ?÷IL?AKMI
Sqh¯ J[;¯ ÀRp
p u@JI¼h ñDø ßZßZß ø h 6 L¼QANv^qORVdXãS ø ÀÁ[]KWQAaúge@JI v ¯XrK_C Zh ñDø ßZßZß ø h `p t»NPQAa
O
ÎSzv [âp
ò
ò
ò
ò
)+*,.- /.1
@
9
9
> >
/
>
> '
0/
> > > E
&
N> > >
*
B
"!
#
+
%#
$#
à
(
¿Ap3EuKWVÕIdNeHJge@úk¾IÃKWV¾ILkîKIêIL?A@úORVdNPlRNeQTOWîIL?A@úVd@ZKMgYgeNPQA@!KWQAa X/OÁbW@-kwKWgPORQAl®IL?A@
gPNeQA@iNeQ ÕmAX/EAk¼OM^+ORQA@mAQANPI-p t\OWV¼@-KWHÃ? ÕmAX/E!Id?A@:EAVÕORfuKWfANPgeNvIçèêNek9Id?uKMIäIL?A@
EuKMVÕILNPH-gP@¼x]NegPg ÕmnXrEêOWQA@$mAQANvIìILO:Id?A@$ge@D^¶I&KWQnawId?A@ÎEAVÕORfuKWfnNegeNvIçè/NekYÀT? 9îId?uKMI
Id?A@iEuKWVÕIdNeHJge@Îx]NegPg ÕmAX/EwORQA@)mAQANvI&ILOåIL?A@iVdNPlR?IZp u@JI]h fG@)Id?A@iE£ORkÕNPIdNeORQîOW^
Id?A@4EuKMVÕILNPH-gP@:KM^¶Id@-VYímAQANvILk-p t»NeQAaã$S¶h [äKWQAa Sqh [Dpåò Sq>&?ANekÎNPkYéQAOx]Q¡KWk
KBVLKMQAaAORX³xìKWgPéGp [
ò
ò
Fp ÛKWNeV)H-ORNPQ Nek)ILORkÕkd@JaTmnQIdNeg_Kú?A@-KWaNekORfnIÃKMNeQA@Jaop Ú?uKMI:NekId?A@ê@âCnEG@-HDIL@Ja
QmAX:fG@-V]OWôILOWkdkd@JkäId?uKMI&x]NPgegsfG@iVd@ mnNeVd@Ja_{
ý p +VdObW@>&?A@JORVd@JX ¿Ap ý ^qOWV&aANekÕH-Vd@DIL@iVLKMQAaAORX³bMKWVÕNüKWfnge@-kJp
p u@JIåh f£@!K¡H-ORQILNPQmAORmAk/VLKWQnaAORX bMKWVÕNüKWfAgP@êx]NPId? ö 6µ/p mAEnE£ORkÕ@!Id?uKMI
Sqh [Î¯ À`KMQAa®Id?uKMIÎ$Sqh®[Y@âCFNekÕIdk-p ?AOxId?uKMI$ÎSqh®[Y¯ ½ ; f _S¶h ²£[¾´²ðp
yäNeQIZû ìORQAkÕNeaA@JVNPQId@-lRVdKMILNPQAlãfè¡EAKWVÕIdk-pî>&?A@/^qORgegPOÁx]NPQAlêÛKWHDINPk?n@-geEF^qmAgÊû)NP^
ÎSqh¡[ì@DCFNek¾ILk]IL?A@JQ!gPNeX ¹ ² wÀ ? µrSq²£[ £¯ p
np + VdObW@>&?A@JORVd@JX ¿ApvÀ ý p f
[ u@JIåh ñâø h ^ ø ßZß-ß ø h fG
@ S ø ÀÁ[:VLKWQAanORX bMKWVdNæ
þ pS -0/ ~¥z )+*/Wz U} -¼zM|~ p [
KWfnge@-kKWQAaige@DI h ¯íðïuñ ¸ òóô ñ h ó p#+gPOWI h bW@-VÕò kdmAkoí4^qORVí!¯À ø ßZßZß ø À ø p
]@-EG@ZKMI&^qOWVëh ñÃø ò h ^ ø ßZßZß ø h 6 ëKWmAHÃ?èðp »ò CFEAgüKMNeQîx]?èêIL?A@JVd@iNek&kÕmAHÃ?KBaANv^üæ
^q@JVd@-QnH-@Wp
ò
À p u @JI&K
h 6 S ø ÀÁ[ëKMQAa!gP@JI v¯ Þ p t»NeQnaîÎS vB[ëKWQAa S v:[âp
ÀRÀRpS -0/ ~¥z g)+*/Wz U} -¼zM|~ \} - .1 ,~L}á| å~ Áz F~¥ , ÃzJ~ p [ u@JI v ñâø v ^ ø ßZßZßnf£@
NPQAaA@-EG@-QnaA@-QIVLKWQAanORXÉbMKWVdNeKWfAge@Jk$kdmAHÃ?IL?uKMI Szv ó ¯3ÀÁ[$¯ Szv ó ¯ ?)ÀÁ[i¯
IÀ HWàF
p u@JIäh ¯ ¸ òóô ñ v ó p&>&?ANPQAéwOWS^ v ó ¯ À:KMk ¾Id?A@k¾ILOHÃéwEAVdNPH-@)NeQAHJVd@ZKMkd@-a
fè6ORQn@raAOWgegüò KMV <#Av ó ¯ ?ÀrKWk ¾IL?n@/k¾ILOHÃé6EAVdNPH-@åaA@JH-Vd@-KWkd@Ja fè¡ORQA@åaAORgPgüKWV
KWQnawh KWk&Id?A@$bWKMgemA@$OWôIL?n@)k¾ILOHÃéêORQãauKZèîí»p
SUK[st»NeQnò aìS¶h [ëKWQAa Sqh [âp
Sqf\[ NeX:mAgeKMIL@&h ò KWQAaêEAgPOWI;ò h bW@-VdkÕmAk&í!^qOWV_í!¯ À ø à ø ßZßZß ø À ø p ]@-EG@ZKMI
Id?A@)x]?nORge@)kdNPX:mAògüKILNeOWQãkÕ@JbW@-VLKMògILNPXr@Jk-p äOWILNPH-@iIçxìOÌd?ANeQnlRk-pwt»NPVdk¾I #\NPIc! kä@ZKWk¾è
IdO ÕkÕ@-@ úEuKMIÕIL@JVdQAkNeQ6IL?A@rkd@ mn@-QAHJ@ê@JbW@-QId?AORmAlR?NPIiNPkVdKWQAaAORXãp @-HJORQAa"#
,+
9
*
,
(
?>
&
*
,+
] (
O
èROWmx]NegPg(tuQAaId?uKMI)IL?A@/^qORmAViVdmAQnk4geOORébW@-VÕè®aANs@JVd@JQI4@JbW@-Q IL?nORmAlR?®IL?A@Dè
x_@-Vd@ålR@JQA@-VdKMIL@Ja6IL?A@BkdKWXr@xëKZèRp y¼Ox aAOîIL?A@BH-KWgeHJmAgüKMIdNeORQnk¼NPQTSÓK[¼@âCFEAgüKWNPQ
IL?n@)kÕ@-HJORQAaúOWfAkd@JVÕbMKMILNPOR_Q {
ÀÁàFp +VdObR@åIL?n@/^qORVÕX:mAgeKWkÎlRNvbR@-Q®NeQ6Id?A@BIÃKWfAgP@åKMIiIL?A@åfG@-lRNPQAQANPQAlîOW^ @-HJIdNeORQ¡¿Ap ¿
^qORV4Id?A@ 7ì@JVdQAORmngegezN # oORNekÕkdOR"Q #LäQANP^qORVÕX # (CFEGORQA@JQIdNüKWzg # KMXrXrKúKWQna 7_@JIÃKFp
y¼@JVd@`KWVd@rkdORX/@`?ANPQILk-p tuORV$IL?A@rXr@-KWQ OW^ìIL?A@ ORNPkdkdOW"
Q #(mAkÕ@ÌL?A@/ÛKWHJI)IL?AKMI
ä¯ ¸ ¹Df <ô ; ¹ H² vp»>oOYHJORX/EAmnIL@ÇIL?A@ÇbMKWVdNeKWQAH-@ #tAVdkÕI»H-ORX/EAmnId@ÇÎSqh Sqh ?rÀÁ[d[âp
tuORV:IL?n@îXr@-KWQ OW^¼Id?A@ KMXrXr
K #+NPI4x]Negeg_?A@-gPETIdO¡X:mAgPIdNeEAgvèKWQAa aANPbNPaA@îfè
ÇS F@ÀÁ|[ H £ñ KWQAaîmnkd@$IL?A@ÎÛKMHJIëId?uKMI]K KWXrXrKan@-QAkÕNPIçèêNPQIL@-lWVLKMId@-këILOîÀWp
tuORVëIL?n@ 7_@JIÃK #uX:mAgPIdNeEAgvèrKWQAawaANPbNPaA@$fè _S @ÀÁ[ ÇS |[ H ÇS @ @ÀÁ[âp
À O p mAEnE£ORkÕ@:xì@BlW@-QA@JVLKMId@BK`VLKMQAaAORXcbMKWVdNeKWfAge@ NPQúIL?A@^qORgegPOÁx]NPQAlåxëKZèR0p t»NPVdk¾I
x_@ uNPEK6ÛKWNPVBH-ORNPQp ^YIL?A@wH-OWNeQTNekå?A@ZKMaAck #ìILKWéW@ ILO®?uKZbR@!K LäQANP^ÕS #PÀ[
aANPkÕILVÕNefAmFILNeOWQp ^£Id?A@ëH-OWNeQBNekIÃKWNPgeck #IÃKWéM@ ÷ILO$?AKbW@äK L¼QANv^ÕS O # ¿[oanNekÕIdVdNPfAmnILNPORQp
SÓKRs[ t»NeQAaîId?A@iXr@-KWQwOW^ 4p
SUfuw[ tNeQAaêId?A@ikÕIÃKMQAauKWVÕa!aA@DbNüKMIdNeORQwOW^ :p
ÀZ¿Apwu@JIäh ñâø ßZßZß ø h ÚKWQAoa v ñÃø ßZßZß ø v f£@)VLKWQnaAORXÜbMKWVdNeKWfAgP@-k]KWQAaãge@DI ñDø ßZßZß ø
KWQAa ñDø ßZßZß ø fG@)HJORQAk¾IÃKWQILkJp ò ?AOxÛIL?AKMI
ò

h ó ø ò v ¯ ò ó Sqh ó ø v [âß
ó
óô ñ Õô ñ
óô ñ dô ñ
À Fpwu@JI
ºÁÞ ~ÇS¶² ø n[+¯ Rñ Sq²G@ A[ OM IL÷?A@J²VÕx]NekÕ@WÀ ß ø à
t»NPQAa
SÓàh ? O v5@ R[Dp
À ý wp u@JIäVSüCA[&fG@iKB^qmAQAHJIdNeORQwOW^ðCãKWQnaãgP@JI]kZSqèn[]fG@)K:^qmAQAHJIdNeORQwOWôèRp ?nOÁx1IL?uKI
ÎS xAS¶h®[ zS v4[ % h®[+J¯ xnSqh®[ ÎS zS v4[ % h¡[Dß
YgPkd<O #nkd?AOx±Id?uKMIë$S xnSqh®[ % h®[(5
¯ xAS¶h®[Dp
>
E
*
*
#
#
#
#
H
6H
H
H
&(
W¿
(
À p +VdObW@)Id?uKMI
SvB[+¯Ú Szv % h®[A@ $Szv % h®[âß
yäNeQIZûu@JI ¯ $Sv:[(KWQna/gP@JI S¶²£[_¯÷iSv % h¯²£[âp ¼OMIL@ëIL?AKMI(ÎS Sqh®[Õ[;¯
;ÎzS v % h®[Y¯ÎSzv4[Î¯6ß"7ì@ZKWViNPQ6XrNPQAaIL?uKI 4NekiKê^qmAQAHDILNPORQ¡OW^_²ðp äOx
x]VÕNPIL@ S v:[;¯ $Szv ?ã[k^ë¯ÚÎSÕSzv ? MS¶h®[d[p@S Sqh®["?ã[Õ[N^ß (CFEuKMQAaîIL?A@
k mAKWVd@ìKWQAa4ILKWéW@ÇIL?A@_@DCFE£@JHJILKMILNPORQp ÇORm4IL?n@-Q:?uKZbW@_ILOäIÃKWéM@ìIL?n@_@âCnEG@-HDIÃKMIdNeORQ
OW^;Id?AVd@J@:IL@-VÕXrkJp çQ¡@-KWHÃ?¡HZKWkÕ@ #mAkÕ@åId?A@BVdmnge@:OW^+Id?A@BNPId@-VdKMIL@Ja6@DCFE£@JHJILKMILNPORQû
NÓp @Mp»$SqkÕIdm Ç[;¯ÚÎS¶ÎSUk¾ILm % h®[d[âß
À np ?nOÁxIL?AKMI$NP^+ÎSqh % v¯ n[]¯ )^qORVÎkÕORX/@BH-ORQAk¾IÃKWQI Id?A@-Q6h KWQna v KMVd@
mAQnH-ORVÕVd@-geKMIL@Jap
À þ p]>&?ANPk mA@-k¾ILNPORQ¡NekYIdOî?A@-gPEèRORm¡mAQAan@-Vdk¾IÃKWQna®Id?A@BNeaA@-KêOW^_K -u Ó U » q¢G u@DI$h ñÃø ßZßZß ø h fG@/õÓõ¶ö x]NvIL?6X/@ZKWQ¡Ý KWQAabMKWVÕNüKWQnH-@ ^ p u@JI
h ¯íðïuñ ¸1ñ òóôh ó p¼>&?A@-Q òh NPkÎK JZn U J U #AIL?uKMIYNek #Kr^qmAQnHJILNPORQúOM^+IL?A@
auKò IÃKnp NeQAHJ@ h Nek&K4VdKWQAaAOWX bMò KWVÕNüKWfAgP@#NPIì?uKWk&K:anNekÕIdVdNPfAmnILNPORQpo>&?ANekìaANek¾ILVdNæ
fAmFILNeOWQNek¼HZKWgPge@-òa!IL?A@`§ Â Å Ö Ë ¦ ªÆ ¨W¦e§D« ¦ ¬ Ä«U¦ ªT¥¤ « u §D« Â «U¦e§D«Ó¦ × Ñ ä@JHZKWgPg^qVdOWX
>&?A@JORVd@JXc¿ApPÀ ý IL?AKMIäÎS h [ì¯ ÝKWQAa S h [_¯ ^ HMí»p ÎOR$Q ! I¼H-ORQF^qmAkd@)IL?A@
aANPkÕIdVdNefnmnILNPORQ:OW^\IL?n@&auKMIÃK)ºò Þ KWQAaBIL?n@&aANek¾ILVdNPfAò mnIdNeORQ:OM^£Id?A@&kÕILKMILNPkÕILNPH]º Þ p>O
XrKWéW@äIL?ANPkìHJge@ZKMcV #ge@JI_h ñâø ßZßZß ø h 6ML¼QnNP^qORVÕXúS ø ÀÁ[âp u@JI]ºÁÞ f£@YId?A@ÎaA@JQAkdNvIçè
OW^nIL?A@LäQANP^qOWVdXãS ø ÀÁ[Dp+geOWI(ºÁÞYp ò äOÁxTgP@JI h ¯1íðïuñ ¸ òóÐô ñ h ó pt»NPQAaÎS h [
KWQna S h [Dp +geOMI»IL?n@-X KMk+KY^qmAQAHDILNeOWQ:OW^Gí»p ò ìORXrX/@-QI-p ¼Ox kdNPX:mAgüKIL@;IL?Aò @
aANPkÕIdVdNefnmnILNPò ORQiOW^ h ^qORVoí!¯À ø ø à ø À p ì?A@JHÃé)Id?uKMIoIL?A@_kdNPX:mAgüKIL@-aibMKWgemn@-k
OW^ÇiS h [)KWQna Sò h [)KMlRVd@J@rx]NPId?èWORmAVId?A@-OWVd@JIdNeH-KWg;HZKWgPH-mAgeKMILNPORQAk-p Ú?uKMI
aAOBèWORm!ò QnOWILNPH-@)KWfGORmnI&ò IL?n@)kdKWX/EAgeNPQAl:aANPkÕILVÕNefAmFILNeOWQîOW^ h KWk]í¡NeQAHJVd@ZKMkd@-k {
ò
à p +VdObW@ u@-X/XrK:¿Apáà p
#
#
*
#
#
*
#
,
+
*
(
#
#
#+
*
*
!#"%$'&)( !#*+ ,-/.1032546-/.7( 89*/.;:</!#=?>A@B>C.1D
GE FIHCJLKNMPORQRSTQUHCVWMPXTHYKIVTH[Z\KIO6Z\]^X`_a]^KbFIcIQUFbd/JLKNMPFeSfQgSfQUHAVWSfhIMiS#jkQUd^heS`]PSfhbHCXmlnQRVTH_oH
hNMPXmcpST]'qC]^jkrIKbSTHPstuhIH[v3lnQUOROwMiOUVT]W_aH1KIVmHCcpQUF3SThIH;SfhbHC]^XxvY]PZyqA]^Fez^HCXmd^HCFbqCH{lnhIQUq|h
QUVucIQRVTqAKIVTVmHCc/QRF3SfhIH{FIH~}Sq|hNMPrSfHCXAs1KbXnIXTVmSQRFIHCJLKNMPORQRSGvQUVMPXmP]zoVQUFbHCJLKNMPORQRSGv^s
AeiA'i|iL {¡£¢C¤I¥¦e§¨ ©«ªL¬®y¯[°)± ²[¯³µ´w¶¦Ń·G´w¯¸e³P°¹»ºP¯®¼m³Pó½¾¶¦¿
º¦³P¼~¹³e²CÀU¯)³Pó½kÁ[ÂÃeÃN¶Á¯{°\Äb³P°Å1Æ\±®ÇW¯mÈ^¹UÁ[°\ÁCÉÊy¶¦¼{³PŃË{ÌÎÍÐÏLÑ
ÆØ sRÙ Ç
Ò Æ\±ÓÍÔÌmÇÎÕ Å;Æ\Ì±ÖÇ ×
Ú;ÛwÜÜÞÝwß Å;Æ\±®Çàâáäæã åaç Æ å Çxè å âà áÞBä é åaç Æ å Çxè åaê %á é ã aå ç Æ å Çxè åë á é ã aå ç Æ å Çmè å/ë
Ì á é ã ç Æ å Çxè å àìÌ Ò Æ»±ÍíÌmÇ sî
ïØ
ï
!
nCe¦#"Ô%$Þ'&Aª ~AL 7¨ ¢A¤I¥¦L§ ¨ ©ªL¬ 9¯[°( àÅ{Æ»±®Ç ³Pó½)+*1à-, Æ\±ÖÇCÉ
. ÄN¯[´bÑ
± 02(3/ ë ÌmÇuÕ )4Ì * * MPFIc Ò Æ5/76/ ë98 ÇÎÕ Ù *
Æ«Ø s;: Ç
Ò %Æ / 1
¯ 6 à Æ\>
± 0?(AÇ @B)%DÉ C[´ Ãb³P¼~°F¹ E[ÂÀg³P¼fÑ Ò Æ%/G6/Í : Ç`Õ Ù @'8H³Pó½ Ò Æ%/G6/Í
< ÄN¯[¼f=
I ÇÎÕ Ù @BJNÉ
Ú;ÛwÜÜÞÝaßK H{KIVmH#MiXTP]zo VQRFIHCJLKNMPORQRSGvSf] qC]^FbqCOUKbcIH1SfhNMiS
Ò Æ%/ ±L0M(N/ ë ÌmÇ%à Ò Æ%/ ±L0M(N/ * ë Ì * ÇÎÕ Å;Æ\±1Ì 0M* (9ÇO* à )4Ì * * ×
tuhIH{VTHAqC]^FbcrNMPXxSuZ\]ÔUOR]lnVu_Lv3VTHASmSfQRFId ÌBà 8 ) s6î
PQPeS
Re§ U TWVYX9Z ÂÃeÃN¶Á¯ < ¯`° ¯|Á[°Î³WÃo¼f¯T½PF¹ E[°¹¶¦´¿Y¯[°\ÄN¶C½¦ÑÎ³3´w¯[Â¾¼T³PÀ9´w¯[°\[?¶¦¼ ¯mÈL³P¿7ÃoÀU¯|Ñ
¶¦´ ³#Á¯[° ¶ [N]´w¯ < ° ¯|Á[!° ET³¦Á¯|ÁCÉu9¯[°w_± ^wà Ù ¹ [°»ÄN` ¯ Ão¼|¯T½PF¹ E[°G¶¦¼7¹UÁ < ¼f¶¦´¾¸ ³Pó½;±_^wà Ï
¹ [)°\ÄN¯nÃo¼f¯T½PF¹ E[°G¶¦¼`¹UÁ`¼~¹g¸¦Ä¾° É . ÄN¯[´ a± ` bà ]cWdfe ^hg d _± ^n¹UÁ`°\ÄN¯ ¶P²[Á¯[¼~ºP¯T½5¯[¼~¼f¶¦¼W¼m³P° ¯CÉ
³ E|Ä_± ^;¿k³PË ²[¯k¼f¯¸e³P¼m½¾¯T½³¦ÁYl³ k;¯[¼~´w¶¦Â¾À»À£¹ < ¹»°»Ä®Ân´ m¦´w¶ < ´¿Y¯T³Pp´ oy?É q¯ < ¶¦ÂÀg½
i j
À£#¹ me¯'°Gr¶ m¦´w¶ < °\ÄN¯#°«¼~ÂN¯|Ñ1²CÂ¾°nÂn´ m¦´w¶ < ´¯[¼~¼f¶¦¼'¼T³P°Gs¯ oy_É C~Ń°«Â¾¹»°«¹»ºP¯[À£ËÑ < ¯Y¯mÈAÃN5¯ E[°u°»Äb³P°
±t`pÁ|ÄN¶¦ÂÀg½²[u¯ E[ÀU¶Á¯)°G¶ oyÉ v`¶ < À£#¹ me¯[À£Ëk¹UÁ t± `/°G¶k´w¶¦°u²[¯ < ¹»°»Ä¾¹»l´ w¶ [xozy{q/¯1Äb³PºP¯
°»Äb³P° , Æ t± `^Ç|à , Æ\± d }Ç @B]4*6à oyÆ Ù 0tooAÇ @B]µ³Pó½
± 0~o/¾Í w|ÇÎÕ ,kÆ wt± * èÇ à oyÆ Ù ]+0tw * ooÇ Õ H]4Ù w *
Ò 5Æ / 1
Á[¹»4´ E|x¯ oyÆ Ù 0~ooÇÎÕ d [?¶¦¼`³PÀ»À oyÉÊy¶¦z¼ w6à × : ³PóS½ ]à Ù Ï^Ï °»ÄN¯W²[¶¦Â¾ó½¹UÁWÉ 'jeÉ î
%
& . +>?*ttmD 89*/.1:</!W=?>A@2
]¾HacIQUFbdIVnQRFIHCJLKNMPORQRSGvQRVnVTQUjkQUOUMPX6QRF VmrIQUXmQRSuST]3MiXTP]zo V7QUFbHCJLKNMPORQRSGv_IKS7QgSnQUV
M{VThNMPXmraHAX6QRFIHCJLKNMPORQRSGvPs KHrIXmHCVmHCFeS6SThIH7XmHCVTKbORShIHCXmHQRFkSGl]#rNMPXxSfVAsBtuhIH7rbXT]¾]PZ\V6MiXTH
QUFSThIH1SfHCq|hbFIQUqCMPO9MPrIroHCFIcbQg}as

ï
nz
r
_
Aei 9¯[° d C× ×?× W`²[¯{¹»ó½¾¯GÃN¯[ó½¾¯[Ń°Î¶P²[Á¯[¼~º¦³P°«¹¶¦´bÁÁ[Â E|Ä5°»Äb³P°
Å;Æ W^\Çà Ï³Pó½ ^yÕ W^9Õ 5^GÉ9¯[° wnÍÐÏNÉ . ÄN¯[´bÑf[?¶¦¼W³PŃË{ÌÎÍÔÏ¾Ñ
`
`
ÆØ s I Ç
Ò ^hg W^ ë w Õ c é ^hg é c ×
d
d
!
"
$#&%('*) ,+ .
) -$#0/21
AeiR9¯[°y± d ?× ×?× ±` HCXmFI]^KIOROUQ ÆhooÇAÉ . ÄN¯[´bÑW[?¶¦¼)³PŃË=wÍÐÏLÑ
ÆØ s HLÇ
Ò / ±aÙ0to/LÍ|w `Õ : c * `
± `)à ] cWd e `^hg d ± ^GÉ
< ÄN¯[¼f¯ t
4365
9
87
! #
Pe=Re§ TWV µ9¯[°6± d ×?×C× ±_` HAXTFI]PKIOUORQ Æ ooÇCÉ®9¯[° ]µà Ù Ï^Ï ³Pó½ w'à × : É q/¯
ÁC³ < °»Äb³P° ÄN¯C²CËÁ|ÄN¯[º Á1Ë¦¹¯[Àg½¾¯T½
Ò %Æ / ±tÙ0~o /Íw|ÇÎÕ × Ï : Ø ×
E5E|¶¦¼m½P¹»´¾¸°Gu
¶ v)¶¯ ½P¹»´¾¸ Á{¹»´w¯ CÂI³PÀ£¹»°ËÑ
± 0to/LÍ × : Ç6Õ : c * d äGä * à × Ï^Ï^Ï
Ò %Æ / L
Â E|Ä3Á[¿k³PÀ»ÀU¯[¼1°\Äb³P´®É 'jeÉ î
< Ä¾F¹ E|Äp¹UÁ)¿'W
]¾H acIQUFIdbVQUFIHAJ¾KIMPOUQgSGvd^QRzPHCVKIV;MkVTQUjkrIORH{l6M?v ST]YqAXTH?M¦SfH#M
Z\]^X6M`_IQRFI]^jkQMPONrNMiXfMPjkHASTHCX o s K HlnQUOROocbQUVTqAKIVTV6qC]^FbIcIHCFIqAH1QUFeSfHAXmziMPORVOUMiSfHAX
_IKbShIHAXTH{QUVÎSThIH{_NMPVTQRq{QUcIHCMbs ÞQg} ÍµÏ MPFIcpOUH[S
Ù OR]^d : d *
w}``à
×
: ]
v ]H wcIQRFIdI VuQUFIHAJLKNMPOUQgSGv
l
Ò / t± `0~o/¾?Í w}` Õ : c * ` à ×
PQ
:
>=
A
;3<5
@?
CB
D?
E
%$F
-.#
U
GIHKJMLNDO,JDGIOQPRJTS
OWVYX[Z]\
^
`_
/
ba
c
_edf
5
7
Ig
9
! h#
_
ï^ï
!
HAS à Æ ±t` 0w ±t` ê w|Ç sntuhIHAF aÒ Æ @ ooÇÎà Ò Æ5/ ±a` 0 o /NÍ wfÇÕ sHCFIqAH
Ò hÆ o Ç ë Ù 0 oSThNMiS7QUV SfhbH#XfMPFbcI]^j QUFeSfHAXmziMPO STXfMPrbVSfhbHWSfXmKIHWrNMiXfMPjkHASTHCX
ziMPOUKbH o lnQRSfh rIXT]P_NMP_IQROUQRSGvÙ 0 olHWqCMPOUO M Ù 0 íqA]^FbNcIHAFIqCH'QRFeSfHCXxziMPOsn5]^XTH
]^FSfhbQUVuOM¦SfHCXAs
,!W<46-32
p46- !#" !W*+ .1*DÞ.1* 89*/.1:</!#=?>[@>C.1D
tuhIQUV1VmHCq[SfQU]PF®qA]^FeS|MPQRFIV;SGl] QRFIHCJLKNMPORQRSTQUHCV]^FH[}roHCq[SfHCc®ziMPOUKIHAV;SfhNM¦S)MPXTH']PZ»STHCF
KIVmHAZ\KIO«s
nCe¦ í% $e¥ iª i i W¨ ¢C¤I¥¦e§¨ ©«ªL¬ C [B± ³Pó½ æÄb³PºP¯ %Ń¹»° ¯º¦³P¼~¹»·
³P4´ E|¯|Á{°»ÄN¯[´
Å / ± /¾Õ Å1Æ\± * ÇGÅ;Æ * Ç ×
2
Æ«Ø s ØPÇ
!HAq?MPOROwSfhIMiSM Z\KbFIqASTQU]^#
F "3QRV %$QRZ9Z\]^XuHCMPq|h å '& MPFIc5H?Miq|h )( Ï +Ù *
" Æ åWê Æ Ù 0 Ç & ÇÕ " Æ å Ç ê Æ Ù 0 Ç " Æ & Ç ×
E ,Z "'QRV SGlnQRqCHcIQ aHCXmHCFeSfQUMP_IORH eSfhIHAF3qC]PFLzPH[}QRSGvXTHAcIKIqAHCV6ST]Wq|hIHAq|¾QRFIdWSThNMiS ".- - Æ å Ç ë Ï
Z\]^XnMPORO å s%E Sq?MPF5_oH)VmhI]lnF SThNMiSnQRZ "QUVnqC]^FezPH[}pSfhbHCFQgSuOUQRHCVnMP_o]z^H)MPFevORQUFIH;SThNMiS
Sf]PKIq|hIHC/V "ÖMiS)Vm]^jH'ro]^QRFLS 9q?MPOROUHCc®MS|MPFbd^HCFeSÒUQUFbHP1s 0 Z\KIFIqASTQU]^2F "QRV
QRZ
0 "QUV1qA]^Fez^H[}a4
s 3Þ}bMPjkrIOUHAV)]iZÎqC]PFLzPH[}®Z\KIFIq[SfQR]^FIV)MPXT1H " Æ å Ç1à å * MPFb5c " Æ å Ç;à +6 s
3Þ}bMPjrbOUHCVu]iZÞqA]^FIq?M?zPH`Z\KIFIq[SfQR]^FIVnMPXmH/" Æ å Çà-0 å * MPFIc#" Æ å Ç%à OR]^d å s
nCe¦8 7Ô: 9i¢ xi¢L {¡ ¢A¤I¥¦L§ ¨ ©ªL»¬ C [ " ¹UuÁ E|¶¦ŃºP¯mÈk°\ÄN¯[´
Å " Æ\±ÖÇ ë " ÆÅ ±®Ç ×
Æ«Ø s Ç
C [ " ¹UÁ E|¶¦4
´ ET³PºP¯{°»ÄN¯[´
Å " Æ\±ÖÇ6Õ " ÆÅ ±®Ç ×
Æ«Ø s Ç
Tg
_
g
_
g
g
_
_
>
8
GIH
_
_
JKX
O
_
_
g
_
g
g
4GIH
J
GIZX
O
Ú;ÛwÜÜÞÝaß;H[S;< Æ å Ç à ê å _aH1M'ORQUFIH S|MiFId^HCFeSuST]=" Æ å Ç MiSÎSThIH{ra]PQUFeS Å1Æ\±®Ç s
g
j r
! 4
QUFbqCH "3QRVuqC]^Fez^H~} aQgSuOUQRHCVnMP_o]z^H1SfhIH{ORQUFIH < Æ å Ç s ]
Å " Æ»±®Ç ë Å < Æ\±®Ç%àìÅ;Æ ê f±®Ç%à ê TÅ{Æ»±®Çà < Æ»Å1Æ\±ÖÇTÇà " ÆÅ±ÖÇ × î
Xm]^j PHCFIVmHCF VÖQUFIHAJLKNMPOUQgSGvÐl6HVmHCHSThNMiS Å ± * ë Æ»Å±®ÇO* MPFIc Å;Æ Ù @±®Ç ë
Ù @Å;Æ\±ÖÇ s QRFIqCH'OR]^dpQUV;qA]^FIqCMzPH Å{Æ OU]^d ±ÖÇ)Õ OR]^d Å;Æ\±®Ç s N]^X;H[}bMPjkrIORH VTKbrIra]PVTH
SfhNM¦S ± Æ I Ù Ç stuhIHAF Å;Æ Ù @±®Ç ë Ù @ I s
í.;46-*>?4n!#=
.1*+> "&& & a&.
+>?*ttmD
89*/.1:</!W=?>A@2
K H;lnQROUOwjMPiH;KIVTH{]PZ9SThIH1H[}bMPq[SuZ\]^XTj]PZytÞM?v¾OU]^XAVÎSThIHC]PXTHCj BQg
Z "kQRVnM#VTjk]]PSTh
Z\KIFIq[SfQU]PF ISfhIHAF5SThIHCXmH`QRVnMkFLKIjW_aHAX ÆÏ Ç VTKbq|hSfhNM¦S " Æ Çà " Æ«ÏeÇ ê " -Æ«ÏeÇ ê
" Æ Çs
*
Ú;ÛwÜÜÞÝ)]iZItuhIHA]^XTHAj Ø s H s N]^X9MPFev ÌÎÍµÏ ¦lH6hIMzPH ¦Z\XT]^jMiXTP]zo VÞQUFbHCJLKNMPORQRSGv
SfhNM¦S
`
`
Ò ^hg ^ ë w à Ò Ì ^hg W^ ë OÌ w à Ò é ë é
d
d
Õ céÅ é
à c é ^ Å;Æ é Ç ×
Æ«Ø s ï Ç
QUFbqCH ^1Õ ^7Õ ^ l6H3qCMPFlnXTQgSfH ^ MPV#M qC]^Fez^H~}qA]^j#_IQRFNMiSfQR]^F®]PZ ^ MPFIc 5^
FNMPjkHCOgv ^wà ^ ê Æ Ù 0 Ç ^ lnhbHCXTH àÆ W^50 ^»AÇ @IÆ ^50 ^Ç s ] ^_ev)SThIH6qA]^Fez^H~}bQgSGv
]PZ é l6H{hNM?z^H
é Õ W^ ^ 0 0 ^^ é ê ^\^\00 ^^ é ×
tÞMiPH)H[}raHAqASfMiSfQR]^FIVn]PZ_a]PSThpVTQUcbHCVMPFIcpKIVTH{SfhbH{ZMiqASuSfhIMiS Æ W^Ç à Ï ST] d^HAS
Æ«Ø s JeÇ
é Õ 0 ^\0 ^ ^ é ê ^ 0 ^ ^ é à
lnhIHCXmH à ÌAÆ ^\0 ^Ç " Æ Çà 0 ê OR]^d Æ Ù 0 ê Ç MPFIc 5-à 0 ^ @IÆ ^ 0 ^»Ç s
Kg
ï J
s` _
[g
^
g
3
^
g
Tg
# "! !
^
g
#
!
Ig
g
Ig
_
$#
h
-%
).&/(+* )0,
!
h
%
!,
)'&)(+* )
"
* )
g
&
_
_
R
g
21
* )
'*)
+ )
43
3
g
* )
7
'*)
$7
+ )
7K
65 %
-
7
R
g
^Ï
J
!
]PSTH)SThNMiS;" Æ«ÏeÇ%à "%- «Æ ÏeÇ àâÏ s07OUVm] " Æ ÇÎÕ Ù @BH Z\]^XMPOUO ÍìÏ s vptM?v¾OU]^XAV
SfhbHC]^XmHCj bSfhIHAXTH{QUVM Æ Ï Ç VTKbq|h SfhIMiS
" Æ Ç à " Æ«ÏeÇ ê " ÆÏeÇ ê : * " Æ Ç
à : * " Æ Ç6Õ ï * à Ì * Æ ^\ï0 ^»Ç * ×
HAFIqCH
Å é Õ Õ é c ×
tuhIH{XTHAVTKIOgSnZ\]ÔUOR]lnVZ\XT]^j ÆØ s ï Ç s î
Ú;ÛwÜÜÞÝÖ]PZÎtuhIHA]^XTHAj Ø s Ø 4s 9H[S W^6à Æ Ù @']ÇCÆ»_± ^0 ooÇ stuhIHCF Å;Æ ^»Ç)à Ï MPFIc
µÕ ^`Õ lnhIHAXTH à 0!o @B] MPFIc pà Æ Ù 0MooAÇ @B] s 07ORVT] Æ 0 ¾Ç * à Ù @B] * s
07rbrIORv¾QUFbd#SfhIH{OMiVmSntuhIHA]^XTHAj l6H)d^HAS
Ò Æ t± Ù0to5|Í wfÇ à Ò ^ ^yÍ w Õ c é é ` ×
tuhIHMi_a]z^HhI]ÔUcbVZ\]^XY` MPFev ÌÍÓÏ s EGFÐrNMPXxSfQUqAKIOMiX SfMPPH Ì à H]+w MPFbcÔl6HdPHAS
Ò Æ a± ` 0 oÍ ` w|ÇÎÕ Bc * s v'M;VmQUjkQUOUMPXyMPXTd^KbjHAFLSÞl6Hnq?MiFkVmhI]l SThNMiSÒ Æ t± ` ` 0 o
0w|ÇÎÕ
± `0~o/¾?Í w `Õ : c * sî
c *
s KbSmSfQRFId'SfhIHAVTH{Sf]^d^H[SfhIHAXul6H)d^HASuÒ / t
~ >?0=?>C_& k"! ->?4 . !#"%$D
07F{H[}qAHCOUORHCFeSyXTH[Z\HCXTHAFIqCH%]^F{rIXm]^_NMP_bQUOUQgSGvQUFIHAJ¾KIMPOUQgSfQUHAVMPFbc{SThIHCQRXKIVTH%QRF1VmS|M¦SfQUVxSfQRqCV
MPFIc rNMiSmSfHCXmFXmHCqC]Pd^FIQRSTQU]^FpQR
V ;H[z¾XT]v^H ;v ]P XmMiFIc KId^]PVTQ Æ Ù JJ Ç stuhIHWrbXT]¾]PZB]PZ
]¾H acIQRFIdI VuQUFIHAJ¾KIMPOUQgSGvYQRVnZ\Xm]^jSThNMiSuSTH[}¾S?s
4n.1"%4>?DÞ.1D
Ù^s HAS ± 3Þ}ro]^FIHCFeSTQMPO Æ Ç s ÞQUFbc`Ò %Æ / ± 0 (=/ 9ë 8 )uÇ Z\]^X 8 Í Ù^s 6]PjrNMiXTH
SThIQUVÎST] SThIH{_a]PKIFIcpv^]^K5d^HASnZ\Xm]^j6hIHA_evVmhIHAzo V1QUFIHAJLKNMPOUQgSGv^s
:s HAS ±
y]^QUVmVT]^F Æ aÇ s VTH 6hbHC_ev¾VThIH[za V1QUFbHCJLKNMPORQRSGvYST]YVmhI]lâSfhNM¦SÒ Æ»± ë
: oÇ6Õ Ù @ s
`
±
_
`
Æ
ooÇ MPFbc t
±
`)à ]cWdWe ^ g _
± ^ s ]^KbFIc#Ò %Æ / a± `0
C
H
m
X
I
F
^
]
I
K
R
O
U
O
Q
I s HAS ± d
d
?
×
?
×
×
o/bÍ wfÇ KIVmQUFId 6hIHC_ev¾VThIH[zoV)QUFIHAJLKNMPOUQgSGvpMPFIcKIVTQRFIdr]H wcIQRFIdI V7QRFIHCJLKNMPORQRSGvPs
!!
[g
g
5
!!
!
!!
g
* )
65 %
-
$#%('*) ,+ ).-$#0/21
<
<
R
c
d
!
[g
R
# /&%(1
-
g
5
! #
7
! #
9
! #
g
3
^
3
365
5
Ù
hI]l SThNMiS ylnhIHAF ] QUV{OUMPXTd^H SfhIHk_o]^KIFIc®Z\XT]^j ]¾HacIQUFIdbV{QUFbHCJLKNMPORQRSGv/QRV
VTjMPOROUHCXSfhNMPFSThIH{_a]^KbFIc5Z\Xm]^j6hbHC_ev¾VThIH[za V1QUFbHCJLKNMPORQRSGv^s
H s HAS ±
±_` HAXTFI]PKIOUORQ Æ ooÇ s
d ×?×C×
Æ M Ç 9H[S ÍÐÏ _aH1b}HCcMPFIcpcIH[NFIH
Ù OR]^d :
w}``à
: ]
W×
HAS of``à9] cWd e `^hg d _± ^ s ;HANFIH `)àÆ ofÙ02w}` of` ê w}`^Ç s VT!H ]H wcIQRFIdI V
QUFbHCJLKNMPORQRSGvYST] VThI]l SThNMiS
Ò Æ ` qC]PFLSfMPQUFbV ooÇ ë Ù 0 ×
KHWq?MiOU
O ` M Ù 0 E|¶¦´ ½¾¯[4´ E|¯)¹»Ń° ¯[¼~º¦³PÀ [?¶¦¼ oyÉ EGFrbXfMPq[SfQUqAH l6H)SfXmKIFIq?M¦SfH
SfhbH)QRFeSfHCXxziMPOVT]kQRSncb]HAVFb]PSnd^] _aHAOU]l Ï ]^XnMi_a]z^H Ù^s
Æ _ Ç)Æ $= Re¥© ¦ PQjRPi¨ i¢© VÇ HAS? VH[}bMPjkQUFIH1SfhbH`rIXm]^roHCXmSTQUHAV]iZÞSfhbQUVnqC]^F
cIHAFIqCH)QUFeSfHAXmziMPO«s 9H[S à Ï × Ï^Ø MiFIc o3à Ï × H s 6]^FIcbKIqAS7MkVmQUj#KbOMiSTQU]^F3VxSfKIcv
Sf]VTHAHWhI]l ]PZ»STHCF5SfhIH)QUFeSfHAXmziMPO9qC]PFLSfMPQUFbV oÆ q?MiOUOUHAcSfhIH)qC]z^HAXfMPdPH Ç s ;] SfhIQRV
Z\]^XnziMPXTQR]^KIVuziMPOUKIHAV]iZ ] _oHASGlHCHAFÙ#MPFbc Ù Ï^ÏPÏ^Ï s OU]iSSThIH`qA]z^HCXTMPd^HWzPHCXmVTKIV
] s
Æ q Ç OU]PSySfhbHÎOUHAFIdPSfhW]PZNSfhIH6QUFeSfHAXmziMPOLz^HCXmVTKIV ] s KbrIra]PVTHÎlH6lÎMiFLSSThIH6ORHCFIdiSfh
]PZySfhIH{QRFLSTHCXxzPMiOwSf]k_aH1Fb]jk]^XTH;SThNMPF × ÏeØ s ]l OUMPXTdPH;VThI]PKIOUc ] _aH
J
=
U On!
g
g
365
_
c
_ed
_
_
g
_
This is page 89
Printer: Opaque this
6
Convergence of Random Variables
6.1 Introduction
The most important aspect of probability theory concerns
the behavior of sequences of random variables. This part of
probability is called large sample theory or limit theory
or asymptotic theory. This material is extremely important
for statistical inference. The basic question is this: what can we
say about the limiting behavior of a sequence of random variables X1 , X2 , X3 , . . .? Since statistics and data mining are all
about gathering data, we will naturally be interested in what
happens as we gather more and more data.
In calculus we say that a sequence of real numbers xn converges to a limit x if, for every ² > 0, |xn − x| < ² for all
large n. In probability, convergence is more subtle. Going back
to calculus for a moment, suppose that xn = x for all n. Then,
trivially, limn xn = x. Consider a probabilistic version of this
example. Suppose that X1 , X2 , . . . is a sequence of random variables which are independent and suppose each has a N (0, 1)
distribution. Since these all have the same distribution, we are
90
6. Convergence of Random Variables
tempted to say that Xn “converges” to X ∼ N (0, 1). But this
can’t quite be right since P(Xn = Z) = 0 for all n. (Two continuous random variables are equal with probability zero.)
Here is another example. Consider X1 , X2 , . . . where Xi ∼
N (0, 1/n). Intuitively, Xn is very concentrated around 0 for large
n. But P (Xn = 0) = 0 for all n. This chapter develops appropriate methods of discussing convergence of random variables.
There are two main ideas in this chapter:
1. The law of large numbers says that sample average
P
X n = n−1 ni=1 Xi converges in probability to the expectation µ = E(X).
2. The central limit theorem says that sample average
has approximately a Normal distribution for large n. More
√
precisely, n(X n − µ) converges in distribution to a
Normal(0, σ 2 ) distribution, where σ 2 = V(X).
6.2 Types of Convergence
The two main types of convergence are defined as follows.
91
Definition 6.1 Let X1 , X2 , . . . be a sequence of random variables and let X be another random variable. Let Fn denote the cdf of Xn and let F denote the cdf of X.
P
1. Xn converges to X in probability, written Xn −→ X,
if, for every ² > 0,
P(|Xn − X| > ²) → 0
(6.1)
as n → ∞.
2. Xn converges to X in distribution, written Xn Ã X,
if,
lim Fn (t) = F (t)
(6.2)
n→∞
at all t for which F is continuous.
There is another type of convergence which we introduce
mainly because it is useful for proving convergence in probability.
Definition 6.2 Xn converges to X in quadratic mean
qm
(also called convergence in L2 ), written Xn −→ X, if,
E(Xn − X)2 → 0
(6.3)
as n → ∞.
If X is a point mass at c – that is P(X = c) = 1 – we write
qm
qm
P
Xn −→ c instead of X −→ X. Similarly, we write Xn −→ c and
Xn Ã c.
Example 6.3 Let Xn ∼ N (0, 1/n). Intuitively, Xn is concentrating at 0 so we would like to say that Xn Ã 0. Let’s see if this
is true. Let F be the distribution function for a point mass at
92
√
0. Note that nXn ∼ N (0, 1). Let Z denote a standard normal
√
random variable. For t < 0, Fn (t) = P(Xn < t) = P( nXn <
√
√
√
nt) = P(Z < nt) → 0 since nt → −∞. For t > 0,
√
√
√
Fn (t) = P(Xn < t) = P( nXn < nt) = P(Z < nt) → 1
√
since nt → ∞. Hence, Fn (t) → F (t) for all t 6= 0 and so
Xn Ã 0. But notice that Fn (0) = 1/2 6= F (1/2) = 1 so convergence fails at t = 0. But that doesn’t matter because t = 0 is
not a continuity point of F and the definition of convergence in
distribution only requires convergence at continuity points. ¥
The next theorem gives the relationship between the types of
convergence. The results are summarized in Figure 6.1.
Theorem 6.4 The following relationships hold:
qm
P
(a) Xn −→ X implies that Xn −→ X.
P
(b) Xn −→ X implies that Xn Ã X.
(c) If Xn Ã X and if P(X = c) = 1 for some real number c,
P
then Xn −→ X.
In general, none of the reverse implications hold except the
special case in (c).
qm
Proof. We start by proving (a). Suppose that Xn −→ X. Fix
² > 0. Then, using Chebyshev’s inequality,
E|Xn − X|2
→ 0.
P(|Xn − X| > ²) = P(|Xn − X| > ² ) ≤
²2
Proof of (b). This proof is a little more complicated. You may
skip if it you wish. Fix ² > 0 and let x be a continuity point of
F . Then
2
2
Fn (x) = P(Xn ≤ x) = P(Xn ≤ x, X ≤ x + ²) + P(Xn ≤ x, X > x + ²)
≤ P(X ≤ x + ²) + P(|Xn − X| > ²)
= F (x + ²) + P(|Xn − X| > ²).
Also,
F (x − ²) = P(X ≤ x − ²) = P(X ≤ x − ², Xn ≤ x) + P(X ≤ x + ², Xn > x)
93
≤ Fn (x) + P(|Xn − X| > ²).
Hence,
F (x−²)−P(|Xn −X| > ²) ≤ Fn (x) ≤ F (x+²)+P(|Xn −X| > ²).
Take the limit as n → ∞ to conclude that
F (x − ²) ≤ lim inf Fn (x) ≤ lim sup Fn (x) ≤ F (x + ²).
n→∞
n→∞
This holds for all ² > 0. Take the limit as ² → 0 and use the fact
that F is continuous at x and conclude that limn Fn (x) = F (x).
Proof of (c). Fix ² > 0. Then,
P(|Xn − c| > ²) = P(Xn < c − ²) + P(Xn > c + ²)
≤ P(Xn ≤ c − ²) + P(Xn > c + ²)
= Fn (c − ²) + 1 − Fn (c + ²)
→ F (c − ²) + 1 − F (c + ²)
= 0 + 1 − 0 = 0.
Let us now show that the reverse implications do not hold.
Convergence in probability does not imply convergence in quadratic mean. Let U ∼ Unif(0, 1) and let
√
√
Xn = nI(0,1/n) (U ). Then P(|Xn | > ²) = P( nI(0,1/n) (U ) >
P
²) = P(0 ≤ U < 1/n) = 1/n → 0. Hence, Then Xn −→ 0. But
R 1/n
E(Xn2 ) = n 0 du = 1 for all n so Xn does not converge in
quadratic mean.
Convergence in distribution does not imply convergence in probability. Let X ∼ N (0, 1). Let Xn = −X
for n = 1, 2, 3, . . .; hence Xn ∼ N (0, 1). Xn has the same distribution function as X for all n so, trivially, limn Fn (x) = F (x)
d
for all x. Therefore, Xn → X. But P(|Xn − X| > ²) = P(|2X| >
²) = P(|X| > ²/2) 6= 0. So Xn does not tend to X in probability.
¥
94
point-mass distribution
quadratic mean
probability
distribution
FIGURE 6.1. Relationship between types of convergence.
P
We can conclude
that E(Xn ) → b
if Xn is uniformly
integrable. See the
technical appendix.
Warning! One might conjecture that if Xn −→ b then E(Xn ) →
b. This is not true. Let Xn be a random variable defined by
P(Xn = n2 ) = 1/n and P(Xn = 0) = 1 − (1/n). Now, P(|Xn | <
P
²) = P(Xn = 0) = 1 − (1/n) → 1. Hence, Z −→ 0. However,
E(Xn ) = [n2 ×(1/n)]+[0×(1−(1/n))] = n. Thus, E(Xn ) → ∞.
Summary. Stare at Figure 6.1.
Some convergence properties are preserved under transformations.
Theorem 6.5 Let Xn , X, Yn , Y be random variables. Let g be a
continuous function.
P
P
P
(a) If Xn −→ X and Yn −→ Y , then Xn + Yn −→ X + Y .
qm
qm
qm
(b) If Xn −→ X and Yn −→ Y , then Xn + Yn −→ X + Y .
(c) If Xn Ã X and Yn Ã c, then Xn + Yn Ã X + c.
P
P
P
(d) If Xn −→ X and Yn −→ Y , then Xn Yn −→ XY .
(e) If Xn Ã X and Yn Ã c, then Xn Yn Ã cX.
P
P
(f ) If Xn −→ X then g(Xn )−→ g(X).
(g) If Xn Ã X then g(Xn ) Ã g(X).
6.3 The Law of Large Numbers
Now we come to a crowning achievement in probability, the law
of large numbers. This theorem says that the mean of a large
sample is close to the mean of the distribution. For example,
the proportion of heads of a large number of tosses is expected
to be close to 1/2. We now make this more precise.
6.3 The Law of Large Numbers
95
Let X1 , X2 , . . . , be an iid sample and let µ = E(X1 ) and
σ = V(X1 ). Recall that the sample mean is defined as X n = Note that µ =
P
E(Xi ) is the same
n−1 ni=1 Xi and that E(X n ) = µ and V(X n ) = σ 2 /n.
for all i so we can
define µ = E(Xi )
for any i. By conTheorem 6.6 (The Weak Law of Large Numbers (WLLN).)
P
vention,
we often
If X1 , . . . , Xn are iid , then X n −→ µ.
write µ = E(X1 ).
2
Interpretation of WLLN: The distribution of X n becomes more concentrated around µ as n gets large.
Proof. Assume that σ < ∞. This is not necessary but it
simplifies the proof. Using Chebyshev’s inequality,
¢ V(X n )
¡
σ2
=
P |X n − µ| > ² ≤
²2
n²2
which tends to 0 as n → ∞. ¥
There is a stronger
theorem in the appendix called the
Example 6.7 Consider flipping a coin for which the probability strong law of large
of heads is p. Let Xi denote the outcome of a single toss (0 numbers.
or 1). Hence, p = P (Xi = 1) = E(Xi ). The fraction of heads
after n tosses is X n . According to the law of large numbers,
X n converges to p in probability. This does not mean that X n
will numerically equal p. It means that, when n is large, the
distribution of X n is tightly concentrated around p. Suppose that
p = 1/2. How large should n be so that P (.4 ≤ X n ≤ .6) ≥ .7?
First, E(X n ) = p = 1/2 and V(X n ) = σ 2 /n = p(1 − p)/n =
1/(4n). From Chebyshev’s inequality,
P(.4 ≤ X n ≤ .6) = P(|X n − µ| ≤ .1)
= 1 − P(|X n − µ| > .1)
1
25
≥ 1−
=
1
−
.
4n(.1)2
n
The last expression will be larger than .7 if n = 84. ¥
96
6.4 The Central Limit Theorem
Suppose that X1 , . . . , Xn are iid with mean µ and variance σ.
P
The central limit theorem (CLT) says that X n = n−1 i Xi
has a distribution which is approximately Normal with mean µ
and variance σ 2 /n. This is remarkable since nothing is assumed
about the distribution of Xi , except the existence of the mean
and variance.
Theorem 6.8 (The Central Limit Theorem (CLT).) Let X1 , . . . , Xn
P
be iid with mean µ and variance σ 2 . Let X n = n−1 ni=1 Xi .
Then
√
n(X n − µ)
Zn ≡
ÃZ
σ
where Z ∼ N (0, 1). In other words,
Z z
1
2
√ e−x /2 dx.
lim P(Zn ≤ z) = Φ(z) =
n→∞
2π
−∞
Interpretation: Probability statements about X n can
be approximated using a Normal distribution. It’s the
probability statements that we are approximating, not
the random variable itself.
In addition to Zn Ã N (0, 1), there are several forms of notation to denote the fact that the distribution of Zn is converging
to a Normal. They all mean the same thing. Here they are:
Zn ≈ N (0, 1)
µ
¶
σ2
X n ≈ N µ,
n
µ
¶
σ2
X n − µ ≈ N 0,
n
¡
¢
√
n(X n − µ) ≈ N 0, σ 2
6.4 The Central Limit Theorem
√
97
n(X n − µ)
≈ N (0, 1).
σ
Example 6.9 Suppose that the number of errors per computer
program has a Poisson distribution with mean 5. We get 125 programs. Let X1 , . . . , X125 be the number of errors in the programs.
We want to approximate P(X < 5.5). Let µ = E(X1 ) = λ = 5
and σ 2 = V(X1 ) = λ = 5. Then,
µ√
¶
√
n(X n − µ)
n(5.5 − µ)
≈ P(Z < 2.5) = .9938.
<
P(X < 5.5) = P
σ
σ
√
The central limit theorem tells us that Zn = n(X − µ)/σ
is approximately N(0,1). However, we rarely know σ. We can
estimate σ 2 from X1 , . . . , Xn by
n
Sn2 =
1 X
(Xi − X n )2 .
n − 1 i=1
This raises the following question: if we replace σ with Sn is the
central limit theorem still true? The answer is yes.
Theorem 6.10 Assume the same conditions as the CLT. Then,
√
n(X n − µ)
Ã N (0, 1).
Sn
You might wonder, how accurate the normal approximation
is. The answer is given in the Berry-Essèen theorem.
Theorem 6.11 (Berry-Essèen.) Suppose that E|X1 |3 < ∞. Then
sup |P(Zn ≤ z) − Φ(z)| ≤
z
33 E|X1 − µ|3
√ 3 .
4
nσ
(6.4)
There is also a multivariate version of the central limit theorem.
98
Theorem 6.12 (Multivariate central limit theorem) Let X1 , . . . , Xn
be iid random vectors where


X1i
 X2i 


Xi =  .. 
 . 
Xki
with mean



µ=

µ1
µ2
..
.
µk
and variance matrix Σ. Let
where X i = n−1
Pn
 
 
=
 



X=

i=1
√


X1
X2
..
.
Xk
X1i . Then,
E(X1i )
E(X2i )
..
.
E(Xki )








.

n(X − µ) Ã N (0, Σ).
6.5 The Delta Method
If Yn has a limiting Normal distribution then the delta method
allows us to find the limiting distribution of g(Yn ) where g is
any smooth function.
Theorem 6.13 (The Delta Method) Suppose that
√
n(Yn − µ)
Ã N (0, 1)
σ
and that g is a differentiable function such that g 0 (µ) 6= 0. Then
√
n(g(Yn ) − g(µ))
Ã N (0, 1).
|g 0 (µ)|σ
6.5 The Delta Method
99
In other words,
¶
µ
¶
µ
2
σ2
0
2σ
Yn ≈ N µ,
=⇒ g(Yn ) ≈ N g(µ), (g (µ))
.
n
n
Example 6.14 Let X1 , . . . , Xn be iid with finite mean µ and fi√
nite variance σ 2 . By the central limit theorem, n(X n )/σ Ã
N (0, 1). Let Wn = eX n . Thus, Wn = g(X n ) where g(s) = es .
Since g 0 (s) = es , the delta method implies that Wn ≈ N (eµ , e2µ σ 2 /n).
¥
There is also a multivariate version of the delta method.
Theorem 6.15 (The Multivariate Delta Method) Suppose that
Yn = (Yn1 , . . . , Ynk ) is a sequence of random vectors such that
√
n(Yn − µ) Ã N (0, Σ).
Let g : Rk → R ane let

∂g
∂y1



∇g(y) =  ...  .
∂g
∂yk
Let ∇µ denote ∇g(y) evaluated at y = µ and assume that the
elements of ∇µ are non-zero. Then
√
¢
¡
n(g(Yn ) − g(µ)) Ã N 0, ∇Tµ Σ∇µ .
Example 6.16 Let
¶
¶
µ
¶ µ
µ
X1n
X12
X11
, ...,
,
X2n
X22
X21
be iid random vectors with mean µ = (µ1 , µ2 )T and variance Σ.
Let
n
n
1X
1X
X1 =
X2 =
X1i ,
X2i
n i=1
n i=1
100
and define Yn = X 1 X 2 . Thus, Yn = g(X 1 , X 2 ) where g(s1 , s2 ) =
s1 s2 . By the central limit theorem,
µ
¶
√
X 1 − µ1
n
Ã N (0, Σ).
X 2 − µ2
Now
∇g(s) =
Ã
!
=
¶µ
µ2
µ1
∂g
∂s1
∂g
∂s2
µ
s2
s1
¶
and so
∇Tµ Σ∇µ
= (µ2 µ1 )
µ
σ11 σ12
σ12 σ22
¶
= µ22 σ11 + 2µ1 µ2 σ12 + µ21 σ22 .
Therefore,
√
³
´
2
2
n(X 1 X 2 − µ1 µ2 ) Ã N 0, µ2 σ11 + 2µ1 µ2 σ12 + µ1 σ22 .
¥
6.6 Bibliographic Remarks
Convergence plays a central role in modern probability theory. For more details, see Grimmet and Stirzaker (1982), Karr
(1993) and Billingsley (1979). Advanced convergence theory is
explained in great detail in van der Vaart and Wellner (1996)
and van der Vaart (1998).
6.7 Technical Appendix
6.7.1 Almost Sure and L1 Convergence
We say that Xn converges almost surely to X, written
as
Xn −→ X, if
P({s : Xn (s) → X(s)}) = 1.
L
1
We say that Xn converges in L1 to X, written Xn −→
X, if
E|Xn − X| → 0
101
as n → ∞.
Theorem 6.17 Let Xn and X be random vaiables. Then:
as
P
(a) Xn −→ X implies that Xn −→ X.
qm
L1
(b) Xn −→ X implies that Xn −→
X.
L1
P
(c) Xn −→ X implies that Xn −→ X.
The weak law of large numbers says that X n converges to
EX1 in probability. The strong law asserts that this is also true
almost surely.
Theorem 6.18 (The strong law of large numbers.) Let X1 , X2 , . . .
as
be iid. If µ = E|X1 | < ∞ then X n −→ µ.
A sequence Xn is asymptotically uniformly integrable if
lim lim sup E (|Xn |I(|Xn | > M )) = 0.
M →∞
n→∞
P
If Xn −→ b and Xn is asymptotically uniformly integrable, then
E(Xn ) → b.
6.7.2 Proof of the Central Limit Theorem
If X is a random variable, define its moment generating function (mgf) by ψX (t) = EetX . Assume in what follows that the
mgf is finite in a neighborhood around t = 0.
Lemma 6.19 Let Z1 , Z2 , . . . be a sequence of random variables.
Let ψn the mgf of Zn . Let Z be another random variable and
denote its mgf by ψ. If ψn (t) → ψ(t) for all t in some open
interval around 0, then Zn Ã Z.
proof of the central limit theorem. Let Yi = (Xi −
P
µ)/σ. Then, Zn = n−1/2 i Yi . Let ψ(t) be the mgf of Yi . The
P
√
mgf of i Yi is (ψ(t))n and mgf of Zn is [ψ(t/ n)]n ≡ ξn (t).
102
Now ψ 0 (0) = E(Y1 ) = 0, ψ 00 (0) = E(Y12 ) = V ar(Y1 ) = 1. So,
ψ(t) = ψ(0) + tψ 0 (0) +
t3
t2 00
ψ (0) + ψ 00 (0) + · · ·
2!
3!
t2 t3 00
+ ψ (0) + · · ·
2
3!
t2 t3 00
= 1 + + ψ (0) + · · ·
2
3!
= 1+0+
Now,
· µ
¶¸n
t
ξn (t) = ψ √
n
¸n
·
2
t3
t
00
+
ψ (0) + · · ·
= 1+
2n 3!n3/2
#n
"
t3
t2
00
ψ
(0)
+
·
·
·
+
3!n1/2
=
1+ 2
n
→ et
2 /2
which is the mgf of a N(0,1). The result follows from the previous
Theorem. In the last step we used the fact that, if an → a then
³
an ń
→ ea .
1+
n
6.8 Excercises
1. Let X1 , . . . , Xn be iid with finite mean µ = E(X1 ) and
finite variance σ 2 = V(X1 ). Let X n be the sample mean
and let Sn2 be the sample variance.
(a) Show that E(Sn2 ) = σ 2 .
P
P
(b) Show that Sn2 −→ σ 2 . Hint: Show that Sn2 = cn n−1 ni=1 Xi2 −
2
dn X n where cn → 1 and dn → 1. Apply the law of large
P
numbers to n−1 ni=1 Xi2 and to X n . Then use part (e) of
Theorem 6.5.
6.8 Excercises
103
2. Let X1 , X2 , . . . be a sequence of random variables. Show
qm
that Xn −→ b if and only if
lim E(Xn ) = b and
n→∞
lim V(Xn ) = 0.
n→∞
3. Let X1 , . . . , Xn be iid and let µ = E(X1 ). Suppose that
qm
the variance is finite. Show that X n −→ µ.
4. Let X1 , X2 , . . . be a sequence of random variables such
that
¶
µ
1
1
1
= 1 − 2 and P (Xn = n) = 2 .
P Xn =
n
n
n
Does Xn converge in probability? Does Xn converge in
quadratic mean?
5. Let X1 , . . . , Xn ∼ Bernoulli(p). Prove that
n
1X 2 P
X −→ p and
n i=1 i
n
1 X 2 qm
X −→ p.
n i=1 i
6. Suppose that the height of men has mean 68 inches and
standard deviation 4 inches. We draw 100 men at random. Find (approximately) the probability that the average height of men in our sample will be at least 68 inches.
7. Let λn = 1/n for n = 1, 2, . . .. Let Xn ∼ Poisson(λn ).
P
(a) Show that Xn −→ 0.
P
(b) Let Yn = nXn . Show that Yn −→ 0.
8. Suppose we have a computer program consisting of n =
100 pages of code. Let Xi be the number of errors on the ith
page of code. Suppose that the Xi0 s are Poisson with mean
P
1 and that they are independent. Let Y = ni=1 Xi be the
total number of errors. Use the central limit theorem to
approximate P(Y < 90).
104
9. Suppose that P(X = 1) = P(X = −1) = 1/2. Define
Xn =
½
X with probability 1 −
en with probability n1 .
1
n
Does Xn converge to X in probability? Does Xn converge
to X in distribution? Does E(X − Xn )2 converge to 0?
10. Let Z ∼ N (0, 1). Let t > 0.
(a) Show that, for any k > 0,
P(|Z| > t) ≤
E|Z|k
.
tk
(b) (Mill’s inequality.) Show that
½ ¾1/2 −t2 /2
2
e
P(|Z| > t) ≤
.
π
t
Hint. Note that P(|Z| > t) = 2P(Z > t). Now write out
what P(Z > t) means and note that x/t > 1 whenever
x > t.
11. Suppose that Xn ∼ N (0, 1/n) and let X be a random
variable with distribution F (x) = 0 if x < 0 and F (x) = 1
if x ≥ 0. Does Xn converge to X in probability? (Prove or
disprove). Does Xn converge to X in distribution? (Prove
or disprove).
12. Let X, X1 , X2 , X3 , . . . be random variables that are positive and integer valued. Show that Xn Ã X if and only
if
lim P(Xn = k) = P(X = k)
n→∞
for every integer k.
6.8 Excercises
105
13. Let Z1 , Z2 , . . . be i.i.d., random variables with density f .
Suppose that P(Zi > 0) = 1 and that λ = limx↓0 f (x) > 0.
Let
Xn = n min{Z1 , . . . , Zn }.
Show that Xn Ã Z where Z has an exponential distribution with mean 1/λ.
2
14. Let X1 , . . . , Xn ∼ Uniform(0, 1). Let Yn = X n . Find the
limiting distribution of Yn .
15. Let
µ
X11
X21
¶ µ
¶
µ
¶
X12
X1n
,
, ...,
X22
X2n
be iid random vectors with mean µ = (µ1 , µ2 ) and variance
Σ. Let
n
n
1X
1X
X1 =
X1i , X 2 =
X2i
n i=1
n i=1
and define Yn = X 1 /X 2 . Find the limiting distribution of
Yn .
Part II
Statistical Inference
103
7
Models, Statistical Inference and
Learning
7.1 Introduction
Statistical inference, or \learning" as it is called in computer
science, is the process of using data to infer the distribution
that generated the data. The basic statistical inference problem
is this:
We observe X1 ; : : : ; Xn F . We want to infer (o r
estimate o r learn) F o r some feature of F such as its
mean.
7.2 Parametric and Nonparametric Models
A statistical model is a set of distributions (or a set of
densities) F. A parametric model is a set F that can be parameterized b y a nite n umber of parameters. For example, if
we assume that the data come from a Normal distribution then
106
7. Models, Statistical Inference and Learning
the model is
F = f (x; ; ) =
1
p
exp
1
(x
2 2
)2 ; 2 R ; > 0 :
(7.1)
This is a two-parameter model. We have written the density
as f (x; ; ) to show that x is a value of the random variable
whereas and are parameters. In general, a parametric model
takes the form
F = f (x; ) : 2 (7.2)
where is an unknown parameter (or vector of parameters) that
can take values in the parameter space . If is a vector but
we are only interested in one component of , we call the remaining parameters nuisance parameters. A nonparametric model is a set F that cannot be parameterized by a nite
number of parameters. For example, FALL = fall cdf 0 sg is nondistinction parametric.
The
between parametric
and nonparametric
is more subtle than
this but we don't
need a rigorous
denition for our
purposes.
Example 7.1 (One-dimensional Parametric Estimation.) Let X1 ; : : : ; Xn
be independent Bernoulli(p) observations. The problem is to estimate the parameter p. Example 7.2 (Two-dimensional Parametric Estimation.) Suppose that
X1 ; : : : ; Xn F and we assume that the pdf f 2 F where F is
given in (7.1). In this case there are two parameters, and .
The goal is to estimate the parameters from the data. If we are
only interested in estimating then is the parameter of interest and is a nuisance parameter. Example 7.3 (Nonparametric estimation of the cdf.) Let X1 ; : : : ; Xn
be independent observations from a cdf F . The problem is to estimate F assuming only that F 2 FALL = fall cdf 0 sg. Example 7.4 (Nonparametric density estimation.) Let X1 ; : : : ; Xn
be independent observations from a cdf F and let f = F 0 be the
7.2 Parametric and Nonparametric Models
107
. Suppose we want to estimate the pdf f . It is not possible to estimate f assuming only that F 2 FALL . We need to
assume some smoothness
on f . For example, we might assume
T
that f 2 F = FDENS FSOB where FDENS is the set of all probability density functions and
pdf
FSOB = f :
Z
(f 00 (x))2 dx < 1 :
The class FSOB is called a Sobolev space; it is the set of functions that are not \too wiggly." Example 7.5 (Nonparametric estimation of functionals.) Let X1 ; : : : ; Xn
R
F . Suppose we want to estimate = E (X1 ) = x dF (x) assuming only that exists. The mean may
be thought of as a
R
function of F : we can write = T (F ) = x dF (x). In general,
any function of F is called a statistical functional
. Other
R 2
examples
of
functions are the variance T (F ) = x dF (x)
R
2
xdF (x) and the median T (F ) = F 1=2 . Example 7.6 (Regression, prediction and classication.) Suppose we
observe pairs of data (X1 ; Y1 ); : : : (Xn ; Yn). Perhaps Xi is the
blood pressure of subject i and Yi is how long they live. X is
called a predictor or regressor or feature or independent
variable. Y is called the outcome or the response variable
or the dependent variable. We call r(x) = E (Y jX = x) the
regression function. If we assume that f 2 F where F is nite dimensional { the set of straight lines for example { then
we have a parametric regression model. If we assume that
f 2 F where F is not nite dimensional then we have a parametric regression model. The goal of predicting Y for a new
patient based on their X value is called prediction. If Y is discrete (for example, live or die) then prediction is instead called
classication. If our goal is to estimate the functin f , then we
call this regression or curve estimation. Regression models
108
are sometimes written as
Y = f (X ) + (7.3)
where E () = 0. We can always rewrite a regression model this
way. To see this, dene = Y f (X ) and hence Y = Y +
f (X ) f (X ) = f (X )+. Moreover, E () = E E (jX ) = E (E (Y
f (X ))jX ) = E (E (Y jX ) f (X )) = E (f (X ) f (X )) = 0. It is traditional in most introductory courses
to start with parametric inference. Instead, we will start with
nonparametric inference and then we will cover parametric inference. In some respects, nonparametric inference is easier to
understand and is more useful than parametric inference.
What's Next?
There are many approaches
to statistical inference. The two dominant approaches are called
frequentist inference and Bayesian inference. We'll cover
both but we will start with frequentist inference. We'll postpone
a discussion of the pro's and con's of these two until later.
Frequentists and Bayesians.
If F = ff (xR; ) : 2 g is a parametric
model,
we write P (X 2 A) = A f (x; )dx and E (r(X )) =
R
r(x)f (x; )dx. The subscript indicates that the probability
or expectation is with respect to f (x; ); it does not mean we
are averaging over . Similarly, we write V for the variance.
Some Notation.
7.3 Fundamental Concepts in Inference
Many inferential problems can be identied as being one of
three types: estimation, condence sets or hypothesis testing.
We will treat all of these problems in detail in the rest of the
book. Here, we give a brief introduction to the ideas.
7.3.1 Point Estimation
109
Point estimation refers to providing a single \best guess"
of some quantity of interest. The quantity of interest could be a
parameter in a parametric model, a cdf F , a probability density
function f , a regression function r, or a prediction for a future
value Y of some random variable.
By convention, we denote a point estimate of by b.
Remember that is a xed, unknown quantity. The estimate b depends on the data so b is a random variable.
More formally, let X1 ; : : : ; Xn be n iid data point from some
distribution F . A point estimator bn of a parameter is some
function of X1 ; : : : ; Xn:
bn = g (X1; : : : ; Xn ):
We dene
bias (bn ) = E (bn )
(7.4)
to be the bias of bn . We say that bn is unbiased if E (bn ) = .
Unbiasedness used to receive much attention but these days it is
not considered very important; many of the estimators we will
use are biased. A point estimator bn of a parameter is consistent if bn P! . Consistency is a reasonable requirement for
estimators. The distribution of bn is called the sampling distribution. The standard deviation of bn is called the standard
error, denoted by se :
se = se (bn )
q
= V (bn ):
(7.5)
Often, it is not possible to compute the standard error but usually we can estimate the standard error. The estimated standard
error is denoted by seb .
Example 7.7 Let X1 ; : : : ; Xn
Bernoulli(p) and let pbn = n
1 P Xi .
i
P
Then E (pbn ) = n 1 i E (Xi ) = p so pbn is unbiased. The standard
110
p
p
error is se = p V (pbn ) = p(1
error is seb = pb(1 pb)=n. p)=n. The estimated standard
The quality of a point estimate is sometimes assessed by the
mean squared error, or MSE, dened by
MSE = E (bn
)2 :
Recall that E () refers to expectation with respect to the distribution
n
Y
f (x1 ; : : : ; xn ; ) = f (xi ; )
i=1
that generated the data. It does not mean we are averaging over
a density for .
Theorem 7.8 The MSE can be written as
MSE = bias (bn )2 + V (bn ):
Proof.
E (bn
(7.6)
Let n = E (bn ). Then
)2 = E (bn n + n )2
= E (bn n)2 + 2(n )2 E (bn
= (n )2 + E (bn n )2
= bias 2 + V (bn ): n ) + E ( n
!P 0 and se ! 0 as n ! 1 then bn is
consistent, that is, bn ! .
Proof. If bias ! 0 and se ! 0 then, by Theorem 7.8,
qm
MSE ! 0. It follows that bn ! . (Recall denition 6.3.) The
result follows from part (b) of Theorem 6.4. Theorem 7.9 If bias
Example 7.10 Returning to the coin ipping example,
we have
p
that E p (pbn ) = p so that bias = p p = 0 and se = p(1 p)=n !
0. Hence, pbn P! p, that is, pbn is a consistent estimator. )2
111
Many of the estimators we will encounter will turn out to
have, approximately, a Normal distribution.
Denition 7.11 An estimator is asymptotically Nor-
mal if
bn
se
N (0; 1):
(7.7)
7.3.2 Condence Sets
A 1 condence interval for a parameter is an interval
Cn = (a; b) where a = a(X1 ; : : : ; Xn ) and b = b(X1 ; : : : ; Xn ) are
functions of the data such that
P (
2 Cn) 1 ;
for all 2 :
(7.8)
In words, (a; b) traps with probability 1 . We call 1 the coverage of the condence interval. Commonly, people use
95 per cent condence intervals which corresponds to choosing
= 0:05. Note: Cn is random and is xed! If is a vector
then we use a condence set (such a sphere or an ellipse) instead
of an interval.
Warning! There is much confusion about how to interpret
a condence interval. A condence interval is not a probability
statement about since is a xed quantity, not a random
variable. Some texts interpret condence intervals as follows: if
I repeat the experiment over and over, the interval will contain
the parameter 95 per cent of the time. This is correct but useless
since we rarely repeat the same experiment over and over. A
better interpretation is this:
On day 1, you collect data and construct a 95 per
cent condence interval for a parameter 1 . On day
112
2, you collect new data and construct a 95 per cent
condence interval for an unrelated parameter 2 . On
day 3, you collect new data and construct a 95 per
cent condence interval for an unrelated parameter 3 .
You continue this way constructing condence intervals for a sequence of unrelated parameters 1 ; 2 ; : : :
Then 95 per cent your intervals will trap the true parameter value. There is no need to introduce the idea
of repeating the same experiment over and over.
Example 7.12 Every day, newspapers report opinion polls. For
example, they might say that \83 per cent of the population favor arming pilots with guns." Usually, you will see a statement
like \this poll is accurate to within 4 points 95 per cent of the
time." They are saying that 83 4 is a 95 per cent condence
interval for the true but unknown proportion p of people who
favor arming pilots with guns. If you form a condence interval this way everyday for the rest of your life, 95 per cent of
your intervals will contain the true parameter. This is true even
though you are estimating a dierent quantity (a dierent poll
question) every day.
Later, we will discuss Bayesian methods in which we treat
as if it were a random variable and we do make probability
statements about . In particular, we will make statements like
\the probability that is Cn, given the data, is 95 per cent."
However, these Bayesian intervals refer to degree-of-belief probabilities. These Bayesian intervals will not, in general, trap the
parameter 95 per cent of the time.
Example 7.13 In the coin ipping setting, let Cn = (pbn n ; pbn +
n ) where 2n = log(2=)=(2n). From Hoeding's inequality (5.4)
it follows that
P(p 2 Cn ) 1 for every p. Hence, Cn is a 1 condence interval. 7.3 Fundamental Concepts in Inference
113
As mentioned earlier, point estimators often have a limiting
Normal distribution, meaning that equation (7.7) holds, that
is, bn N (; seb 2 ). In this case we can construct (approximate)
condence intervals as follows.
Theorem 7.14 (Normal-based Condence Interval.) Suppose that
bn N (; seb 2 ). Let be the cdf of a standard Normal and
let z=2 = 1 (1 (=2)), that is, P(Z > z=2 ) = =2 and
P( z=2 < Z < z=2 ) = 1 where Z N (0; 1). Let
Cn = (bn
Then
z=2 seb ; bn + z=2 seb ):
P (
2 Cn) ! 1 :
(7.10)
Let Zn = (bn )=seb . By assumption Zn
Z N (0; 1). Hence,
Proof.
P (
2 Cn)
(7.9)
=
P bn
=
P
!
P
z=2 seb < z=2 <
bn
b
se
z=2 < Z < z=2
= 1 : Z where
< bn + z=2 seb
!
< z=2
For 95 per cent condence intervals, = 0:05 and z=2 =
1:96 2 leading to the approximate 95 per cent condence
interval bn 2seb . We will discuss the construction of condence
intervals in more generality in the rest of the book.
P
Bernoulli(
p) and let pbn = n
P
1 Pn Xi .
i=1
2 n p(1 p) = n 2 np(1
Then V (pbn ) = n 2 ni=1 V (Xp
i) = n
i=1
p
p) = p(1 p)=n. Hence, se = p(1 p)=n and seb = pbn (1 pbn )=n.
By the Central Limit Theorem, pbn N (p; se
b 2 ). Therefore, an
approximate 1 condence interval is
pb z=2 seb = pb z=2
r
pbn (1 pbn )
:
n
114
Compare this with the condence interval in the previous example. The Normal-based interval is shorter but it only has approximately (large sample) correct coverage. 7.3.3 Hypothesis Testing
The term \retaining
the null hypothesis"
is due to Chris Genovese. Other terminology is \accepting
the null" or \failing
to reject the null."
In hypothesis testing, we start with some default theory
{ called a null hypothesis { and we ask if the data provide
suÆcient evidence to reject the theory. If not we retain the null
hypothesis.
Example 7.16 (Testing if a Coin is Fair) Suppose X1 ; : : : ; Xn Bernoulli(p)
denote n independent coin ips. Suppose we want to test if the
coin is fair. Let H0 denote the hypothesis that the coin is fair
and let H1 denote the hypothesis that the coin is not fair. H0
is called the null hypothesis and H1 is called the alternative
hypothesis. We can write the hypotheses as
H0 : p = 1=2 versus H1 : p 6= 1=2:
It seems reasonable to reject H0 if T = jpbn (1=2)j is large.
When we discuss hypothesis testing in detail, we will be more
precise about how large T should be to reject H0 . 7.4 Bibliographic Remarks
Statistical inference is covered in many texts. Elementary
texts include DeGroot and Schervish (2001) and Larsen and
Marx (1986). At the intermediate level I recommend Casella
and Berger (2002) and Bickel and Doksum (2001). At the advanced level, Lehmann and Casella (1998), Lehmann (1986) and
van der Vaart (1998).
115
Our denition of condence interval requires that P ( 2
Cn ) 1 for all 2 . An pointwise asymptotic condence interval requires that lim infn!1 P ( 2 Cn ) 1 for
all 2 . An uniform asymptotic condence interval requres
that lim infn!1 inf 2 P ( 2 Cn ) 1 . The approximate
Normal-based interval is a pointwise asymptotic condence interval. In general, it might not be a uniform asymptotic condence interval.
116
8
Estimating the
cdf
and Statistical
Functionals
The rst inference problem we will consider is nonparametric
estimationthe
of
cdf F and functions of the cdf.
8.1 The Empirical Distribution Function
Let X1 ; : : : ; Xn F be iid where F is a distribution function
on the real line. We will estimate F with the empirical distribution function, which is dened as follo ws.
Denition 8.1 The empirical distribution function Fbn
is the cdf that puts mass 1=n at e ach data point Xi . Formally,
Fbn (x) =
Pn
i=1 I (Xi
x)
n
n umber of observations less than or equal to x
=
(8.1)
n
where
I (Xi x) =
1 if Xi x
0 if Xi > x:
8. Estimating the cdf and Statistical Functionals
118
Example 8.2 (Nerve Data.) Cox and Lewis (1966) reported 799
waiting times between successive pulses along a nerve bre. The
rst plot in Figure 8.1 shows the a \toothpick plot" where each
toothpick shows the location of one data point. The second plot
shows that empirical cdf Fbn . Suppose we want to estimate the
fraction of waiting times between .4 and .6 seconds. The estimate
is Fbn (:6) Fbn (:4) = :93 :84 = :09. The following theorems give some properties of Fbn (x).
Theorem 8.3 At any xed value of x,
E
Fbn (x)
= F (x) and
V Fbn (x)
Thus,
F (x)(1 F (x))
n
P
and hence, Fbn (x) ! F (x).
MSE =
=
F (x)(1 F (x))
:
n
!0
Actually,
supx jFbn (x) F (x)j Theorem 8.4 (Glivenko-Cantelli Theorem.) Let X1 ; : : : ; Xn F .
converges to 0 Then
almost surely.
sup jFbn(x) F (x)j P! 0:
x
8.2 Statistical Functionals
A statistical functional
T (F ) is any function Rof F . Examples
R
are the mean = xdF (x), the variance 2 = (x )2 dF (x)
and the median m = F 1 (1=2).
Denition 8.5 The plug-in estimator of = T (F ) is
dened by
bn = T (Fbn ):
In other words, just plug in Fbn for the unknown F .
0.0
0.2
0.4
0.6
119
0.8
1.0
1.2
1.4
0.8
1.0
1.2
1.4
0.4
0.0
cdf
0.8
time
0.0
0.2
0.4
0.6
time
FIGURE 8.1. Nerve data. The solid line in the middle is the empirical
distribution function. The lines above and below the middle line are
a 95 per cent condence band. The condence band is explained in
the appendix.
120
R
A functional of theR form r(x)dF (x) is called a Rlinear functional. Recall that r(x)dFP(x) is dened to be r(x)f (x)dx
in the continuous case and j r(xj )f (xj ) in the discrete. The
empirical cdf Fbn (x)R is discrete, putting mass 1=n at each Xi .
Hence, if T (F ) = r(x)dF (x) is a linear functional then we
have:
R
The plug-in estimator for linear functional T (F ) = r(x)dF (x) is:
T (Fbn)
=
Z
r(x)dFbn (x)
n
1X
=
r(Xi ):
n i=1
(8.2)
Sometimes we can nd the estimated standard error se
b of
T (Fbn ) by doing some calculations. However, in other cases it
is not obvious how to estimate the standard error. In the next
chapter, we will discuss a general method for nding se
b . For
now, let us just assume that somehow we can nd se
b . In many
cases, it turns out that
2
T (Fbn ) N (T (F ); se
b ):
By equation (7.10), an approximate 1
for T (F ) is then
condence interval
T (Fbn ) z=2 se:
b
We will call this the Normal-based interval. For a 95 per cent
condence interval, z=2 = z:05=2 = 1:96 2 so the interval is
T (Fbn ) 2 se:
b
Example 8.6 (The mean.)
Let = T (F ) =
R
R
x dF (x). The plugin estimator
is b = x dFbn (x) = X n . The standard error is
q
p
se = V (X n ) = = n. If b denotes an estimate of , then
(8.3)
p
121
the estimated standard error is b= n. (In the next example, we
shall see how to estimate .) A Normal-based condence interval
for is X n z=2 se2 : Example 8.7 (The Variance) Let 2 = T (F ) = V (X ) =
R
xdF (x) 2 . The plug-in estimator is
b2 =
=
Z
R 2
x dF (x)
Z
2
b
xdFn (x)
!2
n
X
x2 dFbn(x)
n
1X
X2
n i=1 i
1
X
n i=1 i
n
1X
=
(X
n i=1 i
X n )2 :
Another reasoable estimator of 2 is the sample variance
Sn2 =
n
X
1
(Xi
X n )2 :
n 1 i=1
In practice, there is little dierence between b2 and Sn2 and you
can use either one. Returning the our last example, we now see
that the estimated standard error of the estimate of the mean is
p
se
b =
b = n. Example 8.8 (The Skewness) Let and 2 denote the mean and
variance of a random variable X . The skewness is dened to be
R
E (X )3
=
=
3
R
(x )3 dF (x)
:
(x )2 dF (x) 3=2
The skewness measure the lack of symmetry of a distribution.
P
To nd the
plug-in
estimate,
rst
recall
that
b = n 1 i Xi and
P
b2 = n 1 i (Xi b)2 . The plug-in estimate of is
b =
R
nR
(x
(x
1 P (Xi
n i
o3=2 =
b3
)3 dFbn(x)
)2 dFbn(x)
b)3
:
122
Example 8.9 (Correlation.) Let Z = (X; Y ) and let = T (F ) =
E (X
X )(Y Y )=(x y ) denote the correlation between X and
Y , where F (x; y ) is bivariate. We can write T (F ) = a(T1 (F ); T2 (F ); T3 (F ); T4 (F ); T5 (F ))
where
R
R
R
T1 (F ) = R x dF (z ) T2 (F ) = R y dF (z ) T3 (F ) = xy dF (z )
T4 (F ) = x2 dF (z ) T5 (F ) = y 2 dF (z )
and
a(t1 ; : : : ; t5 ) = p
(t4
t3
t1 t2
:
t21 )(t5 t22 )
Replace F with Fbn in T1 (F ), . . . , T5 (F ), and take
b = a(T1 (Fbn ); T2 (Fbn ); T3 (Fbn ); T4 (Fbn ); T5 (Fbn )):
We get
b =
qP
P
i (Xi
i (Xi
X n )(Yi Y n )
qP
2
X n )2
i (Yi Y n )
which is called the sample correlation. Example 8.10 (Quantiles.) Let F be strictly increasing with den-
sity f . The T (F ) = F 1 (p) be the pth quantile. The estimate
if T (F ) is Fbn 1 (p). We have to be a bit careful since Fbn is
not invertible. To avoid ambiguity we dene Fbn 1 (p) = inf fx :
Fbn (x) pg. We call Fbn 1 (p) the pth sample quantile. Only in the rst example did we compute a standard error or
a condence interval. How shall we handle the other examples.
When we discuss parametric methods, we will develop formulae
for standard errors and condence intervals. But in our nonparametric setting we need something else. In the next chapter,
we will introduce two methods { the jackknife and the bootstrap
{ for getting standard errors and condence intervals.
Example 8.11 (Plasma Cholesterol.) Figure 8.2 shows histograms
for plasma cholesterol (in mg/dl) for 371 patients with chest
123
pain (Scott et al. 1978). The histograms show the percentage of
patients in 10 bins. The rst histogram is for 51 patients who
had no evidence of heart disease while the second histogram is
for 320 patients who had narrowing of the arteries. Is the mean
cholesterol dierent in the two groups? Let us regard these
data
R
as samples Rfrom two distributions F1 and F2 . Let 1 = xdF1 (x)
and 2 = xdF2 (x) denote the means
of the two populations.
R
The plug-in
estimates are b1 = xdFbn;1 (x) = X n;1 = 195:27
R
and b2 = xdFbn;2 (x) = X n;2 = 216
:19: Recall that the standard
Pn
1
error of the sample mean b = n i=1 Xi is
se (
b) =
v
u n
X
u
tV 1
Xi
n i=1
=
v
u
u
t
n
1 X
V (Xi ) =
n2 i=1
r
n 2
=p
2
n
n
which we estimate by
b (
se
b) =
where
pbn
v
u
n
u1 X
t
b =
(X
n i=1 i
X )2 :
For the two groups this yields seb (b1 ) = 5:0 and seb (b2 ) = 2:4.
Approximate 95 per cent condence intervals for 1 and 2 are
b1 2seb (b1 ) = (185; 205) and b2 2seb (b2 ) = (211; 221).
Now, consider the functional = T (F2 ) T (F1 ) whose plug-in
estimate is b = b2 b1 = 216:19 195:27 = 20:92: The standard
error of b is
se =
p
V (
b2
b1 ) =
p
V (
b2 ) + V (
b1 ) =
p
(se (b1 ))2 + (se (b2 ))2
and we estimate this by
b =
se
p
(seb (b1 ))2 + (seb (b2 ))2 = 5:55:
124
An approximate 95 per cent condence interval for is b 2seb =
(9:8; 32:0). This suggests that cholesterol is higher among those
with narrowed arteries. We should not jump to the conclusion
(from these data) that cholesterol causes heart disease. The leap
from statistical evidence to causation is very subtle and is discussed later in this text. 8.3 Technical Appendix
In this appendix we explain how to construct a condence
band for the cdf .
Theorem 8.12 (Dvoretzky-Kiefer-Wolfowitz (DKW) inequality.) Let
X1 ; : : : ; Xn be iid from F . Then, for any > 0,
P
sup jF (x)
x
j > 2e
Fbn (x)
2n2 :
(8.4)
From the DKW inequality, we can construct a condence set.
Let 2n = log(2=)=(2n), L(x) = maxfFbn (x) n ; 0g and U (x) =
minfFbn (x) + n ; 1g. It follows from (8.4) that for any F ,
P (F
2 Cn) 1 :
Thus, Cn is a nonparametric 1 condence set for F . A better
name for Cn is a condence band. To summarize:
A 1 nonparametric condence band for F is (L(x); U (x)) where
L(x) = maxfFbn (x) n ; 0g
U (x) = minfFbn (x) + n ; 1g
s
1
2
n =
log
:
2n
100
150
200
250
300
125
350
400
plasma cholesterol for patients without heart disease
100
150
200
250
300
350
400
plasma cholesterol for patients with heart disease
FIGURE 8.2. Plasma cholesterol for 51 patients with no heart disease
and 320 patients with narrowing of the arteries.
126
Example 8.13 The dashed lines
8.1 give a 95 per cent
q in Figure
condence band using n =
2
1
2n log :05 = :048.
The Glivenko-Cantelli theorem is the tip of the iceberg. The
theory of distribution functions is a special case of what are
called empirical processes which underlie much of modern statistical theory. Some references on empirical processes are Shorack
and Wellner (1986) and van der Vaart and Wellner (1996).
8.5 Exercises
1. Prove Theorem 8.3.
2. Let X1 ; : : : ; Xn Bernoulli(p) and let Y1 ; : : : ; Ym Bernoulli(q ).
Find the plug-in estimator and estimated standard error
for p. Find an approximate 90 per cent condence interval
for p. Find the plug-in estimator and estimated standard
error for p q . Find an approximate 90 per cent condence
interval for p q .
3. (Computer Experiment.) Generate 100 observations from
a N(0,1) distribution. Compute a 95 per cent condence
band for the cdf F . Repeat this 1000 times and see how
often the condence band contains the true distribution
function. Repeat using data from a Cauchy distribution.
4. Let X1 ; : : : ; Xn F and let Fbn (x) be the empirical distribution function. For a xed x, use the central limit theorem to nd the limiting distribution of Fbn (x).
5. Let x and y be two distinct points. Find Cov(Fbn (x); Fbn (y )).
8.5 Exercises
127
6. Let X1 ; : : : ; Xn F and let Fb be the empirical distribution function. Let a < b be xed numbers and dene
= T (F ) = F (b) F (a). Let b = T (Fbn) = Fbn (b) Fbn (a).
Find the estimated standard error of b. Find an expression
for an approximate 1 condence interval for .
7. Data on the magnitudes of earthquakes near Fiji are available on the course website. Estimate the cdf F (x). Compute and plot a 95 per cent condence envelope for F .
Find an approximate 95 per cent condence interval for
F (4:9) F (4:3).
8. Get the data on eruption times and waiting times between
eruptions of the old faithful geyser from the course website. Estimate the mean waiting time and give a standard
error for the estimate. Also, give a 90 per cent condence
interval for the mean waiting time. Now estimate the median waiting time. In the next chapter we will see how to
get the standard error for the median.
9. 100 people are given a standard antibiotic to treat an infection and another 100 are given a new antibiotic. In the
rst group, 90 people recover; in the second group, 85 people recover. Let p1 be the probability of recovery under
the standard treatment and let p2 be the probability of
recovery under the new treatment. We are interested in
estimating = p1 p2 . Provide an estimate, standard error, an 80 per cent condence interval and a 95 per cent
condence interval for .
10. In 1975, an experiment was conducted to see if cloud seeding produced rainfall. 26 clouds were seeded with silver
nitrate and 26 were not. The decision to seed or not was
made at random. Get the data from
http://lib.stat.cmu.edu/DASL/Stories/CloudSeeding.html
128
Let be the dierence in the median precipitation from
the two groups. Estimate . Estimate the standard error of
the estimate and produce a 95 per cent condence interval.
9
The Bootstrap
The bootstrap is a nonparametric method for estimating standard errors and computing condence intervals. Let
Tn = g (X1 ; : : : ; Xn)
be a statistic, that is, any function of the data. Suppose we
want to know V F (Tn ), the v ariance of Tn . We hav e written V F
to emphasize that the variance usually depends onP
the unknown
n
1
distribution function F . ForRexample, if Tn = n
X then
i=1
R i
2
2
2
V F (Tn ) = =n where = (x ) dF (x) and = xdF (x).
The bootstrap idea has two parts.
First we estimate V F (Tn ) with
Pn
1
V Fbn (Tn ). Thus, for Tn = n
b 2 =n
i=1 Xi we hav e V Fbn (Tn ) = P
complicated
where b2 = n 1 ni=1 (Xi X n ). However, in more
cases we cannot write down a simple formula for V Fbn (Tn ). This
leads us to the second step which is to approximate V Fbn (Tn )
using simulation. Before proceeding, let us discuss the idea of
simulation.
9.1 Simulation
130
9. The Bootstrap
Suppose we draw an iid sample Y1 ; : : : ; YB from a distribution
G. By the law of large numbers,
B
1X
Yj P!
Yn =
B j =1
Z
y dG(y ) = E (Y )
as B ! 1. So if we draw a large sample from G, we can use the
sample mean Y n to approximate E (Y ). In a simulation, we can
make B as large as we like in which case, the dierence between
Y n and E (Y ) is negligible. More generally, if h is any function
with nite mean then
B
1X
h(Y ) P!
B j =1 j
Z
h(y )dG(y ) = E (h(Y ))
as B ! 1. In particular,
2
Z
2
Z
B
B
B
1X
1X
1X
P
2
2
2
(Y Y ) =
Y
Y
! y dF (y)
ydF (y ) = V (Y ):
B j =1 j
B j =1 j
B j =1 j
Hence, we can use the sample variance of the simulated values
to approximate V (Y ).
9.2 Bootstrap Variance Estimation
According to what we just learned, we can approximate V Fbn (Tn )
by simulation. Now V Fbn (Tn ) means \the variance of Tn if the
distribution of the data is Fbn ." How can we simulate from the
distribution of Tn when the data are assumed to have distribution Fbn ? The answer is to simulate X1 ; : : : ; Xn from Fbn and then
compute Tn = g (X1; : : : ; Xn ). This constitutes one draw from
the distribution of Tn . The idea is illustrated in the following
diagram:
Real world F =) X1 ; : : : ; Xn =) Tn = g (X1; : : : ; Xn )
Bootstrap world Fbn =) X1 ; : : : ; Xn =) Tn = g (X1; : : : ; Xn )
9.2 Bootstrap Variance Estimation
131
How do we simulate X1 ; : : : ; Xn from Fbn ? Notice that Fbn puts
mass 1/n at each data point X1 ; : : : ; Xn. Therefore, drawing an
observation from Fbn is equivalent to drawing one point
at random from the original data set. Thu, to simulate
X1 ; : : : ; Xn Fbn it suÆces to draw n observations with replacement from X1 ; : : : ; Xn . Here is a summary:
Boostrap Variance Estimation
1. Draw X1 ; : : : ; Xn Fbn .
2. Compute Tn = g (X1; : : : ; Xn ).
.
3. Repeat steps 1 and 2, B times, to get Tn; 1 ; : : : ; Tn;B
4. Let
B
1X
vboot =
T
B b=1 n;b
B
1X
T
B r=1 n;b
!2
:
(9.1)
Example 9.1 The following pseudo-code shows how to use the
bootstrap to estimate the standard of the median.
132
9. The Bootstrap
Bootstrap for The Median
Given data X = (X(1), ..., X(n)):
T <- median(X)
Tboot <- vector of length B
for(i in 1:N){
Xstar <- sample of size n from X (with replacement)
Tboot[i] <- median(Xstar)
}
se <- sqrt(variance(Tboot))
The following schematic diagram will remind you that we are
using two approximations:
V F (Tn )
not so small
z}|{
V Fbn (Tn )
small
z}|{
vboot :
Example 9.2 Consider the nerve data. Let = T (F ) =
R
(x
)3 dF (x)= 3 be the skewness. The skewness is a measure of
asymmetry. A Normal distribution, for example, has skewness
0. The plug-in estimate of the skewness is
b =
T (Fbn ) =
R
(x
P
)3 dFbn (x) n1 ni=1 (Xi
=
b3
b3
X n )3
= 1:76:
To estimate the standard error with the bootstrap we follow the
same steps as with the medin example except we compute the
skewness from each bootstrap sample. When applied to the nerve
data, the bootstrap, based on B = 1000 replications, yields a
standard error for the estimated skewness of .16.
9.3 Bootstrap Condence Intervals
133
There are several ways to construct bootstrap condence intervals. Here we discuss three methods.
Normal Interval. The simplest is the Normal interval
Tn z=2 seb boot
(9.2)
where seb boot is the bootstrap estimate of the standard error.
This interval is not accurate unless the distribution of Tn is
close to Normal.
Pivotal Intervals. Let = T (F ) and bn = T (Fbn ) and dene the
denote bootstrap replicapivot Rn = bn . Let bn; 1 ; : : : ; bn;B
tions of bn . Let H (r) denote the cdf of the pivot:
H (r) = PF (Rn r):
Dene Cn? = (a; b) where
a = bn H 1 1
and b = bn
2
It follows that
H 1
:
2
(9.4)
= P(a bn bn b bn )
= P(bn b bn bn a)
= P(bn b Rn bn a)
= H (bn a) H (bn b)
H H 1
= H H 1 1
2
2
= 1
= 1 :
2 2
Hence, Cn? is an exact 1 condence interval for . Unfortunately, a and b depend on the unknown distribution H but we
P(a
b)
(9.3)
134
9. The Bootstrap
can form a bootstrap estimate of H :
Hb (r) =
B
1X
r)
I (Rn;b
B b=1
(9.5)
= b
where Rn;b
bn . Let r denote the sample qauntile
n;b
; : : : ; R ) and let denote the sample qauntile of
of (Rn;
1
n;B
). Note that r = bn . It follows that an ap(n; 1 ; : : : ; n;B
proximate 1 condence interval is Cn = (ba; bb) where
b
a
= bn
b
b
= bn
b
Hb 1 1
= n r1 =2 = 2bn
2
Hb 1
= bn r=
2 = 2bn
2
1
=2
:
=
2
In summary:
The 1
bootstrap pivotal condence interval is
Cn =
2bn
b
1 =2 ; 2bn
b=
2
:
Typically, this is a pointwise, asymptotic condence interval.
Theorem 9.3 Under weak conditions on T (F ), PF (T (F ) 2 Cn) !
1 as n ! 1, where Cn is given in (9.6).
Percentile Intervals. The bootstrap percentile interval is
dened by
; Cn = =
2 1
=2
:
The justication for this interval is given in the appendix.
Example 9.4 For estimating the skewness of the nerve data, here
are the various condence intervals.
(9.6)
Method
135
95% Interval
Normal
(1.44, 2.09)
Percentile (1.42, 2.03)
Pivotal
(1.48, 2.11)
All these condence intervals are approximate. The probability that T (F ) is in the interval is not exactly 1 . All three
intervals have the same level of accuracy. There are more accurate bootstrap condence intervals but they are more complicated and we will not discuss them here.
Example 9.5 (The Plasma Cholesterol Data.) Let us return to the
cholesterol data. Suppose we are interested in the dierence of
the medians. Pseudo-code for the bootstrap analysis is as follows:
x1 <- first sample
x2 <- second sample
n1 <- length(x1)
n2 <- length(x2)
th.hat <- median(x2) - median(x1)
B <- 1000
Tboot <- vector of length B
for(i in 1:B){
xx1 <- sample of size n1 with replacement from x1
xx2 <- sample of size n2 with replacement from x2
Tboot[i] <- median(xx2) - median(xx1)
}
se <- sqrt(variance(Tboot))
Normal
<- (th.hat - 2*se, th.hat + 2*se)
percentile <- (quantile(Tboot,.025), quantile(Tboot,.975))
pivotal
<- ( 2*th.hat-quantile(Tboot,.975), 2*th.hat-quantile(Tboot,.025) )
The point estimate is 18.5, the bootstrap standard error is 7.42
and the resulting approximate 95 per cent condence intervals
are as follows:
136
9. The Bootstrap
Method
95% Interval
Normal
(3.7, 33.3)
Percentile (5.0, 33.3)
Pivotal
(5.0, 34.0)
Since these intervals exclude 0, it appears that the second group
has higher cholesterol although there is considerable uncertainty
about how much higher as reected in the width of the intervals.
The next two examples are based on small sample sizes. In
practice, statistical methods based on very small sample sizes
might not be reliable. We include the examples for their pedagogical value but we do want to sound a note of caution about
interpreting the results with some skepticism.
Example 9.6 Here is an example that was one of the rst used
to illustrate the bootstrap by Bradley Efron, the inventor of the
bootstrap. The data are LSAT scores (for entrance to law school)
and GPA.
LSAT
576
651
635
605
558
653
578
575
666
545
580
572
555
594
661
GPA
3.39
3.36
3.30
3.13
2.81
3.12
3.03
2.74
3.44
2.76
3.07
2.88
3.00
3.96
3.43
Each data point is of the form Xi = (Yi; Zi ) where Yi = LSATi
and Zi = GPAi . The law school is interested in the correlation
RR
(y Y )(z Z )dF (y; z )
= qR
:
R
(y Y )2 dF (y ) (z Z )2 dF (z )
The plug-in estimate is the sample correlation
P
(Y Y )(Zi Z )
b
= qP i i
:
P
2
2
(
Y
Y
)
(
Z
Z
)
i i
i i
137
The estimated correlation is b = :776. The bootstrap based on
B = 1000 gives seb = :137. Figure 9.1 shows the data and a histogram of the bootstrap replications b1 ; : : : ; bB . This histogram is
an approximation to the sampling distribution of b. The Normalb = (:51; 1:00)
based 95 per cent condence interval is :78 2se
while the percentile interval is (.46,.96). In large samples, the
two methods will show closer agreement.
Example 9.7 This example is borrowed from An Introduction to
the Bootstrap by B. Efron and R. Tibshirani. When drug companies introduce new medications, they are sometimes requires
to show bioequivalence. This means that the new drug is not
substantially dierent than the current treatment. Here are data
on eight subjects who used medical patches to infuse a hormone
into the blood. Each subject received three treatments: placebo,
old-patch, new-patch.
subject
placebo
old
new
old-placebo
new-old
______________________________________________________________
1
9243
17649
16449
8406
-1200
2
9671
12013
14614
2342
2601
3
11792
19979
17274
8187
-2705
4
13357
21816
23798
8459
1982
5
9055
13850
12560
4795
-1290
6
6290
9806
10157
3516
351
7
12412
17208
16570
4796
-638
8
18806
29044
26325
10238
-2719
Let Z = old placebo and Y = new old. The Food and
Drug Administration (FDA) requirement for bioequivalence is
that jj :20 where
E (Y )
= F :
E F (Z )
9. The Bootstrap
3.4
138
•
•
•
•
3.2
•
3.0
GPA
•
•
•
•
•
•
2.8
•
•
•
•
560
580
600
620
640
660
0
50
100
150
LSAT
0.2
0.4
0.6
Bootstrap Samples
FIGURE 9.1. Law school data.
0.8
1.0
139
The estimate of is
b =
Y
452:3
= :0713:
=
6342
Z
The bootstrap standard error is se
b = :105. To answer the bioequivalence question, we compute a condence interval. From
B = 1000 bootstrap replications we get the 95 per cent interval is (-.24,.15). This is not quite contained in [-.20,.20] so at
the 95 per cent level we have not demonstrated bioequivalence.
Figure 9.2 shows the histogram of the bootstrap values.
The boostrap was invented by Efron (1979). There are several
books on these topics including Efron and Tibshirani (1993),
Davison and Hinkley (1997), Hall (1992), and Shao and Tu
(1995). Also, see section 3.6 of van der Vaart and Wellner (1996).
9.5.1 The Jackknife
There is another method for computing standard errors called
the jackknife, due to Quenouille (1949). It is less computationally expensive than the bootstrap but is less general. Let
Tn = T (X1 ; : : : ; Xn ) be a statistic and T( i) denote
the statistic
Pn
th
1
with the i observation removed. Let T n = n
i=1 T( i) . The
jackknife estimate of var(Tn ) is
n
n 1X
(T
vjack =
n i=1 ( i)
T n )2
and the jackknife estimate of the standard error is se
b jack =
pvjack . Under suitable conditions on T , it can be shown
that
p
vjack consistently estimates var(Tn ) in the sense that vjack =var(Tn ) !
9. The Bootstrap
0
20
40
60
80
140
-0.3
-0.2
-0.1
0.0
Bootstrap Samples
FIGURE 9.2. Patch data.
0.1
0.2
9.6 Excercises
141
1. However, unlike the bootstrap, the jackknife does not produce
consistent estimates of the standard error of sample quantiles.
9.5.2 Justication For The Percentile Interval
Suppose there exists a monotone transformation U = m(T )
such that U N (; c2 ) where = m(). We do not suppose we
).
know the transformation, only that one exist. Let Ub = m(n;b
Let u be the sample quantile of the Ub 's. Since a monotone transformation preserves quantiles, we have that u=2 =
). Also, since U N (; c2 ), the =2 quantile of U is
m(=
2
z=2 c. Hence u=2 = z=2 c. Similarly, u1 =2 = + z=2 c.
Therefore,
P(=
2
1
=2 )
=
=
=
m() m(1 =2 ))
P(u=2 u1 =2 )
P(U cz=2 U + cz=2 )
U P( z=2 z=2 )
c
)
P(m(=
2
=
= 1
:
An exact normalizing transformation will rarely exist but there
may exist approximate normalizing transformations.
9.6 Excercises
1. Consider the data in Example 9.6. Find the plug-in estimate of the correlation coeÆcient. Estimate the standard
error using the bootstrap. Find a 95 per cent condence
inerval using all three methods.
2. (Computer Experiment.) Conduct a simulation to compare
the four bootstrap condence
interval methods. Let n =
R
50 and let T (F ) = (x )3 dF (x)= 3 be the skewness.
Draw Y1 ; : : : ; Xn N (0; 1) and set Xi = eYi , i = 1; : : : ; n.
Construct the four types of bootstrap 95 per cent intervals
for T (F ) from the data X1 ; : : : ; Xn . Repeat this whole
142
9. The Bootstrap
thing many times and estimate the true coverage of the
four intervals.
3. Let
X1 ; : : : ; Xn t3
where n = 25. Let = T (F ) = (q:75 q:25 )=1:34 where qp
denotes the pth quantile. Do a simulation to compare the
coverage and length of the following condence intervals
for : (i) Normal interval with standard error from the
bootstrap, (ii) bootstrap percentile interval.
Remark: The jackknife does not give a consistent estimator of the variance of a quantile.
4. Let X1 ; : : : ; Xn be distinct observations (no ties). Show
that there are
2n 1
n
distinct bootstrap samples.
Hint: Imagine putting n balls into n buckets.
5. Let X1 ; : : : ; Xn be distinct observations (no ties). P
Let X1 ; : : : ; Xn
denote a bootstrap sample and let X n = n 1 ni=1 Xi .
Find: E (X n jX1 ; : : : ; Xn), V (X n jX1 ; : : : ; Xn ), E (X n ) and
V (X n ).
6. (Computer Experiment.) Let X1 ; :::; Xn Normal(; 1): Let
= e and let b = eX be the mle. Create a data set (using
= 5) consisting of n=100 observations.
(a) Use the bootstrap to get the se and 95 percent condence interval for :
(b) Plot a histogram of the bootstrap replications for the
parametric and nonparametric bootstraps. These are estimates of the distribution of b. Compare this to the true
sampling distribution of b.
9.6 Excercises
143
7. Let X1 ; :::; Xn Unif(0; ): The mle is b = Xmax = maxfX1 ; :::; Xng.
Generate a data set of size 50 with = 1:
(a) Find the distribution of b. Compare the true distribution of b to the histograms from the parametric and
nonparametric bootstraps.
(b) This is a case where the nonparametric bootstrap does
very poorly. In fact, we can prove that this is the case.
Show that, for the parametric bootstrap P (b = b) = 0
but for the nonparametric bootstrap P (b = b) :632.
Hint: show that, P (b = b) = 1 (1 (1=n))n then take
the limit as n gets large.
2
R
8. Let Tn =PX n , = E (X1 ), k = jx
bk = n 1 ni=1 jXi X n jk . Show that
jk dF (x) and
2
4X n b2 4X n b3 b4
vboot =
+
+ 3:
n
n2
n
144
9. The Bootstrap
10
Parametric Inference
We now turn our attention to parametric models, that is,
models of the form
n
F = f (x; ) : 2 o
(10.1)
where the R k is the parameter space and = (1 ; : : : ; k )
is the parameter. The problem of inference then reduces to the
problem of estimating the parameter .
Students learning statistics often ask: how would we ever
know that the distribution that generated the data is in some
parametric model? This is an excellent question. Indeed, we
would rarely hav e such knowledge which is why nonparametric
methods are preferable. Still, studying methods for parametric
models is useful for two reasons. First, there are some cases
where background knowledge suggests that a parametric model
provides a reasonable approximation. F or example, counts of
traÆc accidents are known from prior experience to follow approximately a P oissonmodel. Second, the inferential concepts
146
10. Parametric Inference
for parametric models provide background for understanding
certain nonparametric methods.
We will begin with a brief dicussion about parameters of interest and nuisance parameters in the next section, then we will
discuss two methods for estimating , the method of moments
and the method of maximum likelihood.
10.1 Parameter of Interest
Often, we are only interested in some function T (). For example, if X N (; 2 ) then the parameter is = (; ). If our
goal is to estimate then = T () is called the parameter
of interest and is called a nuisance parameter. The parameter of interest can be a complicated function of as in the
following example.
Example 10.1 Let X1 ; : : : ; Xn Normal(; 2). The parameter
is = (; ) is = f(; ) : 2 R ; > 0g. Suppose that Xi
is the outcome of a blood test and suppose we are interested in
, the fraction of the population whose test score is larger than
1. Let Z denote a standard Normal random variable. Then
=
P(X
= 1
> 1) = 1
P(X
P
X
1 1 Z<
=1 Z<
:
P
< 1) = 1
1 <
The parameter of interest is = T (; ) = 1 ((1 )= ). Example 10.2 Recall that X has a Gamma(; ) distribution if
f (x; ; ) =
1
x 1 e
()
x= ;
where ; > 0 and
() =
Z 1
0
y 1 e y dy
x>0
10.2 The Method of Moments
147
is the Gamma function. The parameter is = (; ). The Gamma
distribution is sometimes used to model lifetimes of people, animals, and electronic equipment. Suppose we want to estimate
the mean lifetime. Then T (; ) = E (X1 ) = . 10.2 The Method of Moments
The rst method for generating parametric estimators that
we will study is called the method of moments. We will see
that these estimators are not optimal but they are often easy to
compute. They are are also useful as starting values for other
methods that require iterative numerical routines.
Suppose that the parameter = (1 ; : : : ; k ) has k components. For 1 j k, dene the j th moment
j j () =
E (X j )
=
Z
xj dF (x)
(10.2)
and the j th sample moment
n
1X
Xij :
bj =
n i=1
(10.3)
Denition 10.3 The method of moments estimator bn
is dened to be the value of such that
1 (bn )
2 (bn )
..
.
k (bn )
b1
b2
..
.
= bk :
=
=
..
.
(10.4)
Formula (10.4) denes a system of k equations with k unknowns.
148
Example 10.4 Let X1 ; : : : ; Xn Bernoulli(p). Then 1 = E p (X ) =
P
p and b1 = n 1
n
i=1 Xi .
By equating these we get the estimator
n
1X
pbn =
X:
n i=1 i
Example 10.5 Let X1 ; : : : ; Xn Normal(; 2 ). Then, 1 = E (X1 ) =
Recall that V (X ) = and 2 = E (X12 ) = V (X1 ) + (E (X1 ))2 = 2 + 2 . We need
E (X 2 )
(E (X ))2 . to solve the equations
Hence, E (X 2 ) =
n
1X
V (X ) + (E (X ))2 .
b =
X
n i=1 i
n
1X
2
2
Xi2 :
b + b =
n i=1
This is a system of 2 equations with 2 unknowns. The solution
is
b = X n
n
1X
2
(X
b =
n i=1 i
X n )2 :
Theorem 10.6 Let bn denote the method of moments estimator.
Under the the conditions given in the appendix, the following
statements hold:
1. The estimate bn exists with probability tending to 1.
2. The estimate is consistent: bn P! .
3. The estimate is asymptotically Normal:
pn(b )
n
where
N (0; )
= g E (Y Y T )g T ;
Y = (X; X 2 ; : : : ; X k )T , g = (g1 ; : : : ; gk ) and gj = @j 1 ()=@.
10.3 Maximum Likelihood
149
The last statement in the Theorem above can be used to nd
standard errors and condence intervals. However, there is an
easier way: the bootstrap. We defer discussion of this until the
end of the chapter.
The most common method for estimating parameters in a
parametric model is the maximum likelihood method. Let
X1 ; : : : ; Xn be iid with pdf f (x; ).
Denition 10.7 The likelihood function is dened by
Ln() =
n
Y
i=1
f (Xi ; ):
(10.5)
The log-likelihood function is dened by `n () = log Ln ().
The likelihood function is just the joint density of the data,
except that we treat it is a function of the parameter .
Thus Ln : ! [0; 1). The likelihood function is not a density
function: in general, it is not true that Ln() integrates to 1.
Denition 10.8 The maximum likelihood estimator
, denoted by bn , is the value of that maximizes
Ln().
mle
The maximum of `n () occurs at the same place as the maximum of Ln (), so maximizing the log-likelihood leads to the
same answer as maximizing the likelihood. Often, it is easier to
work with the log-likelihood.
Remark 10.9 If we multiply Ln() by any positive constant c (not
depending on ) then this will not change the
mle
. Hence, we
0.4
0.6
Ln(p)
0.8
1.0
150
0.0
0.2
pbn
0.0
0.2
P
p
0.4
0.6
0.8
1.0
FIGURE 10.1. Likelihood function for Bernoulli with n = 20 and
n X = 12. The mle is p = 12=20 = 0:6.
n
i=1 i
b
shall often be sloppy about dropping constants in the likelihood
function.
Example 10.10 Suppose that X1 ; : : : ; Xn Bernoulli(p). The prob-
ability function is f (x; p) = px(1
known parameter is p. Then,
Ln(p) =
where S =
n
Y
f (Xi ; p) =
i=1
P
i Xi .
n
Y
i=1
p)1
x
pXi (1 p)1
for x = 0; 1. The unXi
= pS (1 p)n
S
Hence,
`n (p) = S log p + (n S ) log(1 p):
Take the derivative of `n (p), set it equal to 0 to nd that the
mle is p
bn = S=n. See Figure 10.1. Example 10.11 Let X1 ; : : : ; Xn
N (; 2 ).
The parameter is
= (; ) and the likelihood function is (ignoring some constants)
Ln(; )
=
Y
i
1
exp
1
(X
2 2 i
)2
(
151
)
1 X
= (Xi )2
2
2 i
nS 2
n(X )2
n
= exp
exp
2 2
2 2
P
P
where X = n 1 i Xi is the sample mean and S 2 = n 1P i (Xi
X )2 . The last equality above follows from the fact that Pi (Xi
2 + n(X )2 which can be veried by writing (Xi
)2 = nS
i
P
2
2
) = i (Xi X + X ) and then expanding the square. The
log-likelihood is
nS 2 n(X )2
`(; ) = n log :
2 2
2 2
Solving the equations
@`(; )
@`(; )
= 0 and
=0
@
@
we conclude that b = X and b = S . It can be veried that these
are indeed global maxima of the likelihood. n exp
Example 10.12 (A Hard Example.) Here is the example that con-
fuses everyone. Let X1 ; : : : ; Xn Unif (0; ). Recall that
1
x
f (x; ) = 0 0otherwise
:
Consider a xed value of . Suppose
< Xi for some i. Then,
Q
f (Xi ; ) = 0 and hence Ln () = i f (Xi ; ) = 0. It follows that
Ln() = 0 if any Xi > . Therefore, Ln() = 0 if < X(n)
where X(n) = maxfX1 ; : : : ; Xn g. Now consider any X(n) .
For
every Xi we then have that f (Xi; ) = 1= so that Ln () =
Q
n
i f (Xi ; ) = . In conclusion,
Ln() =
1 n X(n)
0
< X(n) :
See Figure 10.2. Now Ln() is strictly decreasing over the interval [X(n) ; 1). Hence, bn = X(n) . 0.5
0.0
0.0
0.5
1.0
x
1.5
2.0
0.0
0.5
x
1.0
=1
1.5
2.0
1.0
0.0
0.0
0.5
0.5
1.0
Ln()
1.5
1.5
= :75
f (x; )
1.0
f (x ; )
1.0
0.0
0.5
f (x ; )
1.5
1.5
152
0.0
0.5
1.0
= 1:25
x
1.5
2.0
0.6
0.8
1.0
1.2
1.4
FIGURE 10.2. Likelihood function for Uniform (0; ). The
vertical lines show the observed data. The rst three
plots show f (x; ) for three dierent values of . When
< X(n) = maxfX1 ; : : : ; Xn g, as in the rst plot, f (X(n) ; ) = 0
and hence Ln () = ni=1 f (Xi ; ) = 0. Otherwise f (Xi ; ) = 1=
for each i and hence Ln () = ni=1 f (Xi ; ) = (1=)n . The last plot
shows the likelihood function.
Q
Q
10.4 Properties of Maximum Likelihood Estimators.
153
10.4 Properties of Maximum Likelihood
Estimators.
Under certain conditions on the model, the maximum likelihood estimator bn possesses many properties that make it an appealing choice of estimator. The main properties of the mle are:
(1) It is consistent: bn P! ? where ? denotes the true value
of the parameter ;
(2) It is equivariant: if bn is the mle of then g (bn ) is the
mle of g ( );
p
(3) It is asymptotically Normal: n(b ? )=seb N (0; 1)
where seb can be computed analytically;
(4) It is asymptotically optimal or eÆcient: roughly, this
means that among all well behaved estimators, the mle has the
smallest variance, at least for large samples.
(5) The mle is approximately the Bayes estimator. (To be
explained later.)
We will spend some time explaining what these properties
mean and why they are good things. In suÆciently complicated
problems, these properties will no longer hold and the mle will
no longer be a good estimator. For now we focus on the simpler
situations where the mle works well. The properties we discuss
only hold if the model satises certain regularity conditions.
These are essentially smoothness conditions on f (x; ). Unless
otherwise stated, we shall tacitly assume that these conditions hold.
10.5 Consistency of Maximum Likelihood
Estimators.
Consistency means that the mle converges in probability to
the true value. To proceed, we need a denition. If f and g are
154
's, dene the Kullback-Leibler distance between f and
g to be
Z
f (x)
dx:
(10.6)
D(f; g ) = f (x) log
g (x)
This is not a dis- It can be shown that D(f; g ) 0 and D(f; f ) = 0. For any
tance in the formal ; 2 write D(; ) to mean D(f (x; ); f (x; )). We will
sense.
assume that 6= implies that D(; ) > 0.
Let ? denote the true value of . Maximizing `n () is equivalent to maximizing
pdf
Mn () =
f (Xi ; )
1X
log
:
n i
f (Xi ; ? )
By the law of large numbers, Mn () converges to
E ?
f (Xi ; )
log
f (Xi ; ? )
=
=
=
Z
f (x; )
log
f (x; ? )dx
f (x; ? )
Z
f (x; ? )
log
f (x; ? )dx
f (x; )
D(? ; ):
Hence, Mn () D(? ; ) which is maximized at ? since D(? ; ? ) =
0 and D(? ; ? ) < 0 for 6= ? . Hence, we expect that the maximizer will tend to ? . To prove this formally, we need more than
Mn () P! D(? ; ). We need this convergence to be uniform
over . We also have to make sure that the function D(? ; ) is
well behaved. Here are the formal details.
Theorem 10.13 Let ? denote the true value of . Dene
Mn () =
f (Xi ; )
1X
log
n i
f (Xi ; ? )
and M () = D(? ; ). Suppose that
sup jMn () M ()j P! 0
2
(10.7)
10.6 Equivariance of the mle
155
and that, for every > 0,
sup M () < M (? ):
:j ?j
(10.8)
Let bn denote the mle. Then bn P! ? .
. See appendix. Proof
10.6 Equivariance of the mle
Theorem 10.14 Let = g () be a one-to-one function of . Let
bn be the mle of . Then bn = g (bn) is the mle of .
Let h = gQ1 denote the inverse
of g . Then bn = h(bn ).
Q
For any , L( ) = i f (xi ; h( )) = i f (xi ; ) = L() where
= h( ). Hence, for ana , Ln ( ) = L() L(b) = Ln (b). Proof.
Example 10.15 Let X1 ; : : : ; Xn N (; 1). The mle for is bn =
X n . Let = e . Then, the mle for is b = eb = eX . 10.7 Asymptotic Normality
It turns out that bn is approximately Normal and we can
compute its variance analytically. To explore this, we rst need
a few denitions.
156
Denition 10.16 The score function is dened to be
@ log f (X ; )
:
@
The Fisher information is dened to be
s(X ; ) =
In () =
=
V
n
X
i=1
n
X
i=1
(10.9)
!
s(Xi ; )
V (s(Xi ; )) :
(10.10)
For n = 1 we will sometimes write I () instead of I1 ().
It can be shown that E (s(X ; )) = 0. It then follows that
V (s(X ; )) = E (s2 (X ; )). In fact a further simplication of
In () is given in the next result.
Theorem 10.17 In () = nI () and
I () =
=
2
@ log f (X ; )
E
@ 2 2
Z 2
@ log f (x; )
@ 2 2
f (x; )dx:
(10.11)
10.7 Asymptotic Normality
Theorem 10.18 (Asymptotic Normality of the
priate regularity conditions, the following hold:
1. Let se =
mle
.) Under appro-
p
1=In (). Then,
(bn
se
2. Let seb =
157
q
)
N (0; 1):
(10.12)
N (0; 1):
(10.13)
1=In (bn ). Then,
(bn
b
se
)
The proof is in the appendix. The rst statementpsays that
N (; se ) where the standard error of bn is se = 1=In().
The second statement says that this is still true even if we replace
the standard error by its estimated standard error seb =
q
1=In(bn ).
Informally, the theorem says that the distribution of the mle can
2
be approximated with N (; se
b ). From this fact we can construct
an (asymptotic) condence interval.
bn
Theorem 10.19 Let
Cn = bn
Then, P ( 2 Cn ) ! 1
Proof.
Then,
P (
z=2 seb ; bn + z=2 seb :
as n ! 1.
Let Z denote a standard normal random variable.
2 Cn)
=
=
P bn
P
bn + z=2 seb
z=2 seb
z=2 bn
b
se
z=2
!
158
!
P(
z=2 < Z < z=2 ) = 1 :
For = :05, z=2 = 1:96 2, so:
bn 2 seb
is an approximate 95 per cent condence interval.
When you read an opinion poll in the newspaper, you often
see a statement like: the poll is accurate to within one point,
95 per cent of the time. They are simply giving a 95 per cent
condence interval of the form bn 2seb .
Example 10.20 Let X1 ; : : : ; Xn
P
Bernoulli(p). The mle is pbn =
and f (x; p) =
p)1 x , log f (x; p) = x log p + (1
x) log(1 p), s(X ; p) = (x=p) (1 x)=(1 p), and s0 (X ; p) =
(x=p2 ) + (1 x)=(1 p)2 . Thus,
i Xi =n
px (1
p (1 p)
1
I (p) = E p ( s0 (X ; p)) = 2 +
=
:
2
p (1 p)
p(1 p)
Hence,
1
b = p
se
=
In (pbn )
1
pb(1 pb) 1=2
p
=
:
n
nI (pbn )
An approximate 95 per cent condence interval is
pb(1 pb) 1=2
pbn 2
:
n
Compare this with the Hoeding interval. Example 10.21 Let X1 ; : : : ; Xn
N (; 2 ) where 2
is known.
The score function is s(X ; ) = (X )= 2 and s0 (X ; ) =
1= 2 so that I1 () = 1= 2 . The mle is bn = X n . According
to Theorem 10.18, X n N (; 2 =n). In fact, in this case, the
distribution is exact. (10.14)
10.8 Optimality
159
Poisson(). Then bn = X n
and some calculations show that I1 () = 1=, so
Example 10.22 Let X1 ; : : : ; Xn
s
1
b = q
se
nI1 (bn )
Therefore,qan approximate 1
bn z=2 bn =n. =
bn
:
n
condence interval for is
10.8 Optimality
Suppose that X1 ; : : : ; Xn N (; 2 ). The mle is bn = X n .
Another reasonable estimator is the sample median en . The
mle satises
pn(b ) N (0; 2 ):
n
It can be proved that the median satises
pn(e ) N 0; 2 :
n
2
This means that the median converges to the right value but
has a larger variance than the mle .
More generally, consider two estimators Tn and Un and suppose that
pn(T ) N (0; t2 )
n
and that
pn(U
) N (0; u2 ):
We dene the asymptotic relative eÆciency of U to T by ARE (U; T ) =
t2 =u2 . In the Normal example, ARE (en ; bn ) = 2= = :63. The
interpretation is that if you use the median, you are only using
about 63 per cent of the available data.
Theorem 10.23 If bn is the mle and en is any other estimator
then
The result is acn
tually more subtle
than this but we
needn't worry about
the details.
160
ARE (en ; bn ) 1:
Thus, the mle has the smallest (asymptotic) variance and we
say that mle is eÆcient or asymptotically optimal.
This result is predicated upon the assumed model being correct. If the model is wrong, the mle may no longer be optimal.
We will discuss optimality in more generality when we discuss
decision theory.
10.9 The Delta Method.
Let = g () where g is a smooth function. The maximum
likelihood estimator of is b = g (b). Now we address the following question: what is the distribution of b?
Theorem 10.24 (The Delta Method) If = g () where g is dier-
entiable and g 0() 6= 0 then
pn(b
n
b (b)
se
)
N (0; 1)
(10.15)
b (bn ) = jg 0 (b)j se
b (bn )
se
(10.16)
where bn = g (bn ) and
Hence, if
then P (
Cn = bn
z=2 seb (bn ); bn + z=2 seb (bn )
2 Cn) ! 1 as n ! 1.
Example 10.25 Let X1 ; : : : ; Xn Bernoulli(p) and let
= g (p) =
log(p=(1 p)). The Fisher information function is I (p) = 1=(p(1
p)) so the standard error of the mle pbn is se = fpbn (1 pbn )=ng1=2 .
(10.17)
10.9 The Delta Method.
161
The mle of is b = log pb=(1 pb). Since, g 0(p) = 1=(p=(1 p)),
according to the delta method
1
b ( bn ) = jg 0 (pbn )jse
b (pbn ) = p
:
se
npbn (1 pbn )
An approximate 95 per cent condence interval is
2
bn p
: npbn (1 pbn )
Example 10.26 Let X1 ; : : : ; Xn
N (; 2).
Suppose that is
known, is unknown and that we want to P
estimate = log .
1
The log-likelihood is `( ) = n log 22 i (xi )2 . Dierentiate and set equal to 0 and conclude that
b =
P
i (Xi
n
)2 1=2
:
To get the standard error we need the Fisher information. First,
log f (X ; ) = log with second derivative
1
2
and hence
(X )2
2 2
3(X )2
4
1 3 2 2
+
= :
2 4 2
p
Hence, seb = bn = 2n: Let = g ( ) = log( ). Then,
log bn . Since, g 0 = 1= ,
I ( ) =
b ( bn ) =
se
bn
=
bn
1 b
1
p
=p
b 2n
2n
andp an approximate 95 per cent condence interval is
2= 2n. 162
10.10 Multiparameter Models
These ideas can directly be extended to models with several
parameters. LetP
= (1 ; : : : ; k ) and let b = (b1 ; : : : ; bk ) be the
n
mle . Let `n =
i=1 log f (Xi ; ),
Hjj =
@ 2 `n
@ 2 `n
and
H
=
:
jk
@j2
@j @k
Dene the Fisher Information Matrix by
2
In () =
E (H11 ) E (H12 )
6 E (H ) E (H )
22
6 21
6
.
..
.
4
.
.
E (Hk1 ) E (Hk2 )
..
.
3
E (H1k )
E (H2k ) 7
7
7:
..
5
.
E (Hkk )
(10.18)
Let Jn () = In 1 () be the inverse of In .
Theorem 10.27 Under appropriate regularity conditions,
p
n(b ) N (0; Jn ):
Also, if bj is the j th component of b, then
pn(b )
j
j
bj
se
N (0; 1)
where seb 2j = Jn (j; j ) is the j th diagonal element of Jn . The approximate covariance of bj and bk is Cov(bj ; bk ) Jn (j; k).
There is also a multiparameter delta method. Let = g (1 ; : : : ; k )
be a function and let
0
1
@g
B @ C
B
C
B .. C
rg = B
C
.
B
C
@ @g A
@k
1
be the gradient of g .
10.11 The Parametric Bootstrap
163
Theorem 10.28 (Multiparameter delta method) Suppose that
rg evaluated at b is not 0. Let b = g(b). Then
pn(b )
N (0; 1)
b (b)
se
where
b (b) =
se
q
b g )T Jbn (r
b g );
(r
(10.19)
b g is rg evaluated at = b.
Jbn = Jn (bn ) and r
Example 10.29 Let X1 ; : : : ; Xn
N (; 2 ). Let =. In homework question 8 you will show that
n
In (; ) = 2
= g (; ) =
0
0 2n2 :
Hence,
2 0
2
n 0 2
1
Jn = In 1 (; ) =
The gradient of g is
rg =
!
2
1
:
:
Thus,
b (b) =
se
q
r
r
( b g )T Jbn ( b g ) =
1 1
b2 1=2
pn b4 + 2b2 :
10.11 The Parametric Bootstrap
For parametric models, standard errors and condence intervals may also be estimated using the bootstrap. There is
only one change. In the nonparametric bootstrap, we sampled
164
X1 ; : : : ; Xn from the empirical distribution Fbn . In the parametric bootstrap we sample instead from f (x; bn ). Here, bn could
be the mle or the method of moments estimator.
Example 10.30 Consider example 10.29. To get to bootstrap standardPerror, simulate X1 ; : P
: : ; Xn N (b; b2 ), compute b =
n 1 i Xi and b2 = n 1 i (Xi b )2 . Then compute b =
g (b; b ) = b =b . Repeating this B times yields bootstrap replications
b1 ; : : : ; bB
and the estimated standard error is
s
b boot =
se
PB
b=1 (bb
B
b)2
:
The bootstrap is much easier than the delta method. On the
other hand, the delta method has the advantage that it gives a
closed form expression for the standard error.
10.12.1 Proofs
Proof of Theorem 10.13.
have Mn (bn ) Mn (? ). Hence,
M (? ) M (bn )
=
P
!
Since bn maximizes Mn (), we
Mn (? ) M (bn ) + M (? ) Mn (? )
Mn (b) M (bn ) + M (? ) Mn (? )
sup jMn () M ()j + M (? ) Mn (? )
0
where the last line follows from (10.7). It follows that, for any
Æ > 0,
P M (bn ) < M (? ) Æ ! 0:
165
Pick any > 0. By (10.8), there exists Æ > 0 such that j ? j implies that M () < M (? ) Æ . Hence,
j
? j > ) P M (bn ) < M (? ) Æ
P( bn
! 0: Next we want to prove Theorem 10.18. First we need a Lemma.
Lemma 10.31 The score function satises
E [s(X ; )] = 0:
R
Note that 1 = f (x; )dx. Dierentiate both sides
of this equation to conclude that
Proof.
@
0 =
@
=
=
Z
Z
Z
Z
@
f (x; )dx
@
Z
@f (x;)
@ log f (x; )
@ f (x; )dx =
f (x; )dx
f ( x; )
@
f (x; )dx =
s(x; )f (x; )dx = E s(X ; ):
Let `() = log L(). Then,
0 = `0 (b) `0 () + (b )`00 ():
Rearrange the above equation to get b = `0 ()=`00 () or, in
other words,
pn(b ) =
p1n `0 ()
1 `00 ()
n
TOP
BOTTOM
:
Let Yi = @ log f (Xi ; )=@. Recall that E (Yi ) = 0 from the previous Lemma and also V (Yi ) = I (). Hence,
X
p
p
TOP = n 1=2 Yi = nY = n(Y
i
0)
W
N (0; I )
166
@ 2 log f (Xi; )=@2 .
by the central limit theorem. Let Ai =
Then E (Ai ) = I () and
BOTTOM = A P! I ()
by the law of large numbers. Apply Theorem 6.5 part (e), to
conclude that
p b
W d
1
n( )
= N 0;
:
I ()
I ()
Assuming that I () is a continuous function of , it follows that
I (bn ) P! I (). Now
bn =
se
b
=
pnI 1=2 (b )(b
n
n
n
)
(
pnI 1=2 ()(b )o I (bn )
n
I ()
) 1 =2
:
The rst terms tends in distribution to N(0,1). The second term
tends in probability to 1. The result follows from Theorem 6.5
part (e). Outline of Proof of Theorem 10.24. Write,
b = g (b) g () + (b )g 0 () = + (b )g 0():
Thus,
and hence
pn(b ) pn(b )g0()
p
nI ()(b )
g 0 ( )
p
nI ()(b ):
Theorem 10.18 tells us that the right hand side tends in distribution to a N(0,1). Hence,
p
nI ()(b )
g 0()
N (0; 1)
or, in other words,
where
167
b N ; se 2 (bn )
(g 0())2
:
nI ()
The result remains true if we substitute b for by Theorem 6.5
part (e). se 2 (bn ) =
10.12.2 SuÆciency
A statistic is a function T (X n) of the data. A suÆcient statistic is a statistic that contains all the information in the data.
To make this more formal, we need some dentions.
Denition 10.32 Write xn
$ yn if f (xn; ) = c f (yn; )
for some constant c that might depend on xn and y n but
not . A statistic T (xn ) is suÆcient if T (xn ) $ T (y n)
implies that xn $ y n.
Notice that if xn $ y n then the likelihood function based on
xn has the same shape as the likelihood function based on y n.
Roughly speaking, a statistic is suÆcient if we can calculate the
likelihood function knowing only T (X n ).
Example 10.33 Let X1 ; : : : ; Xn
Example 10.34 Let X1 ; : : : ; Xn
N (; ) and let T
pS (1
p)n S
where S =
Then,
f (X n; ; ) =
Y
i
Y
P
Bernoulli(p). Then L(p) =
i Xi so S is suÆcient. = (X; S ).
f (Xi ; ; )
(
p1 exp
=
i 2
n
1
p
=
exp
2
)
1 X
(Xi )2
2
2 i
nS 2
n(X )2
exp
2 2
2 2
168
The last expression depends on the data only through T and
therefore, T = (X; S ) is a suÆcient statistic. Note that U =
(17 X; S ) is also a suÆcient statistic. If I tell you the value of U
then you can easily gure out T and then compute the likelihood.
SuÆcient statistics are far from unique. Consider the following
statistics for the N (; 2 ) model:
T1 (X n )
T2 (X n )
T3 (X n )
T4 (X n )
=
=
=
=
(X1 ; : : : ; Xn)
(X; S )
X
(X; S; X3 ):
The rst statistic is just the whole data set. This is suÆcient.
The second is also suÆcient as we proved above. The third is not
suÆcient: you can't compute L(; ) if I only tell you X . The
fourth statistic T4 is suÆcient. The statistics T1 and T4 are suÆcient but they contain redundant information. Intuitively, there
is a sense in which T2 is a \more concise" suÆcient statistic
than either T1 or T4 . We can express this formally by noting
that T2 is a function of T1 and similarly, T2 is a function of T4 .
For example, T2 = g (T4 ) where g (a1 ; a2 ; a3 ) = (a1 ; a2 ). Denition 10.35 A statistic T is minimally suÆcient if
(i) it is suÆcient and (ii) it is a function of every other
suÆcient statistic.
Theorem 10.36 T is minimal suÆcient if T (xn ) = T (y n) if and
only if xn $ y n .
A statistic induces a partition on the set if outcomes. We can
think of suÆciency in terms of these partitions.
Example 10.37 Let X1 ; X2 ; X3
P
169
Bernoulli().
Let V = X1 ,
T = i Xi and U = (T; X1 ). Here is the set of outcomes and
the statistics:
X1 X2 V T U
0 0 0 0 (0,0)
0 1 0 1 (1,0)
1 0 1 1 (1,1)
1 1 1 2 (2,1)
The partitions induced by these statistics are:
V
T
U
! f(0; 0); (0; 1)g; f(1; 0); (1; 1)g
! f(0; 0)g; f(0; 1); (1; 0)g; f(1; 1)g
! f(0; 0)g; f(0; 1)g; f(1; 0)g; f(1; 1)g:
Then V is not suÆcient but T and U are suÆcient. T is minimal suÆcient; U is not minimal since if xn = (1; 0; 1) and
y n = (0; 1; 1), then xn $ y n yet U (xn ) 6= U (y n ). The statistic
W = 17 T generates the same partition as T . It is also minimal
suÆcient. Example 10.38 For a N (; 2 ) model, T = (X; S ) is a minimally
P
suÆcient statistic. For the Bernoulli model, T = i XP
i is a
minimally suÆcient statistic. For the Poisson model, TP= i Xi
is a minimally suÆcient statistic. Check that T = ( i Xi ; X1 )
is suÆcient but not minimal suÆcient. Check that T = X1 is
not suÆcient. I did not give the usual denition of suÆciency. The usual
denition is this: T is suÆcient if the distribution of X n given
T (X n ) = t does not depend on .
Example 10.39 Two coin ips. Let X = (X1 ; X2 ) Bernoulli(p).
Then T = X1 + X2 is suÆcient. To see this, we need the distribution of (X1 ; X2 ) given T = t. Since T can take 3 possible
values, there are 3 conditional distributions to check. They are:
170
(i) the distribution of (X1 ; X2 ) given T = 0:
P (X1 = 0; X2 = 0jt = 0) = 1; P (X1 = 0; X2 = 1jt = 0) = 0;
P (X1 = 1; X2 = 0jt = 0) = 0; P (X1 = 1; X2 = 1jt = 0) = 0
(ii) the distribution of (X1 ; X2 ) given T = 1:
1
P (X1 = 0; X2 = 0jt = 1) = 0; P (X1 = 0; X2 = 1jt = 1) = ;
2
1
P (X1 = 1; X2 = 0jt = 1) = ; P (X1 = 1; X2 = 1jt = 1) = 0
2
(iii) the distribution of (X1 ; X2 ) given T = 2:
P (X1 = 0; X2 = 0jt = 2) = 0; P (X1 = 0; X2 = 1jt = 2) = 0;
P (X1 = 1; X2 = 0jt = 2) = 0; P (X1 = 1; X2 = 1jt = 2) = 1:
None of these depend on the parameter p. Thus, the distribution
of X1 ; X2 jT does not depend on so T is suÆcient. Theorem 10.40 (Factorization Theorem) T is suÆcient if and
only if there are functions g (t; ) and h(x) such that f (xn ; ) =
g (t(xn ); )h(xn ).
Example 10.41 Return to the two coin ips. Let t = x1 + x2 .
Then
f (x1 ; x2 ; ) = f (x1 ; )f (x2 ; )
= x1 (1 )1 x1 x2 (1 )1
= g (t; )h(x1 ; x2 )
where g (t; ) = t (1 )2
X1 + X2 is suÆcient. t
x2
and h(x1 ; x2 ) = 1. Therefore, T =
Now we discuss an implication of suÆciency in point estimation. Let b be an estimator of . The Rao-Blackwell theorem says
that an estimator should only depend on the suÆcient statistic,
otherwise it can be improved. Let R(; b) = E [( b)2 ] denote
the mse of the estimator.
171
Theorem 10.42 (Rao-Blackwell) Let b be an estimator and
let T be a suÆcient statistic. Dene a new estimator by
e = E (bjT ):
Then, for every , R(; e) R(; b):
Example 10.43 Consider ipping a coin twice. Let b = X1 . This
is a well dened (and unbiased) estimator. But it is not a function of the suÆcient statistic T = X1 + X2 . However, note
that e = E (X1 jT ) = (X1 + X2 )=2. By the Rao-Blackwell Theorem, e has MSE at least as small as b = X1 . The P
same applies
b
with n coin ips. Again
dene = X1 and T = i Xi . Then
P
1
e
= E (X1 jT ) = n
i Xi has improved mse . 10.12.3 Exponential Families
Most of the parametric models we have studied so far are special cases of a general class of models called exponential families.
We say that ff (x; ; 2 g is a one-parameter exponential
family if there are functions (), B (), T (x) and h(x) such
that
f (x; ) = h(x)e()T (x) B() :
It is easy to see that T (X ) is suÆcient. We call T the natural
suÆcient statistic.
Example 10.44 Let X Poisson(). Then
x e 1 x log = e
x!
x!
and hence, this is an exponential family with () = log , B () =
, T (x) = x, h(x) = 1=x!. f (x; ) =
Example 10.45 Let X
Bin(n; ). Then
n x
f (x; ) =
(1 )n
x
x
n
=
exp x log
+ n log(1 ) :
x
1 172
In this case,
() = log
and
1 ; B () = n log()
T (x) = x; h(x) =
n
:
x
We can rewrite an exponential family as
f (x; ) = h(x)eT (x) A()
where = () is called the natural parameter and
A( ) = log
Z
h(x)eT (x) dx:
For example a Poisson can be written as f (x; ) = ex e =x!
where the natural parameter is = log .
Let X1 ; : : : ; Xn be iid from a exponential family. Then f (xn ; )
is an exponential family:
n
f (xn ; ) = hn (xn )hn (xn )e()Tn (x )
Q
Bn ()
P
where hn (xn ) = i h(xi P
), Tn (xn ) = i T (xi ) and Bn () =
nB (). This implies that i T (Xi ) is suÆcient.
Example 10.46 Let X1 ; : : : ; Xn Uniform(0; ). Then
f (xn ; ) =
1
I (x )
n (n)
where I is 1 if the term inside the brackets is true and 0 otherwise, and x(n) = maxfx1 ; : : : ; xn gP
. Thus T (X n ) = maxfX1 ; : : : ; Xn g
is suÆcient. But since T (X n ) 6= i T (Xi ), this cannot be an exponential family. 10.12 Technical Appendix
173
Theorem 10.47 Let X have density in an exponential family.
Then,
E (T (X )) = A0 ( ); V (T (X )) = A00 ( ):
If = (1 ; : : : ; k ) is a vector, then we say that f (x; ) has
exponential family form if
f (x; ) = h(x) exp
( k
X
j =1
)
j ()Tj (x) B () :
Again, T = (T1 ; : : : ; Tk ) is suÆcient and P
n iid samples also
has
P
exponential form with suÆcient statistic ( i T1 (Xi ); : : : ; i Tk (Xi )).
Example 10.48 Consider the normal family with = (; ). Now,
f (x; ) = exp 2 x
x2
2 2
1 2
+ log(2 2 )
2 2
:
This is exponential with
1 () = 2 ; T1 (x) = x
1
; T (x) = x2
2 () =
2 2 2
1 2
2
B () =
+ log(2 ) ; h(x) = 1:
2 2
P
P
Hence, with n iid samples, ( i Xi ; i Xi2 ) is suÆcient. As before we can write an exponential family as
f (x; ) = h(x) exp T T (x)
A( )
R
where A( ) = h(x)eT T (x) dx. It can be shown that
E (T (X )) = A_ ( ) V (T (X )) = A( )
where the rst expression is the vector of partial derivatives and
the second is the matrix of second derivatives.
174
10.13 Exercises
1. Let X1 ; : : : ; Xn Gamma(; ). Find the method of moments estimator for and .
2. Let X1 ; : : : ; Xn Uniform(a; b) where a and b are unknown parameters and a < b.
(a) Find the method of moments estimators for a and b.
(b) Find the mle ba and bb.
R
(c) Let = xdF (x). Find the mle of .
(d) Let b be the mle from (1bc).R Let e be the nonparametric plug-in estimator of = xdF (x). Suppose that
a = 1, b = 3 and n = 10. Find the mse of b by simulation.
Find the mse of e analytically. Compare.
3. Let X1 ; : : : ; Xn N (; 2 ). Let be the .95 percentile,
i.e. P(X < ) = :95.
(a) Find the mle of .
(b) Find an expression for an approximate 1 condence
interval for .
(c) Suppose the data are:
3.23 -2.50 1.88 -0.68 4.43
1.03 -0.07 -0.01 0.76 1.76
0.33 -0.31 0.30 -0.61 1.52
1.54 2.28 0.42 2.33 -1.03
0.39
0.17
3.18
5.43
4.00
Find the mle b. Find the standard error using the delta
method. Find the standard error using the parametric
bootstrap.
10.13 Exercises
175
4. Let X1 ; : : : ; Xn Uniform(0; ). Show that the mle is
consistent. Hint: Let Y = maxfX1 ; :::; Xn g:. For any c,
P(Y < c) = P(X1 < c; X2 < c; :::; Xn < c) = P(X1 <
c)P(X2 < c):::P(Xn < c).
5. Let X1 ; : : : ; Xn Poisson(). Find the method of moments estimator, the maximum likelihood estimator and
the Fisher information I ().
6. Let X1 ; :::; Xn N (; 1). Dene
Yi =
1 if Xi > 0
0 if Xi 0:
Let = P(Y1 = 1):
(a) Find the maximum likelihood estimate b of .
(b) Find an approximate 95 per cent condence interval
for :
P
(c) Dene e = (1=n) i Yi . Show that e is a consistent
estimator of .
(d) Compute the asymptotic relative eÆciency of e to b.
Hint: Use the delta method to get the standard error of the
mle . Then compute the standard error (i.e. the standard
deviation) of e.
(e) Suppose that the data are not really normal. Show that
b is not consistent. What, if anything, does b converge to?
7. (Comparing two treatments.) n1 people are given treatment 1 and n2 people are given treatment 2. Let X1 be
the number of people on treatment 1 who respond favorably to the treatment and let X2 be the number of
people on treatment 2 who respond favorably. Assume
that X1 Binomial(n1 ; p1 ) X2 Binomial(n2 ; p2 ). Let
= p1 p2 .
176
(a) Find the mle for .
(b) Find the Fisher information matrix I (P1 ; p2 ).
(c) Use the multiparameter delta method to nd the asymptotic standard error of b.
(d) Suppose that n1 = n2 = 200, X1 = 160 and X2 = 148.
Find b. Find an approximate 90 percent condence interval for using (i) the delta method and (ii) the parametric
bootstrap.
8. Find the Fisher information matrix for Example 10.29.
9. Let X1 ; :::; Xn Normal(; 1): Let = e and let b = eX
be the mle. Create a data set (using = 5) consisting of
n=100 observations.
(a) Use the delta method to get seb and 95 percent condence interval for . Use the parametric bootstrap to get
b and 95 percent condence interval for . Use the nonse
parametric bootstrap to get seb and 95 percent condence
interval for : Compare your answers.
(b) Plot a histogram of the bootstrap replications for the
parametric and nonparametric bootstraps. These are estimates of the distribution of b. The delta method also gives
b se 2 ).
an approximation to this distribution namely, Normal(;
Compare these to the true sampling distribution of b (whihc
you can get by simulation). Which approximation, parametric bootstrap, bootstrap, or delta method is closer to
the true distribution?
10. Let X1 ; :::; Xn Unif(0; ): The mle is b = X(n) = maxfX1 ; :::; Xng.
Generate a data set of size 50 with = 1:
(a) Find the distribution of b analytically. Compare the
true distribution of b to the histograms from the parametric and nonparametric bootstraps.
10.13 Exercises
177
(b) This is a case where the nonparametric bootstrap
does very poorly. Show that, for the parametric bootstrap
P(b = b) = 0 but for the nonparametric bootstrap P(b =
b) :632. Hint: show that, P(b = b) = 1 (1 (1=n))n
then take the limit as n gets large. What is the implication
of this?
178
11
Hypothesis Testing and p-values
Suppose we want to know if a certain chemical causes cancer.
We take some rats and randomly divide them into two groups.
We expose one group to the chemical and then we compare
the cancer rate in the two groups. Consider the following two
h ypotheses:
The Null Hypothesis: The cancer rate is the same
in the two groups
The Alternative Hypothesis: The cancer rate is
not the same in the two groups.
If the exposed group has a much higher rate of cancer than the
unexposed group then we will reject the null hypothesis and
conclude that the evidence fa vors the alternative h ypothesis;
in other words we will conclude that there is evidence that the
chemical causes cancer. This is an example of hypothesis testing.
More formally, suppose that we partition the parameter space
into two disjoint sets 0 and 1 and that we wish to test
H0 : 2 0 v ersus H1 : 2 1 :
(11.1)
180
11. Hypothesis Testing and p-values
We call H0 the null hypothesis and H1 the alternative hypothesis.
Let X be a random variable and let X be the range of X . We
test a hypothesis by nding an appropriate subset of outcomes
R X called the rejection region. If X 2 R we reject the
null hypothesis, otherwise, we do not reject the null hypothesis:
X 2 R =) reject H0
X 2= R =) retain (do not reject) H0
Usually the rejection region R is of the form
n
o
R = x : T (x) > c
where T is a test statistic and c is a critical value. The problem in hypothesis testing is to nd an appropriate test statistic
T and an appropriate cuto value c.
Warning! There is a tendency to use hypthesis testing methods even when they are not appropriate. Often, estimation and
condence intervals are better tools. Use hypothesis testing only
when you want to test a well dened hypothesis.
Hypothesis testing is like a legal trial. We asume someone is
innocent unless the evidence strongly suggests that he is guilty.
Similarly, we retain H0 unless there is strong evidence to reject
H0 . There are two types of errors when can make. Rejecting H0
when H0 is true is called a type I error. Rejecting H1 when
H1 is true is called a type II error. The possible outcomes for
hypothesis testing are summarized in the Table below:
Retain
Null
p
H0 true
H1 true type II error
181
Reject Null
type
p I error
Summary of outcomes of hypothesis testing.
Denition 11.1 The power function of a test with rejection re-
gion R is dened by
() = P (X 2 R):
(11.2)
The size of a test is dened to be
= sup ():
20
(11.3)
A test is said to have level if its size is less than or equal to
.
A hypothesis of the form = 0 is called a simple hypothesis. A hypothesis of the form > 0 or < 0 is called a
composite hypothesis. A test of the form
H0 : = 0 versus H1 : 6= 0
is called a two-sided test. A test of the form
H0 : 0 versus H1 : > 0
or
H0 : 0 versus H1 : < 0
is called a one-sided test. The most common tests are twosided.
182
N (; ) where is known. We
want to test H0 : 0 versus H1 : > 0. Hence, 0 = ( 1; 0]
and 1 = (0; 1). Consider the test:
reject H0 if T > c
n
o
where T = X . The rejection region is R = xn : T (xn ) > c .
Let Z denote a standard normal random variable. The power
function is
() = P X > c
p
pn(c ) n(X )
= P
>
pn(c ) = P Z>
p
n(c )
:
= 1 This function is increasing in . Hence
size = sup () = (0) = 1
<0
pnc :
To get a size test, set this equal to and solve for c to get
c=
1 (1 )
pn :
p
So we reject when X > 1 (1 )= n. Equivalently, we reject
when
pn (X 0)
> z : Finding most powerful tests is hard and, in many cases, most
powerful tests don't even exist. Instead of searching for most
powerful tests, we'll just consider three widely used tests: the
Wald test, the 2 test and the permutation test. A fourth test,
the likelihood ratio test, is dicussed in the appendix.
11.1 The Wald Test
183
11.1 The Wald Test
Let be a scalar parameter, let b be an estimate of and let
b be the estimated standard error of b.
se
The test is named
after Abraham Wald
(1902-1950), who
was a very inuential mathematical
statistician.
Denition 11.3 The Wald Test
Consider testing
H0 : = 0 versus H1 : 6= 0 :
Assume that b is asymptotically Normal:
pn(b )
0
N (0; 1):
b
se
The size Wald test is: reject H0 when jW j > z=2
where
b 0
W=
:
se
b
Theorem 11.4 Asymptotically, the Wald test has size , that is,
jZ j > z=2 ! P0
as n ! 1.
Under = 0 , (b 0 )=seb
N (0; 1). Hence, the
probability of rejecting when the null = 0 is true is
Proof.
P0
jW j > z=2
=
P0
!
P
=
jb 0 j > z
=2
b
se
jN (0; 1)j > z=2
: !
Remark 11.5 Most texts dene the Wald test slightly dierently.
They use the standard error computed at = 0 rather then at
the estimated value b. Both versions are valid.
184
Let us consider the power of the Wald test when the null
hypothesis is false.
Theorem 11.6 Suppose the true value of is ? =
6 0 . The power
(? ) { the probability of correctly rejecting the null hypothesis
{ is given (approximately) by
1 0
b
se
?
+ z=2 + 0
b
se
?
z=2 :
(11.4)
Recall that seb tends to 0 as the sample size increases. Inspecting (11.4) closely we note that: (i) the power is large if ? is far
from 0 and (ii) the power is large if the sample size is large.
Example 11.7 (Comparing Two Prediction Algorithms) We test a
prediction algorithm on a test set of size m and we test a second
prediction algorithm on a second test set of size n. Let X be
the number of incorrect predictions for algorithm 1 and let Y
be the number of incorrect predictions for algorithm 2. Then
X Binomial(m; p1 ) and Y Binomial(n; p2 ). To test the null
hypothesis that p1 = p2 write
H0 : Æ = 0 versus H1 : Æ 6= 0
where Æ = p1
standard error
p2 . The
r
b =
se
mle
is Æb = pb1
pb2 with estimated
pb1 (1 pb1 ) pb2 (1 pb2 )
+
:
m
n
The size Wald test is to reject H0 when jW j > z=2 where
W=
Æb 0
b
se
=
q
pb1
pb1 (1 pb1 )
m
pb2
+
pb2 (1 pb2 )
n
:
The power of this test will be largest when p1 is far from p2 and
when the sample sizes are large.
11.1 The Wald Test
185
What if we used the same test set to test both algorithms?
The two samples are no longer independent. Instead we use the
following strategy. Let Xi = 1 if algorithm 1 is correct on test
case i and Xi = 0 otherwise. Let Yi = 1 if algorithm 2 is correct
on test case i Yi = 0 otherwise. A typical data set will look
something like this:
Test Case Xi Yi Di = Xi
1
2
3
4
5
..
.
n
1
1
1
0
0
..
.
0
0
1
1
1
0
..
.
1
1
0
0
-1
0
..
.
-1
Yi
Let
Æ = E (Di ) = E (Xi )
E (Yi ) = P(Xi
= 1)
P(Yi
= 1):
p
P
ThenPÆb = D = n 1 ni=1 Di and seb (Æb) = S= n where S 2 =
n 1 ni=1 (Di D)2 . To test H0 : Æ = 0 versus H1 : Æ 6= 0 we use
b se
b and reject H0 if jW j > z=2 . This is called a paired
W = Æ=
comparison. Example 11.8 (Comparing Two Means.) Let X1 ; : : : ; Xm and Y1 ; : : : ; Yn
be two independent samples from populations with means 1 and
2 , respectively. Let's test the null hypothesis that 1 = 2 .
Write this as H0 : Æ = 0 versus H1 : Æ 6= 0 where Æ = 1 2 .
Recall that the nonparametric plug-in estimate of Æ is Æb = X Y
with estimated standard error
r
b =
se
s21 s22
+
m n
186
where s21 and s22 are the sample variances. The size Wald test
rejects H0 when jW j > z=2 where
W=
Æb 0
b
se
=
X
q
Y
s2
m+ n
s21
2
:
Example 11.9 (Comparing Two Medians.) Consider the previous
example again but let us test whether the medians of the two
distributions are the same. Thus, H0 : Æ = 0 versus H1 : Æ 6=
0 where Æ = 1 2 where 1 and 2 are the medians. The
nonparametric plug-in estimate of Æ is Æb = b1 b2 where b1 and
b2 are the sample medians. The estimated standard error seb of
Æb can be obtained from the bootstrap. The Wald test statistic is
b se
b. W = Æ=
There is a relationship between the Wald test and the 1
asymptotic condence interval b seb z=2 .
Theorem 11.10 The size Wald test rejects H0 : = 0 versus
H1 : 6= 0 if and only if 0 2= C where
C = (b seb z=2 ; b + seb z=2 ):
Thus, testing the hypothesis is equivalent to checking whether
the null value is in the condence interval.
11.2 p-values
Reporting \reject H0 " or \retain H0 " is not very informative.
Instead, we could ask, for every , whether the test rejects at
that level. Generally, if the test rejects at level it will also
reject at level 0 > . Hence, there is a smallest at which the
test rejects and we call this number the p-value.
11.2 p-values
187
Denition 11.11 Suppose that for every 2 (0; 1) we have
a size test with rejection region R . Then,
n
p-value = inf :
T (X n)
o
2 R :
That is, the p-value is the smallest level at which we can
reject H0 .
Informally, the p-value is a measure of the evidence against
H0 : the smaller the p-value, the stronger the evidence against
H0 . Typically, researchers use the following evidence scale:
p-value
< :01
.01 - .05
.05 - .10
> :1
evidence
very strong evidence againt H0
strong evidence againt H0
weak evidence againt H0
little or no evidence againt H0
Warning! A large p-value is not strong evidence in favor of
H0 . A large p-value can occur for two reasons: (i) H0 is true or
(ii) H0 is false but the test has low power.
But do not confuse the p-value with P(H0 jData). The pvalue is not the probability that the null hypothesis is
true.
The following result explains how to compute the p-value.
Theorem 11.12 Suppose that the size test is of the form
reject H0 if and only if T (X n) c :
Then,
p-value = sup P (T (X n ) T (xn )):
20
In words, the p-value is the probability (under H0 ) of observing
a value of the test statistic as or more extreme than what was
actually observed.
We discuss quantities like P(H0 jData)
in the chapter on
Bayesian inference.
188
For a Wald test, W has an aproximate N (0; 1) distribution
under H0 . Hence, the p-value is
p-value P jZ j > jwj = 2P Z <
jw j
= 2 Z <
jw j
(11.5)
0 )=seb is the observed value of
where Z N (0; 1) and w
the test statistic.
Here is an important property of p-values.
= (b
Theorem 11.13 If the test statistic has a continuous distribu-
tion, then under H0 : = 0 , the p-value has a Uniform (0,1)
distribution.
If the p-value is less than .05 then people often report that
\the result is statistically signicant at the 5 per cent level."
This just means that the null would be rejected if we used a size
= 0:05 test. In general, the size test rejects if and only if
p-value < .
Example 11.14 Recall the cholesterol data from Example 8.11.
To test of the means ard dierent we compute
W=
Æb 0
b
se
=
X
q
Y
s21
m
+ sn2
2
=
216:2 195:3
= 3:78:
52 + 2:42
To compute the p-value, let Z N (0; 1) denote a standard Normal random variable. Then,
p-value = P(jZ j > 3:78) = 2P(Z < 3:78) = :0002
which is very strong evidence against the null hypothesis. To
test if the medians are dierent, let b1 and b2 denote the sample
medians. Then,
b
W= 1
b
se
b2
=
212:5 194
= 2:4
7:7
11.3 The 2 distribution
189
where the standard error 7.7 was found using the boostrap. The
p-value is
p-value = P(jZ j > 2:4) = 2P(Z < 2:4) = :02
which is strong evidence against the null hypothesis. Warning! A result might be statistically signicant and yet
the size of the eect might be small. In such a case we have a
result that is statistically signicant but not practically signicant. It is wise to report a condence interval as well.
11.3 The distribution
2
Let Z ; : : : ; Zk be independent, standard normals. Let V =
Then we say that V has a 2 distribution with k degrees of freedom, written V 2k . The probability density of V
is
v (k=2) 1 e v=2
f (v ) = k=2
2 (k=2)
for v > 0. It can be shown that E (V ) = k and V (k) = 2k. We
dene the upper quantile 2k; = F 1 (1 ) where F is the
2
2
cdf . That is, P (k > k; ) = .
1
Pk
2.
Z
i=1 i
11.4 Pearson's Test For Multinomial Data
2
Pearson's 2 test is used for multinomial data. Recall that
X = (X1 ; : : : ; Xk ) has a multinomial distribution if
n
f (x1 ; : : : ; xk ; p) =
px1 pxkk
x1 : : : xk 1
where
n
n!
=
:
x1 : : : xk
x1 ! xk !
The mle of p is pb = (pb1 ; : : : ; pbk ) = (X1 =n; : : : ; Xk =n).
190
Let (p01 ; : : : ; p0k ) be some xed set of probabilities and suppose we want to test
H0 : (p1 ; : : : ; pk ) = (p01 ; : : : ; p0k ) versus H1 : (p1 ; : : : ; pk ) 6= (p01 ; : : : ; p0k ):
Pearson's 2 statistic is
k
X
(Xj
k
np0j )2 X
(Oj Ej )2
T=
=
np0j
Ej
j =1
j =1
where Oj = Xj is the observed data and Ej = E (Xj ) = np0j is
the expected value of Xj under H0 .
2k 1 . Hence the test: reject H0
if T > 2k 1; has asymptotic level . The p-value is P(2k > t)
where t is the observed value of the test statistic.
Theorem 11.15 Under H0 , T
Example 11.16 (Mendel's peas.) Mendel bred peas with round yel-
low seeds and wrinkled green seeds. There are four types of progeny:
round yellow, wrinkled yellow, round green and (4) wrinkled
green. The number of each type is multinomial with probability
(p1 ; p2 ; p3 ; p4 ). His theory of inheritance predicts that
9 3 3 1
p=
; ; ;
p0 :
16 16 16 16
In n = 556 trials he observed X = (315; 101; 108; 32). Since,
np10 = 312:75; np20 = np30 = 104:25 and np40 = 34:75, the test
statistic is
(315 312:75)2 (101 104:25)2 (108 104:25)2 (32 34:75)2
+
+
+
= 0:47:
2 =
312:75
104:25
104:25
34:75
The = :05 value for a 23 is 7.815. Since 0:47 is not larger
than 7.815 we do not reject the null. The p-value is
p-value = P(23 > :47) = :07
which is only moderate evidence againt H0 . Hence, the data do
not contradict Mendel's theory. Interestingly, there is some controversey about whether Mendel's results are \too good." 11.5 The Permutation Test
191
11.5 The Permutation Test
The permutation test is a nonparametric method for testing whether two distribution are the same. This test is \exact"
meaning that it is not based on large sample theory approximations. Suppose that X1 ; : : : ; Xm FX and Y1 ; : : : ; Yn FY are
two independent samples and H0 is the hypothesis that the two
samples are identically distributed. This is the type of hypothesis we would consider when testing whether a treatment diers
from a placebo. More precisely we are testing
H0 : FX = FY versus H1 : FX =
6 FY :
Let T (x1 ; : : : ; xm ; y1 ; : : : ; yn) be some test statistic, for example,
T (X1 ; : : : ; Xm ; Y1; : : : ; Yn ) = jX m
Y n j:
Let N = m + n and consider forming all N ! permutations of
the data X1 ; : : : ; Xm ; Y1 ; : : : ; Yn . For each permutation, compute
the test statistic T . Denote these values by T1 ; : : : ; TN ! . Under
the null hypothesis, each of these values is equally likely. The
distribution P0 that puts mass 1=N ! on each Tj is called the
permutation distribution of T . Let tobs be the observed value
of the test statistic. Assuming we reject when T is large, the pvalue is
N!
1 X
p-value = P0 (T > tobs ) =
I (T > t ):
N ! j =1 j obs
Example 11.17 Here is a toy example to make the idea clear.
Suppose the data are: (X1 ; X2 ; Y1 ) = (1; 9; 3). Let T (X1 ; X2 ; Y1) =
jX Y j = 2. The permutations are:
Under the null hypothesis, given the
ordered data values,
X1 ; : : : ; Xm ; Y1; : : : ; Yn
is uniformly distributed over the
N ! permutations of
the data.
192
permutation value of T probability
(1,9,3)
2
1/6
(9,1,3)
2
1/6
(1,3,9)
7
1/6
(3,1,9)
7
1/6
(3,9,1)
5
1/6
(9,3,1)
5
1/6
The p-value is P(T > 2) = 4=6. Usually, it is not practical to evaluate all N ! permutations.
We can approximate the p-value by sampling randomly from
the set of permutations. The fraction of times Tj > tobs among
these samples approximates the p-value.
Algorithm for Permutation Test
1. Compute the observed value of the test statistic tobs =
T (X1 ; : : : ; Xm ; Y1 ; : : : ; Yn).
2. Randomly permute the data. Compute the statistic again
using the permuted data.
3. Repeat the previous step B times and let T1 ; : : : ; TB denote the resulting values.
4. The approximate p-value is
B
1X
I (T > t ):
B j =1 j obs
Example 11.18 DNA microarrays allow researchers to measure
the expression levels of thousands of genes. The data are the levels of messenger RNA (mRNA) of each gene, which is thought
11.6 Multiple Testing
193
to provide a measure how much protein that gene produces.
Roughly, the larger the number, the more active the gene. The
table below, reproduced from Efron, et. al. (JASA, 2001, p. 1160)
shows the expression levels for genes from two types of liver cancer cells. There are 2,638 genes in this experiment but here we
show just the rst two. The data are log-ratios of the intensity
levels of two dierent color dyes used on the arrays.
Type I
Type II
Patient
1
2
3
4
5
6
7
8
9
10
Gene 1
230.0
-1,350
-1,580.0
-400
-760
970
110
-50
-190
-200
Gene 2
470.0
-850
-.8
-280
120
390
-1730
-1360
-.8
-330
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Let's test whether the median level of gene 1 is dierent between the two groups. Let 1 denote the median level of gene 1
of Type I and let 2 denote the median level of gene 1 of Type II.
The absolute dierence of sample medians is T = jb1 b2 j = 710.
Now we estimate the permutation distribution by simulation and
we nd that the estimated p-value is .045. Thus, if we use a
= :05 level of signicance, we would say that there is evidence
to reject the null hypothesis of no dierence. In large samples, the permutation test usually gives similar
results to a test that is based on large sample theory. The permutation test is thus most useful for small samples.
In some situations we may conduct many hypothesis tests. In
example 11.18, there were actually 2,638 genes. If we tested for a
dierence for each gene, we would be conducting 2,638 separate
hypothesis tests. Suppose each test is conducted at level . For
any one test, the chance of a false rejection of the null is .
But the chance of at least one false rejection is much higher.
194
This is the multiple testing problem. The problem comes up
in many data mining situations where one may end up testing
thousands or even millions of hypotheses. There are many ways
to deal with this problem. Here we discuss two methods.
Consider m hypothesis tests:
H0i versus H1i ; i = 1; : : : ; m
and let P1 ; : : : ; Pm denote m p-values for these tests.
The Bonferroni Method
Given p-values P1 ; : : : ; Pm , reject null hypothesis H0i if Pi < =m.
Theorem 11.19 Using the Bonferroni method, the probability of
falsely rejecting any null hypotheses is less than or equal to .
Let R be the event that at least one null hypotheses
is falsely rejected. Let Ri be the event that the ith null hypothesisSis falsely rejected.
Recall that if A1 ; : : : ; Ak are events then
Pk
k
P( i=1 Ai ) i=1 P(Ai ). Hence,
Proof.
P(R) = P
m
[
i=1
!
Ri
m
X
i=1
P (Ri ) =
m
X
=
m
i=1
from Theorem 11.13. Example 11.20 In the gene example, using = :05, we have
that :05=2638 = :00001895375. Hence, for any gene with p-value
less than .00001895375, we declare that there is a signicant
dierence.
The Bonferroni method is very conservative because it is trying to make it unlikely that you would make even one false
rejection. Sometimes, a more reasonable idea is to control the
195
false discovery rate (FDR) which is dened as the mean of the
number of false rejections divided by the number of rejections.
Suppose we reject all null hypotheses whose p-values fall below some threshold. Let m0 be the number of null hypotheses
are true and m1 = m m0 null hypotheses are false. The tests
can be categorize in a 2 2 as in the following table.
H0 True
H0 False
Total
H0 Not Rejected H0 Rejected
U
V
T
S
m R
R
Total
m0
m1
m
Dene the False Discovery Proportion (FDP)
FDP =
V=R if R > 0
0
if R = 0:
The FDP is the proportion of rejections that are incorrect. Next
dene FDR = E (FDP).
The Benjamini-Hochberg (BH) Method
1. Let P(1) < < P(m) denote the ordered p-values.
2. Dene
ì =
i
;
Cm m
and
n
R = max i : P(i) < ì
o
(11.6)
where Cm isPdened to be 1 if the p-values are independent
and Cm = m
i=1 (1=i) otherwise.
3. Let t = P(R) ; we call t the BH rejection threshold.
4. Rejects all null hypotheses H0i for which Pi t:
196
Theorem 11.21 (Benjamini and Hochberg) If the procedure
above is applied, then regardless of how many nulls are true
and regardless of the distribution of the p-values when the null
hypthesis is false,
FDR = E (FDP) m0
:
m
Example 11.22 Figure 11.1 shows 7 ordered p-values plotted as
vertical lines. If we tested at level without doing any corection for mutiple testing, we would reject all hypotheses whose
p-values are less than . In this case, the 5 hypotheses corresponding to the 5 smallest p-values are rejected. The Bonferroni
method rejects all hypotheses whose p-values are less than =m.
In this example, this leads to no rejections. The BH threshold
corresponds to the last p-value that falls under the line with slope
. This leads to three hypotheses being rejected in this case. The Bonferonni method controls the probability
of a single false rejection. This is very strict and leads to low
power when there are many tests. The FDR method controls the
fraction of false discoveries which is a more reasonable criterion
when there are many tests.
Summary.
11.7.1 The Neyman-Pearson Lemma
In the special case of a simple null H0 : = 0 and a simple
alternative H1 : = 1 we can say precisely what the most
powerful test is.
Theorem 11.23 (Neyman-Pearson.) Suppose we test H0 : =
0 versus H1 : = 1 . Let
Qn
f (x ; )
L
(1 )
T=
= Qni=1 i 1 :
L(0)
i=1 f (xi ; 0 )
197
p-values
t
=m
0
reject
don't reject
threshold
FIGURE 11.1. Schematic illustration of Benjamini-Hochberg procedure. All hypotheses corresponding to the last undercrossing are
rejected.
1
198
Suppose we reject H0 when T > k. If we choose k so that
P0 (T > k ) = then this test is the most powerful, size test. That is among all tests with size , this test maximizes the
power (1 ).
11.7.2 Power of the Wald Test
11.6.
Let Z N (0; 1). Then,
Proof of Theorem
Power = (? )
= P? (Reject H0 )
= P? (jW j > z=2 )
!
j
b 0 j
= P ?
> z=2
b
se
=
=
=
P ?
b 0
b
se
!
> z=2 + P?
b 0
b
se
!
< z=2
b z=2 + P? b < 0 se
b z=2
P? b > 0 + se
!
b ? 0 ?
b ? 0 ?
>
+ z=2 + P?
<
P ?
b
b
b
b
se
se
se
se
0 ?
+ z=2 + P Z < 0 ? z=2
P Z>
b
b
se
se
= 1
0
b
se
?
+ z=2 + 0
b
se
?
z=2 :
11.7.3 The t-test
To test H0 : = 0 where is the mean, we can use the Wald
test. When the data are assumed to be Normal and the sample
size is small, it is common instead to use the t-test. A random
variable T has a t-distribution with k degrees of freedom if it has
!
z=2
density
f (t) =
p
k
k+1 2
k
t2
2 1+ k
199
(k+1)=2 :
When the degrees of freedom k ! 1, this tends to a Normal
distribution. When k = 1 it reduces to a Cauchy.
Let X1 ; : : : ; Xn N (; 2 ) where = (; 2 ) are both unknown. Suppose we want to test = 0 versus 6= 0 . Let
T=
pn(X
n
Sn
0 )
where Sn2 is the sample variance. For large samples T N (0; 1)
under H0 . The exact distribution of T under H0 is tn 1 . Hence
if we reject when jT j > tn 1;=2 then we get a size test.
11.7.4 The Likelihood Ratio Test
Let = (1 ; : : : ; q ; q+1 ; : : : ; r ) and suppose that 0 consists
of all parameter values such that (q+1 ; : : : ; r ) = (0;q+1 ; : : : ; 0;r ).
Denition 11.24 Dene the likelihood ratio statistic
by
sup2 L()
= 2 log
= 2 log
sup20 L()
L(b)
L(b0)
!
where b is the mle and b0 is the mle when is restricted
to lie in 0 . The likelihood ratio test is: reject H0
when (xn ) > 2r q; .
For example, if = (1 ; 2 ; 3 ; 4 ) and we want to test the null
hypothesis that 3 = 4 = 0 then the limiting distribution has
4 2 = 2 degrees of freedom.
200
Theorem 11.25 Under H0 ,
d 2
2 log (xn ) !
r q :
Hence, asymptotically, the LR test is level .
The most complete book on testing is (1986). See also Casella
and Berger (1990, Chapter 8). The FDR method is due to Benjamini and Hochberg (1995).
11.9 Exercises
3. Let X1 ; :::; Xn Uniform(0; ) and let Y = maxfX1 ; :::; Xn g.
We want to test
H0 : = 1=2 versus H1 : > 1=2.
The Wald test is not appropriate since Y does not converge
to a Normal. Suppose we decide to test this hypothesis by
rejecting H0 when Y > c.
(a) Find the power function.
(b) What choice of c will make the size of the test .05?
(c) In a sample of size n = 20 with Y=0.48 what is the
p-value? What conclusion about H0 would you make?
(d) In a sample of size n = 20 with Y=0.52 what is the
p-value? What conclusion about H0 would you make?
4. There is a theory that people can postpone their death
until after an important event. To test the theory, Phillips
11.9 Exercises
201
and King (1988) collected data on deaths around the Jewish holiday Passover. Of 1919 deaths, 922 died the week
before the holiday and 997 died the week after. Think
of this as a binomial and test the null hypothesis that
= 1=2: Report and interpret the p-value. Also construct
a condence interval for :
Reference:
Phillips, D.P. and King, E.W. (1988).
Death takes a holiday: Mortality surrounding major social
occasions.
The Lancet, 2, 728-732.
5. In 1861, 10 essays appeared in the New Orleans Daily
Crescent. They were signed \Quintus Curtuis Snodgrass"
and some people suspected they were actually written by
Mark Twain. To investigate this, we will consider the proportion of three letter words found in an author's work.
From eight Twain essays we have:
.225 .262 .217 .240 .230 .229 .235 .217
From 10 Snodgrass essays we have:
.209 .205 .196 .210 .202 .207 .224 .223 .220 .201
(source: Rice xxxx)
(a) Perform a Wald test for equality of the means. Use
the nonparametric plug-in estimator. Report the p-value
and a 95 per cent condence interval for the dierence of
means. What do you conclude?
(b) Now use a permutation test to avoid the use of large
sample methods. What is your conclusion?
6. Let X1 ; : : : ; Xn N (; 1). Consider testing
H0 : = 0 versus = 1:
202
Let the rejection
region be R = fxn : T (xn ) > cg where
P
T (xn ) = n 1 ni=1 Xi .
(a) Find c so that the test has size .
(b) Find the power under H1 , i.e. nd (1).
(c) Show that (1) ! 1 as n ! 1.
7. Let b be the mle of a parameter and let seb = fnI (b)g 1=2
where I () is the Fisher information. Consider testing
H0 : = 0 versus 6= 0 :
Consider the Wald test with rejection region R = fxn :
jZ j > z=2 g where Z = (b 0 )=seb . Let 1 > 0 be some
alternative. Show that (1 ) ! 1.
8. Here are the number of elderly Jewish and Chinese women
who died just before and after the Chinese Harvest Moon
Festival.
Week Chinese Jewish
-2
55
141
33
145
-1
1
70
139
2
49
161
Compare the two mortality patterns.
9. A randomized, double-blind experiment was conducted to
assess the eectiveness of several drugs for reducing postoperative nausea. The data are as follows.
Number of Patients Incidence of Nausea
Placebo
80
45
Chlorpromazine
75
26
Dimenhydrinate
85
52
Pentobarbital (100 mg)
67
35
Pentobarbital (150 mg)
85
37
11.9 Exercises
203
(source: )
(a) Test each drug versus the placebo at the 5 per cent
level. Also, report the estimated odds-ratios. Summarize
your ndings.
(b) Use the Bonferoni and the FDR method to adjust for
multiple testing.
10. Let X1 ; :::; Xn Poisson():
(a) Let 0 > 0. Find the size Wald test for
H0 : = 0 versus H1 : 6= 0 :
(b) (Computer Experiment.) Let 0 = 1, n = 20 and =
:05. Simulate X1 ; : : : ; Xn Poisson(0 ) and perform the
Wald test. Repeat many times and count how often you
reject the null. How close is the type I error rate to .05?
204
12
Bayesian Inference
12.1 The Bay esianPhilosophy
The statistical theory and methods that we hav e discussed
so far are known as frequentist (or classical) inference. The
frequentist point of view is based on the following postulates:
(F1) Probability refers to limiting relative frequencies. Probabilities are objective properties of the real world.
(F2) P arameters are xed, (usually unknown) constants. Because they are not uctuating, no probability statements can
be made about parameters.
(F3) Statistical procedures should be designed to hav e well dened long run frequency properties. F or example, a 95 per cent
condence in terval should trap the true v alue of the parameter
with limiting frequency at least 95per cent.
There is another approach to inference called Bay esian inference. The Bay esian approach is based on the following postulates:
206
12. Bayesian Inference
(B1) Probability describes degree of belief, not limiting frequency. As such, we can make probability statements about lots
of things, not just data which are subject to random variation.
For example, I might say that `the probability that Albert Einstein drank a cup of tea on August 1 1948" is .35. This does not
refer to any limiting frequency. It reects my strength of belief
that the proposition is true.
(B2) We can make probability statements about parameters,
even though they are xed constants.
(B3) We make inferences about a parameter , by producing
a probability distribution for . Inferences, such as point estimates and interval estimates may then be extracted from this
distribution.
Bayesian inference is a controversial approach because it inherently embraces a subjective notion of probability. In general,
Bayesian methods provide no guarantees on long run performance. The eld of Statistics puts more emphasis on frequentist
methods although Bayesian methods certainly have a presence.
Certain data mining and machine learning communuties seem to
embrace Bayesian methods very strongly. Let's put aside philosophical arguments for now and see how Bayesian inference is
done. We'll conclude this chapter with some discussion on the
strengths and weaknesses of each approach.
12.2 The Bayesian Method
Bayesian inference is usually carried out in the following way.
1. We choose a probability density f () { called the prior
distribution { that expresses our degrees of beliefs about
a parameter before we see any data.
207
2. We choose a statistical model f (xj) that reects our beliefs about x given . Notice that we now write this as
f (xj) instead of f (x; ).
3. After observing data X1 ; : : : ; Xn, we update our beliefs
and form the posterior distribution f (jX1 ; : : : ; Xn ).
To see how the third step is carried out, rst, suppose that is
discrete and that there is a single, discrete observation X . We
should use a capital letter now to denote the parameter since
we are treating it like a random variable so let denote the
parameter. Now, in this discrete setting,
P (X = x; = ) PP (X = xj = )P ( = )
=
P ( = jX = x) =
P (X = x)
P (X = xj = )P ( = )
which you may recognize from earlier in the course as Bayes'
theorem. The version for continuous variables is obtained by
using density functions:
f (xj)f ()
f (jx) = R
:
(12.1)
f (xj)f ()d
If we have n iid observations
X1 ; : : : ; Xn, we replace f (xj)
Qn
with f (x1 ; : : : ; xn j) = i=1 f (xi j). Let us write X n to mean
(X1 ; : : : ; Xn ) and xn to mean (x1 ; : : : ; xn ). Then
L ()f () / L ()f ():
f (xn j)f ()
=R n
f (jxn ) = R
n
f (x j)f ()d
Ln()f ()d n
(12.2)
In the right hand
side of the last equation, we threw away the
R
denominator Ln()f ()d which is a constant that does not
depend on ; we call this quantity the normalizing constant.
We can summarize all this by writing:
\posterior is proportional to likelihood times prior:00 (12.3)
You might wonder,
doesn't it cause a problem to throw away
R
the constant Ln ()f ()d? The answer is that we can always
208
R
recover the constant is since we know that f (jxn )d = 1.
Hence, we often omit the constant until we really need it.
What do we do with the posterior? First, we can get a point
estimate by summarizing the center of the posterior. Typically,
we use the mean or mode of the posterior. The posterior mean
is
R
Z
L ()f ()
n
n = f (jx )d = R n
(12.4)
Ln()f ()d :
WeRcan also obtain aRBayesian interval estimate. Dene a and b
by a1 f (jxn )d = b1 f (jxn )d = =2. Let C = (a; b). Then
P(
so C is a 1
2 Cj
xn )
=
Z b
a
f (jxn ) d = 1 posterior interval.
Bernoulli(p). Suppose we take
the uniform distribution f (p) = 1 as a prior. By Bayes' theorem
the posterior has the form
f (pjxn ) / f (p)Ln(p) = ps (1 p)n s = ps+1 1 (1 p)n s+1 1
P
where s = i xi is the number of heads. Recall that random
variable has a Beta distribution with parameters and if its
density is
f (p; ; ) =
( + ) 1
p (1 p) 1 :
() ( )
We see that the posterior for p is a Beta distribution with parameters s + 1 and n s + 1. That is,
f (pjxn ) =
(n + 2)
p(s+1) 1 (1 p)(n s+1) 1 :
(s + 1) (n s + 1)
We write this as
pjxn Beta(s + 1; n s + 1):
209
Notice that we have gured Rout the normalizing constant without
actually doing the integral Ln (p)f (p)dp. The mean of a Beta
(; ) is =( + ) so the Bayes estimator is
p=
s+1
:
n+2
It is instructive to rewrite the estimator as
p = n pb + (1 n)pe
where pb = s=n is the mle, pe = 1=2 is the prior mean and n =
n=(n + 2) 1. A 95 per cent posterior interval
can be obtained
Rb
by numerically nding a and b such that a f (pjxn ) dp = :95.
Suppose that instead of a uniform prior, we use the prior p Beta(; ). If you repeat the calculations above, you will see that
pjxn Beta( + s; + n s). The at prior is just the special
case with = = 1. The posterior mean is
+s
n
+
p=
=
pb +
p
++n
++n
++n 0
where p0 = =( + ) is the prior mean. In the previous example, the prior was a Beta distribution
and the posterior was a Beta distribution. When the prior and
the posterior are in the same family, we say that the prior is
conjugate.
N (; 2 ). For simplicity, let us
assume that is known. Suppose we take as a prior N (a; b2 ).
In problem 1 in the homework, it is shown that the posterior for
is
jX n N (a; b2 )
(12.5)
where
= wX + (1 w)a
210
where
1
1
1
1
w = 1 se2 1 and 2 = 2 + 2
se b
se2 + b2
p
and se = = n is the standard error of the mle X . This is
another example of a conjugate prior. Note that w ! 1 and
=se ! 1 as n ! 1. So, for large n, the posterior is approxb se2 ). The same is true if n is xed but b ! 1,
imately N (;
which corresponds to letting the prior become very at.
Continuing with this example, let is nd C = (c; d) such that
P r( 2 C jX n) = :95. We can do this by choosing c such that
P r( < cjX n ) = :025 and P r( > djX n) = :025. So, we want to
nd c such that
c n
n
P ( < cjX ) = P
<
X
c = P Z<
= :025:
Now, we know that P (Z < 1:96) = :025. So
c = 1:96
implying that c = 1:96: By similar arguments, d = + 1:96:
So a 95 per cent Bayesian interval is 1:96 . Since b
and se, the 95 per cent Bayesian interval is approximated
by b 1:96 se which is the frequentist condence interval. 12.3 Functions of Parameters
How do we make inferences about a function = g ()? Remember in Chapter 3 we solved the following problem: given
the density fX for X , nd the density for Y = g (X ). We now
simply apply the same reasoning. The posterior cdf for is
H ( j
xn )
= P(g () ) =
Z
A
f (jxn )d
12.4 Simulation
211
where A = f : g () g. The posterior density is h( jxn ) =
H 0 ( jxn ).
that pj
log(p=(1
Xn
H(
j
xn)
Bernoulli(p) P
and f (p) = 1 so
Beta(s + 1; n s + 1) with s =
n
i=1 xi .
p)). Then
=
=
=
=
n
x ) = P log
1
e n
x
1+e j
P(
P P
Z e =(1+e )
0
P
P
Let
=
n
x
f (pjxn) dp
Z e =(1+e )
(n + 2)
ps(1
(s + 1) (n s + 1) 0
p)n s dp
and
h( jxn ) = H 0 ( jxn )
=
=
=
for
s s (n + 2)
e
(s + 1) (n s + 1) 1 + e
e
(n + 2)
(s + 1) (n s + 1) 1 + e
(n + 2)
e
(s + 1) (n s + 1) 1 + e
1
1+e
1
1+e
s 1
1+e
n s
0 @ 1+e e
@
@
n s n s+2
1
1+e
2 R. 12.4 Simulation
The posterior can often be approximated by simulation. Suppose we draw 1 ; : : : ; B p(jxn ). Then a histogram of 1 ; : : : ; B
approximates the posterior density p(jxn ). An approximation
1
2
A
212
P
to the posterior mean n = E (jxn ) is B 1 Bj=1 j : The posterior 1 interval can be approximated by (=2 ; 1 =2 ) where
=2 is the =2 sample quantile of 1 ; : : : ; B .
Once we have a sample 1 ; : : : ; B from f (jxn ), let i = g (i ).
Then 1 ; : : : ; B is a sample from f ( jxn ). This avoids the need
to do any analytical calculations. Simulation is discussed in more
detail later in the book.
Example 12.4 Consider again Example 12.3. We can approximate the posterior for without doing any calculus. Here are
the steps:
1. Draw P1 ; : : : ; PB Beta(s + 1; n s + 1).
2. Let i = log(Pi =(1 Pi )) for i = 1; : : : ; B .
Now 1 ; : : : ; B are iid draws from h( jxn ). A histogram of
these values provides an estimate of h( jxn ). 12.5 Large Sample Properties of Bayes'
Procedures.
In the Bernoulli and Normal examples we saw that the posterior mean was close to the mle . This is true in greater generality.
Theorem 12.5 Under appropriate regularity conditions, we have
2
b se
b ) where bn is the
that the posteriorqis approximately N (;
and seb = 1= nI (bn ). Hence, n bn . Also, if C = (bn
z=2 seb ; bn + z=2 seb ) is the asymptotic frequentist 1 condence
interval, then Cn is also an approximate 1 Bayesian posterior
interval:
P( 2 C jX n ) ! 1 :
mle
There is also a Bayesian delta method. Let = g (). Then
2
jX n N (b; se
e )
12.6 Flat Priors, Improper Priors and \Noninformative" Priors.
213
where b = g (b) and se
e = se jg 0 (b)j.
12.6 Flat Priors, Improper Priors and
\Noninformative" Priors.
A big question in Bayesian inference is: where do you get
the prior f ()? One school of thought, called \subjectivism"
says that the prior should reect our subjective opinion about
before the data are collected. This may be possible in some
cases but seems impractical in complicated problems especially
if there are many parameters. An alternative is to try to dene
some sort of \noninformative prior." An obvious candidate for
a noninformative prior is to use a \at" prior f () / constant.
In the Bernoulli example, taking f (p) = 1 leads to pjX n Beta(s + 1; n s + 1) as we saw earlier which seemed very reasonable. But unfettered use of at priors raises some questions.
Consider the N (; 1) example. Suppose
we adopt
a at prior f () / c where c > 0 is a constant. Note
R
that f ()d = 1 so this is not a real probability density
in the usual sense. We call such a prior an improper prior.
Nonetheless, we can still carry out Bayes' theorem and compute
the posterior density f () / Ln ()f () / Ln (). In the normal example, this gives jX n N (X; 2 =n) and the resulting
point and interval estimators agree exactly with their frequentist counterparts. In general, improper priors are not a problem
as long as the resulting posterior is a well dened probability
distribution.
Improper Priors.
Flat Priors are Not Invariant. Go back to the Bernoulli
example and consider using the at prior f (p) = 1. Recall that
a at prior presumably represents our lack of information about
p before the experiment. Now let = log(p=(1 p)). This is a
transformation and we can compute the resulting distribution
214
for . It turns out that
f ( ) =
e
:
(1 + e )2
But one could argue that if we are ignorant about p then we are
also ignorant about so shouldn't we use a at prior for ? This
contradicts the prior f ( ) for that is implied by using a at
prior for p. In short, the notion of a at prior is not well-dened
because a at prior on a parameter does not imply a at prior
on a transformed version of the parameter. Flat priors are not
transformation invariant.
Jereys came up with a \rule" for creating priors. The rule is: take f () / I ()1=2 where I () is the
Fisher information function. This rule turns out to be transformation invariant. There are various reasons for thinking that
this prior might be a useful prior but we will not go into details
here.
Example 12.6 Consider the Bernoulli (p). Recall that
Jeffreys' Prior.
I (p) =
1
p(1 p)
:
Jerey's rule says to use the prior
f (p) /
p
I (p) = p 1=2 (1 p) 1=2 :
This is a Beta (1/2,1/2) density. This is very close to a uniform
density.
In a multiparameter
problem, the Jereys' prior is dened to
p
be f () / detI () where det(A) denotes the determinant of
a matrix A.
12.7 Multiparameter Problems
12.7 Multiparameter Problems
215
In principle, multiparameter problems are handled the same
way. Suppose that = (1 ; : : : ; p ). The posterior density is still
given by
p(jxn ) / Ln ()f ():
The question now arises of how to extract inferences about one
parameter. The key is nd the marginal posterior density for
the parameter of interest. Suppose we want to make inferences
about 1 . The marginal posterior for 1 is
f (1 j
xn )
=
Z
Z
f (1; ; pjxn )d2 : : : dp:
In practice, it might not be feasible to do this integral. Simulation can help. Draw randomly from the posterior:
1 ; : : : ; B f (jxn)
where the superscripts index the dierent draws. Each j is a
vector j = (1j ; : : : ; pj ). Now collect together the rst component of each draw:
11 ; : : : ; 1B :
These are a sample from f (1 jxn ) and we have avoided doing
any integrals.
Example 12.7 (Comparing two binomials.) Suppose we have n1 control patients and n2 treatment patients and that X1 control patients survive while X2 treatment patients survive. We want to
estimate = g (p1 ; p2 ) = p2 p1 . Then,
X1 Binomial(n1 ; p1 ) and X2 Binomial(n2 ; p2 ):
Suppose we take f (p1 ; p2 ) = 1. The posterior is
f (p1 ; p2 jx1 ; x2 ) / px1 1 (1 p1 )n1
x1 px2 (1
2
p2 )n2
x2 :
Notice that (p1 ; p2 ) live on a rectangle (a square, actually) and
that
f (p1 ; p2 jx1 ; x2 ) = f (p1 jx1 )f (p2 jx2 )
216
where
f (p1 jx1 ) / px1 1 (1 p1 )n1 x1 and f (p2 jx2 ) / px2 2 (1 p2 )n2 x2
which implies that p1 and p2 are independent under the posterior. Also, p1 jx1 Beta(x1 + 1; n1 x1 + 1) and p2 jx2 Beta(x2 +1; n2 x2 +1). If we simulate P1;1 ; : : : ; P1;B Beta(x1 +
1; n1 x1 +1) and P2;1 ; : : : ; P2;B Beta(x2 +1; n2 x2 +1) then
b = P2;b P1;b , b = 1; : : : ; B , is a sample from f ( jx1 ; x2 ). 12.8 Strengths and Weaknesses of Bayesian
Inference
Bayesian inference is appealing when prior information is available since Bayes' theorem is a natural way to combine prior information with data. Some people nd Bayesian inference psychologically appealing because it allows us to make probability
statements about parameters. In contrast, frequentist inference
provides condence sets Cn which trap the parameter 95 per
cent of the time, but we cannot say that P( 2 CnjX n ) is .95.
In the frequentist approach we can make probability statements
about Cn not . However, psychological appeal is not really an
argument for using one type of inference over another.
In parametric models, with large samples, Bayesian and frequentist methods give approximately the same inferences. In
general, they need not agree. Consider the following example.
Example 12.8 Let X
N (; 1)
and suppose we use the prior
N (0; 2 ). From (12.5), the posterior is
x
1
jx N
;
= N (cx; c)
1
1 + 2 1 + 12
where c = 2 =( 2 + 1). A 1 per cent posterior interval is
C = (a; b) where
p
p
a = cx
cz=2 and b = cx + cz=2 :
12.9 Appendix
217
Thus, P( 2 C jX ) = 1 . We can now ask, from a frequntist
perspective, what is the coverage of C , that is, how often will
this interval contain the true value? The answer is
P (a < < b) =
=
=
p
p
p
P (cX z=2 c < < cX + z=2 c)
+ z=2 c
z=2 c c
<X <
P
c
c
(1 c) z=2 c
(1 c) + z=2
P
<Z<
c
c
p
c
p
pc p
p
(1 c) + z=2 c
(1 c) z=2 c
= c
c
where Z N (0; 1). Figure 12.1 shows the coverage as a function
of for = 1 and = :05. Unless the true value of is close to
0, the coverage can be very small. Thus, upon repeated use, the
Bayesian 95 per cent interval might contain the true value with
frequency near 0! In contrast, a condence interval has coverage
95 per cent coverage no matter what the true value of is. What should we conclude from all this? The important thing
is to understand that frequentist and Bayesian methods are answering dierent questions. To combine prior beliefs with data
in a principled way, use bayesian inference. To construct procedures with guaranteed long run performance, such as condence
intervals, use frequentist methods. It is worth remarking that it
is possible to develop nonparametric Bayesian methods similar
to plug-in estimation and the bootstrap. be forewarned, however, that the frequency properties of nonparametric Bayesian
methods can sometimes be quite poor.
12.9 Appendix
It can be shown that the eect of the prior diminishes as
n increases so that f (jX n) / Ln()f () Ln(). Hence,
0.4
0.0
0.2
coverage
0.6
0.8
1.0
218
0
1
2
3
4
5
FIGURE 12.1. Frequentist coverage of 95 per cent Bayesian posterior
interval as a function of the true value . The dotted line marks the
95 per cent level.
219
log f (jX n) `(). Now, `() `(b) + ( b)`0 (b) + [(
b)2 =2]`00 (b) = `(b) + [( b)2 =2]`00 (b) since `0 (b) = 0. Exponentiating, we get approximately that
f (jX n) / exp
(
1 ( b)2
2 n2
)
where n2 =
1=`00 (bn ). So the posterior of is approximately
Normal with mean b and variance n2 . Let ì = log f (Xi j), then
n 2 =
= n
=
X
`00 (bn ) =
1 X
n
nI (bn )
i
i
`00i (bn )
`00i (bn ) nE h
i
`00i (bn )
and hence n se(b). 12.10 Bibliographic Remarks
Some references on Bayesian inference include Carlin and
Louis (1996), Gelman, Carlin, Stern and Rubin (1995), Lee
(1997), Robert (1994) and Schervish (1995). See Cox (1997), Diaconis and Freedman (1996), Freedman (2001), Barron, Schervish
and Wasserman (1999), Ghosal, Ghosh and van der Vaart (2001),
Shen and Wasserman (2001) and Zhao (2001) for discussions of
some of the technicalities of nonparametric Bayesian inference.
12.11 Exercises
1. Verify (12.5).
2. Let X1 ; :::; Xn Normal(; 1): (a) Simulate a data set (using
= 5) consisting of n=100 observations.
(b) Take f () = 1 and nd the posterior density. Plot the
density.
220
(c) Simulate 1000 draws from the posterior. Plot a histogram of the simulated values and compare the histogram
to the answer in (b).
(d) Let = e . Find the posterior density for analytically
and by simulation.
(e) Find a 95 per cent posterior interval for .
(f) Find a 95 per cent condence interval for .
3. Let X1 ; :::; Xn Uniform(0; ): Let f ()
posterior density.
/ 1=.
Find the
4. Suppose that 50 people are given a placebo and 50 are
given a new treatment. 30 placebo patients show improvement while 40 treated patients show improvement. Let
= p2 p1 where p2 is the probability of improving under
treatment and p1 is the probability of improving under
placebo.
(a) Find the mle of . Find the standard error and 90 per
cent condence interval using the delta method.
(b) Find the standard error and 90 per cent condence
interval using the parametric bootstrap.
(c) Use the prior f (p1 ; p2 ) = 1. Use simulation to nd the
posterior mean and posterior 90 per cent interval for .
(d) Let
= log
p1
1 p1
1
p2
p2
be the log-odds ratio. Note that = 0 if p1 = p2 . Find
the mle of . Use the delta method to nd a 90 per cent
condence interval for .
(e) Use simulation to nd the posterior mean and posterior
90 per cent interval for .
12.11 Exercises
221
5. Consider the Bernoulli(p) observations
0101000000
Plot the posterior for p using these priors: Beta(1/2,1/2),
Beta(1,1), Beta(10,10), Beta(100,100).
6. Let X1 ; : : : ; Xn Poisson().
(a) Let Gamma(; ) be the prior. Show that the
posterior is also a Gamma. Find the posterior mean.
(b) Find the Jereys' prior. Find the posterior.
"!#$
%'&
(*),+)-/.0)-213+5476981:-/.-2;=< >@?A8;=BC
DFEHGID
J5KML#NPORQ7OPS,TUKVOPLFW
X [Z:\]#^#_^*`_Z/
a]b,_`c*
/]`H!#\2d
Hce
Vfbg
Y
ch!#ciaj a]"]F^kl/c*
]m0/nco_]"^p]q:c*]c*_b5_`/ce
V/]
^rsU]`/_]tco[
vuvwxrqy
\z[mU`=
5ce
{|]/#}~
{"]
#/
V/=l/c*
]_u3]}^#]}o\2]F]o
co]d/#ceo*
Fg
`}:q]!^r0"P[7Mb'}\2r3
oq]c*
a/_]`{Aq]
\]co
`sl2
l/\
a`]F\_^!#_u
]#^#
d
/
co_*}#\2aZt
d
/
co_`
\ohu
_:
Ut
e_`c*
/] ]q0"uFwx*/3a¡
!
:]q^_\]*/#]l{mF
l/c*
]~¢]co_co\[
aa^d
s"y[y£V¤¦¥*
^d/#U]à
Z a!_]q¦$^\_`]!#a$
`\[
aa^§yu
XYd
aac*
!`d/n^#\__
\z{p_x}¨r
^©
!##

¥y[rªI¤¦MU¬«:P"® ¯zu¦°]c*
aa{m0«±c*
s³²´µ/]¶:u¢:`
!"
`]co$zf#
coa_]q a]q!#\_/]_
«:PF® ¯# y%$ ¯'&
`b!
_^_]a]`m
«:PF® ¯ #)( *$©+ (
]a!#/_`]a]`m
«:PF® ¯ #)( *
$©+ ( ,
« , a]_®
^10q32# 4_`] gx]#a]`m
«:PF® ¯ #.-=q¢/
# s
«:PF® ¯ #65*a]8
79;:=<>@?A HGJI¯'KLG M!aa
\2jg _a~a]ON
9;:=<>C?B AED/F
t
oq!\z/]
q]aa]}/
V
`c*
/] o
t
]q^
/
#u#]sco
Q4_t/#~]bm#`]c*z/c*_ }}aa}`/
SRk¯zub]s
``
e_`c*
/]_m}~3_Z a!
//#
[Z_/
3a]`]
ju
P~
tco^AA}
TVUOW+XLY Z[Y \]X3^_]`a^cbedgf
v=yF®2
¯#xw
?
VPOhjik/lm
7«:P"®z
¯
D
#zy
fnporqSs l o iut

qQn
«:P"®z SGU¯¯ HGJI2¯EKLGJN
F
X 'oa]`$q!#\_]k$b!
^'`]m*``jr{`!l |*}~ c*
b!
`^|`]2¯ v=PF®2
¯V#w
?
R%
$p¯ &
#
|*}~
#
?
R ¯j
Yu &
?
I ¯N
xw s/`l,]qU\2
F/_mbyªho|0z§#8*§F¥y[ªx¤
M.h£§M£¤[yJ§[¤YpMp¥y[Aªx¤¦M P¤0§M"
bMV
DFEHGC
Q TK OPSnORWjVSV ¡ ¢O SW
¦]|\]co
`=x}]r_`/ce
V/]`}~*\[
Y\]co
`=$`jrq!\ g
/]_u3]}_Z_m"/ ^#]F_ #] ]Z"^:
$\a[
l}~_~
,]}#\2
l/ce
]¢Uz/_u ]#^#/q]aa]}#hzf#
coa_u
£¤ ¥§¦ ¨ U8^_©`«ª¬fpo ®
R z¯pyF®0¯°lm+±§l nnp²³s´f¶µf l t f¶²npq ¸
m · n;¹;² lt f ±
f tt[iutºQi nn;»½¼ ium npq ± f t o¢µ i fnporqSs l o iut n;¾
¿/#ÀR l m+± & Â
# Á »Ãbedgf
¸!' [

«
Á
v=P"®
& ¯
v=yF®
¿¯
0
-
0
¸!E ¶
!
V
)(
-
$# % '&
*
Á
[
¸
!'a

"! %a

!
k ² m o¢q ium n lt f v=PF®2
¿¯# w ? SR $¬¯&3# 0 lm+± v=yF®
oSdgf m3v=yF®
w ¢Á½$£¯ & #rÁ½$´¯ & N §i o¢q f/oSd l o q k3

?
v=P"®z
¿ ¯ i oSdgf t µVqQnf v=P"® ¿¯ v=PF®2 & ¯ » fpqSoHdgf t fnporqSs l o iut
q i ² n º d
kiut s º ±¸i s8q m+l o fn8oSdgf i oSdgf t nff q · ² t f » »
&
t q ± q ² ºQi ² n fnpo¢qSs l o iut ;²³o*qSo nf t fno i q ºSº ² npo tEl o f´oSdgf i q m o oSd
qQn m i o i q i ² n*d i µo i i s lt f%orµ it qQn k ² m orq ium n;»
t qQn
*
.- 0/ 0/1
*
2/
+
)7 98 ;:=< 4: ?>A@CB 65
@
B
D
)( *
HG
E* FD
,+
65
@CB
JI
ML *
6K
9NO@ )K'0 $PK ¯ N
Q
¯
&
&
#
¯/
43
² m q
qQn l
l o qSo
£¤ ¥/¦ ¨ U ^_©`«_x¬Jfpo 3
R ¿ ®NN;N® R P~_]!#aaP ¯ » ¼ ium npq ± f t n;¹;² l t f ±
;q l n Vµf*d l f%oSd l o
f ttiut%ºQi nn lm+±´º fpo V
¿ # R » q m f½oHd¸qQn½d l n
K
%K K `¿ ¯#= R¯V#
R
v= ®
m i oHdgf t fnporqSs l o iut qQn
B
T
U
S
T WV Q
Z
IZ ¿ R lm+±\Tjlm+±
µ dgf t f
#YX
V lt f]D i npqSorq4Bf *ium npo lm oSn;» bed¸qQn
S
[
oHdgf^D i npo f t q iut s´f lm ²npq m¸·´l P~_/
`_4
T ®aV'b D t q iut » +§i µcv=6K® K ¯ #
K ¯ Y K ¯¯ &
&
, &
, &
&
d
d w d S WT
UT
#

S

$PK
,
,
T eV Q^f
T3eV3 Qf
f
Q 0K '0 $PK ¯ d Q K WT $PK & N
#
gT W
V Q ¯&
T e
V Q
f
K
&
#
qQn
!"
K
v= ¿¯
v=
K & ¯
0.0020
0.0025
0.0015
0.0000
0.0005
0.0010
0.0
0.2
0.4
K
0.6
] « \[
@! 0.8
!
0 ¸%«
¿ L
&
+§i µ º fpo T1# V # Q » _ml Fs D º f:=<»4:=<
* d i q * f;» b bedgf t fnp² º o¢q m¸· fnpo¢qSs l o iut qQn
K &
)(
lm+±t qQn
* o¢q ium
k ² m
#
S
Q

*
µf½µVq ºSº f D
ºal q m oSd¸qQn
Q Q
qQn
6K K & ¯V# Q ¯ &
Q
Q
)(
1.0
v= ®
N
OD
:=<
R
bedgf t Qq n k ² m ¢o q ium n »¸»
n µf
l t f
Qº i orof ± q m· ² t f
m fpqSoSdgf t fnpo¢qSs l o iut ² m q kiut s º ±¸i s8q m+l ofn½oHdgf i So dgf t »
65
* lm
)-
nff
`zf#
coa_#aH:_^e/] U3
a] \_]co
:j
q!\z/]u"]h^]5`]mb}~#^
]zgIb!ch~`!coce
l{=]qj
q!\z/]u¢}]!\2p`!coce
`_ =/*ce
VfFch!c `j
^Y
P¢
[{_3``ju
¸!' [

«
TVUOW+XLY Z[Y\³X^_©`)bedgf '§0r¤
¯V#
v=
lm+±M0h§F=VPOh
P¯
µ dgf t f
qQn
¯ P¯EKb
v=P"®
y
0Á#u ¯
t q iutVkiut »
FD º f:=<»<»
£¤ /
¥ ¦ ¨ U ^_©`
f½d l f
¼ ium npq ± f t l· l q m oSdgforµ i fnpo¢qSs l o iut n q m l s
B
lm+±
0Á#ua0M¯
?
D
qQn l
qQn
¯
!# =
v yF®
® ¯#
Vy;h
L
K
v= ¿ ¯V#
ce
Mf ¿ K'0
,
Q
$
K¯
#
0
Q
K & ¯#¬c*
Vf Q ¯ & # Q ¯ & N
Q
Q
Q
Q
,
°l nf ±3ium s l qSs8²³s t qQn)(- K qQn l @pfpo¢o f t fnpo¢qSs l o iut npq m* f v= K ¯]/
&
&
B f t - µ dgf m Q qQn ºalt · f)- v= K ¿¯ d l n½nps lºSº f t½t qQn)( f * fD+o
v= K ¿¯ »
i µf kiutl nps lºSºt f · q ium q m oSdgfFD ltEl s´fpof t nD l* f m f lt]K1#À0 »%bed¸² n)s lm 5 Dgf i D º f D t f k f t K ¿ o i K »bed¸qQn3q ºSº ²npo tEl ofn oSd l o ium f 3 m ²¸s;@pf t
&
np²³s8s lt q fn º q ( f8s l qSs8²¸s t qQn)( lt f qSsFg
D f tSk f * o » +/i µ *ium npq ± f t oSdgf
( »"8 iut q ºSº ² npo tEl orq ium - º fpo ² n§o l ( f %K ¯ #c0 »°bedgf m
°l 5 fn t qQn)
® K ¿`¯V# y v=%K® K ¿`¯EK K´# y K 0 $ K ¯ K K# 0
Q
Q
v=
lm+±
® K & ¯#zy v=%K® K & ¯'KK´# Q ¯ N
Q
Q &
- - ® K & ¯
® K ¿l¯ µ d¸q * d8np² ·· fnpoHn°oSd l o K
8
Q
@
i t u
¿ qQn l pfpo¢o f t
fnpo¢qSs l o iut »bed¸qQn3s8q · d¸o nffps q m ¢o ²¸qSo¢q f º t f l n ium+l º f ;²³o§oHd¸qQn lm
npµf t ± f gf m+± n ium oHdgf d i q f ik t q iut » bedgf l ± o lm o l· f ik ² npq m¸·
s l qSs8²³s t qQn
s n qQnoSd l o°Sq o ±¸i fn m i o t f ¹;²¸q t f
± fn +qSofqSoSn t[i º fpÃ
D
)(-
D
*
4B 65
D
*
D @ )-
B
@ E@
3
!"
D
*
3
- * FD º f D t[i @ º fpsÃn)65
* ² º o »HG
d q · d ± qSs´f m pn q ium+lº i s
ium f o i d ii nf l
t q iut » pm ¸
d ii npq m¸·´l3± f k f m npq º f t q iut l m pf8f o t pf s´f º ± q
*
@ D
0*
@
`x}]´!coc*
e] qj´q!\z/] !`nx}]´^q¡g
q`co_]"^:q]3^zZ"o`c*
/]\2]"]`d=
]ecoc*@4
]
/#oc*
VfFc5!c9`ja[
^/]Ac*c*
Vf|l/ce
]OI\2]"] n
coc*@ 4/§
P¢
[{_3``jda
^¢/
] P¢
[{t_`c*
/]`u
R
*
TVUOW+XLY Z[Y \]X3^_]`
t qQn qQn lºSº f ± l
kiut t q iut q k
)(
*
D
5
f oSd l os8q m qSs8qufn½oSdgf °l f n
± f qQnpq ium t ² º §
h§U"V¤¥y »
i t s u
l ºSº
qQn l fn t ² º f
8
v=P"®z
¯#
65 -
A5
#q ® ¯
?
B
'0ÁFu=Á¯
µ dgf t f§oHdgf§q m s8²¸sÂqQn i f t/lºSº fnporqSs l o iut n »
R
43
TVUOW+XLY Z[Y \]X3^_]`
m fnporqSs l o iut oHd l o3s8q m qSs8qufn oHdgf s l q
s8²¸s t qQn §qQn lºSº f ± ½
l py¦r'§r ¤¥ » iut s lºSº Qq n s8q m
qSs l q k
v P"®
=
¯V#¬#q ! v=P"®
¯
'0ÁFu b¯
?
?
µ dgf t f§oHdgf§q m s8²¸sÂqQn i f t/lºSº fnporqSs l o iut n »
)(
*
8
65 -
3
B
DFEHGE
oT LFW hW¡ ORQ T¡UK W
0z Y$
s]u°`]c ¢P [{ /_]_c|mU]`/_]^#x{
F
P( GU¯V#
HGV( ¯ y¯
S GU¯
F
#
5
F
S GV( ¯ y¯
F HGV( ¯ P¯EKb
'0Á#u ¯
}_ HGU¯# 5 HG0®/¯EKb # 5 SGV( ¯ P¯EKbrh '§P0§¥
P_[ ¦¤0M']qR£F u / F _bVydVy;hY] q,
_`c*
/]
%
« p" %
!'
SGU¯{
( GU¯V#
UO\CU¥
@
5
bedgf °
l
«:P"® SGU¯¯ P( GU¯EKb¸N
y
)( ® ¯
fn t qQn
® ¯#
y
n l orqQn fn
'0Á#u ¯
F
+ ( GU¯ HGU¯©KLGJN
9B
¬Jfpo
HGU¯ pf/oSdgf lº ²gf ik oSd l o¶s8q m qSs8qufn
f n fnpo¢qSs l o iut »
°l
I+ ( GU¯
»*bedgf m qQn oSdgf
5
¦XY\
A_}`/§P¢
[{_3``jn
q]aa]}
d
® ¯ # y v=yF® ¯ y¯'K"%# y y «:PF® HGU¯`¯ SGV( ¯EKLG y¯EKb
f
F
«:P"®z SGU¯¯ HG0®2¯EKLG+Kb§#
#
y
y
#
y
dµ
y «:PF®2
HGU¯`¯ y+( GU¯EKb
y
F
F
f
«:yF®z# HGU¯¯ P( GU¯ HGU¯EKLG+Kb
y
F
HGU¯©KLG
#zy
I ( GU¯ SGU¯©KLGJN
Iw qU}\2]"] HGU¯]U¢¢Z a!~]qU3/
V,c*coQ4_ + ( GU¯/_
}~}aabcoc*@4H¢
^s
_Z`{/Go
^5b!c*coQ4_H/
/_/
Ja 5 I ( GU¯ S GUE¯ KLG0u G
:]M} }5\
#^
fFa\q]`ch!a
=q]/#ÃP¢
[{_`/ce
V/]
q]]c*$\_ \$a]¢q!#\_/]_u
UO\CU¥
Hk«:yF® ¯ # P $ ¯ &
oHdgf m oHdgf
SGU¯#zy y+( GU¯'Kb/#w$y+( R
F
5
5
l
#xGU¯pN
fn/fnpo¢qSs l o iut qQn
'0Á#u ¯
Hk«:P"®z
¯%# ( $ ( oHdgf m oSdgf °l f n3fnporqSs l o iut Qq noSdgfs´f ± q lm
oHdgf i npof t q iut
P( GU¯ » Hkt«:P"® ¯ Qq n uf ti ium f ºQi nn Ho dgf m oHdgf °l
F
fnpo¢qSs l o iut qQn%oSdgf§s i;± f ik oHdgf i npo f t q iut
P( GU¯ »
F
?D
^D
3
)-
5
i k
fn
#X' }aa"#]Z~]c q]b!
^s_`] a]_uV#
P¢
[{_
!#a SGU¯0coc*@4_ I+ ( GU¯#65:Pg$ HGU¯`¯'& P+( GU¯EKbFu¦
jb#
F
!"
/#3^_Z
V/Z]q I ( GU¯H}/o_\z~/]h SGU¯ #^ez b!
a
/´
] -*{"_a^/5b!
]r 5y§$ SGU¯¯ P( GU¯EKb #c-#uF]aZ"oq]
F
S GU¯~}~$z/
0Á#u "u G
JI
;(
£ ¤ ¥§¦ ¨ U8^_©`^¬Jfpo R ¿2®NNN_® R

¯'H®e&z¯ µ dgf t f & qQn m i µ m »
²
i nf¶µf²nf l ¯'
® & ¯ t q iut kiut »bedgf °l fn fnporqSs l o iut µVqSoSd
d qQn
t fn gf oo i n;¹;² lt f ± f tpt[iut%ºQi nn%qQn§oSdgf i npo f t q iut s´f lm µ d¸q 3
L DD
D *
D
DFEHG
D
&
SR 2¿ ®NNN_® RJI¯V#
5
RÀ
I
&
I
I
&
-
*
G
gN
ORSORQ7Tj3NPLFW
#]ac ]q ^#oco#c*
Vfo!a\]coa\[
^
#^A}
\[
]
V/_c*F
p\]coa_/\]Z_/
] q
d]`{7##!#
}d}aa co/]´
q_} jz{£_!a/u nce
pco`/
d]r2
j [}¢
[{oq]c_\_/]d¸ P¢
[{_¢l/c*
],}n
\_]`/
j
q!\z/]A
coce
Mfvu
U;\CUu¥ .¬Jfpo @pf§oSdgf
fn t ² º f kiut n i s´f
t q iut
® V¯ # #q H® ¯N
L²DD
D
5
°l
¾
'0Á#u¯
?B
i nf%oHd l o
v=PF® ¯ bedgf m qQn%s8q m qSs llm+±
qQn
H® ±
¯ q]t
aa³N
'0Á#u¯
* lºSº f ±l5¥b§_oªI§!V§ ¥
»
"F!#U]s/
V5n#]c*c*
VfUuk/_=$
"g
]`!a= `!\2/
! ? v=P"® ¯/ ! ? v=yF®z ¯ u#F#\5
[
Z_/
*]q
nq!\z/]' a }~
{" a`
Y]$b!
a,/]Ac*
VfFg
ch!cAmF}
[Z$/
H® $¯ ! v=yF® ¯ u3_\ m
?
® ¯%¬`! =
v P"® ¯]/7!# =
v yF® %
¯ ?
?
® ¯
% 9
% }\2A\]b/
^#\_ '0Á#u ¯ u G
UO\ QU¥
L²DD i nf oSd l o qQn°oSdgf l 5 fn t ² º f µVqSoSd t f 3
ng
D f * o o i n i s´f,D t q iut » L²DD i nf k ² t oHdgf t oHd l o d l n *ium npo lm o
( ¾ v=PF® ¯# kiut n i s´f »°bedgf mn qQn%s8q m qSs l »
t qQn)
¢P [{`j H® ¯¶#c5v=PF® ¯ ,P¯'Kb8#$
#^
#\ v=P"® ¯ H® ¯q]
aaFu3
]} a{Ah`_Z"]!3] g
_c|u G
£¤ ¥/¦ ¨ U ^ _©`^_ ¼ u
i m npq ± f t oSdgf f t m i ² Sº º qs i;± f º µVqSoSdn;¹;² lt f ± f tt[iut
µ f ± So d l o oSdgf8fnporqSs l o iut
ºQi nn;» pm f
l s º f » ´µf½nd i FD :=< <
IZ
X
I
K0 HR ¯ #
[
d l n l
s´f lm
1#
qQn§s8q m
T
V
-
*ium
)(
*
Z
¿ R
Q

Q
Q
ED
gT V
npo b d¸qQn f nporqSs l o iut qQn1oSdgf
l m o t qQn k ² m ro q ium » e
i npof t q iut
lm+± dgf m f oSdgf °l f n t ² º f kiut So dgf t q iut%P~_/
® ¯ µVqSoSd
» /f m f
Ho dgf t f q i ² §
n oHdgf iut fps
oSd¸qQn8fnpo¢qSs l o iut
# qSs l »
*
Q G
£¤ ¥/¦ ¨ U ^_©`^
o¢q ium
¬Jfpo
5
* )-]@ 5
)-
D
^D B
¼ ium npq ± f t*l· l q m oSdgf
@
-
* 3
f tm i ² ºSº q ;²¸oµVqSoHd ºQi nn k ² m
«:6K® K ¯V# K06 K' 0 $ $PK K ¯'& ¯
K HR I ¯# K # S
Q
N
N
)(
d K$PK¯&
d K0 $2K¯
0
0
v=%K® K
#
¯V#
K ' 0 $PK¯ f # K'0 $PK¯
Q f Q
µ d¸q * d- l n l k ² m* orq ium ik K -qQn *ium npo lm o » o * lm @pf*nd i µ m oHd l 4o - kiut
oHd¸qQn ºQi nn k ² m* orq ium -
K HR I ¯ qQn°oSdgf l 5 fn%fnporqSs l o iut ² m+± f t oHdgf D t q iut
%K ¯ #c0 » §f m* f)- K qQn§s8q m qSs l »G
bedgf t qQn qQn
!"
!/
ab!l/]/]d
jA_}
V:/5c*c*
Vfn_`c*
/]
q]
3]c*
aco]F^_a
U;\CUu¥ ¬Jfpo R3¿z® NN;N_® RJI/ ¯pyF® 0Ã
¯ lm+±º fpo /
# R »°bedgf m
qQn8s8q m qSs l µVqSoSd t fnDgf * o o i lm 5 µf ºSº 3 @pfd l Bf ± ºQi nn k ² m* orq ium » o
U¨ ¨ U uU
¥°U XVZ uZ Z U§¨ UU¨ Qq n%oSdgf iumgº fnporqSs l o iut µVqSoSd oSd¸qQn t[i gf t o »
'UOZ ¥Z U;\]XU¤
wIq h
/
co_`
\st``\_/_^mv5/#]`c U]Zs^]F_
X
¥§¥ UpZ ¢Y \Z Z U
\ ¢Y¸Y X¸` ]:
a{d
/$ f"t f
c*a]}_u
U QU; ¨ Z \]¨ ¦
£ ¤ ¥§¦ ¨ U8^_©`^ ²
i nfoSd l o R ¯'PF®;0M¯lm+± oSd l o qQn m i µ m
Z\ 'UOZ \ ¥°U QU
o i º q fq m oSdgfq m o f t lº$ ® ! µ dgf t f ®0 »8bedgf´² m qr¹;²gf
]`
s8q m qSs l fnporqSs l o iut ² m+± f t n;¹;² lt f ± f tt[iut%ºQi nn%qQn
^D D
65
5
L DD
B
`/
/
J(
# HRk¯# 2
0 R¯
$&%$'$)(%_¯ $&% *$)(% ¯ » o * lm @pf
)-
µ g
d f t f 2
#"¯ #
nd i µ m oHd l oVoSd¸qQn
qQn oHdgf °l fn t ² º f*µVqSoSd t fn gf oo i oSdgf t q iut oHd l o +²¸oSn*s l nn + l o
lm+± s l nn + l o $ »-, iut f i f t qSo l m pf n d i µ m oHd l ooHdgf t qQn
qQn m i o ium pn o lm o ;²¸oqSo ±¸i fn n l orqQn k %v=PF®
¯$ ® ¯ kiut%lºSº nff
q · ² t f » » /f m f Ãbedgf iut fps
» 3qSs º q f n So d l o qQn s8q m qSs l »
5
G
*
;:=< <
8
DFEHG/.
F:
@
D *
* )-
D
B - * @
5
:=< 4:: FD
TOPQQ 3
0 OrL#NPO21V4365
D
:
7
)(
OPSORQ T¬TvS73 =T L#W
° ]
/
c*z/\Hco]F^_a
0/
V/lq {}[
j`!a
x{t\_]^/]m
/#dce
Mf#ch!c aj_a]F]"^k`c*
/]55
`][f#ce
V/a{c*c*
VfUu
]`^_~`"!
_^n`]~a] }\2e~`b!
_^n¡
a!Z `¡
#\u
wx
/
co_\c*]"^a¢}a¡

c*#a_m\[
`]}/
V
: ¦LY ¨=¨ ;
Z U /#hZ
¡
\hc ^]co
V/s
]n/#o``j]q,/# |98¸~
<=u QU
YuY V\/
U ]!a{db!
a/Z `¡
\_
> Q ( & ¯@? LY ¨ U Z U
¯ # ? ¯ Yu &BA ? p¯ N
v=PF®
u ¢Y X;U Y \/\/
U ¿
> Q ( ¯ `
}*
[} /# #/_$]Y
/
co_\hc*]"^_amU/sZ \_
]q/ |C8~ `][f#ce
V/a{
0
¯ A
=
P¯
Q:D
%F 6 =!
%
L
"
5
10
15
20
!
¸ % Ã

−0.5
2
0.0
0.5
!
;
« a
[
@!Ã[ !'a ! % ' !
/a %

C !' ¸ !'!
E

§
%'
©
\
\ #
}` D y¯~¢° `#q]c*
]u":\_m
v PF®
¯
Q=
A
D
0
P¯
0Á#ua0-¯
N
°]=
{£]=l/ce
]=ms\
U#]M} /
Voq]sa¡
` Q m
v=P"®2
¯ v=yF® ¯zu |]$_\a{m
ac a Ic7`! !# Q =
? ( ? v P ® ¯
0
D
P¯
N
0Á#ua0L0M¯

[{F$/
[mU
na]"\[
aPmUa
h/
coah_ m/ |C8~ c*Fg
c*
VfUuUwI$\
k
a`]ds]}/
t/ |C8~
#][fFc*
/_a{A/
P¢
[{_3`!a u
wx7`!coce
l{m 7
/
co_\|co]F^#a=}7a¡
A/
coam /
|C8~ h
`]fFce
a{'coce
MfY
^ P¢
[{_u,`=
\[
[Z
[
/_Y`!a|#[
j¬^#]M}¬} /kb!chA]qs
co_/_
a¡
`$
¢$zf"3zf#
coa`]}u
Z ¯py Z ® e& ¯ -&% #
£¤ ¥/¦ ¨ U ^_©`^ !8 X#"\ ¥ ¨¥°U X!$ ¬Jfpo
Q
S
0® NN;N® Q »Ã¬Jfpo
#

¿ ;® NNN® I¯ ± f m i o f oHdgf ± l o llm+±º fpo #
S
S
S
!"
P¿z®;NNNz®2CIb¯§±
f m i of§oSdgf§² m
(
m i µ m
D
R
ltEl s´fpof t n;»
nnp²³s´f½oHd l o
I Z
FI P ¿z®NN;N_®2CI¯~ Z & &
[¿
- oSdgf t f lt f l ns lm 5 D ltEl s´fpof t n
kiut n i s´f - » pm oHd¸qQns i;± f º *
U ¥ X " \ ¥ ¨ l n i @pnf t B l o¢q ium n;»§bedgf |C8~ IZ qQn Z# S Z # S ¿ ®;NNNz® S I¯ » m+± f t oSdgf
¥°U X ¦ Q\ ¨ Uu¥
Y ºQi nn k ² m* orq ium£«:PF®
¯# X ¿ $ ¯ & -%oSdgf t qQn)( ik oSdgf |C8~ qQn
¥°\CU LUuXU S ¨ Z u X v=P"®
¯# &» o * lm p@ f½nd i µ [ m oSd l ooSdgf§s8q m qSs lt qQn)
( qQn l DD t[i q43
YZ
¨ \³\
;O`
8X s l o f º65 e& e&p p&z¯lm+±§ium f * lm m+±½lm fnpo¢qSs l o iut oHd l o l* d¸q f B fn
X\³X ¦u S ¥ UpZ ¢Y UO oSd¸qQn t qQn)
( » L q m* f e& e&© p& ¯]/ e& - µf nff oSd l o d l n nps lºSº f t t qQn)(
Z[Y=¥*Z[Y\³X
¦ Q\ ¨ Uu¥
oSd lm oHdgf |C8~ » m D tEl* orq * f)- oHdgf ± q *
f t f m* fp@ fporµff m oHdgf t qQn)( n * lm
CU ¥*Z U¥ Z[Y ¨=¨ U<=LY u ¨ UuXZ Z\
Z L
Y p@ f3np² p@ npo lm orq lº »bed¸qQn3nd i µ n oHd l o/s l qSs8²³s º q ( f º qQd ii;± qQn m i o lm
D orqSs lº fnpo¢qSs l o iut q m d¸q · d ± qSs´f m npq ium+lº D ti @ º fpsÃn;»
i +
¥°\
U¨`
DFEHG
'3Q¨ORWMWMO,ORNPOr¡
|ce
Mf$l/c*
]
^8P¢
[{H_`/ce
V/]
``]"]"^
_ `/ce
V/]
/_~
¦/z{ Z¢c*
aa"``ju F]c*z/co¦¦
a]t!#_q!#a
/]=\2
/
\z/`Q4_
^|_`c*
/]_u
R
TVUOW+XLY Z[Y \]X3^_]`a^
m f npo¢qSs l o iut qQn
f qQnpoSn lm i oHdgf t½t ² º f pn ² d So d l o
v=P"®z
v=P"®z
£ ¤ ¥§¦

n;¹;² lt f
s8qQnnpq
nps lºSº f
v=¢F
Á ®
m it ² º
l ±± f
¯
@
¯
q k oHdgf t f
*
¯±q]3
aa ^
v=yF®
¯±q]3
a[
lt]h¸N
/
v=yF®
*
¨ U8^_©`^ ¬Jfpo R ¯'PF®;0M¯ lm+± ium npq ± f t fnporqSs l o¢q m¸·h µVqSoSd
HR¯#®Á » fµVq ºSº nd i µ oHd l o qQn l ±
± f ttiut´ºQi nn;» ¬Jfpo
º f;» ²
i nf m i o »°bedgf m oHdgf t f8f qQnpoSn l3± q *f t f m o t ² º f VµVqSoSd
¯ # - » §f m f -3#
t t qQn » m
lt o¢q ² ºalt v=¢ÁF® ¯ v=rÁ#®
¯ # 3
5 R SU
G ¯ $ Á¯'& HGJIÁ¯'KLG »%bed¸² n yHGU¯ # Á » i oSdgf t f8qQn
f oHd l o pf l oSn » fF m oSd i ² · d qQn l ± s8qQnnpq º f/qSo¶qQn pº f ltº l
qQnpq ium t ² º f;»
@ ,L DD
)(
D
y§py[[ ¥
*
A@
-
*
HG
B
)-
@
L
* )*
65
3
!% ¸
¢"
]*^#`x{©
eªI¤¦¥y¥*¤ Mqq]o_Z`{7'
#^7_Z`{
-#m5 ? ?( P¯EKb x-#u
UO\CU¥
h§F
V¤¦¥"s§Mr§pP[ ¦¥+ L²DD i nfoHd l o
¶ lm+± oSd l o v=yF®z ¯ qQn l2*ium orq m ² i ² n k ² m* orq ium ik3/kiut f Bf t 5
»¬Jfpo p@ f l D t q iut± f m npqSo 5 µVqSoSd k ² ºSº np²DD iut o lm+± º fpo @pfoSdgf
°l 5 fn t ² º f;» Sk oHdgf
l 5 fn t qQn)( qQn m qSo f%oSdgf m qQn l ± s8qQnnpq@ º f;»
F !#U]
^coà u__ ` fF`
:_`/
!\2 /
v=P"® ¯ v=yF® ¯*q]
aa3£
^jv=P ® ¯ /
!a p
v=P ®2
¯q]¢]co uz
#6v=y ®z ¯$ v=P ®2 ¯ x-#u F\_%v
\_]/b!]!m[` 0
-`!\2 v=yF® ¯O$ v=P"® ¯ q]
aa´P $ ®2 ¯zu 3
]}$m
H® ¯
$
® ¯
v=yF® y
#
#
¯ P¯EKb*$
y
v=P"®
¯ P¯EKb
Q v=P"®z ¯ $ v=P"®z
¯ y¯EKb
? y
v=P"® ¯ $ v=P"® ¯ y¯EKb
? ( y ? y¯'K"
? ( -]N
y
3_\ m H® ¯
®2 ¯zuc*a_U/
^]"#]0c*coQ4_
® ¯}\2A\]/
^\_qy
\_
t h/§P¢
[{:!au G
UO\CU¥
.
¬fpo R3¿ ®N;NN® RJI p
¯ H® e&z¯ »
t[iut§ºQi nn)- R qQn l ± s8qQnnpq@ º f;»
m+± f t n;¹;² lt f ± f t
~]"]q]q¢a¡
`/_]_c b!~/_\2\[
a#
^h ]co`/^
!#:/ ^
o:
:q]aa]}_u5]``]co[
A3
^co`#aq]
{n`\_/a{d]/Z`]_u#¦
j$/$`]/]oU%
¯' ® &_¯zu#X _
&s Zl{pa¡
m0d]``]c*
p5
#][fFc*
/_a{_"!
a]
R£u
3
!"
]} dco#c*
VfFx{
^´
^co`#ax{raj ^dwx£_
aRm¦
3
!#a ce
[{$~] m ]5]¦/_uP~!#¦`
]c* qy
\_ ajb
^coàx{o
Âcoce
VfFx{u
U;\CUu¥ L²DD i nf½oHd l o
npq@ º f;» bedgf m qSo qQn§s8q m qSs l »
j v=RF ®2¯
)(
*
d l n ium npo lm o t qQn
#
3
lm+± qQn l ± s8qQn
nq]e`]c* Vu wIqhY
}~_]
coce
Vf=A/_$zfF`/:
=!a !#\2|/
v=yF®
® ¯V#
uN
%¯ ! =
v P"® ¯ /¨!# =
v F
?
?
¢}~]!#aÂcoa{o/
Vt=
^c*àu G
3]} }=\[
`]MZo
`l/\_^Z`]]q~#]`c
b!
^|`]a]_u
*q]
0ÁFu@0
U;\CUu¥ ¬Jfpo R3¿z®N;NN_®ER;I/z¯'P"®0M¯ »bedgf m -² m+± f t n;¹;² lt f ±
f tt[iut§ºQi nn)- /
# R qQn%s8q m qSs l »
\_\]`^#k]'_]_c 0Á#u ¸0m
o
^co`#au ½0 Q }\2A\]`/
[u_!a:q]aa]}¢q`]c g
jd] q s
]`c 0ÁFu "u G
a ]!©coce
Mfp`!a_*
A]o!
/_^©]Y|
^co`g
ah/z{k
`\_a]`=]A
^coàu #
[{/
V n$_M¥ y
§py[[ ¥£q`rzfF`/
Y!a ^ -p`!\2 /
V
v=P"® ¯ /v=P"®z ¯ $ q]
aa"u
U;\CUu¥ npq@ º f;»
DFEHG
Hk

65
qQn°s8q m qSs l oHdgf m qSoqQn m i onpo t[ium¸·º
q m+l ± s8qQn
J¡VL#OPSWnJTUK T 3V F!]|/
R ¯pyF®0M¯=
^¨\]`^_*_`c*
/'}
b!
^±_]a]ut°]c /#pzZ"]!\z/] }~£jb]}i/
V
3
] « ¸ !' ¸ §!
SRk¯# R o
^#c*a u :]M} \]`^_*_`c*
/x}~]m,!#zg
a¡
^ "!
/_ §# y¿z®2 & ¯0
^5`!]/
R3¿z¯'P¿z®0¯
#^
¯V#YX & ¿ y $ ¯ & u
R
z¯'P ® 0M¯#^^#/a{mM}/ a]`«3yF®
&
&
[
3]`!#à{m H R¯ # R
^coàA}`1R #
H R3¿ ® R ¯ u 3]} \]^$/o_
a@ 4[
]r
] Z ]`ce
aZ co[
u
&
_
# P ¿ ® N;NN®2 [¯zm R # H R3¿ ;® NN NE® R¯}/ R ¯'P ® 0¯ Fg
^#^ ¯3
^ra]3«:PF® ¯ # X ¿ P $ ¯ & u "/_
`]!^_^
[
_Z`{]t}#d:`]MZ^d
[mq Á#m/## H RkV¯ #jR ,
^Fg
c*àuwI\[
Y*`]}Y
/sq]aa]}n_`c*
/]_mjb]}
/
co g "A_`c*
/]_m#
:`ce
aa_~`j

HRk¯#
d
0 $
Z$ Z R Z
X R &f
0Á#ua0M¯
Z
}`´#"¯ # ce
Mf "F®-bu,*l/c*
]=`jb* R o/] g
}¢
^
-#uHco`/
h/
m }_¨l/ce
ce
{£
/
csg
__m¢/_e
nZ a! ``jbpl/ce
_u¢
]``Z ]Ya¡
[{"h
pcoU]`2
b`]a=Yc*]"^_'#]
co_/`\
q!\z/]A_`c*
/]u
DFEHG =O ,NRO K T 1O nLFQ TUK ¦W
Iw 5hô\!a5/] ^£]F]jb5
s\_]Zsco]"^`p^\_`]'zg
]`{r[
V^_/
aPu \z/$]q ^\_`]]`{r\
U q]!^
_aa
o
^ P R--¯zm©P~` '0 ¯ m°!]''0 ¯¢
#^
c*
|
^ àa
A' 0 ¯zu
DFEHG HL"u
K FOPWL#W
0uwx=[
\2o]q/#¢q]aa]M}co]F^#a_m ^r ¯/ P¢
[{_,`jh
#^
/#/~P {:l/c*
]mF!*`b!
_^_]a]ù
P
¯
R
zP~]co¡
a
Q ®gK¯zm=K´zP~z2
gT ®aV¦¯zu
y¯¶R® ]`]U¯ m3
c*c*
"gT® V ¯ u
y\[¯R
z¯'P"® e&z¯~}` e&3jb]}r
^|§z¯p
® &z¯
u
u
!"
Fu:_ R3¿ ®N;NN® RJI6 ¯pyF® e&z¯n
#^ ` !]}~kl/ce

}/a]tq!#\_/]«:P"® ¯ # P/$ ¯ & & u F]} R ^#c*at
Âc*c*
VfUu
Á#u:_s
V¿ ®N;NN®2 Ud
/e
/
co_ \_u ,`]Z
~]l/`]co]F^/ P¢
[{_`c*
/]!^#4_]Vg
]#a]ù
#
u _aa¡
=
#^ ~P __u ¯:_
JI
R3¿z®NNN_® R =U =/
coaq`]cµ
e&[u ]^_`c*
/]`t]q,
^ `#!#/]|}/rZ \_
q]`c V& }` & =/|/
coaAZ `¡
\_u _*/|a]`
q!#\_/]nq]l/ce
e&3
«: & ® & ¯ #
&
&
d
$x0 $pa]
&
&
f
N
° ^*]F/c*
aZ a!#t]q ¢ /
coc*@4H`j=q]~
aa
e&u
Fu rP~à#m0 LÁ¯zu"_
a]¢q!#\_/]
a Q ® K¯
^d!#U]t
R®cP]co¡
«:%K® K ¯
#
d
0 $
K
K f
&
}_8-E/WK / 0u ]^3/h_`c*
/] K SRk¯ # -FuU
_`/ce
V/]:qy
aa]!#^#h/=
co_/_
\¢-#®;0M¯3!#$}
}aa
aa]}/u F#]M}'/
K S Rk¯ #.-/#!#b!mco#c*
Vf
`!a u
u \]¥§¦ ZU ¶£¤¸¦LU ¢Y=¥ UuXZp` ¯ ]c*
5hj]qhcoa ^
co g "el/ce
]t'0Á#ua0M¯{e`ch!#a¡
]u`{=Z g
]!hZ a!#o]q Q ^´Z ]!sZ_\_]eFu F!coc*
@4d{]!
`!#a/_u
Part III
Statistical Models and
Methods
243
This is page 244
This is page 245
14
Linear Regression
Regression is a method for studying the relationship between a response variable Y and a covariates X . The covariate is also called a predictor variable or a feature. Later,
we will generalize and allow for more than one covariate. The The term \regression" is due to
data are of the form
Sir Francis Galton
(Y1 ; X1 ); : : : ; (Yn; Xn):
(1822-1911) who
One way to summarize the relationship between X and Y is noticed that tall
and short men tend
through the regression function
to have sons with
Z
r(x) = E (Y jX = x) = y f (y jx)dy:
(14.1) heights closer to the
mean. He called this
Most of this chapter is concerned with estimating the regression \regression towards
the mean."
function.
14.1
Simple Linear Regression
The simplest version of regression is when Xi is simple (a
scalar not a vector) and r(x) is assumed to be linear:
r(x) = 0 + 1 x:
246
14.
Linear Regression
This model is called the the simple linear regression model.
Let i = Yi (0 + 1 Xi ). Then,
E (i
jXi)
=
=
=
=
(0 + 1 Xi )jXi) = E (Yi jXi )
r(Xi) (0 + 1 Xi )
(0 + 1 Xi ) (0 + 1 Xi )
0:
E (Yi
(0 + 1 Xi )
Let 2 (x) = V (i jX = x). We will make the further simplifying
assumption that 2 (x) = 2 does not depend on x. We can thus
write the linear regression model as follows.
Denition 14.1 The Linear Regresion Model
Yi = 0 + 1 Xi + i
(14.2)
where E (i jXi) = 0 and V (i jXi ) = 2 .
Example 14.2 Figure 14.1 shows a plot of Log surface temperature (Y) versus Log light intensity (X) for some nearby stars.
Also on the plot is an estimated linear regression line which will
be explained shortly.
The unknown parameters in the model are the intercept 0
and the slope 1 and the variance 2 . Let b0 and b1 denote
estimates of 0 and 1 . The tted line is dened to be
rb(x) = b0 + b1 x:
(14.3)
The predicted values or tted values are Ybi = rb(Xi ) and the
residuals are dened to be
b
b
b
bi = Yi Yi = Yi
0 + 1 Xi :
(14.4)
The residual sums of squares or rss is dened by
n
X
rss =
b2i :
i=1
247
4.4
4.3
4.2
4.1
4.0
Log surface temperature (Y)
4.5
4.6
14.1 Simple Linear Regression
4.0
4.5
5.0
Log light intensity (X)
FIGURE 14.1. Data on stars in nearby stars.
5.5
248
14.
Linear Regression
The quantity rss measures how well the tted line ts the data.
Denition 14.3 The least squares estimates are the valP
ues b0 and b1 that minimize rss = ni=1 b2i .
Theorem 14.4 The least squares estimates are given by
Pn
(Xi X n )(Yi Y n )
i=1P
b
1 =
;
(14.5)
n
2
i=1 (Xi X n )
b0 = Y n b1 X n :
(14.6)
An unbiased estimate of 2 is
b =
2
1
n 2
X
n
i=1
b2i :
(14.7)
Example 14.5 Consider the star data from Example 14.2. The
least squares estimates are b0 = 3:58 and b1 = 0:166. The tted
line rb(x) = 3:58 + 0:166x is shown in Figure 14.1. Example 14.6 (The 2001 Presidential Election.) Figure 14.2 shows
the plot of votes for Buchanan (Y) versus votes for Bush (X)
in Florida. The least squares estimates (omitting Palm Beach
County) and the standard errors are
b0 = 66:0991 seb (b0 ) = 17:2926
b1 = 00:0035 seb (b1 ) = 0:0002:
The tted line is
Buchanon = 66:0991 + :0035 Bush:
(We will see later how the standard errors were computed.) Figure 14.2 also shows the residuals. The inferences from linear
14.2 Least Squares and Maximum Likelihood
249
regression are most accurate when the residuals behave like random normal numbers. Based on the residual plot, this is not the
case in this example. If we repeat the analysis replacing votes
with log(votes) we get
b0 = 2:3298 seb (b0 ) = :3529
b1 = 0:730300 seb (b1 ) = 0:0358:
This gives the t
log(Buchanon) = 2:3298 + :7303 log(Bush):
The residuals look much healthier. Later, we shall address two
interesting questions: (1) how do we see if Palm Beach County
has a statistically plausible outcome? (2) how do we do this problem nonparametrically? 14.2
is,
Least Squares and Maximum Likelihood
Suppose we add the assumption that i jXi
N (0; ), that
2
Yi jXi N (i ; i2 )
where i = 0 + 1 Xi . The likelihood function is
n
Y
i=1
n
Y
f (Xi ; Yi ) =
i=1
n
Y
=
where L1 =
i=1
i=1 fX (Xi )
L
2
fX (Xi ) L L
=
Qn
fX (Xi )fY jX (YijXi )
1
n
Y
i=1
fY jX (Yi jXi )
2
and
=
n
Y
i=1
fY jX (Yi jXi ):
(14.8)
Linear Regression
• •• • • • • •
•
•
•
• • •
•
•
•
•
•
•
••••••••• •• • •
0
50000
200
• •
• •
•
•••••
•
••••••••••• •••• ••• •
•••• • ••
•
•
• • •
•
••
•
150000
0
Residuals
1000
3000
•
250000
0
50000
150000
1.0
•
0.0
•
-1.0
Residuals
8
7
6
5
4
3
••
2
log Buchanon
•
• •• • • • • • •
•
• ••• ••• •• •
•
•
•
•
• • •• • • • • ••• • • • •• •
•
• ••••
•• • •• •• •
8
9
10
250000
Bush
•
7
••
•
Bush
•
•
-400
14.
0
Buchanon
250
11
12
log Bush
FIGURE 14.2. Voting Data for Election 2000.
7
•
• •
•
• • •
•
•
•
••
•• • •
•
•
•
•
• • ••
•
••
•
•
• • •
• •
• •
• • •• •••
• • • •• • • •
•
•
•
• ••
•
•
•
•
•
•
•
8
9
10
log Bush
11
12
14.3 Properties of the Least Squares Estimators
251
The term L1 does not involve the parameters 0 and 1 . We shall
focus on the second term L2 which is called the conditional
likelihood, given by
(
)
n
Y
X
1
L2 L(0; 1; ) = fY jX (YijXi) / n exp 22 (Yi i)2 :
i=1
i
The conditional log-likelihood is
`(0 ; 1 ; ) = n log n
1 X
2 2
i=1
Yi
2
(0 + 1 Xi ) : (14.9)
To nd the mle of (0 ; 1 ) we maximize `(0 ; 1 ; ). From (14.9)
we see that maximizing
the likelihood
is the same as minimizing
2
Pn the rss i=1 Yi (0 + 1 Xi ) . Therefore, we have shown the
following.
Theorem 14.7 Under the assumption of Normality, the least squares
estimator is also the maximum likelihood estimator.
We can also maximize `(0 ; 1 ; ) over yielding the mle
1X 2
b2 =
bi :
(14.10)
n
i
This estimator is similar to, but not identical to, the unbiased
estimator. Common practice is to use the unbiased estimator
(14.7).
14.3
Properties of the Least Squares Estimators
We now record the standard errors and limiting distribution
of the least squares estimator. In regression problems, we usually focus on the properties of the estimators conditional on
X n = (X1 ; : : : ; Xn ). Thus, we state the means and variances as
conditional means and variances.
252
14.
Linear Regression
Theorem 14.8 Let bT = (b0 ; b1 )T denote the least squares
estimators. Then,
0
n
b
E ( jX ) =
1
!
Pn
1
2
2
X
X
n
n i=1 i
V (bjX n ) =
(14.11)
n s2X
Xn
1
P
where s2X = n 1 ni=1 (Xi X n )2 .
The estimated standard errors of b0 and b1 are obtained by
taking the square roots of the corresponding diagonal terms of
V (bjX n ) and inserting the estimate b for . Thus,
r Pn
2
b
i=1 Xi
p
b (b0 ) =
se
(14.12)
n
sX n
b
p :
b (b1 ) =
se
sX n
(14.13)
b (b0 jX n ) and seb (b1 jX n) but
We should really write these as se
b (b0 ) and seb (b1 ).
we will use the shorter notation se
Theorem 14.9 Under appropriate conditions we have:
1. (Consistency): b0
!
P
0
and b1
! .
P
1
2. (Asymptotic Normality):
b0 0
b (b0 )
se
3. Approximate 1
N (0; 1) and
b1 1
b (b1 )
se
N (0; 1):
condence intervals for 0 and 1 are
b0 z=2 seb (b0 ) and b1 z=2 seb (b1 ):
(14.14)
The Wald test statistic for testing H0 : 1 = 0 versus H1 :
1 6= 0 is: reject H0 if W > z=2 where W = b1 =seb (b1 ).
14.4 Prediction
253
Example 14.10 For the election data, on the log scale, a 95 per
cent condence interval is :7303 2(:0358) = (:66; :80). The
fact that the interval excludes 0 The Wald statistics for testing
H0 : 1 = 0 versus H1 : 1 6= 0 is W = j:7303 0j=:0358 = 20:40
with a p-value of P(jZ j > 20:40) 0. This is strong evidence
that that the true slope is not 0. 14.4
Prediction
Suppose we have estimated a regression model rb(x) = b0 + b1
from data (X1 ; Y1 ); : : : ; (Xn; Yn ). We observe the value X = x
of the covariate for a new subject and we want to predict their
outcome Y . An estimate of Y is
Yb = b0 + b1 X :
(14.15)
Using the formula for the variance of the sum of two random
variables,
V (Yb ) = V (b0 + b1 x ) = V (b0 ) + x2 V (b1 ) + 2x Cov(b0 ; b1 ):
Theorem 14.8 gives the formulas for all the terms in this equab (Yb ) is the square root of
tion. The estimated standard error se
this variance, with b2 in place of 2 . However, the condence
interval for Y is not of the usual form Yb z=2 . The appendix
explains why. The correct form of the condence interval is given
in the following Theorem. We call the interval a prediction interval.
254
14.
Linear Regression
Theorem 14.11 (Prediction Interval) Let
bn2 = seb 2 (Yb ) + b2
Pn
(Xi X )2
2
i
=1
+1 :
= b
P
n i (Xi X )2
An approximate 1
(14.16)
predicition interval for Y is
Yb z=2 n:
(14.17)
Example 14.12 (Election Data Revisited.) On the log-scale, our linear regression gives the following prediction equation: log (Buchanon) =
2:3298 + :7303log (Bush). In Palm Beach, Bush had 152954
votes and Buchanan had 3467 votes. On the log scale this is
11.93789 and 8.151045. How likely is this outcome, assuming
our regression model is appropriate? Our prediction for log Buchanan
votes -2.3298 + .7303 (11.93789)=6.388441. Now 8.151045 is
bigger than 6.388441 but is is \signicantly" bigger? Let us compute a condence interval. We nd that bn = :093775 and the approximate 95 per cent condence interval is (6.200,6.578) which
clearly excludes 8.151. Indeed, 8.151 is nearly 20 standard errors from Yb . Going back to the vote scale by exponentiating, the
condence interval is (493,717) compared to the actual number
of votes which is 3467. 14.5
Multiple Regression
Now suppose we have k covariates X1 ; : : : ; Xk . The data are
of the form
(Y1 ; X1 ); : : : ; (Yi; Xi ); : : : ; (Yn; Xn )
where
Xi = (Xi1 ; : : : ; Xik ):
14.5 Multiple Regression
255
Here, Xi is the vector of k covariate values for the ith observation. The linear regression model is
Yi =
k
X
j =1
j Xij + i
(14.18)
for i = 1; : : : ; n, where E (i jX1i ; : : : ; Xki) = 0. Usually we want
to include an intercept in the model which we can do by setting
Xi1 = 1 for i = 1; : : : ; n. At this point it will be more convenient
to express the model in matrix notation. The outcomes will be
denoted by
0
1
B
Y =B
B
@
Y1
Y2 C
C
.. C
. A
Yn
and the covariates will be denoted by
0
1
X11 X12 : : : X1k
B X21 X22 : : : X2k C
C:
X =B
B ..
C
.
.
.
.
.
.
@ .
.
.
. A
Xn1 Xn2 : : : Xnk
Each row is one observation; the columns correspond to the k
covariates. Thus, X is a (n k) matrix. Let
0
1
0 1
1
1
k
n
C
B C
=B
@ ... A and = @ ... A :
Then we can write (14.18) as
Y = X + :
(14.19)
Theorem 14.13 Assuming that the (k k) matrix X T X is invertible, the least squares estimate is
256
14.
Linear Regression
b = (X T X ) 1 X T Y:
(14.20)
The estimated regression function is
rb(x) =
k
X
j =1
bj xj :
(14.21)
The variance-covariance matrix of b is
j
V (b X n ) = 2 (X T X )
1
:
Under appropriate conditions,
b N (; 2 (X T X ) 1 ):
An unbiased estimate of 2 is
b =
2
1
n k
X
n
i=1
b2i
where b = X b Y is the vector of residuals. An approximate
1 condence interval for j is
bj z=2 seb (bj )
(14.22)
b 2 (bj ) is the j th diagonal element of the matrix b2 (X T X ) 1 .
where se
Example 14.14 Crime data on 47 states in 1960 can be obtained
at http://lib.stat.cmu.edu/DASL/Stories/USCrime.html. If we
t a linear regression of crime rate on 10 variables we get the
following:
14.6 Model Selection
257
Covariate
Least Estimated t value p-value
Squares
Standard
Estimate
Error
(Intercept)
-589.39
167.59 -3.51 0.001 **
Age
1.04
0.45
2.33 0.025 *
Southern State
11.29
13.24
0.85 0.399
Education
1.18
0.68
1.7 0.093
Expenditures
0.96
0.25
3.86 0.000 ***
Labor
0.11
0.15
0.69 0.493
Number of Males
0.30
0.22
1.36 0.181
Population
0.09
0.14
0.65 0.518
Unemployment (14-24)
-0.68
0.48
-1.4 0.165
Unemployment (25-39)
2.15
0.95
2.26 0.030 *
Wealth
-0.08
0.09 -0.91 0.367
This table is typical of the output of a multiple regression program. The \t-value" is the Wald test statistic for testing H0 :
j = 0 versus H1 : j 6= 0. The asterisks denote \degree of
signicance" with more asterisks being signicant at a smaller
level. The example raises several important questions. In particular: (1) should we eliminate some variables from this model?
(2) should we interpret this relationships as causal? For example, should we conclude that low crime prevention expenditures
cause high crime rates? We will address question (1) in the next
section. We will not address question (2) until a later Chapter.
14.6
Model Selection
Example 14.14 illustrates a problem that often arises in multiple regression. We may have data on many covariates but we
may not want to include all of them in the model. A smaller
model with fewer covariates has two advantages: it might give
better predictions than a big model and it is more parsimonious
(simpler). Generally, as you add more variables to a regression,
258
14.
Linear Regression
the bias of the predictions decreases and the variance increases.
Too few covariates yields high bias; too many covariates yields
high variance. Good predictions result from achieving a good
balance between bias and variance.
In model selection there are two problems: (i) assigning a
\score" to each model which measures, in some sense, how good
the model is and (ii) searching through all the models to nd
the model with the best score.
Let us rst discuss the problem of scoring models. Let S f1; : : : ; kg and let XS = fXj : j 2 S g denote a subset of the
covariates. Let S denote the coeÆcients of the corresponding
set of covariates and let bS denote the least squares estimate of
S . Also, let XS denote the X matrix for this subset of covariates and dene rbS (x) to be the estimated regression function
from (14.21). The predicted valus from model S are denoted by
Ybi (S ) = rbS (Xi ). The prediction risk is dened to be
R(S ) =
n
X
i=1
E (Ybi (S )
Yi )2
(14.23)
where Yi denotes the value of a future observation of Yi at
covariate value Xi . Our goal is to choose S to make R(S ) small.
The training error is dened to be
Rbtr (S ) =
n
X
i=1
(Ybi (S )
Yi )2 :
This estimate is very biased and under-estimates R(S ).
Theorem 14.15 The training error is a downward biased estimate of the prediction risk:
btr (S )) < R(S ):
E (R
In fact,
bias (Rbtr (S )) = E (Rbtr (S )) R(S ) = 2
X
i=1
Cov(Ybi; Yi ): (14.24)
259
The reason for the bias is that the data are being used twice:
to estimate the parameters and to estimate the risk. When
we t a complex model with many parameters, the covariance
Cov(Ybi ; Yi) will be large and the bias of the training error gets
worse. In summary, the training error is a poor estimate of risk.
Here are some better estimates.
Mallow's Cp statistic is dened by
Rb(S ) = Rbtr (S ) + 2jS jb2
(14.25)
where jS j denotes the number of terms in S and b2 is the estimate of 2 obtained from the full model (with all covariates
in the model). This is simply the training error plus a bias correction. This estimate is named in honor of Colin Mallows who
invented it. The rst term in (14.25) measures the t of the
model while the second measure the complexity of the model.
Think of the Cp statistic as:
lack of t + complexity penalty:
Thus, nding a good model involves trading o t and
complexity.
A related method for estimating risk is AIC (Akaike Information Criterion). The idea is to choose S to maximize
Some texts use a
slightly
dierent
`S jS j
(14.26) denition of AIC
which
involves
where `S is the log-likelihood of the model evaluated at the multiplying
the
mle . This can be thought of \goodness of t" minus \com- denition here by
plexity." In linear regression with Normal errors, maximizing 2 or -2. This has
no eect on which
AIC is equivalent to minimizing Mallow's Cp ; see exercise 8.
Yet another method for estimating risk is leave-one-out model is selected.
cross-validation. In this case, the risk estimator is
RbCV (S ) =
n
X
( Yi
i=1
Yb(i) )2
(14.27)
260
14.
Linear Regression
where Yb(i) ) is the prediction for Yi obtained by tting the model
with Yi omitted. It can be shown that
!
n
X
bi (S ) 2
Y
Y
i
(14.28)
RbCV (S ) =
1 Uii (S )
i=1
where Uii (S ) is the ith diagonal element of the matrix
U (S ) = XS (XST XS ) 1 XST :
(14.29)
Thus, one need not actually drop each observation and re-t
the model. A generalization is k-fold cross-validation. Here
we divide the data into k groups; often people take k = 10. We
omit one group of data and t the models to the remaining data.
We use the tted model to predict the data in the group that
P
was omitted. We then estimate the risk by i (Yi Ybi )2 where
the sum is over the the data points in the omitted group. This
process is repeated for each of the k groups and the resulting
risk estimates are averaged.
For linear regression, Mallows Cp and cross-validation often
yield essentially the same results so one might as well use Mallows' method. In some of the more complex problems we will
discuss later, cross-validation will be more useful.
Another scoring method is BIC (Bayesian information criterion). Here we choose a model to maximize
BIC(S ) = rss (S ) + 2jS jb2:
(14.30)
The BIC score has a Bayesian interpretation. Let S = fS1 ; : : : ; Sm g
denote a set of models. Suppose we assign the prior P(Sj ) = 1=m
over the models. Also, assume we put a smooth prior on the parameters within each model. It can be shown that the posterior
probability for a model is approximately,
j
P(Sj data)
BIC (Sj )
Pe eBIC S :
r
( r)
261
Hence, choosing the model with highest BIC is like choosing the
model with highest posterior probability. The BIC score also has
an information-theoretic interpretation in terms of something
called minimum description length. The BIC score is identical
to Mallows Cp except that it puts a more severe penalty for
complexity. It thus leads one to choose a smaller model than
the other methods.
Now let us turn to the problem of model search. If there are k
covariates then there are 2k possible models. We need to search
through all these models, assign a score to each one, and choose
the model with the best score. If k is not too large we can do
a complete search over all the models. When k is large, this is
infeasible. In that case we need to search over a subset of all
the models. Two common methods are forward and backward stepwise regression. In forward stepwise regression, we
start with no covariates in the model. We then add the one variable that leads to the best score. We continue adding variables
one at a time until the score does not improve. Backwards stepwise regression is the same except that we start with the biggest
model and drop one variable at a time. Both are greedy searches;
nether is guaranteed to nd the model with the best score. Another popular method is to do random searching through the
set of all models. However, there is no reason to expect this to
be superior to a deterministic search.
Example 14.16 We apply backwards stepwise regression to the
crime data using AIC. The following was obtained from the program R. This program uses minus our version of AIC. Hence,
we are seeking the smallest possible AIC. This is the same is
minimizing Mallows Cp .
262
14.
Linear Regression
The full model (which includes all covariates) has AIC= 310.37.
The AIC scores, in ascending order, for deleting one variable are
as follows:
variable Pop Labor South Wealth Males U1 Educ. U2 Age Expend
AIC
308 309
309
309
310 310 312 314 315
324
For example, if we dropped Pop from the model and kept the
other terms, then the AIC score would be 308. Based on this information we drop \population" from the model and the current
AIC score is 308. Now we consider dropping a variable from the
current model. The AIC scores are:
variable South Labor Wealth Males U1 Education U2 Age Expend
AIC
308
308
308
309 309
310
313 313
329
We then drop \Southern" from the model. This process is continued until there is no gain in AIC by dropping any variables.
In the end, we are left with the following model:
Crime = 1:2 Age + :75 Education + :87 Expenditure + :34 Males
Warning! This does not yet address the question of which variables are causes of crime.
14.7
The Lasso
There is an easier model search method although it addresses
a slightly dierent question. The method, due to Tibshirani, is
called the Lasso. In this section we assume that the covariates have all been rescaled to have the same variance. This
puts each covariate on the same scale. Consider estimating =
(1 ; : : : ; k ) by minimizing the loss function
n
X
i=1
(Yi
Ybi )2 + k
X
j =1
jj j
(14.31)
:86 U1 + 2:31 U2:
263
where > 0. The idea is to minimize the sums of squares but
we include a penalty that gets large if any of the j0 s are large.
The solution b = (b1 ; : : : ; bk ) can be found numerically and
will depend on the choice of . It can be shown that some of
the bj 's will be 0. We interpret this is meaning that the j th is
omitted from the model. Hence, we are doing estimation and
model selection simultaneously. We need to choose a value of
. We can do this by estimating the prediction risk R() as
a function of and choosing to minimze the estimated risk.
For example, we can estimate the risk using leave-one-out crossvalidation.
Example 14.17 Returning to the crime data, Figure 14.3 shows
the results of the lasso. The rst plot shows the leave-one-out
cross-validation score as a function of . The minimum occurs
at = :7. The second plot shows the estimated coeÆcients as a
function of . You can see how some estimated parameters are
zero until gets larger. At = :7, all the bj are non-zero so the
Lasso chooses the full model.
14.8
Technical Appendix
Why is the predicition interval of a dierent form than the
other condence intervals we have seen? The reason is that the
quantity we want to estimate, Y , is not a xed parameter, it
is a random variable. To understand this point better, let =
0 + 1 X and let b = b0 + b1 X . Thus, Yb = b while Y = + .
Now, b N (; se 2 ) where
se 2 = V (b) = V (b0 + b1 x ):
q
Note that V (b) is the same as V (Yb ). Now, b 2 V ar(b) is a
95 per cent condence interval for = 0 + 1 x using the usual
Linear Regression
800
1200
14.
cross-validation score
264
0.2
0.4
0.6
0.8
0.6
0.8
−10
b()
10
30
0.2
0.4
FIGURE 14.3. The Lasso applied to the crime data.
265
argument for a condence interval. It is not a valid condence
intervalqfor Y . To see why, let's compute
the probability that
q
Yb 2 V (Yb ) contains Y . Let s = V ar(Yb ). Then,
!
b Y
Y
P(Yb 2s < Y < Yb + 2s) = P
2<
<2
s
!
b <2
= P
2<
s
!
b = P
2<
<2
s
s
2
P 2 < N (0; 1) N 0; s2 < 2
2
= P 2 < N 0; 1 + 2 < 2
s
6= :95:
The problem is that the quantity of interest Y is equal
to a parameter plus a random variable. We can x this
by dening
n = V (Yb ) + 2 =
2
P
(xi x )2
i
P
+ 1 2:
n i (xi x)2
In practice, we substitute b for and we denote the resulting
quantity by bn. Now consider the interval Yb 2bn. Then,
!
b Y
Y
<2
P(Yb 2bn < Y < Yb + 2bn ) = P
2<
bn
!
b = P
2<
<2
bn
N (0; s2 + 2 )
P 2<
<2
bn
N (0; s2 + 2 )
P 2<
<2
n
266
14.
Linear Regression
= P ( 2 < N (0; 1) < 2) = :95:
Of course, a 1
14.9
interval is given by Yb z=2 bn .
Exercises
2. Prove the formulas for the standard errors in Theorem
14.8. You should regard the Xi 's as xed constants.
3. Consider the regression through the origin model:
Yi = Xi + :
Find the least squares estimate for . Find the standard
error of the estimate. Find conditions that guarantee that
the estimate is consistent.
4. Prove equation (14.24).
5. In the simple linear regression model, construct a Wald
test for H0 : 1 = 170 versus H1 : 1 6= 170 .
6. Get the passenger car mileage data from
http://lib.stat.cmu.edu/DASL/Datales/carmpgdat.html
(a) Fit a simple linear regression model to predict MPG
(miles per gallon) from HP (horsepower). Summarize your
analysis including a plot of the data with the tted line.
(b) Repeat the analysis but use log(MPG) as the response.
Compare the analyses.
7. Get the passenger car mileage data from
http://lib.stat.cmu.edu/DASL/Datales/carmpgdat.html
(a) Fit a multiple linear regression model to predict MPG
(miles per gallon) from HP (horsepower). Summarize your
analysis.
14.9 Exercises
267
(b) Use Mallow Cp to select a best sub-model. To search
trhough the models try (i) all possible models, (ii) forward stepwise, (iii) backward stepwise. Summarize your
ndings.
(c) Repeat (b) but use BIC. Compare the results.
(d) Now use the Lasso and compare the results.
8. Assume that the errors are Normal. Show that the model
with highest AIC (equation (14.26)) is the model with the
lowest Mallows Cp statistic.
9. In this question we will take a closer look at the AIC
method. Let X1 ; : : : ; Xn be iid observations. Consider two
models M0 and M1 . Under M0 the data are assumed to
be N (0; 1) while under M1 the data are assumed to be
N (; 1) for some unknown 2 R :
M
M
: X1 ; : : : ; Xn
1 : X1 ; : : : ; Xn
0
N (0; 1)
N (; 1); 2 R :
This is just another way to view the hypothesis testing
problem: H0 : = 0 versus H1 : 6= 0. Let `n () be
the log-likelihood function. The AIC score for a model is
the log-likelihood at the mle minus the number of parameters. (Some people multiply this score by 2 but that is
irrelevant.) Thus, the AIC score for M0 is AIC0 = `n (0)
and the AIC score for M1 is AIC1 = `n (b) 1. Suppose
we choose the model with the highest AIC score. Let Jn
denote the selected model:
0 > AIC1
Jn = 01 ifif AIC
AIC1 > AIC0 :
(a) Suppose that M0 is the true model, i.e. = 0. Find
lim P (Jn = 0) :
n!1
268
14.
Linear Regression
Now compute limn!1 P (Jn = 0) when 6= 0.
(b) The fact that limn!1 P (Jn = 0) 6= 1 when = 0 is
why some people say that AIC \overts." But this is not
quite true as we shall now see. Let (x) denote a Normal
density function with mean and variance 1. Dene
0
b
fn (x) = 0 ((xx)) ifif JJn =
n = 1:
b
p
If = 0, show that D(0 ; fbn ) ! 0 as n ! 1 where
Z
f (x)
D(f; g ) = f (x) log
dx
g (x)
p
is the Kullback-Leibler distance. Show also that D( ; fbn ) !
0 if 6= 0. Hence, AIC consistently estimates the true density even if it \overshoots" the correct model.
REMARK: If you are feeling ambitious, repeat this analysis for BIC which is the log-likelihood minus (p=2) log n
where p is the number of parameters and n is sample size.
This is page 269
15
Multivariate Models
In this chapter we revisit the Multinomial model and the multivariate Normal, as we will need them in future chapters.
Let us rst review some notation from linear
algebra. ReP
T
call that if x and y are vectors then x y = j xj yj . If A is a
matrix then det(A) denotes the determinant of A, AT denotes
the transpose of A and A denotes the the inverse of A (if the
inverse exists). The trace of a square matrix A { denoted by
tr(A) { is the sum of its diagonal elements. The trace satises tr(AB ) = tr(BA) and tr(A) + tr(B ). Also, tr(a) = a if a
is a scalar (i.e. a real number). A matrix is positive denite if
xT x > 0 for all non-zero vectors x. If a matrix is symmetric and positive denite, there exists a matrix = { called the
square root of { with the following properties:
1. = is symmetric;
2. = = = ;
3. = = = = = = I where = = ( = ) .
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1
270
15.1
15.
Multivariate Models
Random Vectors
Multivariate models involve0a random
vector X of the form
1
X=
B
@
X1
...
C
A:
Xk
The mean of a random vector X is dened by
0
1
0
1
1
E (X1 )
. C B ... C
=B
@ .. A = @
A:
k
E (Xk )
The covariance
matrix is dened to be
2
V (X )
Cov(X ; X )
6 Cov(X ; X ) V (X )
= V (X ) = 664 ..
...
.
Cov(Xk ; X ) Cov(Xk ; X )
1
1
2
1
2
2
1
2
...
(15.1)
3
Cov(X1 ; Xk )
Cov(X2 ; Xk ) 7
7
...
V (Xk )
7:
5
(15.2)
This is also called the variance matrix or the variance-covariance
matrix.
a be a vector of length k and let X be a random vector of the same length with mean and variance .
Then E (aT X ) = aT and V (aT X ) = aT a. If A is a matrix
with k columns then E (AX ) = A and V (AX ) = AAT .
Theorem
15.1
Let
Now suppose we have a random sample of n vectors:
0
B
B
B
@
X11
X21
...
Xk1
1
0
C
C
C;
A
B
B
B
@
X12
X22
...
Xk2
1
C
C
C;
A
0
:::;
B
B
B
@
X1n
X2n
...
Xkn
The sample mean X is a vector dened by
0
X1
.
X =B
@ ..
Xk
1
C
A
1
C
C
C:
A
(15.3)
15.2 Estimating the Correlation
271
P
n
where X i = n
j Xij . The sample variance matrix, also
called the covariance matrix or the variance-covariance matrix,
is
2
3
1
=1
S=
where
6
6
6
4
s11 s12
s12 s22
s k
s k
s1k s2k
skk
...
...
1
...
...
2
7
7
7
5
(15.4)
n
X
1
sab =
(Xaj X a )(Xbj X b):
n 1j
=1
It follows that E (X ) = . and E (S ) = .
15.2
Estimating the Correlation
Consider n data points from a bivariate distribution:
X11
X21
;
X12
X22
;;
X1n
X2n
:
Recall that the correlation between X and X is
E ((X
)(X ))
=
:
1
1
1
1
2
2
2
2
(15.5)
The sample correlation (the plug-in estimator) is
b =
Pn
i=1 (X1i
X 1 )(X2i
s1 s2
X 2)
:
(15.6)
We can construct a condence interval for by applying the
delta method as usual. However, it turns out that we get a more
accurate condence interval by rst constructing a condence
interval for a function = f () and then applying the inverse
function f . The method, due to Fisher, is as follows. Dene
1
f (r) = log(1 + r) log(1 r)
2
1
272
15.
Multivariate Models
akd let = f (). The inverse of r is
g (z ) f 1 (z ) =
e2z 1
:
e2z + 1
Now do the following steps:
Approximate Condence Interval for The Correlation
1. Compute
b
1
= f (b) = 2 log(1 + b) log(1 b) :
2. Compute the approximate standard error of b which can
be shown to be
1 :
b (b) = p
se
n 3
3. An approximate 1 condence interval for = f () is
(a; b) b
pz=2 ; b + pz=2 :
n 3
n 3
4. Apply the inverse transformation f (z) to get a condence interval for :
a
e
1; e b 1 :
e a+1 e b+1
1
15.3
2
2
2
2
Multinomial
Let us now review the Multinomial distribution. Consider
drawing a ball from an urn which has balls with k dierent
colors labeled color 1, color 2, ... , color k. Let p = (p ; : : : ; pk )
1
15.3 Multinomial
273
P
where pj 0 and kj pj = 1 and suppose that pj is the probability of drawing a ball of color j . Draw n times (independent
draws with replacement) and let X = (X ; : : : ; Xk ) where
Xj is
Pk
the number of times that color j appeared. Hence, n = j Xj .
We say that X has a Multinomial (n,p) distribution. The probability function is
=1
1
=1
n
f (x; p) =
px1 pxkk
x1 : : : xk 1
where
n
n!
=
:
x1 : : : xk
x1 ! xk !
Theorem 15.2 Let X Multinomial(n; p). Then the marginal
distribution of Xj is Xj Binomial(n; pj ). The mean and variance of X are
0
1
np1
B
C
E (X ) = @ ... A
npk
and
V (X ) =
0
B
B
B
@
np1 (1 p1 )
np1 p2
np1 p2 np2 (1 p2 )
..
.
np1 pk
..
.
np2 pk
..
.
np1 pk
np2 pk
..
.
npk (1 pk )
1
C
C
C:
A
That Xj Binomial(n; pj ) follows easily. Hence,
E (Xj ) = npj and V (Xj ) = npj (1 pj ). To compute Cov(Xi ; Xj )
we proceed as follows. Notice that Xi + Xj Binomial(n; pi + pj )
and so V (Xi + Xj ) = n(pi + pj )(1 pi pj ). On the other hand,
Proof.
V (Xi + Xj )
= V (Xi) + V (Xj ) + 2Cov(Xi; Xj )
= npi (1 pi) + npj (1 pj ) + 2Cov(Xi; Xj ):
Equating this last expression with n(pi + pj )(1 pi pj ) implies
that Cov(Xi; Xj ) = npi pj . 274
15.
Theorem
Multivariate Models
15.3
The maximum likelihood estimator of
0
pb = B
@
1
0
.. C
. A
pbk
=B
@
pb1
X1
n
..
.
Xk
n
p is
1
C
A
= Xn :
The log-likelihood (ignoring the constant) is
Proof.
`(p) =
k
X
j =1
Xj log pj :
When we maximize `Pwe have to be careful since we must enforce
the constraint that j pj = 1. We use the method of Lagrange
multipliers and instead maximize
A(p) =
k
X
j =1
Now
Xj log pj + X
j
pj
1 :
@A(p) Xj
= p + :
@pj
j
P
Setting @A@p p = 0 yields pbj = Xj =. Since j pbj = 1 we see
that = n and hence pbj = Xj =n as claimed. Next we would like to know the variability of the mle . We
can either compute the variance matrix of pb directly or we can
approximate the variability of the mle by computing the Fisher
information matrix. These two approaches give the same answer
in this case. The direct approach is easy: V (pb) = V (X=n) =
n V (X ) and so
( )
j
2
0
V (pb) =
1B
B
n
B
@
p1 (1 p1 )
p1 p2
p1 p2 p2 (1 p2 )
...
p1 p k
...
p2 pk
...
p1 pk
p2 pk
...
pk (1 pk )
1
C
C
C:
A
15.4 Multivariate Normal
15.4
275
Multivariate Normal
Let us recall how the multivariate Normal distribution is dened. To begin, let
0
1
Z=
Z1
...
B
@
Zk
C
A
where Z ; : : : ; Zk N (0; 1) are independent. The density of Z
is
1
(
k
X
)
2
1 z = 1 exp
2j j
(2)k=
1 zT z :
2
(15.7)
The variance matrix of Z is the identity matrix I . We write
Z N (0; I ) where it is understood that 0 denotes a vector of
k zeroes. We say that Z has a standard multivariate Normal
distribution.
More generally, a vector X has a multivariate Normal distribution, denoted by X N (; ), if its density is
1 exp
f (z ) =
(2)k=
2
2
=1
1
f (x; ; ) =
(2)k= det() = exp
1 (x )T (x )
2
(15.8)
where det() denotes the determinant of a matrix, is a vector
of length k and is a k k symmetric, positive denite matrix.
Then E (X ) = and V (X ) = . Setting = 0 and = I gives
back the standard Normal.
2
Theorem
15.4
1
1 2
The following properties hold:
1. If
Z N (0; 1) and X = + 1=2 Z
2. If
X N (; ), then =
1 2
then
X N (; ).
(X ) N (0; 1).
N (; ) a is a vector of the same length as X ,
then aT X N (aT ; aT a).
3. If
X
276
15.
Multivariate Models
4. Let
V = (X
Then
V
)T 1 (X
):
k .
2
Suppose we partition a random Normal vector X into two
parts X = (Xa; Xb) We can similarly partition the the mean
= (a ; b ) and the variance
=
aa ab :
ba bb
X N (; ). Then:
(1) The marginal distribition of Xa is Xa N (a ; aa ).
(2) The conditional distribition of Xb given Xa = xa is
Theorem
15.5
Let
Xb jXa = xa N ((xa ); (xa ))
where
(xa ) = b + ba aa1 (xa a )
(xa ) = bb ba aa1 ab :
(15.9)
(15.10)
n from a N (; ),
the log-likelihood is (up to a constant not depending on or )
Theorem
15.6
Given a random sample of size
is given by
n
n
n
(
X )T (X )
tr( S )
2
2
2 log det():
`(; ) =
The
mle
1
1
is
b = n 1 S:
b = X and n
(15.11)
15.5 Appendix
15.5
277
Appendix
Proof of Theorem 15.6. Denote the i random vector by X i.
The log-likelihood is
X
n
1 X(X i )T (X i ):
kn
log(2
) log det()
`(; ) =
f (X i; ; ) =
2
2
2 i
i
th
1
Now,
X
i
(X i )T (X i ) =
1
=
P
X
[(X i X ) + (X )]T [(X i X ) + (X )]
i
1
X
(X i X )T (X i X ) + n(X )T (X )
i
1
1
since i(X i X ) (X ) = 0. Also, notice that (X i
)T (X i ) is a scalar, so
1
1
X
i
(X i )T (X i ) =
X
=
X
1
i
i
tr (X i
)T 1 (X i
)
tr 1 (X i
)(X i
)T
X
)(X i
)T
"
= tr 1
i
= n tr S
1
#
(X i
and the conclusion follows.
15.6
Exercises
2. Find the Fisher information matrix for the mle of a Multinomial.
4. (Computer Experiment. Write a function to generate nsim
observations from a Multinomial(n; p) distribution.
278
15.
Multivariate Models
5. (Computer Experiment. Write a function to generate nsim
observations from a Multivariate normal with given mean
and covariance matrix .
6. (Computer Experiment. Generate 1000 random vectors from
a N (; ) distribution where
=
3 ; =
8
2 6 :
6 2
Plot the simulation as a scatterplot. Find the distribution
of X jX = x using theorem 15.5. In particular, what is
the formula for E (X jX = x )? Plot E (X jX = x ) on
your scatterplot. Find the correlation between X and
X . Compare this with the sample correlations from your
simulation. Find a 95 per cent condence interval for .
Estimate the covariance matrix .
2
1
1
2
1
1
2
1
1
1
2
7. Generate 100 random vectors from a multivariate Normal
with mean (0; 2)T and variance
1 3 :
3 1
Find a 95 per cent condence interval for the correlation
. What is the true value of ?
This is page 279
16
Inference about Independence
In this chapter we address the following questions:
(1) How do we test if two random variables are independent?
(2) How do we estimate the strength of dependence between two
random variables?
When Y and Z are not independent, we say that they are
or
or
If Y and Z are associated, it does not imply that Y causes Z or that Z causes Y . If
Y does cause Z then changing Y will change the distribution of
Z , otherwise it will not change the distribution of Z .
For example, quitting smoking Y will reduce your probability of heart disease Z . In this case Y does cause Z . As another
example, owning a TV Y is associated with having a lower incidence of starvation Z . This is because if you own a TV you
are less likely to live in an impoverished nation. But giving a
starving person will not cause them to stop being hungry. In
this case, Y and Z are associated but the relationship is not
causal.
dependent associated related.
Recall that we write
Y q Z to mean that
Y and Z are independent.
280
16.
We defer detailed discussion of the important question of causation until a later chapter.
16.1
Two Binary Variables
Suppose that Y and Z are both binary. Consider a data set
(Y1 ; Z1 ); : : : ; (Yn ; Zn). Represent the data as a two-by-two table:
Y =0 Y =1
Z = 0 X00
X01
X0
Z = 1 X10
X11
X1
X0
X1 n = X
where the Xij represent counts:
Xij = number of observations for which Y = i and Z = j:
P
The dotted subscripts denote sums. For example, Xi = j Xij .
This is a convention we use throughout the remainder of the
book. Denote the corresponding probabilities by:
Y =0 Y =1
Z = 0 p00
p01 p0
Z = 1 p10
p11 p1
p0
p1
1
where pij = P(Z = i; Y = j ). Let X = (X00 ; X01 ; X10 ; X11 )
denote the vector of counts. Then X Multinomial(n; p) where
p = (p00 ; p01 ; p10 ; p11 ). It is now convenient to introduce two new
parameters.
16.1 Two Binary Variables
Denition 16.1 The
odds ratio is dened to be
p00 p11
:
p01 p10
(16.1)
= log( ):
(16.2)
=
The
281
log odds ratio is dened to be
Theorem
16.2 The following statements are equivalent:
1. Y q Z .
2. = 1.
3. = 0.
4. For i; j 2 f0; 1g,
pij = pi pj :
(16.3)
Now consider testing
H0 : Y
qZ
versus H1 : Y
q6 Z:
First we consider the likelihood ratio test. Under H1 , X Multinomial(n; p) and the mle is pb = X=n. Under H0 , we again
have that X Multinomial(n; p) but p is subject to the constraint
pij = pi pj ; j = 0; 1:
This leads to the following test.
Theorem
Let
16.3 (Likelihood Ratio Test for Independence in a 2-by-2 table)
T =2
1 X
2
X
i=0
X X
Xij log ij :
XiXj
j =0
(16.4)
Under H0 , T
21 . Thus, an approximate level test is
obtained by rejecting H0 then T > 21; .
282
16.
Another popular test for independence is Pearson's 2 test.
Theorem
Let
16.4 (Pearson's test for Independence in a 2-by-2 table)
2
U=
where
1 X
1
X
(Xij
i=0 j =0
Eij =
Eij )2
Eij
(16.5)
Xi Xj
:
n
Under H0 , U
21 . Thus, an approximate level test is
obtained by rejecting H0 then U > 21; .
Here is the intuition for the Pearson test. Under H0 , pij =
pipj , so the maximum likelihood estimator of pij under H0 is
pbij = pbi pbj =
Xi Xj
:
n n
Thus, the expected number of observations in the (i,j) cell is
Eij = npbij =
XiXj
:
n
The statistic U compares the observed and expected counts.
Example 16.5 The following data from Johnson and Johnson
(1972) relate tonsillectomy and Hodgkins disease. (The data are
actually from a case-control study; we postpone discussion of
this point until the next section.)
Hodgkins Disease No Disease
Tonsillectomy
90
165
255
No Tonsillectomy 84
307
391
Total
174
472
646
We would like to know if tonsillectomy is related to Hodgkins
disease. The likelihood ratio statistic is T = 14:75 and the pvalue is P(21 > 14:75) = :0001. The 2 statistic is U = 14:96
16.1 Two Binary Variables
283
and the p-value is P(21 > 14:96) = :0001. We reject the null
hypothesis of independence and conclude that tonsillectomy is
associated with Hodgkins disease. This does not mean that tonsillectomies cause Hodgkins disease. Suppose, for example, that
doctors gave tonsillectomies to the most seriosuly ill patients.
Then the assocation between tonsillectomies and Hodgkins disease may be due to the fact that those with tonsillectomies were
the most ill patients and hence more likely to have a serious
disease.
We can also estimate the strength of dependence by estimating the odds ratio and the log-odds ratio .
Theorem
16.6 The mle 's of
b=
and are
X00 X11
; b = log b:
X01 X10
(16.6)
The asymptotic stanrad errors (computed from the delta method)
are
r
1
1
1
1
b (b
se
) =
+
+
+
(16.7)
X00 X01
b ( b) = bse
b (b
se
):
X10
X11
(16.8)
Remark 16.7 For small sample sizes, b and b
can have a very
large variance. In this case, we often use the modied estimator
X00 + 21 X11 + 12
b=
:
X01 + 21 X10 + 12
(16.9)
Yet another test for indepdnence is the Wald test for = 0
b (b
given by W = (b 0)=se
). A 1 condence interval for is
z=2 seb (b ). A 1 condence interval for can be obtained
b
b ( b). Second, since
in two ways. First, we could use b z=2 se
= e we could use
b (b
exp b z=2 se
) :
This second method is usually more accurate.
(16.10)
284
16.
Example 16.8 In the previous example,
b=
and
b
90 307
= 1:99
165 84
= log(1:99) = :69:
So tonsillectomy patients were twice as likely to have Hodgkins
disease. The standard error of b is
r
1
1
1
1
+ +
+
= :18:
90 84 165 307
The Wald statistic is W = :69=:18 = 3:84 whose p-value is
P(jZ j > 3:84) = :0001, the same as the other tests. A 95 per
cent condence interval for is b
2(:18) = (:33; 1:05). A 95
per cent condence interval for is (e:33 ; e1:05 ) = (1:39; 2:86).
16.2
Interpreting The Odds Ratios
Suppose event A as probability P (A). The odds of A are dened
as
P (A)
:
odds(A) =
1 P (A)
It follows that
odds(A)
P(A) =
:
1 + odds(A)
Let E be the event that someone is exposed to something
(smoking, radiation, etc) and let D be the event that they get
a disease. The odds of getting the disease given that you are
exposed are
P(D jE )
odds(DjE ) =
1 P(DjE )
and the odds of getting the disease given that you are not exposed are
P(D jE c )
:
odds(DjE c) =
1 P(DjE c)
16.2 Interpreting The Odds Ratios
285
The odds ratio is dened to be
=
odds(DjE )
:
odds(DjE c)
If = 1 then disease probability is the same for exposed and unexposed. This implies that these events are independent. Recall
that the log-odds ratio is dened as = log( ). Independence
corresponds to = 0.
Consider this table of probabilities:
Dc
E c p00
E p10
p0
D
p01 p0
p11 p1
p1 1
Denote the data by
Dc D
E c X00 X01 X0
E X10 X11 X1
X0 X1 X
Now
P (D
jE ) = p p+11p
10
11
and P(DjE c) =
p01
p00 + p01
and so
odds(DjE ) =
and therefore,
p11
p
and odds(DjE c) = 01
p10
p00
=
p11 p00
:
p01 p10
To estimate the parameters, we have to rst consider how the
data were collected. There are three methods.
Multinomial Sampling. We draw a sample from the population and, for each person, record their exposure and disease
status. In this case, X = (X00 ; X01 ; X10 ; X11 ) Multinomial(n; p).
286
16.
We then estimate the probabilities in the table by pbij = Xij =n
and
b = pb11 pb00 = X11 X00 :
pb01 pb10
X01 X10
We get
some exposed and unexposed people and count the number with
disease in each group. Thus,
Prospective Sampling. (Cohort Sampling).
X01
X11
Binomial(X0 ; P (DjE c))
Binomial(X1 ; P (DjE )):
We should really write x0 and x1 instead of X0 and X1 since
in this case, these are xed not random but for notational simplicity I'll keep using capital letters. We can estimate P (DjE )
and P (DjE c) but we cannot estimate all the probabilities in the
table. Still, we can estimate since is a function of P (DjE )
and P (DjE c). Now
X
Pb(DjE ) = 11
X1
and
X
Pb(DjE c) = 01 :
X0
Thus,
b = X11 X00
X01 X10
just as before.
Case-Control (Retrospective) Sampling. Here we get
some diseased and non-diseased people and we observe how
many are exposed. This is much more eÆcient if the disease
is rare. Hence,
X10
X11
Binomial(X0 ; P (E jDc))
Binomial(X1 ; P (E jD)):
From these data we can estimate P(E jD) and P(E jDc). Surprisingly, we can also still estimate . To understand why, note
16.2 Interpreting The Odds Ratios
that
P(E
jD) = p p+11p ;
01
11
1 P(E jD) =
p01
p01 + p11
By a similar argument,
odds(E jDc) =
287
; odds(E jD) =
p11
:
p01
p10
:
p00
Hence,
odds(E jD) p11 p00
=
= :
odds(E jDc) p01 p10
From the data, we form the following estimates:
Pb(E jD) =
Therefore,
X11
X [
X [
X
; 1 Pb(E jD) = 01 ; odds(
E jD) = 11 ; odds(
E jDc) = 10 :
X1
X1
X01
X00
b=
X00 X11
:
X01 X10
So in all three data collection methods, the estimate of
out to be the same.
turns
It is tempting to try to estimate P(DjE ) P(DjE c). In a
case-control design, this quantity is not estimable. To see this,
we apply Bayes' theorem to get
P(E jD )P(D ) P(E c jD )P(D )
P(D jE ) P(D jE c ) =
:
P(E )
P(E c )
Because of the way we obtained the data, P(D) is not estimable
from the data. However, we can estimate = P(DjE )=P(DjE c),
which is called the
, under the
relative risk
rare disease assumption.
Theorem 16.9 Let = P(DjE )=P(DjE c). Then
as P(D) ! 0.
!1
288
16.
Thus, under the rare disease assumption, the relative risk is
approximately the same as the odds ratio and, as we have seen,
we can estimate the ods ratio.
In a randomized experiment, we can interpret a strong association, that is 6= 1, as a causal relationship. In an observational
(non-randomized) study, the association can be due to other
unobserved
variables. We'll discuss causation in
more detail later.
confounding
16.3
Two Discrete Variables
Now suppose that Y 2 f1; : : : ; I g and Z 2 f1; : : : ; J g are two
discrete variables. The data can be represented as an I by J
table of counts:
Z=1
..
.
Z=i
..
.
Z=I
Y =1 Y =2
X11
X12
..
.
..
.
Y = j Y = J
X1j X1J X1
..
.
..
.
..
.
..
.
..
.
..
.
..
.
Xi1
Xi2
Xij
XiJ
Xi
XI 1
X1
XI 2
X2
XIj
Xj
XIJ
XJ
XI ..
.
..
.
Consider testing H0 : Y
..
.
q Z versus H1 : Y q6 Z .
..
.
..
.
n
16.3 Two Discrete Variables
Theorem
16.10 Let
T =2
I X
J
X
i=1
289
X X
Xij log ij :
XiXj
j =1
(16.11)
The limiting distribution of T under the null hypothesis
of independence is 2 where = (I 1)(J 1). Pearson's
2 test statistic is
U=
I X
J
X
(nij
i=1 j =1
Eij )2
Eij
:
(16.12)
Asymptotically, under H0 , U has a 2 distribution where
= (I 1)(J 1).
Example 16.11 These data are from a study by Hancock et al
(1979). Patients with Hodkins disease are classied by their response to treatment and by histological type.
Type
LP
NS
MC
LD
Positive Response Partial Response No Response
74
18
12
104
68
16
12
96
154
54
58
266
18
10
44
72
The 2 test statistic is 75.89 with 2 3 = 6 degrees of freedom.
The p-value is P(26 > 75:89) 0. The likelihood ratio test
statistic is 68.30 with 2 3 = 6 degrees of freedom. The p-value
is P(26 > 68:30) 0. Thus there is strong evidence that reponse
to treatment and histological type are associated.
There are a variety of ways to quantify the strength of dependence between two discrete variables Y and Z . Most of them
are not very intuitive. The one we shall use is not standard but
is more interpretable.
290
16.
We dene
Æ (Y; Z ) = max
jPY;Z (Y
A;B
2 A; Z 2 B )
P Y (Y
2 A)
PZ (Z
2 B )j
(16.13)
where the maximum is over all pairs of events A and B .
Theorem
16.12 Properties of Æ:
1. 0 Æ (Y; Z ) 1.
2. Æ (Y; Z ) = 0 if and only if Y
q Z.
3. The following identity holds:
I X
J 1X
Æ (X; Y ) =
p
2 i=1 j =1 ij
4. The
mle
(16.14)
of Æ is
Æ (X; Y ) =
where
pi pj :
I
J
1 X X
pb
2 i=1 j =1 ij
pbi pbj (16.15)
Xij
X
X
; pbi = i ; pbj = j :
n
n
n
The interpretation of Æ is this: if one person makes probability
pbij =
statements assuming independence and another person makes
probability statements without assuming independence, their
probability statements may dier by as much as Æ . Here is a
suggested scale for interpreting Æ :
0 < Æ :01
negligible association
:01 < Æ :05 non-negligible association
:05 < Æ :10 substantial association
Æ > :10
very strong association
A condence interval for Æ can be obtained by bootstrapping.
The steps for bootstrapping are:
16.4 Two Continuous Variables
291
1. Draw X Multinomial(n; pb);
2. Compute pbij ; pbi ; pbj .
3. Compute Æ .
4. Repeat.
Now we use any of the methods we learned earlier for constructing bootstrap condence intervals. However, we should not use
a Wald interval in this case. The reason is that if Y q Z then
Æ = 0 and we are on the boundary of the parameter space. In
this case, the Wald method is not valid.
Example 16.13 Returning to Example 16.11 we nd that Æb = :11.
Using a pivotal bootstrap with 10; 000 bootstrap samples, a 95
per cent condence interval for Æ is (.09,.22). Our conclusion
is that the association between histological type and response is
substantial.
16.4
Two Continuous Variables
Now suppose that Y and Z are both continuous. If we assume
that the joint distribution of Y and Z is bivariate Normal, then
we mesure the dependence between Y and Z by means of the
correaltion coeÆcient . Tests, estimates and condence intervals for in the Normal case are given in the previous chapter.
If we do not assume Normality then we need a nonparametric
method for assessing dependence.
Recall that the correaltion is
E ((X1 1 )(X2 2 ))
=
:
1 2
A nonparametric estimator of is the plug-in estimator which
is
b =
Pn
(X1i X 1 )(X2i X 2 )
qP i=1
Pn
n
2
2
i=1 (X1i X 1 )
i=1 (X2i X 2 )
292
16.
which is just the sample correlation. A condence interval can be
constructed using the bootstrap. A test for = 0 can be based
on the Wald test using the bootstrap to estimate the standard
error.
The plug-in approach is useful for large samples. For small
samples, we measure the correlation using the
bS . We simply replace the data by their
ranks { ranking each variable separately { then we compute the
correaltion coeÆcient of the ranks. To test the null hypothesis
that S = 0 we need the distribution of bS under the null hypothesis. This can be obtained easily by simulation. We x the
ranks of the rst variable as 1; 2; : : : ; n. The ranks of the second
variable are chosen at random from the set of n! possible orderings. Then we compute the correlation. This is repeated many
times and the resulting distribution P0 is the null distribution of
bS . The p-value for the test is P0 (jRj > jbS j) where R is drawn
from the null distribution P0 .
correaltion coeÆcient
Spearman rank
Example 16.14 The following data (Snedecor and Cochran, 1980,
p. 191) are systolic blood pressure X1 and are diastolic blood
pressure X2 in millimeters:
X1 100 105 110 110 120 120 125 130 130 150 170 195
X2 65 65 75 70 78 80 75 82 80 90 95 90
The sample correlation is b = :88. The bootstrap standard
error is .046 and the Wald test statistic is :88=:046 = 19:23.
The p-value is near 0 giving strong evidence that the correlation
is not 0. The 95 per cent pivotal bootstrap condence interval is
(.78,.94). Because the sample size is small, consider Spearman's
rank correlation. In ranking the data, we will use average ranks
if there are ties. So if the third and fourth lowest numbers are
the same, they each get rank 3.5. The ranks of the data are:
16.5 One Continuous Variable and One Discrete
293
X1 1 2 3.5 3.5 5.5 5.5 7 8.5 8.5 10 11 12
X2 1.5 1.5 4.5 3 6 7.5 4.5 9 7.5 10.5 12 10.5
The rank correlation is bS = :94 and the p-value for testing the
null hypothesis that there is no correaltion is P0 (jRj > :94) 0
which is was obtained by simulation.
16.5
One Continuous Variable and One Discrete
Suppose that Y 2 f1; : : : ; I g is discrete and Z is continuous.
Let Fi (z ) = P(Z z jY = i) denote the cdf of Z conditional
on Y = i.
Theorem
16.15 When Y 2 f1; : : : ; I g is discrete and Z is con-
tinuous, then Y
q Z if and only if F1 = = FI .
It follows from the previous theorem that to test for independence, we need to test
H0 : F1 = = FI versus H1 : not H0 :
For simplicity, we consider the case where I = 2. To test the
null hypothesis that F1 = F2 we will use the
. Let n1 denote the nimber of observations for which Yi = 1 and let n2 denote the nimber of
observations for which Yi = 2. Let
two sample
Kolmogorov-Smirnov test
Fb1 (z ) =
and
Fb2 (z )
=
n
1 X
n1
i=1
n
1 X
n2
i=1
I (Zi z )I (Yi = 1)
I (Zi z )I (Yi = 2)
denote the empirical distribution function of Z given Y = 1 and
Y = 2 respectively. Dene the test statistic
D = sup jFb1 (x) Fb2 (x)j:
x
294
16.
Theorem
16.16 Let
H (t) = 1 2
1
X
j =1
( 1)j 1 e
Under the null hypthesis that F1 = F2 ,
r
lim P
n!1
2j 2 t2
:
n1 n2
D t = H (t):
n1 + n2
It follows from the theorem that an approximate level test
is obtained by rejecting H0 when
r
16.6
n1 n2
D > H 1 (1 ):
n1 + n 2
Bibliographic Remarks
Johnson, S.K. and Johnson, R.E. (1972). New England Journal
of Medicine. 287. 1122-1125.
Hancock, B.W. (1979). Clinical Oncology, 5, 283-297.
16.7
Exercises
4. Prove equation (16.14).
The data here
an approximate
creation using
information in
article.
are
rethe
the
5. The New York Times (January 8, 2003, page A12) reported the following data on death sentencing and race,
from a study in Maryland:
Death Sentence No Death Sentence
Black Victim 14
641
White Victim 62
594
16.7 Exercises
295
Analyze the data using the tools from this Chapter. Interpret the results. Explain why, based only on this information, you can't make causal conclusions. (The authors
of the study did use much more information in their full
report.)
6. Analyze the data on the variables Age and Financial Status from:
http://lib.stat.cmu.edu/DASL/Datales/montanadat.html
7. Estimate the correlation between temperature and latitude using the data from
http://lib.stat.cmu.edu/DASL/Datales/USTemperatures.html
Use the correlation coeÆcient and the Spearman rank correaltion. Provide estimates, tests and condence intervals.
8. Test whether calcium intake and drop in blood pressure
are associated. Use the data in
http://lib.stat.cmu.edu/DASL/Datales/Calcium.html
This is page 297
17
Undirected Graphs and Conditional
Independence
Graphical models are a class of multivariate statistical models
that are useful for representing independence relations. They are
also useful develop parsimonious models for multivariate data.
To see why parsimonious models are useful in the multivariate
setting, consider the problem of estimating the joint distribution of several discrete random variables. Two binary variables
Y1 and Y2 can be represented as a two-by-two table which corresponds to a multinomial with four categories. Similarly, k binary
variables Y1 ; : : : ; Y correspond to a multinomial with N = 2
categories. When k is even moderately large, N = 2 will be
huge. It can be shown in that case that the mle is a poor estimator. The reason is that the data are sparse: there are not
enough data to estimate so many parameters. Graphical models often require fewer parameters and may lead to estimators
with smaller risk. There are two main types of graphical models:
undirected and directed. Here, we introduce undirected graphs.
We'll discuss directed graphs later.
k
k
k
298
17. Undirected Graphs and Conditional Independence
17.1
Conditional Independence
Underlying graphical models is the concept of conditional independence.
Denition 17.1 Let
X , Y and Z be discrete random variables. We say that X and Y are conditionally independent given Z , written X q Y j Z , if
P(X
=
x; Y
=
j
y Z
=
z)
=
P(X = xjZ = z )P(Y
=
j
y Z
=
z)
(17.1)
for all x; y; z . If X , Y and Z are continuous random variables, we say that X and Y are conditionally independent
given Z if
fX;Y
for all x,
y
j (x; y jz ) = f j (xjz )f j (y jz ):
Z
X Z
Y Z
and z .
Intuitively, this means that, once you know Z , Y provides no
extra information about X .
The conditional independence relation satises some basic
properties.
Theorem 17.2 The following implications hold:
=)
X q Y j Z and U = h(X )
=)
X q Y j Z and U = h(X )
=)
X q Y j Z and X q W j(Y; Z )
=)
X q Y j Z and X q Z j Y
=)
X
q j
Y
Z
Y
U
X
X
X
q j
q j
q j( )
q( )j
q( )
X
Z
Y
Z
Y
Z; U
W; Y
Z
Y; Z :
The last property requires the assumption that all events have
positive probability; the rst four do not.
17.2 Undirected Graphs
299
Y
b
b
X
b
FIGURE 17.1. A graph with vertices
E
=
f(X; Y ); (Y; Z )g.
17.2
V
=
Z
fX; Y ; Z g.
The edge set is
Undirected Graphs
An undirected graph G = (V; E ) has a nite set V of vertices (or nodes) and a set E of edges (or arcs) consisting of
pairs of vertices. The vertices correspond to random variables
X; Y; Z; : : : and edges are written as unordered pairs. For example, (X; Y ) 2 E means that X and Y are joined by an edge.
An example of a graph is in Figure 17.1.
Two vertices are adjacent, written X Y , if there is an edge
between them. In Figure 17.1, X and Y are adjacent but X and
Z are not adjacent. A sequence X0 ; : : : ; X
is called a path if
X 1 X for each i. In Figure 17.1, X; Y; Z is a path. A graph
is complete if there is an edge between every pair of vertices.
A subset U V of vertices together with their edges is called a
n
i
i
subgraph.
If
are three distinct subsets if V , we say that C
separates A and B if every path from a variable in A to a
variable in B intersects a variable in C . In Figure 17.2 fY; W g
and fZ g are separated by fX g. Also, W and Z are separated
by fX; Y g.
A; B
and
C
300
FIGURE 17.2.
Z
W
b
b
X
Y
b
b
Z
fY; W g and fZ g
fX; Y g.
are separated by
fX g.
Also,
W
and
are separated by
Y
b
X
b
b
FIGURE 17.3.
17.3
X
Z
q Z jY .
Probability and Graphs
Let V be a set of random variables with distribution P. Construct a graph with one vertex for each random variable in V .
Suppose we omit the edge between a pair of variables if they are
independent given the rest of the variables:
no edge between
X
and Y
()
X
q jrest
Y
where \rest" refers to all the other variables besides X and Y .
This type of graph is called a pairwise Markov graph. Some
examples are shown in Figures 17.3, 17.4 17.6 and 17.5.
The graph encodes a set of pairwise conditional independence
relations. These relations imply other conditional independence
17.3 Probability and Graphs
301
Y
b
X
b
b
Z
FIGURE 17.4. No implied independence relations.
X
b
bW
Y
b
b
Z
and
Y
FIGURE 17.5.
b
X
X
q Z jfY; W g
b
Y
q W jfX; Z g.
b
b
Z
W
FIGURE 17.6. Pairwise independence implies that
But is
X
q Z jY ?
X
q Z jfY ; W g.
302
relations. How can we gure out what they are? Fortunately, we
can read these other conditional independence relations directly
from the graph as well, as is explained in the next theorem.
Theorem 17.3 Let
G=(
) be a pairwise Markov graph for
a distribution P. Let A; B and C be distinct subsets of V such
that C separates A and B . Then A q B jC .
V; E
Remark 17.4 If A and B are not connected (i.e. there is no path
from A to B ) then we may regard A and B as being separated
by the empty set. Then Theorem 17.3 implies that A q B .
The independence condition in Theorem 17.3 is called the
global Markov property. We thus see that the pairwise and
global Markov properties are equivalent. Let us state this more
precisely. Given a graph G , let Mpair (G ) be the set of distributions which satisfy the pairwise Markov property: thus P 2
Mpair (G ) if, under P , X q Y jrest if and only if there is no edge
between X and Y . Let Mglobal (G ) be the set of distributions
which satisfy the global Markov property: thus P 2 Mpair (G ) if,
under P, A q B jC if and only if C separates A and B .
Theorem 17.5 Let
G be a graph. Then,
Mpair
(G ) = Mglobal (G ).
This theorem allows us to construct graphs using the simpler
pairwise property and then we can deduce other independence
relations using the global Markov property. Think how hard this
would be to do algebraically. Returning to 17.6, we now see that
X q Z jY and Y q W jZ .
Example 17.6 Figure 17.7 implies that
(Y; Z ).
Example 17.7 Figure 17.8 implies that
j
Z Y
.
X
X
q
Y
,
X
q j(
W
q
and
X
) and
X
Z
Y; Z
q
q
17.3 Probability and Graphs
Y
b
X
b
b
FIGURE 17.7.
X
X
b
qY,
X
qZ
and
X
Z
q (Y; Z ).
Y
b
bW
b
Z
FIGURE 17.8.
X
q W j(Y; Z )
and
X
q Z jY .
303
304
17.4
Fitting Graphs to Data
Given a data set, how do we nd a graphical model that ts
the data. Some authors have devoted whole books to this subject. We will only treat the discrete case and we will consider a
method based on log-linear models which are the subject of
the next chapter.
17.5
Thorough treatments of undirected graphs can be found in Whittaker (1990) and Lauritzen (1996). Some of the exercises below
are adapted from Whittaker (1990).
17.6
Exercises
1. Consider random variables (X1 ; X2 ; X3 ). In each of the
following cases, draw a graph that has the given independence relations.
(a) X1 q X3 j X2 .
(b) X1 q X2 j X3 and
X1
q
X3
j
X2
(c) X1 q X2 j X3 and
X1
q
X3
j
X2
.
and
X2
q
X3
j
X1
.
2. Consider random variables (X1 ; X2 ; X3 ; X4 ). In each of the
following cases, draw a graph that has the given independence relations.
(a) X1 q X3 j X2 ; X4 and X1 q X4 j X2 ; X3 and X2 q X4 j
X1 ; X3 .
17.6 Exercises
X1
X2
b
b
305
X4
b
b
X3
FIGURE 17.9.
X1
X2
b
b
X3
b
X4
b
FIGURE 17.10.
(b) X1 q X2 j X3 ; X4 and X1 q X3 j X2 ; X4 and X2 q X3 j
X1 ; X4 .
(c)
X1
q
X3
j
X2 ; X4
and X2 q X4 j X1 ; X3 .
3. A conditional independence between a pair of variables is
minimal if it is not possible to use the Separation Theorem to eliminate any variable from the conditioning set,
i.e. from the right hand side of the bar (Whittaker 1990).
Write down the minimal conditional independencies from:
(a) Figure 17.9; (b) Figure 17.10; (c) Figure 17.11; (d)
Figure 17.12.
4. Here are breast cancer data on diagnostic center (X1 ),
nuclear grade (X2 ), and survival (X3 ):
306
X3
X4
b
bX
b
bX
2
1
FIGURE 17.11.
X1
b
X2
b
b
X3
b
b
b
X4
X5
X6
FIGURE 17.12.
17.6 Exercises
307
malignant malignant benign
benign
X3
died
survived
died survived
Boston
35
59
47
112
Glamorgan
42
77
26
76
X2
X1
(a) Treat this as a multinomial and nd the maximum
likelihood estimator.
(b) If someone has a tumour classied as benign at the
Glamorgan clinic, what is the estimated probability that
they will die? Find the standard error for this estimate.
(c) Test the following hypotheses:
q 2j
1 q 3j
2 q 3j
X1
X
X3
X
X
X2
X
X
X1
versus
versus
versus
q= 2j
= 3j
1 q
= 3j
2 q
X1
X
X3
X
X
X2
X
X
X1
Based on the results of your tests, draw and interpret the
resulting graph.
18
Loglinear Models
In this chapter we study loglinear models which are useful for
modelling multivariate discrete data. There is a strong connection between loglinear models and undirected graphs. Parts of
this Chapter draw on the material in Whittaker (1990).
18.1
The Loglinear Model
Let X = (X1 ; : : : ; Xm) be a random vector with probability
function
f (x) = P(X
= x) = P(X1 = x1; : : : ; Xm = xm )
where x = (x1; : : : ; xm ). Let rj be the number of values that
Xj takes. Without loss of generality, we can assume that Xj 2
f0; 1; : : : ; rj 1g. Suppose now that we have n such random vectors. We can think of the data as a sample from a Multinomial
with N = r1 r2 rm categories. The data can be represented as counts in a r1 r2 rm table. Let p = (p1; : : : ; pN )
denote the multinomial parameter.
This is page 309
310
18.
Loglinear Models
Let S = f1; : : : ; mg. Given a vector x = (x1 ; : : : ; xm ) and a
subset A S , let xA = (xj : j 2 A). For example, if A = f1; 3g
then xA = (x1; x3 ).
Theorem 18.1 The joint probability function f (x) of a single
random vector X = (X1 ; : : : ; Xm ) can be written as
X
log f (x) =
(18.1)
A (x)
AS
where the sum is over all subsets A of S
's satisfy the following conditions:
= f1; : : : ; mg and the
1. ; (x) is a constant;
2. For every A S ,
the rest of the x0j s.
A
(x) is only a function of xA and not
3. If i 2 A and xi = 0, then
(x) = 0.
The formula in equation (18.1) is called the log-linear expansion of f . Note that this is the probability function for a
single draw. Each A (x) will depend on some unknown parameters A. Let = (A : A S ) be the set of all these parameters.
We will write f (x) = f (x; ) when we want to estimate the dependence on the unknown parameters .
In terms of the multinomial, the parameter space is
P=
n
A
p = (p1 ; : : : ; pN ) : pj
0;
N
X
j =1
pj
o
=1
:
This is an N 1 dimensional space. In the log-linear represenation, the parameter space is
n
o
= = (1 ; : : : ; N ) : = (p); p 2 P
where (p) is the set of values associated with p. The set is
a N 1 dimensinal surface in R N . We can always go back and
forth betwee the two parameterizations we can write = (p)
and p = p( ).
18.1 The Loglinear Model
Example 18.2 Let X
311
Bernoulli(p) where 0 < p < 1. We can
write the probability mass function for X as
f (x) = px (1
p)1
x
= px1 p12
for x = 0; 1, where p1 = p and p2 = 1
x
p. Hence,
log f (x) = ; (x) + 1 (x)
where
; (x)
= log(p2) p1
1 (x) = x log p :
2
Notice that ; (x) is a constant (as a function of x) and 1 (x) =
0 when x = 0. Thus the three conditions of the Theorem hold.
The loglinear parameters are
0 = log(p2 ); 1 = log
p1
:
p2
The original, multinomial parameter space is P = f(p1 ; p2 )
pj 0; p1 + p2 = 1g. The log-linear parameter space is
= f(0 ; 1) 2 R 2 :
e0 +1 + e0
:
= 1:g
Given (p1 ; p2 ) we can solve for (0 ; 1 ). Conversely, given (0 ; 1 )
we can solve for (p1 ; p2 ). Example 18.3 Let X
= (X1; X2) where X1 2 f0; 1g and X2 2
f0; 1; 2g. The joint distribution of n such random vectors is a
multinomial with 6 categories. The multinomial parameters can
be written as a 2-by-3 table as follows:
multinomial
x2
0 1 2
x1 0
p00 p01 p02
1
p10 p11 p12
312
18.
Loglinear Models
The n data vectors can be summarized as counts:
data
x2
0
1
2
x1 0
C00 C01 C02
1
C10 C11 C12
For x = (x1 ; x2 ), the log-linear expansion takes the form
log f (x) = ; (x) + 1 (x) + 2 (x) +
12 (x)
where
; (x)
= log p00 p10
1 (x) = x1 log p
2 (x)
12 (x)
00 = I (x2 = 1) log
p01
p00
= I (x1 = 1; x2 = 1) log
+ I (x2 = 2) log
p11 p00
p01 p10
p02
p00
+ I (x1 = 1; x2 = 2) log
Convince yourself that the three conditions on the 's of the
theorem are satised. The six parameters of this model are:
1 = log p00
4 = log
p02
p00
2
= log
5
= log
p10
p00
p11 p00
p01 p10
3 = log
6 = log
p01
p00
p12 p00
p02 p10
:
The next Theorem gives an easy way to check for conditional
independence in a loglinear model.
Theorem 18.4 Let (Xa ; Xb ; Xc) be a partition of a vectors (X1 ; : : : ; Xm ).
Then Xb q Xc jXa if and only if all the -terms in the log-linear
expansion that have at least one coordinate in b and one coordinate in c are 0.
To prove this Theorem, we will use the following Lemma
whose proof follows easily from the denition of conditional independence.
p12 p00
:
p02 p10
18.2 Graphical Log-Linear Models
313
Lemma 18.5 A partition (Xa ; Xb ; Xc) satises Xb q XcjXa if and
only if f (xa ; xb ; xc ) = g (xa ; xb )h(xa ; xc ) for some functions g and
h
(Theorem 18.4.) Suppose that t is S
0 whenever tShas
coordinates in b and c. Hence, t is 0 if t 6 a b or t 6 a c.
Therefore
X
X
X
log f (x) = S t (x) + S t (x)
t (x):
Proof.
ta
b
ta
ta
c
Exponentiating, we see that the joint density is of the form
g (xa ; xb )h(xa ; xc ). By Lemma 18.5, Xb q Xc jXa . The converse
follows by reversing the argument. 18.2
Graphical Log-Linear Models
A log-linear model is graphical if missing terms correspond
only to conditional independence constraints.
Denition 18.6 Let log f (x) =
P
AS
A
(x) be a log-linear
model. Then f is graphical if all -terms are non-zero
except for any pair of coordinates not in the edge set for
some graph G . In other words, A (x) = 0 if and only if
fi; j g A and (i; j ) is not an edge.
Here is way to think about the denition above:
If you can add a term to the model and the graph does
not change, then the model is not graphical.
Example 18.7 Consider the graph in Figure 18.1.
is
The graphical log-linear model that corresponds to this graph
log f (x) =
+
; + 1 (x) + 2 (x) + 3 (x) + 4 (x) + 5 (x)
12 (x) + 23 (x) + 25 (x) + 34 (x) + 35 (x) + 45 (x) + 235 (x) + 345 (x):
314
18.
Loglinear Models
X5
X4
b
b
b
X1
X2
X3
b
b
FIGURE 18.1. Graph for Example 18.7.
Let's see why this model is graphical. The edge (1; 5) is missing
in the graph. Hence any term containing that pair of indices is
omitted from the model. For example,
15 ;
125 ;
135 ;
145 ;
1235 ;
1245 ;
1345 ;
12345
are all omitted. Similarly, the edge (2; 4) is missing and hence
24 ;
124 ;
234 ;
245 ;
1234 ;
1245 ;
2345 ;
12345
are all omitted. There are other missing edges as well. You can
check that the model omits all the corresponding terms. Now
consider the model
log f (x) =
+
; (x) + 1 (x) + 2 (x) + 3 (x) + 4 (x) + 5 (x)
12 (x) + 23 (x) + 25 (x) + 34 (x) + 35 (x) + 45 (x):
This is the same model except that the three way interactions
were removed. If we draw a graph for this model, we will get the
same graph. For example, no terms contain (1; 5) so we omit
the edge between X1 and X5 . But this is not graphical since it has
extra terms omitted. The independencies and graphs for the two
models are the same but the latter model has other constraints
besides conditional independence constraints. This is not a bad
18.3 Hierarchical Log-Linear Models
315
thing. It just means that if we are only concerned about presence
or absence of conditional independences, then we need not consider such a model. The presence of the three-way interaction
235 means that the strength of association between X2 and X3
varies as a function of X5 . Its absence indicates that this is not
so. 18.3
Hierarchical Log-Linear Models
There is a set of log-linear models that is larger than the set of
graphical models and that are used quite a bit. These are the
hierarchical log-linear models.
Denition 18.8 A log-linear model is hierarchical if
0 and a t implies that
t
= 0.
a
=
Lemma 18.9 A graphical model is hierarchical but the reverse
need not be true.
Example 18.10 Let
log f (x) = ; (x) + 1(x) + 2 (x) + 3 (x) +
12 (x) + 13 (x):
The model is hierarchical; its graph is given in Figure 18.2. The
model is graphical because all terms involving (1,3) are omitted.
It is also hierarchical. Example 18.11 Let
log f (x) = ; (x)+ 1 (x)+ 2 (x)+ 3 (x)+ 12 (x)+ 13 (x)+ 23 (x):
The model is hierarchical. It is not graphical. The graph corresponding to this model is complete; see Figure 18.3. It is not
graphical because 123 (x) = 0 which does not correspond to any
pairwise conditional independence. 316
18.
Loglinear Models
b
b
b
X1
X2
X3
FIGURE 18.2. Graph for Example 18.10.
X1
X2
b
b
b
X3
FIGURE 18.3. The graph is complete. The model is hierarchical but
not graphical.
18.4 Model Generators
b
b
b
X1
X2
X3
317
FIGURE 18.4. The model for this graph is not hierarchical.
Example 18.12 Let
log f (x) = ; (x) + 3 (x) +
12 (x):
The graph corresponding is in Figure 18.4. This model is not
hierarchical since 2 = 0 but 12 is not. Since it is not hierarchical, it is not graphical either. 18.4
Model Generators
Hierarchical models can be written succinctly using generators. This is most easily explained by example. Suppose that
X = (X1 ; X2 ; X3 ). Then, M = 1:2 + 1:3 stands for
log f =
; + 1 + 2 + 3 + 12 + 13 :
The formula M = 1:2+1:3 says: \include 12 and 13 ." We have
to also include the lower order terms or it won't be hierarchical.
The generator M = 1:2:3 is the saturated model
log f =
; + 1 + 2 + 3 + 12 + 13 + 23 + 123 :
The saturated models corresponds to tting an unconstrained
multinomial. Consider M = 1 + 2 + 3 which means
log f =
; + 1 + 2 + 3:
318
18.
Loglinear Models
1.2
1+2
1
2
;
FIGURE 18.5. The lattice with two variables.
This is the mutual independence model. Finally, consider M =
1:2 which has log-linear expansion
log f =
; + 1 + 2 + 12 :
This model makes X3 jX2 = x2 ; X1 = x1 a uniform distribution.
18.5
Lattices
Hierarchical models can be organized into something called a
lattice. This is the set of all hierarchical models partially ordered by inclusion. The set of all hierarchical models for two
variables can be illustrated as in Figure 18.5.
M = 1:2 is the saturated model, M = 1 + 2 is the independence model, M = 1 is independence plus X2jX1 is uniform,
M = 2 is independence plus X1 jX2 is uniform, M = 0 is the
uniform distribution.
The lattice of trivariate models is shown in gure 18.6.
18.6
Fitting Log-Linear Models to Data
Let denote all the parameters in a log-linear model M . The
loglikelihood for is
`( ) =
X
j
xj log pj ( )
18.6 Fitting Log-Linear Models to Data
saturated
(graphical)
1.2.3
two-way
1.2 + 1.3 + 2.3
319
graphical
1.2 + 1.3
1.2 + 2.3
1.3 + 2.3
graphical
1.2 + 3
1.3 + 2
2.3 + 1
mutual independence
(graphical)
conditional uniform
independence
uniform
1+2+3
1.2
1.3
2.3
1+2
1+3
2+3
1
2
3
;
FIGURE 18.6. The lattice of models for three variables.
320
18.
Loglinear Models
where the sum is over the cells and p( ) denotes the cell probabilities corresponding to . The mle b generally has to be found
numerically. The model with all possible -terms is called the
saturated model. We can also t any sub-model which corresponds to setting some subset of terms to 0.
Denition 18.13 For any submodel M , dene the deviance
dev(M ) by
dev(M ) = 2(b̀sat
b̀M )
where b̀sat is the log-likelihood of the saturated model evaluated at the mle and b̀M is the log-likelihood of the model
M evaluated at its mle .
Theorem
for
18.14
The deviance is the likelihood ratio test statistic
H0 : the
model is M versus H1 : the model is not M:
d
Under H0 , dev (M ) !
2 with degrees of freedom equal to the
dierence in the number of parameters between the saturated
model and M .
One way to nd a good model is to use the deviance to test
every sub-model. Every model that is not rejected by this test is
then considered a plausible model. However, this is not a good
strategy for two reasons. First, we will end up doing many tests
which means that there is ample opportunity for making Type I
and Type II errors. Second, we will end up using models where
we failed to reject H0. But we might fail to reject H0 due to low
power. The result is that we end up with a bad model just due
to low power.
There are many model searching strategies. A common approach is to use some form of penalized likelihood. One version
of penalized is the AIC that we used in regression. For any model
18.6 Fitting Log-Linear Models to Data
M
dene
321
AIC(M ) = 2
jM j
(18.2)
where jM j is the number of parameters. Here is the explanation
of where AIC comes from.
Consider a set of models fM1; M2 ; : : :g. Let fbj (x) denote the
estimated probability function obtained by using the maximum
likelihood estimator ofr mode Mj . Thus, fbj (x) = fb(x; bj ) where
bj is the mle of the set of parameters j for model Mj . We will
use the loss function D(f; fb) where
X
f (x)
f (x) log
D(f; g ) =
g (x)
x
b̀(M )
is the Kullback-Leibler distance between two probability functions. The corresponding risk function is R(f;P
fb) = E (D(f; fb).
Notice that D(f; fb) = c A(f; fb) where c = x f (x) log f (x)
deos not depend on fb and
A(f; fb) =
X
x
f (x) log fb(x):
Thus, minimizing the risk is equivalent to maximizing a(f; fb) E (A(f; fb)).
It is tempting to estimate a(f; fb) by Px fb(x) log fb(x) but,
just as the training error in regression is a highly
P bbiased estimate of prediction risk, it is also the case that x f (x) log fb(x)
is a highly biased estimate of a(f; fb). In fact, the bias is approximately equal to jMj j. Thus:
Theorem 18.15 AIC (Mj ) is an approximately unbiased estimate
of a(f; fb).
After nding a \best model" this way we can draw the corresponding graph. We can also check the overall t of the selected
model using the deviance as described above.
322
18.
Loglinear Models
Example 18.16 Data on breast cancer from Morrison et al (1973)
were presented in homework question 4 of Chapter 17. The data
are on diagnostic center (X1 ), nuclear grade (X2 ), and survival
(X3 ): (Morrison et al 1973):
X2 malignant malignant benign
benign
X3
died survived
died survived
X1
Boston
35
59
47
112
Glamorgan
42
77
26
76
The saturated log-linear model is:
CoeÆcient
Standard Error Wald Statistic p-value
(Intercept)
3.56
0.17 21.03
center
0.18
0.22
0.79
grade
0.29
0.22
1.32
survival
0.52
0.21
2.44
centergrade
-0.77
0.33
-2.31
centersurvival
0.08
0.28
0.29
gradesurvival
0.34
0.27
1.25
centergradesurvival
0.12
0.40
0.29
The best sub-model, selected using AIC and backward searching
is
0.00 ***
0.42
0.18
0.01 *
0.02 *
0.76
0.20
0.76
CoeÆcient
Standard Error Wald Statistic p-value
(Intercept)
3.52
0.13 25.62 < 0.00 ***
center
0.23
0.13
1.70 0.08
grade
0.26
0.18
1.43 0.15
survival
0.56
0.14
3.98 6.65e-05 ***
centergrade
-0.67
0.18
-3.62 0.00 ***
gradesurvival
0.37
0.19
1.90 0.05
The graph for this model M is shown in Figure 18.7. To test
the t of this model, we compute the deviance of M which is
0.6. The appropriate 2 has 8 6 = 2 degrees of freedom. The
p-value is P(22 > :6) = :74. So we have no evidence to suggest
that the model is a poor t. 18.6 Fitting Log-Linear Models to Data
Center
Grade
323
Survival
FIGURE 18.7. The graph for Example 18.16.
The example above is a toy example. With so few variables,
there is little to be gained by searching for a best model. The
overall multinomial mle would be suÆcient. More importantly,
we are probably interested in predicting survivial from grade and
center in which case this should really be treated as a regresion
problem with outcome Y = survival and covariates center and
grade.
Example 18.17 Here is a synthetic example. We generate n =
100 random vectors X = (X1; : : : ; X5) of length 5. We generated
the data as follows:
X1 Bernoulli(1=2)
and
Xj jX1 ; : : : ; Xj 1 It follows that
f (x1 ; : : : ; x5 )
=
=
1=4 if Xj 1 = 0
3=4 if Xj 1 = 1:
x1 1 x1 x2 1 x2 x3 1 x3 x4 1 x4
3
3
3
3
3
3
3
1 3
2
4
2
4
4
4
4
x1+x2+x3+x4 4 x1 x2 x3 x4
1 3
3
4
4
:
We estimated f using three methods: (i) maximum likelihood
treating this as a multinomial with 32 categories, (ii) maximum
likelihood from the best loglinear model using AIC and forward
selection and (iii) maximum likelihood from the best loglinear
4
4
4
324
18.
Loglinear Models
model using BIC and forward selection. We estimated the risk
by simulating the example 100 times. The average risks were:
Method Risk
MLE
0.63
AIC
0.54
BIC
0.53
In this example, there is little dierence between AIC and BIC.
Both are better than maximum likelihood. 18.7
A classic references on loglinear models is Bishop, Fienberg and
Holland (1975). See also Whittaker (1990) from which some of
the exercises are borrowed.
18.8
Exercises
1. Solve for the p0ij s in terms of the 's in Example 18.3.
2. Repeat example 18.17 using 7 covariates and n = 1000.
To avoid numerical problems, replace any zero count with
a one.
3. Prove Lemma 18.5.
5. Consider random variables (X1; X2; X3 ; X4). Suppose the
log-density is
log f (x) = ; (x) + 12 (x) + 13 (x) + 24 (x) + 34 (x):
(a) Draw the graph G for these variables.
(b) Write down all independence and conditional independence relations implied by the graph.
18.8 Exercises
325
(c) Is this model graphical? Is it hierarchical?
6. Suppose that parameters p(x1; x2 ; x3) are proportional to
the following values:
x1
0 0
0 1
0 2 8
1 16 128
x2
x3
1 1
0 1
4 16
32 256
Find the -terms for the log-linear expansion. Comment
on the model.
7. Let X1; : : : ; X4 be binary. Draw the independence graphs
corresponding to the following log-linear models. Also,
identify whether each is graphical and/or hierarchical (or
neither).
(a) log f = 7 + 11x1 + 2x2 + 1:5x3 + 17x4
(b) log f = 7+11x1 +2x2 +1:5x3 +17x4 +12x2 x3 +78x2 x4 +
3x3 x4 + 32x2x3 x4
(c) log f = 7+11x1 +2x2 +1:5x3 +17x4 +12x2x3 +3x3x4 +
x1 x4 + 2x1 x2
(d) log f = 7 + 5055x1x2 x3 x4
This is page 327
19
Causal Inference
In this Chapter we discuss causation. Roughly speaking, the
statement \X causes Y " means that changing the value of X
will change the distribution of Y . When X causes Y , X and Y
will be associated but the reverse is not, in general, true. Association does not necessarily imply causation. We will consider two
frameworks for discussing causation. The rst uses the notation
of counterfactual random variables. The second, presented in
the next Chapter, uses directed acyclic graphs.
19.1
The Counterfactual Model
Suppose that X is a binary treatment variable where X = 1
means \treated" and where X = 0 means \not treated." We are
using the word \treatment" in a very broad sense: treatment
might refer to a medication or something like smoking. An alternative to \treated/not treated" is \exposed/not exposed" but
we shall use the former.
328
19. Causal Inference
Let Y be some outcome variable such as presence or absence of
disease. To distinguish the statement \X is associated Y " from
the statement \X causes Y " we need to enrich our probabilistic
vocabulary. We will decompose the response Y into a more negrained object.
We introduce two new random variables (C0 ; C1 ), called potential outcomes with the following interpretation: C0 is the
outcome if the subject is not treated (X = 0) and C1 is the
outcome if the subject is treated (X = 1). Hence,
Y=
C0 if X = 0
C1 if X = 1:
We can express the relationship between Y and (C0 ; C1 ) more
succintly by
Y = CX :
(19.1)
This equation is called the consistency relationship.
Here is a toy data set to make the idea clear:
X Y C0 C1
0
0
0
0
1
1
1
1
4
7
2
8
3
5
8
9
4
7
2
8
*
*
*
*
*
*
*
*
3
5
8
9
The asterisks denote unobserved values. When X = 0 we don't
observe C1 in which case we say that C1 is a counterfactual
since it is the outcome you would have had if, counter to the
fact, you had been treated (X = 1). Similarly, when X = 1 we
don't observe C0 and we say that C0 is counterfactual.
Notice that there are four types of subjects:
19.1 The Counterfactual Model
329
Type
C0 C1
Survivors
1 1
Responders
0 1
Anti-repsonders 1 0
Doomed
0 0
Think of the potential outcomes (C0 ; C1 ) as hidden variables
that contain all the relavent information about the subject.
Dene the average causal eect or average treatment
eect to be
= E (C1 ) E (C0 ):
(19.2)
The parameter has the following interpretation: is the mean
if everyone were treated (X = 1) minus the mean if everyone
were not treated (X = 0). There are other ways of measuring
the causal eect. For example, if C0 and C1 are binary, we dene
the causal odds ratio
P(C1 = 1)
P(C1 = 0)
1)
PP((CC0 == 0)
0
and the causal relative risk
P(C1 = 1)
:
P(C0 = 1)
The main ideas will be the same whatever causal eect we use.
For simplicity, we shall work with the average causal eect .
Dene the association to be
= E (Y jX = 1)
E (Y
jX = 0):
(19.3)
Again, we could use odds ratios or other summaries if we wish.
19.1 (Association is not equal to Causation) In
general, 6= .
Theorem
Example 19.2
Suppose the whole population is as follows:
330
X Y C0 C1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
Again, the asterisks denote unobserved values. Notice that C0 =
C1 for every subject, thus, this treatment has no eect. Indeed,
8
8
1X
1X
= E (C1 ) E (C0 ) =
C
C
8 i=1 1i 8 i=1 0i
1+1+1+1+0+0+0+0
1+1+1+1+0+0+0+0
=
8
8
= 0:
Thus the average causal eect is 0. The observed data are X 's
and Y 's only from which we can estimate the association:
jX = 1) E (Y jX = 0)
1+1+1+1 0+0+0+0
= 1:
=
4
4
=
E (Y
Hence, 6= .
To add some intutition to this example, imagine that the outcome variable is 1 if \healthy" and 0 if \sick". Suppose that
X = 0 means that the subject does not take vitamin C and that
X = 1 means that the subject does take vitamin C. Vitamin
C has no causal eect since C0 = C1 . In this example there
are two types of people: healthy people (C0 ; C1 ) = (1; 1) and
unhealthy people (C0 ; C1 ) = (0; 0). Healthy people tend to take
vitamin C while unhealty people don't. It is this association between (C0 ; C1 ) and X that creates an association between X and
Y . If we only had data on X and Y we would conclude that X
19.1 The Counterfactual Model
331
and Y are associated. Suppose we wrongly interpret this causally
and conclude that vitamin C prevents illness. Next we might encourage everyone to take vitamin C. If most people comply the
population will look something like this:
X Y C0 C1
0
1
1
1
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
Now = (4=7) (0=1) = 4=7. We see that went down from 1
to 4/7. Of course, the causal eect never changed but the naive
observer who does not distinguish association and causation will
be confused because his advice seems to have made things worse
instead of better. In the last example, = 0 and = 1. It is not hard to create
examples in which > 0 and yet < 0. The fact that the
association and causal eects can have dierent signs is very
confusing to many people including well educated statisticians.
The example makes it clear that, in general, we cannot use the
association to estimate the causal eect . The reason that 6= is that (C0 ; C1 ) was not independent of X . That is, treatment
assigment was not independent of person type.
Can we ever estimate the causal eect? The answer is: sometimes. In particular, random asigment to treatment makes it
possible to estimate .
19.3 Suppose we randomly assign subjects to treatment and that P(X = 0) > 0 and P(X = 1) > 0. Then = .
Hence, any consistent estimator of is a consistent estimator
Theorem
332
of . In particular, a consistent estimator is
b =
Eb (Y
= Y1
jX = 1) Eb (Y jX = 0)
Y0
is a consistent estimator of , where
Y1 =
1
X
n1 i:X =1
Yi ;
Y0 =
i
n1 =
1
X
Y;
n0 i:X =0 i
i
Pn X and n = Pn (1 X ).
0
i
i=1 i
i=1
Since X is randomly assigned, X is independent of
(C0 ; C1 ). Hence,
Proof.
=
E (C1 )
E (C0 )
= E (C1 jX = 1) E (C0 jX = 0)
= E (Y jX = 1) E (Y jX = 0)
= :
since X q (C0 ; C1 )
since Y = CX
The consistency follows from the law of large numbers. If Z is a covariate, we dene the conditional causal eect
by
z = E (C1 jZ = z )
E (C0
jZ = z):
For example, if Z denotes gender with values Z = 0 (women)
and Z = 1 (men), then 0 is the causal eect among women and
1 is the causal eect among men. In a randomized experiment,
z = E (Y jX = 1; Z = z ) E (Y jX = 0; Z = z ) and we can
estimate the conditional causal eect using appropriate sample
averages.
19.2 Beyond Binary Treatments
333
Summary
Random variables: (C0 ; C1 ; X; Y ).
Consistency relationship: Y = CX .
Causal Eect: = E (C1 ) E (C0 ).
Association: = E (Y jX = 1) E (Y jX = 0).
Random assigment =) (C0 ; C1 ) q X =) = .
19.2
Beyond Binary Treatments
Let us now generalize beyond the binary case. Suppose that
X 2 X . For example, X could be the dose of a drug in which
case X 2 R . The counterfactual vector (C0 ; C1 ) now becomes
the counterfactual process
fC (x) : x 2 Xg
(19.4)
where C (x) is the outcome a subject would have if he received
dose x. The observed reponse is given by the consistency relation
Y
C (X ):
(19.5)
See Figure 19.1. The causal regression function is
(x) = E (C (x)):
(19.6)
The regression function, which measures association, is r(x) =
E (Y jX = x).
19.4 In general, (x) 6= r(x). However, when X is
randomly assigned, (x) = r(x).
Theorem
An example in which (x) is constant but r(x)
is not constant is shown in Figure 19.2. The gure shows the
counterfactuals processes for four subjects. The dots represent
Example 19.5
334
C (x)
7
6
5
Y = C (X ) 4
3
2
1
0
0
1
2
3
4
5
6
X
7
8
9
10
FIGURE 19.1. A counterfactual process C (x). The outcome Y is the
value of the curve C (x) evaluated at the observed dose X .
11
x
19.3 Observational Studies and Confounding
335
their X values X1 ; X2 ; X3 ; X4 . Since Ci(x) is constant over x
for all i, there is no causal eect and hence
C (x) + C2 (x) + C3 (x) + C4 (x)
(x) = 1
4
is constant. Changing the dose x will not change anyone's outcome. The four dots in the lower plot represent the observed data
points Y1 = C1 (X1 ); Y2 = C2 (X2 ); Y3 = C3 (X3 ); Y4 = C4 (X4 ).
The dotted line represents the regression r(x) = E (Y jX = x).
Although there is no causal eect, there is an association since
the regression curve r(x) is not constant. 19.3
Observational Studies and Confounding
A study in which treatment (or exposure) is not randomly assigned, is called an observational study. In these studies, subjects select their own value of the exposure X . Many of the
health studies your read about in the newspaper are like this.
As we saw, association and causation could in general be quite
dierent. This descrepancy occurs in non-randomized studies
because the potential outcome C is not independent of treatment X . However, suppose we could nd groupings of subjects
such that, within groups, X and fC (x) : x 2 Xg are independent. This would happen if the subjects are very similar within
groups. For example, suppose we nd people who are very similar in age, gender, educational background and ethnic background. Among these people we might feel it is reasonable to
assume that the choice of X is essentially random. These other
variables are called confounding variables. If we denote these
other variables collectively as Z , then we can express this idea
by saying that
fC (x) : x 2 Xg q X jZ:
(19.7)
Equation (19.7) means that, within groups of Z , the choice of
treatment X does not depend on type, as represented by fC (x) :
A more precise definition of confounding is given in the
next Chapter.
336
3
b
C1 (x)
b
C2 (x)
2
b
C3 (x)
1
b
C4 (x)
0
0
1
2
3
4
b
3
b
2
Y3
b
1
Y4
b
0
0
1
2
3
5
x
Y1
Y2
(x)
4
5
FIGURE 19.2. The top plot shows the counterfactual process C (x)
for four subjects. The dots represent their X values. Since Ci (x) is
constant over x for all i, there is no causal eect. Changing the dose
will not change anyone's outcome. The lower plot shows the causal
regression function (x) = (C1 (x) + C2 (x) + C3 (x) + C4 (x))=4.
The four dots represent the observed data points
Y1 = C1 (X1 ); Y2 = C2 (X2 ); Y3 = C3 (X3 ); Y4 = C4 (X4 ). The
dotted line represents the regression r(x) = E (Y jX = x). There is
no causal eect since Ci (x) is constant for all i. But there is an
association since the regression curve r(x) is not constant.
19.3 Observational Studies and Confounding
337
x 2 Xg. If (19.7) holds and we observe Z then we say that there
is no unmeasured confounding.
Theorem
19.6 Suppose that (19.7) holds. Then,
(x) =
Z
E (Y
jX = x; Z = z)dFZ (z)dz:
(19.8)
If rb(x; z ) is a consistent estimate of the regression function E (Y jX =
x; Z = z ), then a consistent estimate of (x) is
n
b(x) = 1 X rb(x; Zi):
n i=1
In particular, if r(x; z ) = 0 + 1 x + 2 z is linear, then a consistent estimate of (x) is
b(x) = b0 + b1 x + b2 Z n
(19.9)
where (b0 ; b1 ; b2 ) are the least squares estimators.
It is useful to compare equation R(19.8) to E (Y jX =
x) which can be written as E (Y jX = x) = E (Y jX = x; Z =
z )dFZ jX (z jx).
Remark 19.7
Epidemiologists call (19.8) the adjusted treatment eect.
The process of computing adjusted treatment eects is called
adjusting (or controlling) for confounding. The selection
of what confounders Z to measure and control for requires scientic insight. Even after adjusting for confounders, we cannot
be sure that there are not other confounding variables that we
missed. This is why observational studies must be treated with
healthy skepticism. Results from observational studies start to
become believable when: (i) the results are replicated in many
studies, (ii) each of the studies controlled for plausible confounding variables, (iii) there is a plausible scientic explanation for
the existence of a causal relationship.
338
A good example is smoking and cancer. Numerous studies
have shown a relationship between smoking and cancer even after adjusting for many confouding variables. Moreover, in laboratory studies, smoking has been shown to damage lung cells. Finally, a causal link between smoking and cancer has been found
in randomized animal studies. It is this collection of evidence
over many years that makes this a convincing case. One single
observational study is not, by itself, strong evidence. Remember
that when you read the newspaper.
19.4
Simpson's Paradox
Simpson's paradox is a puzzling phenomenon that is dicussed in
most statistics texts. Unfortunately, it is explained incorrectly in
most statistics texts. The reason is that it is nearly impossible to
explain the paradox without using counterfactuals (or directed
acyclic graphs).
Let X be a binary treatment variable, Y a binary outcome
and Z a third binary variable such as gender. Suppose the joint
distribution of X; Y; Z is
X=1
X=0
Y =1 Y =0
.1500 .2250
.0375 .0875
Z = 1 (men)
Y =1
Y =0
.1000
.0250
.2625
.1125
Z = 0 (women)
The marginal distribution for (X; Y ) is
19.4 Simpson's Paradox
X=1
X=0
339
Y =1 Y =0
.25
.30
.55
.25 .50
.20 .50
.45
1
Now,
j
P(Y = 1 X = 1)
j
P(Y = 1 X = 0) =
P(Y = 1; X = 1)
P(X = 1)
:25 :30
:50 :50
= 0:1:
P(Y = 1; X = 0)
P(X = 0)
=
We might naively interpret this to mean that the treatment
is bad for you since P(Y = 1jX = 1) < P(Y = 1jX = 0).
Furthermore, among men,
j
P(Y = 1 X = 1; Z = 1)
j
P(Y = 1 X = 0; Z = 1)
P(Y = 1; X = 1; Z = 1)
=
P(X = 1; Z = 1)
=
:15
:3750
= 0:1:
:0375
:1250
P(Y = 1; X = 0; Z = 1)
P(X = 0; Z = 1)
Among women,
j
P(Y = 1 X = 1; Z = 0)
j
P(Y = 1 X = 0; Z = 0)
P(Y = 1; X = 1; Z = 0)
=
P(X = 1; Z = 0)
=
:1
:1250
= 0:1:
:2625
:3750
P(Y = 1; X = 0; Z = 0)
P(X = 0; Z = 0)
To summarize, we seem to have the following information:
340
Mathematical Statement
P(Y = 1jX = 1) < P(Y = 1jX = 0)
P(Y = 1jX = 1; Z = 1) > P(Y = 1jX = 0; Z = 1)
P(Y = 1jX = 1; Z = 0) > P(Y = 1jX = 0; Z = 0)
English Statement?
treatment is harmful
treatment is benecial to men
treatment is benecial to women
Clearly, something is amiss. There can't be a treatment which
is good for men, good for women but bad overall. This is nonsense. The problem is with the set of English statements in the
table. Our translation from math into english is specious.
The inequality P(Y = 1jX = 1) < P(Y = 1jX =
0) does not mean that treatment is harmful.
The phrase \treatment is harmful" should be written mathematically as P(C1 = 1) < P(C0 = 1). The phrase \treatment is
harmful for men" should be written P(C1 = 1jZ = 1) < P(C0 =
1jZ = 1). The three mathematical statements in the table are
not at all contradictory. It is only the translation into English
that is wrong.
Let us now show that a real Simpson's paradox cannot happen, that is, there cannot be a treatment that is benecial for
men and women but harmful overall. Suppose that treatment is
benecial for both sexes. Then
j
P(C1 = 1 Z = z )
> P(C0 = 1jZ = z )
for all z . It then follows that
P(C1 = 1) =
X
z
X
j
P(C1 = 1 Z = z )P(Z = z )
P(C0 = 1jZ = z )P(Z = z )
z
= P(C0 = 1):
>
Hence, P(C1 = 1) > P(C0 = 1) so treatment is benecial overall.
No paradox.
19.5
341
The use of potential outcomes to clarify causation is due mainly
to Jerzy Neyman and Don Rubin. Later developments are due
to Jamie Robins, Paul Rosenbaum and others. A parallel development took place in econometrics by various people including
Jim Heckman and Charles Manski. Currently, there are no textbook treatments of causal inference from the potential outcomes
viewpoint.
19.6
Exercises
1. Create an example like Example 19.2 in which > 0 and
< 0.
3. Suppose you are given data (X1 ; Y1 ); : : : ; (Xn; Yn) from an
observational study, where Xi 2 f0; 1g and Yi 2 f0; 1g.
Although it is not possible to estimate the causal eect ,
it is possible to put bounds on . Find upper and lower
bounds on that can be consistently estimated from the
data. Show that the bounds have width 1. Hint: Note that
E (C1 ) = E (C1 jX = 1)P(X = 1) + E (C1 jX = 0)P(X = 0).
4. Suppose that X 2 R and that, for each subject i, Ci (x) =
1i x. Each subject has their own slope 1i . Construct a
joint distribution on (1 ; X ) such that P(1 > 0) = 1
but E (Y jX = x) is a decreasing function of x, where Y =
C (X ). Interpret. Hint: Write f (1 ; x) = f (1 )f (xj1 ). Choose
f (xj1 ) so that when 1 is large, x is small and when 1
is small x is large.
This is page 343
20
Directed Graphs
20.1
Introduction
Directed graphs are similar to undirected graphs except that
there are arrows between vertices instead of edges. Like undirected graphs, directed graphs can be used to represent independence relations. They can also be used as an alternative to
counterfactuals to represent causal relationships. Some people
use the phrase Bayesian network to refer to a directed graph
endowed with a probability distribution. This is a poor choice of
terminology. Statistical inference for directed graphs can be performed using frequentist or Bayesian methods so it is misleading
to call them Bayesian networks.
20.2
DAG's
A directed graph G consists of a set of vertices V and an edge
set E of ordered pairs of variables. If (X; Y ) 2 E then there is
an arrow pointing from X to Y . See Figure 20.1.
344
20. Directed Graphs
Y
X
Z
FIGURE 20.1. A directed graph with
E = f(Y; X ); (Y ; Z )g.
V
= fX; Y ; Z g and
If an arrow connects two variables X and Y (in either direction) we say that X and Y are adjacent. If there is an arrow
from X to Y then X is a parent of Y and Y is a child of
X . The set of all parents of X is denoted by or (X ). A
directed path from X to Y is a set of vertices beginning with
X , ending with Y such that each pair is connected by an arrow
and all the arrows point in the same direction as follows:
X
X
! ! Y
or X
Y
A sequence of adjacent vertices staring with X and ending with
Y but ignoring the direction of the arrows is called an undirected path. The sequence fX; Y; Z g in Figure 20.1 is an undirected path. X is an ancestor of Y is there is a directed path
from X to Y . We also say that Y is a descendant of X .
A conguration of the form:
X
!Y
Z
is called a collider. A conguration not of that form is called a
non-collider, for example,
X
or
X
!Y !Z
Y
!Z
20.3 Probability and DAG's
345
A directed path that starts and ends at the same variable is
called a cycle. A directed graph is acyclic if it has no cycles. In
this case we say that the graph is a directed acyclic graph or
DAG. From now on, we only deal with graphs that are DAG's.
20.3
Probability and DAG's
Let G be a DAG with vertices V = (X1 ; : : : ; X ).
k
Denition 20.1 If P is a distribution for V with probability
function p, we say that If P is
represents P if
p(v ) =
Y
k
Markov to
p(x
i
G or that G
j )
(20.1)
i
i=1
where are the parents of X . The set of distributions
represented by G is denoted by M (G ).
i
i
Example 20.2 For the DAG in Figure 20.2, P
only if its probability function p has the form
2 M (G ) if and
p(x; y; z; w) = p(x)p(y )p(z j x; y )p(w j z ):
The following theorem says that P 2 M (G ) if and only if the
Markov Condition holds. Roughly speaking, the Markov Condition says that every variable W is independent of the \past"
given its parents.
Theorem 20.3 A distribution P
lowing
Markov Condition
2 M (G ) if and only if the fol-
holds: for every variable W ,
fj
W qW
W
(20.2)
f denotes all the other variables except the parents and
where W
descendants of W .
346
20. Directed Graphs
X
Z
W
Y
FIGURE 20.2. Another DAG.
B
A
D
E
C
FIGURE 20.3. Yet another DAG.
Example 20.4 In Figure 20.2, the Markov Condition implies that
X qY
and W
q fX; Y g j Z: Example 20.5 Consider the DAG in Figure 20.3. In this case
probability function must factor like
p(a; b; c; d; e) = p(a)p(bja)p(cja)p(djb; c)p(ejd):
The Markov Condition implies the following independence relations:
D q A j fB; C g; E q fA; B; C g j D and B q C j A
X2
20.4 More Independence Relations
347
X3
X1
X5
X4
FIGURE 20.4. And yet another DAG.
20.4
More Independence Relations
With undirected graphs, we saw that pairwise independence relations implied other independence relations. Luckily, we were
able to deduce these other relations from the graph. The situation is similar for DAG's. The Markov Condition allows us to
list some independence relations. These relations may logically
imply other other independence relations. Consider the DAG in
Figure 20.4. The Markov Condition implies:
X1 q X2 ;
X2 q fX1 ; X4 g;
X4 q fX2 ; X3 g j X1 ;
X3 q X4 j fX1 ; X2 g;
X5 q fX1 ; X2 g j fX3 ; X4 g
It turns out (but it is not obvious) that these conditions imply
that
fX ; X g q X j fX ; X g:
4
5
2
1
3
How do we nd these extra independence relations? The answer is \d-separation." d-separation can be summarized by three
rules. Consider the four DAG's in Figure 20.5 and the DAG in
Figure 20.6. The rst 3 DAG's in Figure 20.5 are non-colliders.
The DAG in the lower right of Figure 20.5 is a collider. The
DAG in Figure 20.6 is a collider with a descendent.
In what follows, we
implicitly
assume
that P is faithful
to G which means
that P has no
extra independence
relations other than
those logically implied by the Markov
Condition.
348
20. Directed Graphs
X
Y
Z
X
Y
Z
X
Y
Z
X
Y
Z
FIGURE 20.5. The rst three DAG's have no colliders. The fourth
DAG in the lower right corner has a collider at Y .
X
Y
Z
W
FIGURE 20.6. A collider with a descendent.
The rules of d-separation.
Consider the DAG's in Figures 20.5 and 20.6.
Rule (1). In a non-collider, X and Z are d-connected,
but they are d-separated given Y .
Rule (2). If X and Z collide at Y then X and Z are
d-separated but they are d-connected given Y .
Rule (3). Conditioning on the descendant of a collider
has the same eect as conditioning on the collider. Thus
in Figure 20.6, X and Z are d-separated but they are
d-connected given W .
Here is a more formal denition of d-separation. Let X and Y
be distinct vertices and let W be a set of vertices not containing
X or Y . Then X and Y are d-separated given W if there
20.4 More Independence Relations
X
U
S1
V
W
349
Y
S2
FIGURE 20.7. d-separation explained.
exists no undirected path U between X and Y such that (i)
every collider on U has a descendant in W and (ii) no other
vertex on U is in W . If U; V and W are distinct sets of vertices
and U and V are not empty, then U and V are d-separated given
W if for every X 2 U and Y 2 V , X and Y are d-separated
given W . Vertices that are not d-separated are said to be dconnected.
Example 20.6 Consider the DAG in Figure 20.7. From the d-
seperation rules we conclude that:
X and Y are d-separated (given the empty set)
X and Y are d-connected given fS1 ; S2 g
X and Y are d-separated given fS1 ; S2 ; V g
Theorem 20.7 (Spirtes, Glymour and Scheines) Let A, B
and C be disjoint sets of vertices. Then A q B j C if and only
if A and B are d-separated by C .
Example 20.8 The fact that conditioning on a collider creates
dependence might not seem intuitive. Here is a whimsical example from Jordan (2003) that makes this idea more palatable.
Your friend appears to be late for a meeting with you. There
are two explanations: she was abducted by aliens or you forgot
to set your watch ahead one hour for daylight savings time. See
Figure 20.8 Aliens and Watch are blocked by a collider which
implies they are marginally independent. This seems reasonable
since, before we know anything about your friend being late, we
would expect these variables to be independent. We would also
expect that P(Aliens = yesjLate = Y es) > P(Aliens = yes);
350
20. Directed Graphs
aliens
watch
late
FIGURE 20.8. Jordan's alien example. Was your friend kidnapped
by aliens or did you forget to set your watch?
learning that your friend is late certainly increases the probability that she was abducted. But when we learn that you forgot to
set your watch properly, we would lower the chance that your
friend was abducted. Hence, P(Aliens = yesjLate = Y es) 6=
P(Aliens = yesjLate = Y es; W atch = no). Thus, Aliens and
Watch are dependent given Late.
Graphs that look dierent may actually imply the same independence relations. If G is a DAG, we let I (G ) denote all the independence statements implied by G . Two DAG's G1 and G2 for
the same variables V are Markov equivalent if I (G1 ) = I (G2 ).
Given a DAG G , let skeleton(G ) denote the undirected graph obtained by replacing the arrows with undirected edges.
and G2 are Markov equivalent if
and only if (i) skeleton(G1 ) = skeleton(G2 ) and (ii) G1 and G2
have the same colliders.
Theorem 20.9 Two DAG's
G
1
Example 20.10 The rst three DAG's in Figure 20.5 are Markov
equivalent. The DAG in the lower right of the Figure is not
Markov equivalent to the others. 20.5
Estimation for DAG's
Let G be a DAG. Assume that all the variables V = (X1 ; : : : ; X )
are discrete. The probability function can be written
m
p(v ) =
Y
k
p(x
i
i=1
j ):
i
(20.3)
20.5 Estimation for DAG's
351
To estimate p(v ) we need to estimate p(x j ) for each i. Think
of the parents of X as one discrete variable Xe with many levels.
For example, suppose that X3 has parents X1 and X2 and that
X1 2 f0; 1; 2g while X2 2 f0; 1g. We can regard the parents X1
and X2 as a single variable Xe3 dened by
i
i
i
Xe3 =
i
8
1
>
>
>
>
>
>
<2
>
>
>
>
>
:
3
4
5
6
if
if
if
if
if
if
Hence we can write
p(v ) =
X1 = 0; X2 = 0
X1 = 0; X2 = 1
X1 = 1; X2 = 0
X1 = 1; X2 = 1
X1 = 2; X2 = 0
X1 = 2; X2 = 1:
Y
k
p(x j xe ):
i
(20.4)
i
i=1
Theorem 20.11 Let V1 ; : : : ; V be iid random vectors from disn
tribution p given in (20.4). The maximum likelihood estimator
of p is
pb(v ) =
Y
k
pb(x j xe )
i
(20.5)
i
i=1
where
#fi : X = x and Xe = xe g
:
#fi : Xe = xe g
Example 20.12 Let V = (X; Y; Z ) have the DAG in the top left of
Figure 20.5. Given n observations (X1 ; Y1; Z1 ); : : : ; (X ; Y ; Z ),
the mle of p(x; y; z ) = p(x)p(y jx)p(z jy ) is
#fi : X = xg
;
pb(x) =
pb(x j xe ) =
i
i
i
i
i
i
i
i
n
i
n
#fi : X = x and Y = y g
pb(y jx) =
;
#fi : Y = y g
i
i
i
and
pb(z jy ) =
#fi : Y = y and Z = z g
:
#fi : Z = z g
i
i
i
n
n
352
20. Directed Graphs
It is possible to extend these ideas to continuous random variables as well. For example, we might use some parametric model
p(xj ; ) for each condiitonal density. The likelihood function
is then
x
x
L() =
Y
n
p(V ; ) =
YY
n
m
i
i=1
p(X j ; )
ij
j
j
i=1 j =1
where X is the value of X for the ith data point and are the
parameters for the j th conditional density. We can then proceed
using maximum likelihood.
So far, we have assumed that the structure of the DAG is
given. One can also try to estimate the structure of the DAG itself from the data. For example, we could t every possible DAG
using maximum likelihood and use AIC (or some other method)
to choose a DAG. However, there are many possible DAG's so
you would need much data for such a method to be reliable.
Producing a valid, accurate condence set for the DAG structure would require astronomical sample sizes. DAG's are thus
most useful for encoding conditional independence information
rather than discovering it.
ij
20.6
j
j
Causation Revisited
We discussed causation in Chapter 19 using the idea of counterfactual random variables. A dierent approach to causation
uses DAG's. The two approaches are mathematically equivalent
though they appear to be quite dierent. In Chapter 19, the
extra element we added to clarify causation was the idea of a
counterfactual. In the DAG approach, the extra element is the
idea of intervention. Consider the DAG in Figure 20.9.
The probability function for a distribution consistent with
this DAG has the form p(x; y; z ) = p(x)p(y jx)p(z jx; y ). Here is
pseudo-code for generating from this distribution:
20.6 Causation Revisited
X
Y
353
Z
FIGURE 20.9. Conditioning versus intervening.
for i = 1; : : : ; n :
x < p (x )
y < p j (y jx )
z < p j (z jx ; y )
i
X
i
i
Y X
i
Z X;Y
i
i
i
i
i
Suppose we repeat this code many times yielding data (x1 ; y1 ; z1 ); : : : ; (x ; y ; z ).
Among all the times that we observe Y = y , how often is Z = z ?
The answer to this question is given by the conditional distribution of Z jY . Specically,
n
P(Z = z
jY = y)
=
P(Y = y; Z = z )
p(y; z )
=
P(Y = y )
p(y )
P
P
p(x; y; z )
p(x) p(y jx) p(z jx; y )
=
=
p(y )
p(y )
X
p(y jx) p(x) X
p(x; y )
=
p(z jx; y )
=
p(z jx; y )
p(y )
p(y )
x
X
x
=
x
p(z jx; y ) p(xjy ):
x
x
Now suppose we intervene by changing the computer code.
Specically, suppose we x Y at the value y . The code now
looks like this:
n
n
354
20. Directed Graphs
set Y = y
for i = 1; : : : ; n
x < p (x )
z < p j (z jx ; y )
i
X
i
i
Z X;Y
i
i
Having set Y = y , how often was Z = z ? To answer, note
that the intervention has changed the joint probability to be
p (x; z ) = p(x)p(z jx; y ):
The answer to our question is given by the marginal distribution
X X
p (z ) =
p (x; z ) =
p(x)p(z jx; y ):
x
x
We shall denote this as P(Z = z jY := y ) or p(z jY := y ). We call
P(Z = z jY = y ) conditioning by observation or passive
conditioning. We call P(Z = z jY := y ) conditioning by
intervention or active conditioning.
Passive conditioning is used to answer a predictive question
like:
\Given that Joe smokes, what is the probability he will get lung
cancer?"
Active conditioning is used to answer a causal question like:
\If Joe quits smoking, what is the probability he will get lung
cancer?"
Consider a pair (G ; P) where G is a DAG and P is a distribution for the variables V of the DAG. Let p denote the probability
function for P. Consider intervening and xing a variable X to
be equal to x. We represent the intervention by doing two things:
(1) Create a new DAG G by removing all arrows pointing
into X ;
(2) Create a new distribution p (v ) = P(V = v jX := x) by
removing the term p(xj ) from p(v ).
X
20.6 Causation Revisited
355
The new pair (G ; P ) represents the intervention \set X =
x."
Example 20.13 You may have noticed a correlation between rain
and having a wet lawn, that is, the variable \Rain" is not independent of the variable \Wet Lawn" and hence p (r; w) 6=
p (r)p (w) where R denotes Rain and W denotes Wet Lawn.
Consider the following two DAGS:
R;W
R
W
Rain
! Wet Lawn
Rain
Wet Lawn:
The rst DAG implies that p(w; r) = p(r)p(wjr) while the second implies that p(w; r) = p(w)p(rjw) No matter what the joint
distribution p(w; r) is, both graphs are correct. Both imply that
R and W are not independent. But, intuitively, if we want a
graph to indicate causation, the rst graph is right and the second is wrong. Throwing water on your lawn doesn't cause rain.
The reason we feel the rst is correct while the second is wrong is
because the interventions implied by the rst graph are correct.
Look at the rst graph and form the intervention W = 1 where
1 denotes \wet lawn." Following the rules of intervention, we
break the arrows into W to get the modied graph:
Rain
set
Wet Lawn =1
with distribution p (r) = p(r). Thus P(R = r j W := w) =
P(R = r ) tells us that \wet lawn" does not cause rain.
Suppose we (wrongly) assume that the second graph is the
correct causal graph and form the intervention W = 1 on the
second graph. There are no arrows into W that need to be broken
so the intervention graph is the same as the original graph. Thus
p (r) = p(rjw) which would imply that changing \wet" changes
\rain." Clearly, this is nonsense.
Both are correct probability graphs but only the rst is correct
causally. We know the correct causal graph by using background
knowledge.
356
20. Directed Graphs
Remark 20.14 We could try to learn the correct causal graph
from data but this is dangerous. In fact it is impossible with two
variables. With more than two variables there are methods that
can nd the causal graph under certain assumptions but they are
large sample methods and, furthermore, there is no way to ever
know if the sample size you have is large enough to make the
methods reliable.
We can use DAG's to represent confouding variables. If X is
a treatment and Y is an outcome, a confounding variable Z is
a variable with arrows into both X and Y ; see Figure 20.10. It
is easy to check, using the formalism of interventions, that the
following facts are true.
In a randomized study, the arrow betwen Z and X is broken. In this case, even with Z unobserved (represented by enclosing Z in a circle), the causal relationship between X and
Y is estimable because it can be shown that E (Y jX := x) =
E (Y jX = x) which does not involve the unobserved Z . In
an observational study,
with all confounders observed, we get
R
E (Y jX := x) = E (Y jX = x; Z = z )dF (z ) as in formula
(19.8). If Z is unobserved then we
R cannot estimate the causal
eect because E (Y jX := x) = E (Y jX = x; Z = z )dF (z )
involves the unobserved Z . We can't just use X and Y since in
this case. P(Y = y jX = x) 6= P(Y = y jX := x) which is just
another way of saying that causation is not association.
In fact, we can make a precise connection between DAG's and
counterfactuals as follows. Suppose that X and Y are binary.
Dene the confounding variable Z by
Z
Z
8
>
>
< 12
Z= 3
>
:
if
if
if
4 if
(C0 ; C1 ) = (0; 0)
(C0 ; C1 ) = (0; 1)
(C0 ; C1 ) = (1; 0)
(C0 ; C1 ) = (1; 1):
Z
X
Z
Y
X
357
Z
Y
X
FIGURE 20.10. Randomized study; Observational study with measured confounders; Observational study with unmeasured confounders. The circled variables are unobserved.
From this, you can make the correspondence between the DAG
approach and the counterfactual approach explicit. I leave this
for the interested reader.
20.7
There are a number of texts on DAG's including Edwards (1996)
and Jordan (2003). The rst use of DAG's for representing
causal relationships was by Wright (1934). Modern tratments
are contained in Spirtes, Glymour, and Scheines (1990) and
Pearl (2000). Robins, Scheines, Spirtes and Wasserman (2003)
discuss the problems with estimating causal structure from data.
20.8
Exercises
1. Consider the three DAG's in Figure 20.5 without a collider. Prove that X q Z jY .
2. Consider the DAG in Figure 20.5 with a collider. Prove
that X q Z and that X and Z are dependent given Y .
3. Let X 2 f0; 1g, Y 2 f0; 1g, Z 2 f0; 1; 2g. Suppose the
distribution of (X; Y; Z ) is Markov to:
X
!Y !Z
Y
358
20. Directed Graphs
Create a joint distribution p(x; y; z ) that is Markov to this
DAG. Generate 1000 random vectors from this distribution. Estimate the distribution from the data using maximum likelihood. Compare the estimated distribution to
the true distribution. Let = (000 ; 001 ; : : : ; 112 ) where
= P(X = r; Y = s; Z = t). Use the bootstrap to get
standard errors and 95 per cent condence intervals for
these 12 parameters.
rst
4. Let V = (X; Y; Z ) have the following joint distribution
X
Y jX=x
Z j X = x; Y = y
Bernoulli
1
2 e
4x
2
Bernoulli
e2( + ) 2
Bernoulli
1 + e2( + )
1 + e
4x
x
2
y
x
y
2
:
(a) Find an expression for P(Z = z j Y = y ). In particular,
nd P(Z = 1 j Y = 1).
(b) Write a program to simulate the model. Conduct a
simulation and compute P(Z = 1 j Y = 1) empirically.
Plot this as a function of the simulation size N . It should
converge to the theoretical value you computed in (a).
(c) Write down an expression for P(Z = 1 j Y := y ). In
particular, nd P(Z = 1 j Y := 1).
20.8 Exercises
359
(d) Modify your program to simulate the intervention \set
Y = 1." Conduct a simulation and compute P(Z = 1 j
Y := 1) empirically. Plot this as a function of the simulation size N . It should converge to the theoretical value
you computed in (c).
5. This is a continuous, Gaussian version of the last question.
Let V = (X; Y; Z ) have the following joint distribution
X
Y jX=x
Z j X = x; Y = y
Normal (0; 1)
Normal (x; 1)
Normal (y + x; 1):
Here, ; and are xed parameters. Economists refer to
models like this as structural equation models.
(a) Find
R an explicit expression for f (z j y) and E (Z j Y =
y ) = zf (z j y )dz .
(b) Find an explicit expression
for f (z j Y := y ) and then
R
nd E (Z j Y := y ) zf (z j Y := y )dy . Compare to (b).
(c) Find the joint distribution of (Y; Z ). Find the correlation between Y and Z .
(d) Suppose that X is not observed and we try to make
causal conclusions from the marginal distribution of (Y; Z ).
(Think of X as unobserved confounding variables.) In particular, suppose we declare that Y causes Z if 6= 0 and
360
20. Directed Graphs
we declare that Y does not cause Z if = 0. Show that
this will lead to erroneous conclusions.
(e) Suppose we conduct a randomized experiment in which
Y is randomly assigned. To be concrete, suppose that
X
Y
Z j X = x; Y = y
Normal(0; 1)
Normal(; 1)
Normal(y + x; 1):
Show that the method in (d) now yields correct conclusions i.e. = 0 if and only if f (z j Y := y ) does not depend
on y .
This is page 359
21
Nonparametric Curve Estimation
In this Chapter we discuss nonparametric estimation of probability density functions and regression functions which we refer
to as curve estimation.
In Chapter 8 we saw that it is possible to consistently estimate
a cumulative distribution function F without making any assumptions about F . If we want to estimate a probability density
function f (x) or a regression function r(x) = E (Y jX = x) the
situation is dierent. We cannot estimate these functions consistently without making some smoothness assumptions. Correspondingly, we will perform some sort of smoothing operation
on the data.
A simple example of a density estimator is a histogram,
which we discuss in detail in Section 21.2. To form a histogram
estimator of a density f , we divide the real line to disjoint sets
called bins. The histogram estimator is piecewise constant function where the height of the function is proportional to number
of observations in each bin; see Figure 21.3. The number of bins
is an example of a smoothing parameter. If we smooth too
360
21. Nonparametric Curve Estimation
gb(x)
This is a function of the data
This is the point at which we are
evaluating bg()
FIGURE 21.1. A curve estimate gb is random because it is a function
of the data. The point x at which we evaluate gb is not a random
variable.
much (large bins) we get a highly biased estimator while if we
smooth too little (small bins) we get a highly variable estimator.
Much of curve estimation is concerned with trying to optimally
balance variance and bias.
21.1 The Bias-Variance Tradeo
Let g denote an unknown function and let bgn denote an estimator of g. Bear in mind that bgn(x) is a random function
evaluated at a point x. bgn is random because it depends on
the data. Indeed, we could be more explicit and write bgn(x) =
hx (X ; : : : ; Xn ) to show that bgn (x) is a function of the data
X ; : : : ; Xn and that the function could be dierent for each x.
See Figure 21.1.
As a loss function, we will use the integrated squared error
(ISE):
Z
L(g; bgn) = (g (u) bgn(u)) du:
(21.1)
1
1
2
The risk or mean integrated squared error (MISE) is
R(f; fb) = E L(g; bg) :
(21.2)
21.2 Histograms
361
Lemma 21.1 The risk can be written as
R(g; bgn) =
where
Z
b (x) dx +
2
Z
v (x) dx
(21.3)
b(x) = E (bgn (x)) g (x)
(21.4)
is the bias of gbn (x) at a xed x and
v (x) = V (bgn(x)) = E (bgn(x)
E (b
gn(x))
2
)
(21.5)
is the variance of bgn (x) at a xed x.
In summary,
RISK = BIAS + VARIANCE:
2
(21.6)
When the data are over-smoothed, the bias term is large and
the variance is small. When the data are under-smoothed the opposite is true; see Figure 21.2. This is called the bias-variance
trade-o. Minimizing risk corresponds to balancing bias and
variance.
21.2 Histograms
Let X ; : : : ; Xn be iid on [0; 1] with density f . The restriction
to [0; 1] is not crucial; we can always rescale the data to be on
this interval. Let m be an integer and dene bins
1
1 ; B = 1 ; 2 ; : : : ; B = m 1 ; 1 : (21.7)
B = 0;
m
m
m m
m
1
2
Dene the binwidth h = 1=m, let j beRthe number of observations in Bj , let pbj = j =n and let pj = Bj f (u)du.
0.06
0.08
0.10
362
0.04
Risk
0.02
Variance
0.00
Bias squared
Optimal Smoothing
0.1
0.2
0.3
0.4
more smoothing −−−>
FIGURE 21.2. The Bias-Variance trade-o. The bias increases and
the variance decreases with the amount of smoothing. The optimal
amount of smoothing, indicated by the vertical line, minimizes the
risk = bias2 + variance.
0.5
21.2 Histograms
363
The histogram estimator is dened by
fbn (x) =
8
>
pb1 =h
>
>
< pb =h
2
.
..
>
>
>
: pb =h
m
x 2 B1
x 2 B2
...
x 2 Bm
which we can write more succinctly as
fbn (x)
=
n
X
pbj
j =1
h
I (x 2 Bj ):
(21.8)
R
Bj
To
understand the motivation for this estimator, let pj = f (u)du
and note that, for x 2 Bj and h small,
R
f (u)du f (x)h
p
b
p
fbn (x) = j j = Bj
f (x) = f (x):
h h
h
Example 21.2 Figure 21.3 shows three dierent histograms based
on n = 1; 266 data points from an astronomical sky survey.
Each data point represents the distance from us to a galaxy.
The galaxies lie on a \pencilbeam" pointing directly from the
Earth out into space. Because of the nite speed of light, looking at galaxies farther and farther away corresponds to looking
back in time. Choosing the right number of bins involves nding
a good tradeo between bias and variance. We shall see later
that the top left histogram has too few bins resulting in oversmoothing and too much bias. The bottom left histogram has too
many bins resulting in undersmoothing and too few bins. The
top right histogram is just right. The histogram reveals the presence of clusters of galaxies. Seeing how the size and number of
galaxy clusters varies with time, helps cosmologists understand
the evolution of the universe. The mean and variance of fbn(x) are given in the following
Theorem.
0
0
2
10
4
20
6
30
8
40
10
50
12
60
14
364
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
Just Right
0
−8
−10
−14
−12
20
40
cross validation score
60
−6
Oversmoothed
0.15
0.00
0.05
0.10
Undersmoothed
0.15
0.20
0
200
400
600
800
number of bins
FIGURE 21.3. Three versions of a histogram for the astronomy data.
The top left histogram has too few bins. The bottom left histogram
has too many bins. The top right histogram is just right. The lower,
right plot shows the estimated risk versus the number of bins.
1000
0.20
21.2 Histograms
365
21.3 Consider xed x and xed m, and let Bj be the
bin containing x. Then,
pj (1 pj )
pj
and
V (fbn (x)) =
:
(21.9)
E (fbn (x)) =
h
nh2
Theorem
Let's take a closer look at the bias-variance tradeo using
equation (21.9) . Consider some x 2 Bj . For any other u 2 Bj ,
f (u) f (x) + (u x)f 0 (x)
and so Z
Z 0
pj =
f (u)du f (x) + (u x)f (x) du
Bj
Bj
=
f (x)h + hf 0 (x) h j
Therefore, the bias b(x) is
p
b(x) = E (fbn (x)) f (x) = j f (x)
h f (x)h + hf 0 (x) h j
x
1
=
2
If xej is the center of the bin, then
Z
b (x) dx
2
Bj
Z
Bj
(f 0(xe
Z
1
j ))
0
Bj
h
= (f 0(xej )) 12
:
b (x)dx =
2
m Z
X
2
x
dx
2
x
dx
3
b (x)dx h (f 0 (xe
=1
1
h j
2
2
j =1 Bj
m
h2 X
= 12
j
1
2
Z
x :
x :
2
f (x)
(f 0(x))2 h j
2
Therefore,
1
2
h
f 0 (x) h j
1
2
j ))
2
m
X
j =1
h
(f 0(xej )) 12
h2
12
2
Z
1
0
3
(f 0(x)) dx:
2
366
Note that this increases as a function of h. Now consider the
variance. For h small, 1 pj 1, so
v (x)
pj
nh
2
f (x)h + hf 0 (x) h j
=
nh2
(x)
fnh
1
2
x
where we have kept only the dominant term. So,
Z
1
v (x)dx :
1
nh
0
Note that this decreases with h. Putting all this together, we
get:
Theorem
R
21.4 Suppose that (f 0 (u))2 du < 1. Then
Z
1
(
f 0 (u)) du + :
12
nh
h2
R(fbn ; f ) 2
(21.10)
The value h that minimizes (21.10) is
h =
1
R
n1=3
1=3
6
(f 0(u)) du
2
:
(21.11)
With this choice of binwidth,
R(fbn ; f ) where C = (3=4)2=3
R
C
n2=3
1=3
(f 0(u)) du
2
(21.12)
.
Theorem 21.4 is quite revealing. We see that with an optimally chosen binwidth, the MISE decreases to 0 at rate n = .
By comparison, most parametric estimators converge at rate
2 3
21.2 Histograms
367
n 1 . The slower rate of convergence is the price we pay for being nonparametric. The formula for the optimal binwidth h is
of theoretical interest but it is not useful in practice since it
depends on the unknown function f .
A practical way to choose the binwidth is to estimate the
risk function and minimize over h. Recall that the loss function,
which we now write as a function of h, is
L(h) =
=
Z
(fbn(x) f (x)) dx
2
Z
fbn (x) dx 2
Z
2
fbn (x)f (x)dx +
Z
f 2 (x) dx:
The last term does not depend on the binwidth h so minimizing
the risk is equivalent to minimizing the expected value of
J (h) =
Z
fbn2 (x) dx
Z
2 fbn(x)f (x)dx:
We shall refer to E (J (h)) as the risk,
although it diers from
R
the true risk by the constant term f (x) dx.
2
Denition 21.5 The cross-validation estimator of risk
is
Jb(h) =
Z 2
fbn (x) dx
n
2X
fb i (Xi )
n
i=1
(
)
(21.13)
where fb( i) is the histogram estimator obtained after removing the ith observation. We refer to Jb(h) as the crossvalidation score or estimated risk.
Theorem
21.6 The cross-validation estimator is nearly unbiased:
E (Jb(x))
E (J (x)):
368
In principle, we need to recompute the histogram n times to
compute Jb(h). Moreover, this has to be done for all values of h.
Fortunately, there is a shortcut formula.
Theorem
21.7 The following identity holds:
Jb(h)
m
n+1 X
pb2j :
(n 1) j=1
2
= (n 1)h
(21.14)
Example 21.8 We used cross-validation in the astronomy exam-
ple. The cross-validation function is quite at near its minimum. Any m in the range of 73 to 310 is an approximate minimizer but the resulting histogram does not change much over
this range. The histogram in the top right plot in Figure 21.3
was constructed using m = 73 bins. The bottom right plot shows
the estimated risk, or more precisely, Ab, plotted versus the number of bins.
Next we want some sort of condence set for f . Suppose fbn
is a histogram with m bins and binwidth h = 1=m. We cannot
realistically make condence statements about the ne details of
the true density f . Instead, we shall make condence statements
about f at the resolution of the histogram. To this end, dene
pj
for x 2 Bj
h
fe(x) =
where pj =
R
Bj
(21.15)
f (u)du which is a \histogramized" version of f .
21.9 Let m = m(n) be the number of bins in the histogram
Assume that m(n) ! 1 and m(n) log n=n ! 0 as
n ! 1. Dene
Theorem
fbn .
`(x) =
u(x) =
nq
max
q
o2
fbn (x) c; 0
2
b
fn (x) + c
(21.16)
21.2 Histograms
where
c=
Then,
z=(2m)
r
2
m
:
n
369
(21.17)
lim P `(x) fe(x) u(x) for all x 1 :
n!1
(21.18)
Here is an outline of the proof. A rigorous proof
requires fairly sophisticated tools. From the central limit
thep
pbj orem, pbj N (pj ; pj (1 pj )=n). By the delta method,
p
p
N ( pj ; 1=(4n)). Moreover, it can be shown that the pbj 's are
approximately independent. Therefore,
Proof.
p
p
2 n
pbj
pp Z
(21.19)
j
j
where Z ; : : : ; Zm N (0; 1). Let A be the event that `(x) fe(x) u(x) for all x. So,
1
A =
=
=
=
Therefore,
P(Ac )
=
=
=
n
`(x) fe(x) u(x) for all x
np
`(x) q
nq
o
p
u(x) for all x
fe(x)
q
fbn (x) c
q
n
fbn (x)
max
x fe(x)
q
e
f (x)
o
q
fbn (x) + c
o
for all x
o
c :
r
!
r q
q
p
b
p
j
j
fe(x) > c = P max
P max fbn (x)
>c
x
j h
h
p
p
p
p
P max pbj
j > c h = P max 2 n pbj
j > 2c
j
j
p
p
P max 2 n pbj
> z=(2m)
j
j
m
X
P max Zj > z=(2m)
P Zj > z=(2m)
j
j =1
p
p
p
j j
p
p
j j
p
p
hn
370
=
m
X
= :
m
j =1
Example 21.10 Figure 21.4 shows a 95 per cent condence enve-
lope for the astronomy data. We see that even with over 1,000
data points, there is still substantial uncertainty. 21.3 Kernel Density Estimation
Histograms are discontinuous. Kernel density estimators are
smoother and they converge faster to the true density than histograms.
Let X ; : : : ; Xn denote the observed data, a sample from f .
In this chapter, a kernel is dened
to be anyR smooth funcR
tion K suchR that K (x) 0, K (x) dx = 1, xK (x)dx = 0
and K x K (x)dx > 0. Two examples of kernels are the
1
2
2
Epanechnikov kernel
K (x) =
3
4
0
p
p
(1 x =5)= 5 jxj < 5
2
(21.20)
otherwise
and the Gaussian (Normal) kernel K (x) = (2)
=
1 2
e
x2 =2 .
Denition 21.11 Given a kernel K and a positive number
h, called the bandwidth, the kernel density estimator is dened to be
fb(x)
n
X
1
= n h1 K x h Xi :
i
(21.21)
=1
An example of a kernel density estimator is show in Figure
21.5. The kernel estimator eectively puts a smoothed out lump
of mass of size 1=n over each data point Xi. The bandwidth h
371
0
10
20
30
40
50
21.3 Kernel Density Estimation
0.00
0.05
0.10
0.15
FIGURE 21.4. 95 per cent condence envelope for astronomy data
using m = 73 bins.
0.20
0.00
0.05
0.10
0.15
372
−10
−5
0
FIGURE 21.5. A kernel density estimator fb. At each point x, fb(x)
is the average of the kernels centered over the data points Xi . The
data points are indicated by short vertical bars.
5
373
controls the amount of smoothing. When h is close to 0, fbn
consists of a set of spikes, one at each data point. The height of
the spikes tends to innity as h ! 0. When h ! 1, fbn tends
to a uniform density.
Example 21.12 Figure 21.6 shows kernel density estimators for
the astronomy data using for three dierent bandwidths. In each
case we used a Gaussian kernel. The properly smoothed kernel
density estimator in the top right panel shows similar structure
as the histogram. However, it is easier to see the clusters in the
kernel estimator. To construct a kernel density estimator, we need to choose
a kernel K and a bandwidth h. It can be shown theoretically
and empirically that the choice of K is not crucial. However, the
choice of bandwidth h is very important. As with the histogram,
we can make a theoretical statement about how the risk of the
estimator depends on the bandwidth.
Theorem
21.13 Under weak assumptions on f and K ,
R
Z
K (x)dx
1
00
b
R(f; fn ) K h (f (x)) +
4
nh
R
where K = x K (x)dx. The optimal bandwidth is
4
2
4
2
2
(21.22)
2
c1 2=5 c12=5 c3 1=5
h =
(21.23)
n1=5
R
R
R
where c1 = x2 K (x)dx, c2 = K (x)2 dx and c3 = (f 00 (x))2 dx.
With this choice of bandwidth,
c
R(f; fbn ) 44=5
n
for some constant c4 > 0.
n
Write Kh(x; X ) = h
P
1
b
i Kh (x; Xi ). Thus, E [fn (x)]
Proof.
K ((x X )=h) and fbn (x) =
= E [Kh(x; X )] and V [fbn(x)] =
1
0
0
2
10
20
4
30
6
40
8
50
10
60
12
70
374
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.15
0.20
just right
−7.4
−7.6
0
−7.8
200
400
600
800
cross−validation score
1000
−7.2
1200
−7.0
1400
oversmoothed
0.10
0.00
0.05
0.10
0.15
0.20
0.002
0.004
0.006
0.008
bandwidth
undersmoothed
FIGURE 21.6. Kernel density estimators and estimated risk for the
astronomy data.
0.010
0.012
n
1
V [Kh (x; X )].
E [Kh (x; X )]
Now,
=
=
Z
Z
Z
1 K x t f (t) dt
h
h
K (u)f (u hu) du
= K (u) f (u) hf (u) + 21 f 00 (u) + du
Z
1
00
= f (u) + 2 h f (u) u K (u) du 0
2
R
375
2
R
since K (x) dx = 1 and x K (x) dx = 0. The bias is
1
00
E [Kh (x; X )] f (x) k h f (x):
2
By a similar calculation,
R
f
(
x
)
K (x) dx
:
V [fb (x)] 2
2
2
n
n hn
The second result follows from integrating the squared bias plus
the variance. We see that kernel estimators converge at rate n = while histograms converge at the slower rate n = . It can be shown that,
under weak assumptions, there does not exist a nonparametric
estimator that converges faster than n = .
The expression for h depends on the unknown density f
which makes the result of little practical use. As with the histograms, we shall use cross-validation to nd a bandwidth. Thus,
we estimate the risk (up to a constant) by
4 5
2 3
4 5
Jb(h) =
Z
fb2 (x)dz
2 n1
n
X
i=1
fb i (Xi )
(21.24)
where fb i is the kernel density estimator after omitting the i
observation.
th
376
Theorem
21.14 For any h > 0,
E
h
i
Jb(h) = E [J (h)] :
Also,
Jb(h)
hn1
XX
2
i
j
X
K i
h
Xj
2 K (0)
+ nh
(21.25)
R
where K (x) = K (2) (x) 2K (x) and K (2) (z ) = K (z y )K (y )dy .
In particular, if K is a N(0,1) Gaussian kernel then K (2) (z ) is
the N (0; 2) density.
For large data sets,
We then choose the bandwidth hn that minimizes Jb(h). A
b
f and (21.25) justication for this method is given by the following remarkable
can be computed theorem due to Stone.
quickly using the
fast Fourier trans- Theorem 21.15 (Stone's Theorem.) Suppose that f is bounded.
form.
Let fbh denote the kernel estimator with bandwidth h and let hn
denote the bandwidth chosen by cross-validation. Then,
R
2
f (x) fbhn (x)
inf h
R
f (x)
dx
fbh (x)
2
dx
! 1:
P
(21.26)
Example 21.16 The top right panel of Figure 21.6 is based on
cross-validation. These data are rounded which problems for
cross-validation. Specically, it causes the minimizer to be h =
0. To overcome this problem, we added a small amount of random Normal noise to the data. The result is that Jb(h) is very
smooth with a well dened minimum.
Remark 21.17 Do not assume that, if the estimator fb is wiggly,
then cross-validation has let you down. The eye is not a good
judge of risk.
377
To construct condence bands, we use something similar to
histograms although the details are more complicated. The version described here is from Chaudhuri and Marron (1999).
378
Condence Band for Kernel Density Estimator
1. Choose an evenly spaced grid of points V = fv ; : : : ; vN g.
For every v 2 V , dene
1
v Xi
Y (v ) = K
:
(21.27)
1
i
h
h
Note that fbn(v) = Y n(v), the average of the Yi(v)0s.
2. Dene
s(v )
pn
P
where s2(v) = (n 1) 1 ni=1(Yi(v) Y n(v))2.
se (v ) =
(21.28)
3. Compute the eective sample size
ESS (v ) =
Pn
(v Xi )
i=1 K
h
K (0)
:
(21.29)
Roughly, this is the number of data points being averaged
together to form fbn(v).
4. Let V = fv 2 V : ESS (v) 5g. Now dene the number
of independent blocks m by
1 = ESS
m
n
where ESS is the average of ESS over V .
5. Let
q=
and dene
1
1 + (1 ) =m
2
1
(21.30)
(21.31)
`(v ) = fbn (v ) q se (v ) and u(v ) = fbn (v ) + q se (v )
(21.32)
where we replace `(v) with 0 if it becomes negative.
Then,
P
n
`(v ) f (v ) u(v ) for all v
o
1 :
379
Example 21.18 Figure 21.7 shows approximate 95 per cent con-
dence bands for the astronomy data. Suppose now that the data Xi = (Xi ; : : : ; Xid ) are d-dimensional.
The kernel estimator can easily be generalized to d dimensions.
Let h = (h ; : : : ; hd) be a vector of bandwidths and dene
1
1
fbn (x)
n
X
1
= n Kh(x Xi )
where
Kh (x Xi ) =
(21.33)
i=1
1
nh1 hd
( d
Y
j =1
K
xi
Xij
)
(21.34)
hj
where h ; : : : ; hd are bandwidths. For simplicity, we might take
hj = sj h where sj is the standard deviation of the j variable.
There is now only a single bandwidth h to choose. Using calculations like those in the one-dimensional case, the risk is given
by
1
th
1
R(f; fbn ) K
4
4
" d
X
R
j =1
h4j
Z
fjj2 (x)dx +
d
X
j 6=k
h2j h2k
Z
#
fjj fkk dx
K (x)dx
+ nh h
d
where fjj is the second partial derivative of f . The optimal
bandwidth satises hi c n = d ) leading to a risk of order
n = d . From this fact, we see that the risk increases quickly
with dimension, a problem usually called the curse of dimensionality. To get a sense of how serious this problem is, consider the following table from Silverman (1986) which shows the
sample size required to ensure a relative mean squared error less
than 0.1 at 0 when the density is multivariate normal and the
optimal bandwidth is selected.
2
1
1
4 (4+ )
1 (4+ )
0
10
20
30
40
380
0.00
0.05
0.10
0.15
FIGURE 21.7. 95 per cent Condence bands for kernel density estimate for the astronomy data.
0.20
21.4 Nonparametric Regression
381
Dimension Sample Size
1
4
2
19
3
67
4
223
5
768
6
2790
7
10,700
8
43,700
9
187,000
10
842,000
This is bad news indeed. It says that having 842,000 observations in a ten dimensional problem is really like having 4 observations.
Consider pairs of points (x ; Y ); : : : ; (xn; Yn) related by
1
1
Yi = r(xi ) + i
(21.35)
where E (i ) = 0 and we are treating the xi 's as xed. In nonparametric regression, we want to estimate the regression function
r(x) = E (Y jX = x).
There are many nonparametric regression estimators. Most
involve estimating r(x) by taking some sort of weighted average
of the Yi's, giving higher weight to those points near x. In particular, the Nadaraya-Watson kernel estimator is dened
by
382
rb(x) =
n
X
i=1
wi (x)Yi
(21.36)
where the weights wi(x) are given by
wi (x) =
x xi h
Pn
x xj :
K
j =1
h
K
(21.37)
The form of this estimator comes from rst estimating the
joint density f (x; y) using kernel density estimation and then
inserting this into:
R
Z
yf (x; y )dy
r(x) = E (Y jX = x) = yf (y jx)dy = R
:
f (x; y )dy
Theorem 21.19 Suppose that V (i ) = . The risk of the Nadaraya2
Watson kernel estimator is
Z
4 Z
h4
2
2
x K (x)dx
R(rbn ; r) 4
+
Z
2
R
f 0 (x) 2
00
0
r (x) + 2r (x)
dx
f (x)
K (x)dx
dx:
nhf (x)
2
(21.38)
The optimal bandwidth decreases at rate n
choice the risk decreases at rate n 4=5 .
=
1 5
and with this
In practice, to choose the bandwidth h we minimize the cross
validation score
Jb(h) =
n
X
i=1
(Yi rb i(xi ))
(21.39)
2
where rb i is the estimator we get by omitting the i variable.
An approximation to Jb is given by
n
X
1
b
(21.40)
J (h) = (Yi rb(xi )) :
K
i
1 Pn K xi xj th
2
2
(0)
=1
j =1
h
383
Example 21.20 Figures 21.8 shows cosmic microwave background
(CMB) data from BOOMERaNG (Nettereld et al. 2001), Maxima (Lee et al. 2001) and DASI (Halverson 2001). The data
consist of n pairs (x1 ; Y1 ); : : : ; (xn ; Yn) where xi is called the
multipole moment and Yi is the estimated power spectrum of
the temperature uctuations. What your seeing are sound waves
in the cosmic microwave background radiation which is the heat,
left over from the big bang. If r(x) denotes the true power spectrum then
Yi = r(xi ) + i
where i is a random error with mean 0. The location and size
of peaks in r(x) provides valuable clues about the behavior of
the early universe. Figure 21.8 shows the t based on crossvalidation as well as an undersmoothed and oversmoothed t.
The cross-validation t shows the presence of three well dened
peaks, as predicted by the physics of the big bang. The procedure for nding condence bands is similar to that
for density estimation. However, we rst need to estimate .
Suppose that the xi's are ordered. Assuming r(x) is smooth, we
have r(xi ) r(xi) 0 and hence
2
+1
Yi+1
h
Yi = r(xi+1 ) + i+1
i
h
r(xi ) + i
i
i
i
+1
and hence
Yi ) V (i+1
V (Yi+1
i ) = V (i+1 ) + V (i ) = 2 2 :
We can thus use the average of the n 1 dierences Yi
to estimate . Hence, dene
+1
2
b2 =
1
n 1
X
2(n 1) i (Yi
=1
+1
Yi)2 :
Yi
(21.41)
5000
4000
3000
2000
1000
0
0
1000
2000
3000
4000
5000
6000
6000
384
200
400
600
800
1000
200
400
800
1000
Oversmoothed
0
6e+05
5e+05
3e+05
1000
4e+05
2000
3000
estimated risk
4000
5000
7e+05
6000
8e+05
Undersmoothed
600
200
400
600
800
1000
20
40
60
80
bandwidth
Just Right (Using cross−valdiation)
FIGURE 21.8. Regression analysis of the CMB data. The rst t is
undersmoothed, the second is oversmoothed and the third is based
on cross-validation. The last panel shows the estimated risk versus
the bandwidth of the smoother. The data are from BOOMERaNG,
Maxima and DASI.
100
120
385
Condence Bands for Kernel Regression
Follow the same procedure as for kernel density estimators except, change the denition of se (v) to
v
u n
uX
se (v ) = bt
w2 (xi )
(21.42)
i=1
where b is dened in (21.41).
Example 21.21 Figure 21.9 shows a 95 per cent condence enve-
lope for the CMB data. We see that we are highly condence of
the existence and position of the rst peak. We are more uncertain about the second and third peak. At the time of this writing,
more accurate data are becoming available that apparently provide sharper estimates of the second and third peak. The extension to multiple regressors X = (X ; : : : ; Xp) is
straightforward. As with kernel density estimation we just replace the kernel with a multivariate kernel. However, the same
caveats about the curse of dimensionality apply. In some cases,
we might consider putting some restrictions on the regression
function which will then reduce the curse of dimensionality. For
example, additive regression is based on the model
1
Y=
p
X
j =1
rj (Xj ) + :
(21.43)
Now we only need to t p one-dimensional functions. The model
can be enriched by adding various interactions, for example,
Y=
p
X
j =1
rj (Xj ) +
X
j<k
rjk (Xj Xk ) + :
(21.44)
Additive models are usually t by an algorithm called backtting.
−1000
0
1000
2000
3000
4000
5000
6000
386
200
400
600
800
FIGURE 21.9. 95 per cent condence envelope for the CMB data.
1000
21.5 Appendix: Condence Sets and Bias
387
Backtting
1. Initialize r (x ), . . . , rp(xp ).
1
1
2. For j = 1; : : : ; p:
P
(a) Let i = Yi
s6 j rs (xi ).
(b) Let rj be the function estimate obtained by regressing
the i 's on the j covariate.
=
th
3. If converged STOP. Else, go back to step 2.
Additive models have the advantage that they avoid the curse
of dimensionality and they can be t quickly but they have one
disadvantage: the model is not fully nonparametric. In other
words, the true regression function r(x) may not be of the form
(21.43).
21.5 Appendix: Condence Sets and Bias
The condence bands we computed are not for the density function or regression function but rather, for the smoothed function.
For example, the condence band for a kernel density estimate
with bandwidth h is a band for the function one gets by smoothing the true function with a kernel with the same bandwidth.
Getting a condence set for the true function is complicated for
reasons we now explain.
First, let's review the parametric case. When estimating a
scalar quantity with an estimator b, the usual condence interval is of the form q
bz= sn where b is the maximum likelihood
estimator and sn = V (b) is the estimated standard error of the
estimator. Under standard regularity conditions, b N (; sn)
2
388
and
lim P b 2 sn b + 2 sn = 1 :
n!1
But let us take a closer look.
Let bn = E (bn) We know that b N (; sn ) but a more
accurate statement is b N ( + bn ; sn). We usually ignore the
bias term bn in large sample calculations but let's keep track of
it. The coverage of the usual condence interval is
P b
z=2 sn b + z=2 sn
=
z=2 P
=
P
!
z=2
sn
N (bn ; sn)
z=2 z=2
sn
bn
; 1 z=2 :
z=2 N
sn
P
=
b
In a well behaved parametric model, sn is of size n = bn is
of size n . Hence, bn=sn ! 0 so the last displayed probability
statement becomes P( z= N (0; 1) z= ) = 1 . What
makes parametric condence intervals have the right coverage
is the fact that bn=sn ! 0.
The situation is more complicated for kernel methods. Consider estimating a density f (x) at a single point x with a kernel
density estimator. Since fb(x) is a sum if iid random variables,
the central limit theorem implies that
c f (x)
b
f (x) N f (x) + b (x);
(21.45)
1 2
1
2
2
2
n
where
nh
1
(21.46)
2
R
R
is the bias, c = x K (x)dx and c = K (x)dx. The estimated standard error is
bn (x) = h2 f 00 (x)c1
1
2
2
2
(
c fb(x)
sn (x) = 2
nh
)1=2
:
(21.47)
389
Suppose we use the usual interval fb(x) z= sn. Arguing as
above, the coverage is approximately,
bn (x)
P
z= N
; 1 z= :
sn (x)
2
2
2
The optimal bandwidth is of the form h = cn = for some
constant c. If we plug h = cn = into (21.46) and(21.47) we
see that b (x)=sn(x) does not tend to 0. Thus, the condence
interval will have coverage less than 1 .
1 5
1 5
n
Two very good books on curve estimation are Scott (1992) and
Silverman (1986). I drew heavily on those books for this Chapter.
21.7 Exercises
1. Let X ; : : : ; Xn f and let fbn be the kernel density estimator using the boxcar kernel:
<x<
K (x) = 01 otherwise
:
1
1
2
1
2
(a) Show that
E (fb(x))
and
V (fb(x)) =
1
nh2
2
4
Z
1
=
h
Z x+(h=2)
x (h=2)
x+(h=2)
x (h=2)
f (y )dy
f (y )dy
Z x+(h=2)
x (h=2)
f (y )dy
!2 3
5:
390
(b) Show that if h ! 0 and nh ! 1 as n ! 1 then
fbn (x) ! f (x).
P
2. Get the data on fragments of glass collected in forensic
work. You need to download the MASS library for R then:
library(MASS)
data(fgl)
x <- fgl[[1]]
help(fgl)
The data are also on my website. Estimate the density
of the rst variable (refractive index) using a histogram
and use a kernel density estimator. Use cross-validation
to choose the amount of smoothing. Experiment with different binwidths and bandwidths. Comment on the similarities and dierences. Construct 95 per cent condence
bands for your estimators.
3. Consider the data from question 2. Let Y be refractive
index and let x be aluminium content (the fourth variable). Do a nonparametric regression to t the model Y =
f (x) + . Use cross-validation to estimate the bandwidth.
Construct 95 per cent condence bands for your estimate.
21.7 Exercises
391
8. Consider regression data (x ; Y ); : : : ; (xn; Yn). Suppose that
0 xi 1 for all i. Dene bins Bj as in equation (21.7).
For x 2 Bj dene
rbn (x) = Y j
where Y j is the mean of all the Yi's corresponding to those
xi 's in Bj . Find the approximate risk of this estimator.
From this expression for the risk, nd the optimal bandwidth. At what rate does the risk go to zero?
1
1
9. Show that with suitable smoothness assumptions on r(x),
b in equation (21.41) is a consistent estimator of .
2
2
This is page 393
22
Smoothing Using Orthogonal
Functions
In this Chapter we study a dierent approach to nonparametric
curve estimation based on orthogonal functions. We begin
with a brief introduction to the theory of orthogonal functions.
Then we turn to density estimation and regression.
L2 Spaces
Let v = (v ; v ; v ) denote a three dimensional vector, that is,
a list of three real numbers. Let V denote the set of all such
vectors. If a is a scalar (a number) and v is a vector, we dene
av = (av ; av ; av ). The sum of vectors v and w is dened by
v + w = (v + w ; v + w ; v + w ). The inner
product between
P
two vectors v and w is dened by hv; wi = i viwi. The norm
(or length) of a vector v is dened by
22.1
Orthogonal Functions and
1
1
2
3
2
1
3
1
2
2
3
p
3
jjvjj = hv; vi =
3
=1
v
u 3
uX
t
vi2 :
i=1
(22.1)
394
22. Smoothing Using Orthogonal Functions
Two vectors are orthogonal (or perpendicular) if hv; wi = 0.
A set of vectors are orthogonal if each pair in the set is orthogonal. A vector is normal if jjvjj = 1.
Let = (1; 0; 0), = (0; 1; 0), = (0; 0; 1). These vectors
are said to be an orthonormal basis for V since they have the
following properties:
(i) they are orthogonal;
(ii) they are normal;
(iii) they form a basis for V which means that if v 2 V then v
can be written as a linear combination of , , :
1
2
3
1
v=
X
3
j =1
where
j j
2
3
j = hj ; v i:
(22.2)
For example, if v = (12; 3; 4) then v = 12 +3 +4 . There
are other
orthonormal
bases for
V , for example,
1
1
1
1
1
1
1
2
= p3 ; p3 ; p3 ; = p2 ; p2 ; 0 ; = p6 ; p6 ; p6 :
You can check that these three vectors also form an orthonormal
basis for V . Again, if v is any vector then we can write
1
2
2
1
v=
3
X
j =1
j
3
3
where
j
j = h j ; v i:
For example, if v = (12; 3; 4) then
v = 10:97 + 6:36 + 2:86 :
Now we make the leap from vectors to functions. Basically, we
just replace vectors with functions and sums with integrals. Let
L (a;Rb) denote all functions dened on the interval [a; b] such
that ab f (x) dx < 1:
1
2
3
2
2
L2 [a; b] = f : [a; b] ! R ;
Z b
a
f (x) dx < 1 :
2
(22.3)
22.1 Orthogonal Functions and L2 Spaces
395
We sometimes write L instead of L (a; b). RThe inner product
between two functions f; g 2 L is dened by f (x)g(x)dx. The
norm of f is
sZ
f (x) dx:
(22.4)
jjf jj =
2
2
2
2
R
Two functions are orthogonal if f (x)g(x)dx = 0. A function
is normal if jjf jj = 1.
R
A sequence of functions
;
;
;
:
:
:
is
orthonormal
if
j (x)dx =
R
1 for each j and i(x)j (x)dx = 0 for i 6= j . An orthonormal sequence is complete if the only function that is orthogonal to each j is the zero function. In this case, the functions
; ; ; : : : form in basis, meaning that if f 2 L then f can The
the
be written as
1
1
2
2
2
3
3
f (x) =
equality in
displayed
equationR
means
that
(f (x)
2
1
X
j =1
j j (x);
where
Parseval's relation says that
Z
jjf jj f (x)dx =
2
2
where = ( ; ; : : :).
1
j =
1
X
j =1
Z b
a
f (x)j (x)dx:
(22.5)
(22.6)
j2 jj jj2
2
Example 22.1 An example of an orthonormal basis for L2 [0; 1]
is the
dened as follows. Let 1 (x) = 1 and for
j 2 dene
p
cosine basis
j (x) =
2 cos((j 1)x):
The rst six functions are plotted in Figure 22.1.
Example 22.2 Let
2
:1
f (x) = x(1 x) sin
(x + :05)
p
(22.7)
fn (x))2 dx !
where
fn (x)
Pn
j =1 j j (x).
0
=
−1.5
0.6
−1.0
0.8
−0.5
1.0
0.0
0.5
1.2
1.0
1.4
1.5
396
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
0.0
FIGURE 22.1. The rst six functions in the cosine basis.
22.1 Orthogonal Functions and L2 Spaces
397
which is called the \doppler function." Figure 22.2 shows f (top
left) and its approximation
fJ (x) =
J
X
j =1
j j (x)
with J equal to 5 (top right), 20 (bottom left) and 200 (bottom
right). As J increases
we see that fJ (x) gets closer to f (x). The
R1
coeÆcients j = 0 f (x)j (x)dx were computed numerically. Example 22.3 The
ne by
Pj (x) =
Legendre polynomials on [ 1; 1] are de-
1
dj 2
2j j ! dxj (x
1)j ;
j = 0; 1; 2; : : :
(22.8)
It can be shown that these functions are complete and orthogonal
and that
Z 1
2 :
(22.9)
2j + 1
p
It follows that the functions j (x) = (2j + 1)=2Pj (x), j =
0; 1; : : : form an orthonormal basis for L ( 1; 1). The rst few
1
Pj2(x)dx =
2
Legendre polynomials are
P0 (x)
P1 (x)
=
=
P (x) =
P (x) =
2
3
1
x
1 3x 1
2
1 5x 3x:
2
2
3
These polynomials may be constructed explicitly using the following recursive relation:
Pj +1 (x) =
(2j + 1)xPj (x)
j+1
jPj
1
(x) : (22.10)
−0.4
−0.1
−0.2
0.0
0.0
0.1
0.2
0.2
0.4
398
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−0.4
−0.4
−0.2
−0.2
0.0
0.0
0.2
0.2
0.4
0.4
0.0
FIGURE 22.2. Approximating
the doppler function
p
2:1
f (x) = x(1 x) sin (x+:05) with its expansion in
the cosine basis. The P
function f (top left) and its approximation fJ (x) = Jj=1 j j (x) with J equal to 5
(top right), 20 (bottom
R 1 left) and 200 (bottom right). The
coeÆcients j = 0 f (x)j (x)dx were computed numerically.
22.2 Density Estimation
399
The coeÆcients ; ; : : : are related to the smoothness of
the function f . To see why, note that if f is smooth, then
its
derivatives will be nite. Thus we expect that, for some k,
R
(f k (x)) dx < 1 where f k is ther k derivative
of f . Now
P1
consider the cosine basis (22.7) and let f (x) = j j j (x).
Then,
1
1
( )
2
2
( )
th
0
=1
Z
1
0
(f k (x)) dx = 2
( )
2
P1 2
j =1 j
1
X
j =1
j2 ( (j
1)) k :
2
The only way that
((j 1)) k can be nite is if the
j 's get small when j gets large. To summarize:
2
If the function f is smooth then the coecients j will
be small when j is large.
For the rest of this chapter, assume we are using the cosine
basis unless otherwise specied.
22.2
Density Estimation
Let X ; : : : ; Xn be observations from a distribution on [0; 1]
with density f . Assuming f 2 L we can write
iid
1
2
f (x) =
1
X
j =1
j j (x)
where ; ; : : : is an orthonormal basis. Dene
n
X
1
b
=
(X ):
1
2
j
Theorem
n
i=1
j
i
(22.11)
22.4 The mean and variance of bj are
E
bj
V bj
=
=
j
j2
n
(22.12)
(22.13)
400
where
Z
= V (j (Xi)) = (j (x) j ) f (x)dx:
(22.14)
We have
n
X
1
b
E j
= n E (j (Xi))
i
= EZ (j (X ))
= j (x)f (x)dx = j :
The calculation for the variance is similar. Hence, bj isPan unbiased estimate of j . It is tempting to
estimate f by 1j bj j (x) but this turns out to have a very
high variance. Instead, consider the estimator
j
2
2
Proof.
=1
1
=1
( )=
fb x
J
X
j =1
bj j (x):
(22.15)
The number of terms J is a smoothing parameter. Increasing J
will decrease bias while increasing variance. For technical reasons, we restrict J to lie in the range
1J p
p
where p = p(n) = n. To emphasize the dependence of the risk
function on J , we write the risk function as R(J ).
Theorem
22.5 The risk of fb is given by
R(J ) =
J
X
j2
j =1
n
+
1
X
j =J +1
(22.16)
j2 :
In kernel estimation, we used cross-validation to estimate the
risk. In the orthogonal function approach, we instead use the
risk estimator
Rb
(J ) =
J
X
bj2
j =1
n
+
p X
j =J +1
bj2
bj2
n
+
(22.17)
401
where a = maxfa; 0g and
+
n X
1
j (Xi )
bj =
n 1i
2
(22.18)
bj :
2
=1
To motivate this estimator, note that bj is an unbiased estmate
of j and bj bj is an unbiased estimator of j . We take the
positive part of the latter term since we know that j cannot be
negative. We now choose 1 Jb p to minimize Rb(f;b f ). Here
is a summary.
2
2
2
2
2
2
Summary of Orthogonal Function Density Estimation
1. Let
bj
n
X
1
=
(X ):
n
i=1
j
i
2. Choose Jb to minimize Rb(J ) over 1 J p = pn where
Rb(J ) =
and
J
X
bj2
j =1
+
n
p X
j =J +1
bj
2
n X
1
bj =
j (Xi )
n 1i
2
bj2
n
+
2
bj :
=1
3. Let
( )=
fb x
Jb
X
j =1
bj j (x):
The estimator fbn can be negative. If we are interested in
exploring the shape of f , this is not a problem. However, if
we need our estimate to be a probability density function, we
402
can truncate the estimate
and then normalize it. That, we take
R
fb = maxffbn (x); 0g= maxffbn(u); 0gdu.
Now let us construct a condence band for f . Suppose we
estimate f using JPorthogonal functions. We are essentially
estimating
fe(x) = Jj j j (x) not the true density f (x) =
P1
j j j (x). Thus, the condence band should be regarded as
a band for fe(x).
Theorem 22.6 An approximate 1 condence band for fe is
(`(x); u(x)) where
`(x) = fbn (x) c; u(x) = fbn (x) + c
(22.19)
1
0
=1
=1
where
JK
c= p
n
and
p
1 + p2z
s
2
J
(22.20)
K = 1max
max jj (x)j:
j J x
For the cosine basis, K
p
= 2.
Example 22.7 Let
5
1 X (x; ; :1)
f (x) = (x; 0; 1) +
j
6
6j
5
=1
where (x; ; ) denotes a Normal density with mean and
standard deviation , and (1 ; : : : ; 5 ) = ( 1; 1=2; 0; 1=2; 1).
Marron and Wand (1990) call this \the claw" although I prefer
the \Bart Simpson." Figure 22.3 shows the true density as well
as the estimated density based on n = 5; 000 observations and
a 95 per cent condence band. The density has been rescaled to
have most of its mass between 0 and 1 using the transformation
y = (x + 3)=6.
403
0
1
2
3
4
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0
1
2
3
4
0.0
FIGURE 22.3. The top plot is the true density for the Bart Simpson
distribution (rescaled to have most of its mass between 0 and 1). The
bottom plot is the orthogonal function density estimate and 95 per
cent condence band.
404
22.3
Regression
Consider the regression model
Yi = r(xi ) + i ; i = 1; : : : ; n
(22.21)
where the i are independent with mean 0 and variance . We
will initially focus on the special case where xi = i=n. We assume
that r 2 L [0; 1] and hence we can write
2
2
r(x) =
1
X
j =1
j j (x)
where
Z
j =
1
0
r(x)j (x)dx
(22.22)
where ; ; : : : where is an orthonormal basis for [0; 1].
Dene
n
1X
Yi j (xi ); j = 1; 2; : : :
(22.23)
bj =
n
1
2
i=1
Since bj is an average, the central limit theorem tells us that bj
will be approximately normally distributed.
Theorem
22.8
2
bj N j ;
:
n
(22.24)
The mean of bj is
n
n
X
X
1
1
b
E (Y ) (x ) =
r(x ) (x )
E ( ) =
Proof.
j
i
n
Z i=1
j
i
n
r(x)j (x)dx = j
i=1
i
j
i
where the approximate
equality follows
from the denition of a
R
P
Riemann integral: i n h(xi) ! h(x)dx where n = 1=n.
The variance is
n
X
1
b
V ( ) =
V (Y ) (x )
1
0
j
n2
i=1
i
2
j
i
22.3 Regression
=
405
n
n
2 X
2 1 X
2
(
x
)
=
2 (x )
n2 i=1 j i
n n i=1 j i
Z
2
2
2
j (x)dx =
n
n
since R j (x)dx = 1. As we did for density estimation, we will estimate r by
2
rb(x) =
Let
R(J ) = E
Z
J
X
j =1
bj j (x):
(r(x)
rb(x))2 dx
be the risk of the estimator.
P
Theorem 22.9 The risk R(J ) of the estimator rbn (x) = Jj
=1
is
R(J ) =
1
J 2 X
+
2:
n j =J +1 j
bj j (x)
(22.25)
To estimate for = V ar(i ) we use
2
b2 =
n
n X
b2
k i=n k+1 j
(22.26)
where k = n=4. To motivate this estimator, recall that if f is
smooth then j p0 for large j . So, for j k, bj N (0; =n).
Thus, bj bj = n for for j k, where bj N (0; 1). Therefore,
2
b
2
=
n
n X
b2
k i=n k+1 j
n
n X
k i=n k+1
pn bj
2
406
=
=
n
2 X
b2
k i=n k+1 j
2 2
k k
since a sum of k Normals has a k distribution. Now E (k ) = k
and hence E (b ) . Also, V (k ) = 2k and hence V (b ) ( =k )(2k) = (2 =k) ! 0 as n ! 1. Thus we expect b to be
a consistent estimator of . There is nothing special about the
choice k = n=4. Any k that increases with n at an appropriate
rate will suÆce.
We estimate the risk with
2
2
4
2
2
2
2
2
4
2
2
n
b2 X
Rb(J ) = J +
b2
n j =J +1 j
b2
n
:
+
(22.27)
We are now ready to give a complete description of the method,
which Beran (2000) calls REACT (Risk Estimation and Adaptation by Coordinate Transformation).
22.3 Regression
407
Orthogonal Series Regression Estimator
1. Let
bj
2. Let
n
X
1
= n Yij (xi ); j = 1; : : : ; n:
i=1
n
n X
b2
b =
k i=n k+1 j
(22.28)
2
where k n=4.
3. For 1 J n, compute the risk estimate
Rb
n
b2 X
(J ) = J n +
bj2
j =J +1
b2
n
:
+
4. Choose Jb 2 f1; : : : ng to minimize Rb(J ).
5. Let
Jb
X
fb(x) =
bj j (x):
j =1
Example 22.10 Figure 22.4 shows the doppler function f and
n = 2; 048 observations generated from the model
Yi = r(xi ) + i
N (0; (:1)2 ). Then gure shows the true
where xi = i=n, i function, the data, the estimated function and the estimated risk.
The estimated gure was based on Jb = 234 terms.
Finally, we turn to condence bands. As before, these bands
are not really for the true function r(xP) but rather for the
smoothed version of the function re(x) = Jjb j j (x).
=1
−0.4
−0.5
−0.2
0.0
0.0
0.2
0.5
0.4
408
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.04
0.00
−0.4
0.02
−0.2
0.0
Estimated Risk
0.2
0.06
0.4
0.08
0.0
0
500
1000
J
FIGURE 22.4. The doppler test function.
1500
2000
22.3 Regression
409
22.11
Theorem
Suppose the estimate rb is based on J terms and
b is dened as in equation (22.28). Assume that J < n k + 1.
An approximate 1 condence band for re is (`; u) where
`(x)
u(x)
where
=
p
s
p
s
rbn(x) K J
=
rbn(x) + K J
z b J b2
pn + n
(22.29)
z b J b2
pn + n
(22.30)
K = 1max
max jj (x)j
j J x
and
b
2
2
J b
J
= n 1+ k
4
(22.31)
and k = n=4 is
p from equation (22.28). In the cosine basis, we
may use K = 2.
Example 22.12 Figure 22.5 shows the condence envelope for the
doppler signal. Th rst plot is based on J = 234 (the value of
J that minimizes
the estimated risk). The second is based on
p
J = 45 n. Larger J yields a higher resolution estimator
at the cost of large condence bands. Smaller J yields a lower
resolution estimator but has tighter condence bands. So far, we have assumed that the xi's are of the form f1=n; 2=n; : : : ; 1g.
If the xi's are on interval [a; b] then we can rescale them so that
are in the interval [0; 1]. If the xi 's are not equally spaced the
methods we have discussed still apply so long as the xi 's \ll
out" the interval [0,1] in such a way so as to not be too clumped
together. If we want to treat the xi's as random instead of xed,
then the method needs signicant modications which we shall
not deal with here.
−1.0
−0.5
0.0
0.5
1.0
410
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−1.0
−0.5
0.0
0.5
1.0
0.0
FIGURE 22.5. The condence envelope for the doppler
test function using n = 2; 048 observations. The top plot
shows the estimate and envelope using J = 234 terms.
The bottom plot shows the estimate and envelope using
J = 45 terms.
22.4 Wavelets
22.4
411
Wavelets
Suppose there is a sharp jump in a regression function f at some
point x but that f is otherwise very smooth. Such a function
f is said to be spatially inhomogeneous. See Figure 22.6 for
an example.
It is hard to estimate f using the methods we have discussed
so far. If we use a cosine basis and only keep low order terms, we
will miss the peak; if we allow higher order terms we will nd
the peak but we will make the rest of the curve very wiggly.
Similar comments apply to kernel regression. If we use a large
bandwidth, then we will smooth out the peak; if we use a small
bandwidth, then we will nd the peak but we will make the rest
of the curve very wiggly.
One way to estimate inhomogeneous functions is to use a more
carefully chosen basis that allows us to place a \blip" in some
small region without adding wiggles elsewhere. In this section,
we describe a special class of bases called wavelets, that are
aimed at xing this problem. Statistical inference suing wavelets
is a large and active area. We will just discuss a few of the main
ideas to get a avor of this approach.
We start with a particular wavelet called the Haar wavelet.
The Haar father wavelet or Haar scaling function is dened by
0x<1
(x) = 10 ifotherwise
(22.32)
:
The mother Haar wavelet is dened by
(x) = 11 ifif 0 < xx 1:;
(22.33)
For any integers j and k dene
j;k (x) = 2j= (2j x k) and j;k (x) = 2j= (2j x k): (22.34)
The function j;k has the same shape as but it has been
rescaled by a factor of 2j= and shifted by a factor of k.
1
2
2
1
2
2
2
0.4
0.6
0.8
1.0
1.2
1.4
412
0.0
0.2
0.4
0.6
FIGURE 22.6. An inhomogeneous function. The function is smooth except for a sharp peak in one place.
0.8
1.0
22.4 Wavelets
413
See Figure 22.7 for some examples of Haar wavelets. Notice
that for large j , j;k is a very localized function. This makes it
possible to add a blip to a function in one place without adding
wiggles elsewhere. Increasing j is like looking in a microscope at
increasing degrees of resolution. In technical terms, we say that
wavelets provide a multiresolution analysis of L (0; 1).
Let
Wj = f jk ; k = 0; 1; : : : ; 2j 1g
be the set of rescaled and shifted mother wavelets at resolution
j.
2
Theorem
22.13 The set of functions
n
; W0 ; W1 ; W2 ; : : : ;
o
is an orthonormal basis for L2 (0; 1).
It follows from this theorem that we can expand any function
f 2 L (0; 1) in this basis. Because each Wj is itself a set of
functions, we write the expansion as a double sum:
2
f (x) = (x) +
where
=
Z
1
0
1 2X1
X
j
j =0 k=0
j;k
f (x)(x) dx; j;k =
j;k
Z
1
0
(x)
f (x)
(22.35)
j;k
(x) dx:
We call the scaling coeÆcient and the j;k's are called
the detail coeÆcients. We call the nite sum
fe(x) = (x) +
j 1
J 1 2X
X
j =0 k=0
j;k
j;k
(x)
(22.36)
2
0
−2
−4
−4
−2
0
2
4
4
414
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
2
0
−2
−4
−4
−2
0
2
4
0.2
4
0.0
FIGURE 22.7. Some Haar wavelets. Top left: the father
wavelet (x); top right: the mother wavelet (x); bottom left: 2;2 (x); bottom right: 4;10 (x).
22.4 Wavelets
415
the resolution J approximation to f . The total number of
terms in this sum is
1+
J 1
X
j =0
2j = 1 + 2J 1 = 2J :
Example 22.14 Figure 22.8 shows the doppler signal, and its resolution J approximation for J = 3; 5 and J = 8. Haar wavelets are localized, meaning that they are zero outside an interval. But they are not smooth. This raises the question of whether there exist smooth, localized wavelets that from
an orthonormal basis. In 1988, Ingrid Daubechie showed that
such wavelets do exist. These smooth wavelets are diÆcult to
describe. They can be constructed numerically but there is no
closed form formual for the smoother wavelets. To keep things
simple, we will continue to use Haar wavelets. For your interest,
gure 22.9 shows an example of a smooth wavelet called the
\symmlet 8."
We can now use wavelets to do density estimation and regression. We shall only discuss the regression problem Yi =
r(xi ) + i where i N (0; 1) and xi = i=n. To simplify the
discussion we assume that n = 2J for some J .
There is one major dierence between estimation using wavelets
instead of a cosine (or polynomial) basis. With the cosine basis,
we used all the terms 1 j J for some J . With wavelets,
we use a method called thresholding where we keep a term in
the function approximation if its coeÆcient is large, otherwise,
we throw we don't use that term. There are many versions of
thresholding. The simplest is called hard, universal thresholding.
0.0
0.0
−0.4
−0.4
−0.2
−0.2
f
0.2
0.2
0.4
0.4
416
0.0
0.2
0.4
0.6
0.8
1.0
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−0.4
−0.4
−0.2
−0.2
0.0
0.0
0.2
0.2
0.4
0.4
x
0.0
0.2
0.4
FIGURE 22.8. PThe Pdoppler signal and its reconstruction
fe(x) = (x) + Jj =01 k j;k j;k (x) based on J = 3, J = 5 and
J = 8.
417
−1.0
0.0
0.5
1.0
1.5
22.4 Wavelets
0
2
4
−0.2
0.2
0.6
1.0
−2
0
2
4
FIGURE 22.9. The symmlet 8 wavelet. The top plot is the mother
and the bottom plot is the father.
6
418
Haar Wavelet Regression
1. Let J = log (n) and dene
1 X (x )Y and D = 1 X
b =
k i i
j;k
n
n
2
i
j;k
i
(xi )Yi (22.37)
for 0 j J 1.
2. Estimate by
p
median
j
DJ ;k j : k = 0; : : : ; 2J
b = n 0:6745
1
3. Apply universal thresholding:
bj;k =
4. Set
(
Dj;k
0
if jDj;kj > b
otherwise:
( ) = b(x) +
fb x
q
j 1
J 1 2X
X
j =j0 k=0
2 log
n
bj;k
n
)
j;k
1
1 :
(22.38)
(22.39)
(x):
In practice, we do not compute Sk and Dj;k using (22.37). Instead, we use the discrete wavelet transform (DWT) which
is very fast. For Haar wavelets, the DWT works as follows.
22.4 Wavelets
419
DWT for Haar Wavelets
Let y be the vector of Yi's (length n) and let J =
log (n). Create a list D with elements
D[[0]]; : : : ; D[[J 1]]:
Set:
p
2
temp < y= n:
Then do:
for(j
m
I
D[[j ]]
in
<
<
<
temp <
(J 1) : 0)f
2j
(1
: m)
p
temp[2 I ] temp[(2 I ) 1] = 2
p
temp[2 I ] + temp[(2 I ) 1] = 2
g
Remark 22.15 If you use a programming language that does not
allow a 0 index for a list, then index the list as
D[[1]]; : : : ; D[[J ]]
and replace the second last line with
p
1] = 2
The estimate for probably looks strange. It is similar to the
estimate we used for the cosine basis but it is designed to be
insensitive to sharp peaks in the function.
To understand the intuition behind universal thresholding,
consider what happens when there is no signal, that is, when
j;k = 0 for all j and k.
D[[j + 1]] < temp[2 I ] temp[(2 I )
420
22.16
Theorem
Suppose that j;k = 0 for all j and k and let
bj;k be the universal threshold estimator. Then
(
P bj;k
= 0 for all j; k) ! 1
as n ! 1.
To simplify the proof, assume that is known. Now
Dj;k N (0; =n). We will need Mill's inequality
: if Z p
t
=
N (0; 1) then P(jZ j > t) (c=t)e
where c = 2= is a
constant. Thus,
X
P(max jDj;k j > ) P(jDj;k j > )
j;k
p
pn X
njDj;k j
>
P
Proof.
2
2 2
=
j;k
X
c
pn exp
j;k
c
p2 log
! 0:
n
1 n
2
2
2
Example 22.17 Consider Yi = r(xi ) + i where f is the doppler
signal, = :1 and n = 2048. Figure 22.10 shows the data and
the estimated function using universal thresholding. Of course,
the estimate is not smooth since Haar wavelets are not smooth.
Nonetheless, the estimate is quite accurate. 22.5
A reference for orthogonal function methods is Efromovich (1999).
See also Beran (2000) and Beran and Dumbgen (1998). An introduction to wavelets is given in Ogden (1997). A more advanced treatment can be found in Hardle, Kerkyacharian, Picard and Tsybakov (1998). The theory of statistical estimation
421
−0.5
0.0
0.5
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
−0.4
−0.2
0.0
0.2
0.4
0.0
FIGURE 22.10. Estimate of the Doppler function using Haar
wavelets and universal thresholding.
422
using wavelets has been developed by many authors, especially
David Donoho and Ian Johnstone. There is a series of papers especially worth reading: Donoho and Johnstone (1994), Donoho
and Johnstone (1995a), Donoho and Johnstone (1995b), and
Donoho and Johnstone (1998).
22.6
Exercises
3. Let
1
1
1
1
1
1
1
2
= p3 ; p3 ; p3 ; = p2 ; p2 ; 0 ; = p6 ; p6 ; p6 :
Show that these vectors have norm 1 and are orthogonal.
4. Prove Parseval's relation equation (22.6).
5. Plot the rst ve Legendre polynomials. Very, numerically,
that they are orthonormal.
6. Expand the following functions in the cosine basis on [0; 1].
For (a) and (b), nd the coeÆcients j analytically. For
(c) and (d), nd the coeÆcients j numerically, i.e.
1
2
j =
Z
1
0
3
N
r r
X
1
f
f (x)j (x) N
N j N
r=1
for some large integer N . Then plot the partial sum Pnj j j (x)
for increasing values of n.
p
(a) f (x) = 2cos(3x).
(b) f (x) = sin(x).
(c) f (x) = Pj hj K (x tj ) where K (t) = (1+sign(t))=2,
=1
11
=1
22.6 Exercises
423
(tj ) = (:1; :13; :15; :23; :25; :40; :44; :65; :76; :78; :81),
(hj ) = (4; 5; 3; 4; 5; 4:2; 2:1; 4:3; 3:1; 2:1; 4:2).
p
(d) f = x(1 x) sin x : : :
7. Consider the glass fragments data. Let Y be refractive index and let X be aluminium content (the fourth variable).
(a) Do a nonparametric regression to t the model Y =
f (x)+ using the cosine basis method. The data are not on
a regular grid. Ignore this when estimating the function.
(But do sort the data rst.) Provide a function estimate,
an estimate of the risk and a condence band.
(b) Use the wavelet method to estimate f .
8. Show that the Haar wavelets are orthonormal.
9. Consider again the doppler signal:
p
2
:1
f (x) = x(1 x) sin
:
x + 0:05
21
( + 05)
Let n = 1024 and let (x ; : : : ; xn ) = (1=n; : : : ; 1). Generate
data
Yi = f (xi ) + i
where i N (0; 1).
(a) Fit the curve using the cosine basis method. Plot the
function estimate and condence band for J = 10; 20; : : : ; 100.
(b) Use Haar wavelets to t the curve.
10. (Haar density Estimation.) Let X ; : : : ; Xn f for some
density f on [0; 1]. Let's consider constructing a wavelet
histogram. Let and be the Haar father and mother
wavelet. Write
1
1
f (x) (x) +
j 1
J 1 2X
X
j =0 k=0
j;k
j;k
(x)
424
where J log (n). Let
2
bj;k
n
X
1
=
n
i=1
j;k
(Xi):
(a) Show that bj;k is an unbiased estimate of j;k.
(b) Dene the Haar histogram
( ) = (x) +
fb x
j 1
B 2X
X
j =0 k=0
bj;k
j;k
(x)
for 0 B J 1.
(c) Find an approximate expression for the MSE as a function of B .
(d) Generate n = 1000 observations from a Beta (15,4)
density. Estimate the density using the Haar histogram.
Use leave-one-out cross vaidation to choose B .
11. In this question, we will explore the motivation for equation (22.38). Let X ; : : : ; Xn N (0; ). Let
p median(jX j; : : : ; jXnj)
b = n :
0:6745
(a) Show that E (b) = .
(b) Simulate n = 100 observations from a N(0,1) distribution. Compute b as well as the usual estimate of . Repeat
1000 times and compare the MSE.
(c) Repeat (b) but add some outliers to the data. To do
this, simulate each observation from a N(0,1) with probability .95 and simulate each observation from a N(0,10)
with probability .95.
12. Repeat question 6 using the Haar basis.
2
1
1
This is page 425
23
Classication
23.1 Introduction
The problem of predicting a discrete random variable Y from another random variable X is called classication, supervised
learning, discrimination or pattern recognition.
In more detail, consider iid data (X1 ; Y1 ); : : : ; (Xn ; Yn) where
Xi = (Xi1 ; : : : ; Xid ) 2 X R d
is a d-dimensional vector and Yi takes values in some nite set
Y . A classication rule is a function h : X ! Y . When we
observe a new X , we predict Y to be h(X ).
Example 23.1 Here is a an example with fake data. Figure 23.1
shows 100 data points. The covariate X = (X1 ; X2 ) is 2-dimensional
and the outcome Y 2 Y = f0; 1g. The Y values are indicated on
the plot with the triangles representing Y = 1 and the squares
representing Y = 0. Also shown is a linear classication rule
represented by the solid line. This is a rule of the form
a + b1 x1 + b2 x2 > 0
h(x) = 10 ifotherwise
:
426
23. Classication
Everything above the line is classied as a 0 and everything below
the line is classied as a 1. Example 23.2 The Coronary Risk-Factor Study (CORIS) data
involve 462 males between the ages of 15 and 64 from three rural areas in South Africa (Rousseauw et al 1983, Hastie and
Tibshirani, 1987). The outcome Y is the presence (Y = 1) or
absence (Y = 0) of coronary heart disease. There are 9 covariates: systolic blood pressure, cumulative tobacco (kg), ldl (low
density lipoprotein cholesterol), adiposity, famhist (family history of heart disease), typea (type-A behavior), obesity, alcohol (current alcohol consumption), and age. Figure 23.2 shows
the outcomes and two of the covariates, systolic blood pressure
and tobaco consumption. People with/without heart disease are
coded with triangles and squares. Also shown is a linear decision boundary which we will explain shortly. In this example,
the groups are much harder to tell apart. In fact, 141 people are
misclassied using this classication rule.
At this point, it is worth revisiting the Statistics/Data Mining
dictionary:
Statistics
classication
data
covariates
classier
estimation
Computer Science
supervised learning
training sample
features
hypothesis
learning
Meaning
predicting a discrete Y from X
(X1 ; Y1 ); : : : ; (Xn; Yn)
the Xi 's
map h : X ! Y
nding a good classier
In most cases in this Chapter, we deal with the case
f0; 1g.
Y
=
427
0.0
0.2
0.4
x2
0.6
0.8
1.0
23.1 Introduction
0.0
0.2
0.4
0.6
x1
FIGURE 23.1. Two covariates and a linear decision
boundary. 4 means Y = 1. means Y = 0. You won't
see real data like this.
0.8
1.0
23. Classication
15
10
5
0
Tobacco
20
25
30
428
100
120
140
160
180
Systolic Blood Pressure
FIGURE 23.2. Two covariates and and a linear decision boundary with data from the Coronary Risk-Factor
Study (CORIS). 4 means Y = 1. means Y = 0. This
is real life.
200
220
23.2 Error Rates and The Bayes Classier
429
23.2 Error Rates and The Bayes Classier
Our goal is to nd a classication rule h that makes accurate
predictions. We start with the following denitions.
One can use other
Denition 23.3 The
true error rate of a classier h is
L(h) = P(fh(X ) 6= Y g)
(23.1)
and the empirical error rate or training error rate
is
n
1X
b
Ln (h) =
I (h(Xi ) 6= Yi ):
(23.2)
n i=1
First we consider the special case where Y = f0; 1g. Let
r(x) = E (Y jX = x) = P(Y = 1jX = x)
denote the regression function. From Bayes' theorem we have
that
f1 (x)
r(x) =
(23.3)
f1 (x) + (1 )f0 (x)
where
f0 (x) = f (xjY = 0)
f1 (x) = f (xjY = 1)
and = P(Y = 1).
Denition 23.4 The
ned to be
Bayes classication rule h is de
1 if r(x) > 12
(23.4)
0 otherwise:
The set D(h) = fx : P(Y = 1jX = x) = P(Y = 0jX =
x)g is called the decision boundary.
h (x) =
measures of error as
well.
430
23. Classication
Warning! The Bayes rule has nothing to do with Bayesian
inference. We could estimate the Bayes rule using either frequentist or Bayesian methods.
The Bayes' rule may be written in several equivalent forms:
h (x) =
and
1 if P(Y = 1jX = x) > P(Y = 0jX = x)
0 otherwise
h (x) =
1 if f1 (x) > (1 )f0 (x)
0 otherwise:
(23.5)
(23.6)
23.5 The Bayes rule is optimal, that is, if h is any
other classication rule then L(h ) L(h).
Theorem
The Bayes rule depends on unknown quantities so we need to
use the data to nd some approximation to the Bayes rule. At
the risk of oversimplifying, there are three main approaches:
1. Empirical Risk Minimization. Choose a set of classiers
and nd bh 2 H that minimizes some estimate of L(h).
H
2. Regression. Find an estimate rb of the regression function
r and dene
if rb(x) > 21
b
h(x) = 01 otherwise
:
3. Density Estimation. Estimate f0 from the Xi 's for which
Yi = 0, estimate
f1 from the Xi 's for which Yi = 1 and let
Pn
1
b = n
i=1 Yi . Dene
rb(x) = Pb(Y = 1jX = x) =
and
b
h(x)
=
bfb1 (x)
bfb1 (x) + (1 b)fb0 (x)
1 if rb(x) > 12
0 otherwise:
23.3 Gaussian and Linear Classiers
431
Now let us generalize to the case where Y takes on more than
two values as follows.
Theorem
23.6 Suppose that Y 2 Y = f1; : : : ; K g. The
optimal rule is
h(x) = argmaxk P(Y = kjX = x)
= argmaxk k fk (x)
where
(23.7)
(23.8)
fk (x)k
;
(23.9)
r fr (x)r
r = P (Y = r), fr (x) = f (xjY = r) and argmaxk means
\the value of k that maximizes that expression."
P(Y
= k j X = x) =
P
Perhaps the simplest approach to classication is to use the density estimation strategy and assume a parametric model for the
densities. Suppose that Y = f0; 1g and that f0 (x) = f (xjY = 0)
and f1 (x) = f (xjY = 1) are both multivariate Gaussians:
fk (x) =
1
(2 )d=2 jk j1=2
exp
1
(x
2
k )T k 1 (x
k ) ; k = 0; 1:
Thus, X jY = 0 N (0 ; 0 ) and X jY = 1 N (1 ; 1 ).
432
23. Classication
Theorem 23.7 If X jY = 0 N (0 ; 0 ) and X jY = 1 N (1 ; 1 ), then the Bayes rule is
h (x) =
(
1 if r12 < r02 + 2 log
0 otherwise
1
0
+ log jj01 jj
(23.10)
where
ri2 = (x i )T i 1 (x i ); i = 1; 2
(23.11)
is the Manalahobis distance. An equivalent way of expressing the Bayes' rule is
h(x) = argmaxk Æk (x)
where
1
log jk j
2
1
(x
2
k )T k 1 (x k ) + log k
(23.12)
and jAj denotes the determinant of a matrix A.
Æk (x) =
The decision boundary of the above classier is quadratic so
this procedure is often called quadratic discriminant analysis (QDA). In practice, we use sample estimates of ; 1; 2 ; 0 ; 1
in place of the true value, namely:
n
n
1X
1X
b0 =
(1 Yi ); b1 =
Y
n i=1
n i=1 i
1 X
1 X
Xi ; b1 =
X
b0 =
n0 i: Y =0
n1 i: Y =1 i
1 X
1 X
S0 =
(Xi b0 )(Xi b0 )T ; S1 =
(X
n0 i: Y =0
n1 i: Y =1 i
i
i
i
where n0 =
P
i
i (1
Yi ) and n1 =
P
i Yi .
b1 )(Xi
b1 )T
433
A simplication occurs if we assume that 0 = 0 = . In
that case, the Bayes rule is
(23.13)
where now
1 T 1
+ log k :
(23.14)
2 k
The parameters are estimated as before except that the mle of
is
n S +n S
S = 0 0 1 1:
n0 + n1
The classication rule is
Æ1 (x) > Æ0 (x)
h (x) = 10 ifotherwise
(23.15)
where
Æk (x) = xT 1 k
Æj (x) = xT S 1 bj
1 T 1
b S bj + log bj
2 j
is called the discriminant function. The decision boundary
fx : Æ0 (x) = Æ1(x)g is linear so this method is called linear
discrimination analysis (LDA).
Example 23.8 Let us return to the South African heart disease
data. The decision rule in Figure 23.2 was obtained by linear
discrimination. The outcome was
classied as 0 classied as 1
y = 0 277
25
y = 1 116
44
The observed misclassication rate is 141=462 = :31. Including
all the covariates reduces the error rate to .27. The results from
quadratic discrimination are
classied as 0 classied as 1
y = 0 272
30
y = 1 113
47
434
23. Classication
which has about the same error rate 143=462 = :31. Including
all the covariates reduces the error rate to .26. In this example,
there is little advantage to QDA over LDA.
Now we generalize to the case where Y takes on more than
two values.
23.9 Suppose that Y 2 f1; : : : ; K g. If fk (x) =
f (xjY = k) is Gaussian, the Bayes rule is
Theorem
where
1
log jk j
2
1
(x
2
k )T k 1 (x k ) + log k :
(23.16)
If the variances of the Gaussians are equal then
Æk (x) =
Æk (x) = xT 1 k
1 T 1
+ log k :
2 k
(23.17)
We estimate Æk (x) by by inserting estimates of k , k and k .
There is another version of linear discriminant analysis due to
Fisher. The idea is to rst reduce the dimension of covariates to
one dimension by projecting the data onto a line. Algebraically,
this meansd replacing the covariatePX = (X1 ; : : : ; Xd ) with a
linear combination U = wT X = dj=1 wj Xj . The goal is to
choose the vector w = (w1 ; : : : ; wd ) that \best separates the
data." Then we perform classication with the new covariate Z
instead of X .
We need dene what we mean by separation of the groups.
We would like the two groups to have means that are far apart
relative to their spread. Let j denote the mean of X for Yj
and let be the variance matrix of X . Then E (U jY = j ) =
J E (wT X jY = j ) = wT j and V(U ) = wT w. Dene the separa-
The quantity
arises in physics,
where it is called the
Rayleigh coeÆcient.
tion by
435
(E (U jY = 0) E (U jY = 1))2
wT w
T
T
(w 0 w 1 )2
=
wT w
T
w (0 1 )(0 1 )T w
=
:
T w
wP
We estimate J as follows. Let nj ni=1 I (Yi = j ) be the number
of observations in group j , let X j be the sample mean vector of
the X 's for group j , and let Sj be the sample covariance matrix
in group j . Dene
wT S w
(23.18)
Jb(w) = T B
w SW w
where
SB = (X 0 X 1 )(X 0 X 1 )
(n0 1)S0 + (n1 1)S1
:
SW =
(n0 1) + (n1 1)
Theorem 23.10 The vector
w = SW1 (X 0 X 1 )
(23.19)
is a minimizer of Jb(w). We call
(23.20)
U = wT X = (X 0 X 1 )T SW1 X
the Fisher linear discriminant function. The midpoint m
between X 0 and X 1 is
1
1
m = (X 0 + X 1 ) = (X 0 X 1 )T SB 1 (X 0 + X 1 ) (23.21)
2
2
Fisher's classication rule is
0 if wT X m
h(x) =
1 if wT X < m
0 if (X 0 X 1 )T SW1 x m
=
1 if (X 0 X 1 )T SW1 x < m:
Fisher's rule is the same as the Bayes linear classier in equation (23.14) when b = 1=2.
J (w) =
436
23. Classication
23.4 Linear Regression and Logistic Regression
A more direct approach to classication is to estimate the regression function r(x) = E (Y jX = x) without bothering to estimste
the densities fk . For the rest of this section, we will only consider the case where Y = f0; 1g. Thus, r(x) = P(Y = 1jX = x)
and once we have an estimate rb, we will use the classication
rule
if rb(x) > 21
h(x) = 01 otherwise
(23.22)
:
The simplest regression model is the linear regression model
Y = r(x) + = 0 +
d
X
j =1
j Xj + (23.23)
where E () = 0. This model can't be correct since it does not
force Y = 0 or 1. Nonetheless, it can sometimes lead to a decent
classier.
Recall that the least squares estimate of = (0 ; 1 ; : : : ; d )T
minimizes the residual sums of squares
RSS ( ) =
n X
i=1
Yi
0
d
X
j =1
2
Xij j :
Let us briey review what that estimator is. Let X denote the
N (d + 1) matrix of the form
2
6
X=6
6
4
1
1
..
.
1
X11
X21
..
.
Xn1
:::
:::
..
.
:::
X1d
X2d
..
.
Xnd
3
7
7
7:
5
Also let Y = (Y1 ; : : : ; Yn)T . Then,
RSS ( ) = (Y
X )T (Y X )
23.4 Linear Regression and Logistic Regression
437
and the model can be written as
Y = X + where = (1 ; : : : ; n )T . The least squares solution b that minimizes RSS can be found by elementary calculus and is given
by
b = (XT X) 1 XT Y:
The predicted values are
b
b = X:
Y
P
Now we use (23.22) to classify, where rb(x) = b0 + j bj xj .
It seems sensible to use a regression method that takes into
account that Y 2 f0; 1g. The most common method for doing
so is logistic regression. Recall that the model is
P
e0 + P x
:
r(x) = P (Y = 1jX = x) =
1 + e 0 + x
j
j
j
j
j
j
(23.24)
We may write this is
logit P (Y = 1jX = x) = 0 +
X
j
j xj
where logit (a) = log(a=(1 a)). Under this model, each Yi is a
Bernoulli with success probability
P
e0 + P X
pi ( ) =
:
1 + e0 + X
j
j
ij
j
j
ij
The likelihood function for the data set is
L( ) =
n
Y
i=1
pi ( )Y (1 pi ( ))1
We obtain the mle numerically.
i
Yi :
438
23. Classication
Example 23.11 Let us return to the heart disease data. A logistic
regression yields the following estimates and Wald tests for the
coeÆcients:
Covariate Estimate Std. Error Wald statistic p-value
Intercept
-6.145
1.300
-4.738 0.000
sbp
0.007
0.006
1.138 0.255
tobacco
0.079
0.027
2.991 0.003
ldl
0.174
0.059
2.925 0.003
adiposity
0.019
0.029
0.637 0.524
famhist
0.925
0.227
4.078 0.000
typea
0.040
0.012
3.233 0.001
obesity
-0.063
0.044
-1.427 0.153
alcohol
0.000
0.004
0.027 0.979
age
0.045
0.012
3.754 0.000
Are surprised by the fact that systolic blood pressure is not
signicant or by the minus sign for the obesity coeÆcient? If
yes, then you are confusing correlation and causation. There
are lots of unobserved confounding variables so the coeÆcients
above cannot be interpreted causally. The fact that blood pressure is not signicant does not mean that blood pressure is not an
important cause of heart disease. It means that it is not an important predictor of heart disease relative to the other variables
in the model. The error rate, using this model for classication,
is .27. The error rate from a linear regression is .26.
We can get a better classier by tting a richer model. For
example we could t
X
X
logit P (Y = 1jX = x) = 0 + j xj + jk xj xk : (23.25)
j
j;k
More generally, we could add terms of up to order r for some
integer r. Large values of r give a more complicated model which
should t the data better. But there is a bias-variance tradeo
which we'll discuss later.
Example 23.12 If we use model (23.25) for the heart disease data
with r = 2, the error rate is reduced to .22.
23.5 Relationship Between Logistic Regression and LDA
439
The logistic regression model can easily be extended to k
groups but we shall not give the details here.
23.5 Relationship Between Logistic Regression and
LDA
LDA and logistic regression are almost the same thing. If we
assume that each group is Gaussian with the same covariance
matrix then we saw that
P(Y = 1jX = x)
1
= log 0
( + 1 )T 1 (1 0 ) + xT 1 (1
log
P(Y = 0jX = x)
1
2 0
0 + T x:
On the other hand, the logistic model is, by assumption,
P(Y = 1jX = x)
= 0 + T x:
log
P(Y = 0jX = x)
These are the same model since they both lead to classication
rules that are linear in x. The dierence is in how we estimate
the parameters.
The joint density of a single observation is f (x; y ) = f (xjy )f (y ) =
f (y jx)f (x). In LDA we estimated the whole joint distribution by
estimating f (xjy ) and f (y ); specically, we estimated fk (x) =
f (xjY = k), 1 = fY (1) and 0 = fY (0). We maximized the
likelihood
Y
i
f (xi ; yi) =
Y
|
i
f (xi jyi)
{z
Y
i
f (yi ) :
(23.26)
} | {z }
Gaussian
Bernoulli
In
logistic regression we maximized the conditional likelihood
Q
i f (yi jxi ) but we ignored the second term f (xi ):
Y
i
f (xi ; yi) =
Y
|
i
f (yi jxi )
{z
logistic
Y
i
f (xi ) :
} | {z
ignored
}
(23.27)
0 )
440
23. Classication
Since classication only requires knowing f (y jx), we don't really
need to estimate the whole joint distribution. Here is an analogy:
we can estimate the mean of a distribution with the sample
mean X without bothering to estimate the whole density function. Logistic regression leaves the marginal distribution f (x)
un-specied so it is more nonparametric than LDA.
To summarize: LDA and logistic regression both lead to a
linear classication rule. In LDA we estimate the entire joint
distribution f (x; y ) = f (xjy )f (y ). In logistic regression we only
estimate f (y jx) and we don't bother estimating f (x).
23.6 Density Estimation and Naive Bayes
Recall that the Bayes rule is h(x) = argmaxk k fk (x): If we
can estimate k and fk then we can estimate the Bayes classication rule. Estimating k is easy but what about fk ? We did
this previously by assuming fk was Gaussian. Another strategy
is to estimate fk with some nonparametric density estimator
fbk such as a kernel estimator. But if x = (x1 ; : : : ; xd ) is high
dimensional, nonparametric density estimation is not very reliable. This problem is ameliorated if we assume
that X1 ; : : : ; Xd
Qd
are independent, for then, fk (x1 ; : : : ; xd ) = j =1 fkj (xj ). This
reduces the problem to d one-dimensional density estimation
problems, within each of the k groups. The resulting classier
is called the naive Bayes classier. The assumption that the
components of X are independent is usually wrong yet the resulting classier might still be accurate. Here is a summary of
the steps in the naive Bayes classier:
23.7 Trees
441
The Naive Bayes Classier
1. For each group k, compute an estimate fbkj of the density
fkj for Xj , using the data for which Yi = k.
2. Let
fbk (x)
3. Let
=
fbk (x1 ; : : : ; xd ) =
d
Y
j =1
fbkj (xj ):
n
1X
bk =
I (Y = k)
n i=1 i
where I (Yi = k) = 1 if Yi = k and I (Yi = k) = 0 if Yi 6= k.
4. Let
h(x) = argmaxk bk fbk (x):
The naive Bayes classier is especially popular when x is high
dimensional and discrete. In that case, fbkj (xj ) is especially simple.
23.7 Trees
Trees are classication methods that partition the covariate
space X into disjoint pieces and then classify the observations
according to which partition element they fall in. As the name
implies, the classier can be represented as a tree.
For illustration, suppose there are two covariates, X1 = age
and X2 = blood pressure. Figure 23.3 shows a classication tree
using these variables.
The tree is used in the following way. If a subject has Age
50 then we classify him as Y = 1. If a subject has Age < 50
then we check his blood pressure. If blood pressure is < 100
442
23. Classication
Age
< 50
50
1
Blood Pressure
< 100
100
0
1
FIGURE 23.3. A simple classication tree.
then we classify him as Y = 1, otherwise we classify him as
Y = 0. Figure 23.4 shows the same classier as a partition of
the covariate space.
Here is how a tree is constructed. For simplicity, we focus on
the case in which 2 Y = f0; 1g. First, suppose there is only a
single covariate X . We choose a split point t that divides the
real line into two sets A1 = ( 1; t] and A2 = (t; 1). Let pbs (j )
be the proportion of observations in As such that Yi = j :
pbs (j ) =
Pn
I (Yi = j; Xi As )
i=1
Pn
i=1 I (Xi As )
2
2
(23.28)
for s = 1; 2 and j = 0; 1. The impurity of the split t is dened
to be
I (t) =
where
s = 1
2
X
s=1
1
X
j =0
s
pbs (j )2 :
(23.29)
(23.30)
23.7 Trees
Blood Pressure
1
110
1
0
50
Age
FIGURE 23.4. Partition representation of classication tree.
443
444
23. Classication
This measure of impurity is known as the Gini index. If a
partition element As contains all 0's or all 1's, then s = 0.
Otherwise, s > 0. We choose the split point t to minimize the
impurity. (Other indices of impurity besides can be used besides
the Gini index.)
When there are several covariates, we choose whichever covariate and split that leads to the lowest impurity. This process
is continued until some stopping criterion is met. For example,
we might stop when every partition element has fewer than n0
data points, where n0 is some xed number. The bottom nodes
of the tree are called the leaves. Each leaf is assigned a 0 or 1
depending on whether there are more data points with Y = 0
or Y = 1 in that partition element.
This procedure is easily generalized to the case where Y 2
f1; : : : ; K g: We simply dene the impurity by
s = 1
k
X
j =1
pbs (j )2
(23.31)
where pbi (j ) is the proportion of observations in the partition
element for which Y = j .
Example 23.13 Figure 23.5 shows a classication tree for the
heart disease data. The misclassication rate is .21. Suppose we
do the same example but only using two covariates: tobacco and
age. The misclassication rate is then .29. The tree is shown in
Figure 23.6. Because there are only two covariates, we can also
display the tree as a partition of X = R 2 as shown in Figure
23.7. Our description of how to build trees is incomplete. If we keep
splitting until there are few cases in each leaf of the tree, we are
likely to overt the data. We should choose the complexity of
the tree in such a way that the estimated true error rate is low.
In the next section, we discuss estimation of the error rate.
23.7 Trees
445
age < 31.5
|
age < 50.5
tobacco < 0.51
alcohol < 11.105
typea < 68.5
famhist:a
0
ldl < 6.705
tobacco < 7.605
0
0
0
1
typea < 42.5
adiposity < 28.955
tobacco < 4.15
1
adiposity < 24.435
0
adiposity < 28
1
1
1
typea < 48
0
0
0
0
FIGURE 23.5. A classication tree for the heart disease data.
1
446
23. Classication
age < 31.5
|
age < 50.5
tobacco < 0.51
tobacco < 7.47
0
0
0
0
FIGURE 23.6. A classication tree for the heart disease
data using two covariates.
1
447
1
0
5
10
15
0
0
0
tobacco
20
25
30
23.7 Trees
0
20
30
40
age
FIGURE 23.7. The tree pictured as a partition of the
covariate space.
50
60
448
23. Classication
23.8 Assessing Error Rates and Choosing a Good
Classier
How do we choose a good classier? We would like to have a
classier h with a low true error rate L(h). Usually, we can't use
the training error rate Lbn (h) as an estimate of the true error rate
because it is biased downward.
Example 23.14 Consider the heart disease data again. Suppose
we t a sequence logistic regression models. In the rst model we
include one covariate. In the second model we include two covariates and so on. The ninth model includes all the covariates.
We can go even further. Let's also t a tenth model that includes all nine covariates plus the rst covariate squared. Then
we t an eleventh model that includes all nine covariates plus the
rst covariate squared and the second covariate squared. Continuing this way we will get a sequence of 18 classiers of increasing complexity. The solid line in Figure 23.8 shows the observed classication error which steadily decreases as we make
the model more complex. If we keep going, we can make a model
with zero observed classication error. The dotted line shows the
cross-validation estimate of the error rate (to be explained
shortly) which is a better estimate of the true error rate than the
observed classication error. The estimated error decreases for
a while then increases. This is the bias-variance tradeo phenomenon we have seen before. There are many ways to estimate the error rate. We'll consider
two: cross-validation and probability inequalities.
Cross-Validation. The basic idea of cross-validation, which
we have already encountered in curve estmation, is to leave out
some of the data when tting a model. The simplest version of
cross-validation involves randomply splitting the data into two
449
0.30
0.28
0.26
error rate
0.32
0.34
23.8 Assessing Error Rates and Choosing a Good Classier
5
10
Number of terms in the model
FIGURE 23.8. Solid line is the observed error rate and
dashed line is the cross-validation estimate of true error
rate.
15
450
23. Classication
Training Data
|
{z
b
h
T
Validation Data
}
|
{z
Lb
FIGURE 23.9. Cross-validation.
pieces: the training set T and the validation set V . Often,
about 10 per cent of the data might be set aside as the validation set. The classier h is constructed from the training set.
We then estimate the error by
Lb(h) =
1 X
I (h(Xi ) 6= YI ):
m X 2V
(23.32)
i
where m is the size of the validation set. See Figure 23.9.
Another approach to cross-validation is K-fold cross-validation
which is obtained from the following algorithm.
V
}
451
K -fold cross-validation.
1. Randomly divide the data into K chunks of approximately
equal size. A common choice is K = 10.
2. For k = 1 to K do the following:
(a) Delete chunk k from the data.
(b) Compute the classier bh(k) from the rest of the data.
(c) Use bh(k) to the predict the data in chunk k. Let Lb(k)
denote the observed error rate.
3. Let
Lb(h)
K
1X
=
Lb :
K k=1 (k)
(23.33)
This is how dotted line in Figure 23.8 was computed. An alternative approach to K-fold cross-validation is to split the data
into two pieces called the training set and the test set. All
the models are t on the training set. The models are then used
to do predictions on the test set and the errors are estimated
from these predictions. When the clasication procedure is time
consuming and there are lots of data, this approach is preferred
to K-fold cross-validation.
Cross-validation can be applied to any classication method.
To apply it to trees, one begins by tting an initial tree. Smaller
trees are obtained by pruning tree, that is removing splits from
the tree. We can do this for trees of various sizes where size refers
to the number of terminal nodes on the tree. Cross-validation is
then used to estimate error rate as a function of tree size.
Example 23.15 Figure 23.10 shows the cross-validation plot for
a classication tree tted to the heart disease data. The error
452
23. Classication
is shown as a function of tree size. Figure 23.11 shows the tree
with the size chosen by minimizing the cross-validation error. . Another approach to estimating the error rate is to nd a condence interval for Lbn (h) using
probability inequalities. This method is useful in the context of
empirical risk minimization.
Let H be a set of classiers, for example, all linear classiers.
Empirical risk minimization means choosing the classier bh 2 H
to minimize the training error Lbn (h), also called the empirical
risk. Thus,
Probability Inequalities
!
1X
b
h = argminh2H Lbn (h) = argminh2H
I (h(Xi ) 6= Yi ) :
n i
(23.34)
b
b
b
Typically, Ln (h) underestimates the true error rate L(h) because
b
h was chosen to make Lbn (bh) small. Our goal is to assess how
much underestimation is taking place. Our main tool for this
analysis is Hoeding's inequality. Recall that if X1 ; : : : ; Xn Bernoulli(p), then, for any > 0,
j
P ( pb
2
pj > ) 2e 2n
(23.35)
P
where pb = n 1 ni=1 Xi .
First, suppose that H = fh1 ; : : : ; hm g consists of nitely many
classiers. For any xed h, Lbn (h) converges in almost surely
to L(h) by the law of large numbers. We will now establish a
stronger result.
23.16 (Uniform Convergence.) Assume H is nite
and has m elements. Then,
Theorem
P
max j
h2H
Lbn (h)
L(h)j > 2me
2n2 :
We will use Hoeding's inequality and weSwill also
use the fact that if A1 ; : : : ; Am is a set of events then P( m
i=1 Ai ) Proof.
453
145
140
135
130
score
150
155
160
2
4
6
8
size
FIGURE 23.10. Cross validation plot
10
12
14
454
23. Classication
age < 31.5
|
age < 50.5
0
typea < 68.5
famhist:a
tobacco < 7.605
0
1
1
0
FIGURE 23.11. Smaller classication tree with size chosen by cross-validation.
1
Pm
i=1 P(Ai ). Now,
b n (h) L(h)
P max L
h2H
j
j>
=
Theorem
23.17 Let
s
=
Then Lbn (bh) is a 1
Proof.
P (j
P
[
455
!
jLbn(h) L(h)j > h2H
X b
P Ln (h) L(h) > H 2H
X
2
2
2e 2n = 2me 2n :
j
j
H 2H
2
2m
log
:
n
condence interval for L(bh).
This follows from the fact that
j > ) P max
j
h2H
2me 2n = : Lbn (bh)
L(bh)
Lbn (bh)
j>
L(bh)
2
When H is large the condence interval for L(bh) is large.
The more functions there are in H the more likely it is we have
\overt" which we compensate for by having a larger condence
interval.
In practice we usually use sets H that are innite, such as the
set of linear classiers. To extend our analysis to these cases we
want to be able to say something like
P
sup j
h2H
Lbn (h)
L(h)j > something not too big:
All the other results followed from this inequality. One way
to develop such a generalization is by way of the VapnikChervonenkis or VC dimension. We now consider the main
ideas in VC theory.
456
23. Classication
Let A be a class of sets. Give a nite set F = fx1 ; : : : ; xn g let
n
NA (F ) = # F
\
o
A: A2A
(23.36)
be the number of subsets of F \picked out" by A. Here #(B )
denotes the number of elements of a set B . The shatter coefcient is dened by
s(A; n) = max NA (F )
(23.37)
F 2Fn
where Fn consists of all nite sets of size n. Now let X1 ; : : : ; Xn P and let
1X
I (Xi 2 A)
Pn (A) =
n i
denote the empirical probability measure. The following remarkable theorem bounds the distance between P and Pn .
23.18 (Vapnik and Chervonenkis (1971).) For any
P, n and > 0,
Theorem
P
sup jPn (A)
A2A
j > 8s(A; n)e
P(A)
n2 =32 :
(23.38)
The proof, though very elegant, is long and we omit it. If H
is a set of classiers, dene A to be the class of sets of the form
fx : h(x) = 1g. When then dene s(H; n) = s(A; n).
Theorem
23.19
P
A1
sup jLbn (h)
h2H
L(h)j > 8s(H; n)e
n2 =32 :
condence interval for L(bh) is Lbn (bh) n where
32
8s(H; n)
2n = log
n
:
457
These theorems are only useful if the shatter coeÆcients do
not grow too quickly with n. This is where VC dimension enters.
Denition 23.20 The VC (Vapnik-Chervonenkis) dimen-
sion of a class of sets A is dened as follows. If s(A; n) =
2n for all n set V C (A) = 1. Otherwise, dene V C (A)
to be the largest k for which s(A; n) = 2k .
Thus, the VC-dimension is the size of the largest nite set
F than can be shattered by A meaning that A picks out each
subset of F . If H is a set of classiers we dene V C (H) = V C (A)
where A is the class of sets of the form fx : h(x) = 1g as h
varies in H. The following theorem shows that if A has nite
VC-dimension then the shatter coeÆcients grow as a polynomial
in n.
Theorem
23.21 If A has nite VC-dimension v , then
s(A; n) nv + 1:
A = f( 1; a]; a 2 Rg. The A shatters
ever one point set fxg but it shatters no set of the form fx; y g.
Therefore, V C (A) = 1.
Example 23.23 Let A be the set of closed intervals on the real
line. Then A shatters S = fx; y g but it cannot shatter sets with
3 points. Consider S = fx; y; zT
g where x < y < z. One cannot
nd an interval A such that A S = fx; z g. So, V C (A) = 2.
Example 23.24 Let A be all linear half-spaces on the plane. Any
Example 23.22 Let
three point set (not all on a line) can be shattered. No 4 point
set can be shattered. Consider, for example, 4 points forming a
diamond. Let T be the left and rightmost points. This can't be
picked out. Other congurations can also be seen to be unshatterable. So V C (A) = 3. In general, halfspaces in Rd have VC
dimension d + 1.
458
23. Classication
Example 23.25 Let
A be all rectangles on the plane with sides
parallel to the axes. Any 4 point set can be shattered. Let S be
a 5 point set. There is one point that is not leftmost, rightmost,
uppermost, or lowermost. Let T be all points in S except this
point. Then T can't be picked out. So V C (A) = 4.
23.26 Let x have dimension d and let H be th set of
linear classiers. The VC dimension of H is d +1. Hence, a 1 condence interval for the true error rate is Lb(bh) where
Theorem
8(nd+1 + 1)
32
2n = log
n
:
23.9 Support Vector Machines
In this section we consider a class of linear classiers called support vector machines. Throughout this section, we assume
that Y is binary. It will be convenient to label the outcomes as
1 and +1 instead of 0 and 1. A linear classier can then be
written as
h(x) = sign H (x)
where x = (x1 ; : : : ; xd ),
H (x) = a0 +
and
sign(z ) =
8
<
:
d
X
i=1
ai xi
1 if z < 0
0 if z = 0
1 if z > 0:
First, suppose that the data are linearly separable, that
is, there exists a hyperplane that perfectly separates the two
classes.
23.9 Support Vector Machines
459
Lemma 23.27 The data can be separated by some hyperplane if
P
and only if there exists a hyperplane H (x) = a0 + di=1 ai xi such
that
Yi H (xi ) 1; i = 1; : : : ; n:
(23.39)
Suppose
the data can be separated by a hyperplane
Pd
W (x) = b0 + i=1 bi xi . It follows that there exists some constant
c such that Yi = 1 implies W (Xi ) c and Yi = 1 implies
W (XP
i ) c. Therefore, Yi W (Xi ) c for all i. Let H (x) =
a0 + di=1 ai xi where aj = bj =c. Then Yi H (Xi ) 1 for all i. The
reverse direction is straightforward. Proof.
In the separable case, there will be many separating hyperplanes. How should we choose one? Intuitively, it seems reasonable to choose the hyperplane \furthest" from the data in the
sense that it separates the +1'a and -1's and maximizes the distance to the closest point. This hyperplane is called the maximum margin hyperplane. The margin is the distance to from
the hyperplane to the nearest point. Points on the boundary of
the margin are called support vectors. See Figure 23.12.
23.28 The hyperplane Hb (x) = ba0 +
Pd
ai xi
i=1 b
that separatesP
the data and maximizes the margin is given by minimizing
(1=2) dj=1 b2j subject to (23.39).
Theorem
It turns out that this problem can be recast as a quadratic
programming problem. Recall that hXi ; Xk i = XiT Xk is the inner product of Xi and Xk .
23.29 Let Hb (x) = ba0 +
Pd
ai xi
i=1 b
denote the optimal
(largest margin) hyperplane. Then, for j = 1; : : : ; d,
Theorem
b
aj
=
n
X
i=1
bi Yi Xj (i)
460
23. Classication
c
b
c
b
bc
c
b
bc
bc
bc
bc
c
b
c
b
c
b
c
b
bc
H (x) = a0 + aT x = 0
bc
FIGURE 23.12. The hyperplane H (x) has the largest margin of all
hyperplanes that separate the two classes.
where Xj (i) is the value of the covariate Xj for the ith data
point, and b = (b1 ; : : : ; bn) is the vector that maximizes
n
X
i=1
i
n X
n
1X
Y Y hX ; X i
2 i=1 k=1 i k i k i k
subject to
(23.40)
i 0
and
0=
X
i
i Yi :
The points Xi for which b 6= 0 are called support vectors. ba0
can be found by solving
bi Yi(XiT ba + b0 = 0
for any support point Xi . Hb may be written as
Hb (x)
= ba0 +
n
X
i=1
bi Yi hx; Xi i:
23.10 Kernelization
461
There are many software packages that will solve this problem
quickly. If there is no perfect linear classier, then one allows
overlap between the groups by replacing the condition (23.39)
with
Yi H (xi ) 1 i ; i 0; i = 1; : : : ; n:
(23.41)
The variables 1 ; : : : ; n are called slack variables.
We now maximize (23.40) subject to
0 i c; i = 1; : : : ; n
and
n
X
i=1
i Yi = 0:
The constant c is a tuning parameter that controls the amount
of overlap.
23.10 Kernelization
There is a trick for improving a computationally simple classier
h called kernelization. The idea is to map the covariate X {
which takes values in X { into a higher dimensional space Z and
apply the classier in the bigger space Z . This can yield a more
exible classier while retaining computationally simplicity.
The standard example of this idea is illustrated in Figure
23.13. The covariate x = (x1 ; x2 ). The Yi 's can be separated
into two groups using an ellipse. Dene a mapping by
p
z = (z1 ; z2 ; z3 ) = (x) = (x21 ; 2x1 x2 ; x22 ):
This maps X = R 2 into Z = R 3 . In the higher dimensional
space Z , the Yi 's are seperable by a linear decision boundary.
In other words, a linear classier in a higher dimenional space
corresponds to a non-linear classier in the original space.
462
23. Classication
x2
+
++
z2
+
+
+
+
+
+
x1
+
+
+
+
+
+
+
+
+
+
+
+
+
z1
+
z3
FIGURE 23.13. Kernelization. Mapping the covariates into a higher
dimensional space can make a complicated decision boundary into a
simpler decision boundary.
23.10 Kernelization
463
There is a potential drawback. If we signicantly expand the
dimension of the problem, we might increase the computational
burden. For example, x has dimension d = 256 and we wanted
to use all fourth order terms, then z = (x) has dimension
183,181,376. What saves us is two observations. First, many
classiers do not require that we know the values of the individual points but, rather, just the inner product between pairs
of points. Second, notice in our example that the inner product
in Z can be written
hz; zei
= h(x); (xe)i
= x21 xe21 + 2x1 xe1 x2 xe2 + x22 xe22
= (hx; xei)2
k(x; xe):
Thus, we can compute hz; zei without ever computing Zi =
(Xi ).
To summarize, kernelization means involves nding a mapping : X ! Z and a classier such that:
1.
Z has higher dimension than X
of classiers.
and so leads a richer set
2. The classier only requires computing inner products.
3. There is a function K , called a kernel, such that h(x); (xe)i =
K (x; xe).
4. Everywhere the term hx; xei appears in the algorithm, replace it with K (x; xe).
In fact, we never need to construct the mapping at all.
We only need to specify a kernel k(x; xe) that corresponds to
h(x); (xe)i for some . This raises an interesting question: given
a function of two variables K (x; y ), does there exist a function
(x) such that K (x; y ) = h(x); (y )i. The answer is provided
464
23. Classication
by Mercer's theorem which says, roughly, that if K is positive
denite { meaning that
Z Z
K (x; y )f (x)f (y )dxdy 0
for square integrable functions f { then such a exists. Examples of commonly used kernels are:
r
polynomial K (x; xe) = hx; xei + a
sigmoid K (x; xe) = tanh(
ahx; xei + b)
2
2
Gaussian K (x; xe) = exp jjx xejj =(2 )
Let us now see how we can use this trick in LDA and in
support vector machines.
Recall that the Fisher linear discriminant method replaces X
with U = wT X where w is chosen to maximize the Rayleigh
coeÆcient
wT S w
J (w) = T B ;
w SW w
SB = (X 0 X 1 )(X 0 X 1 )T
and
(n0 1)S0
(n1 1)S1
SW =
+
:
(n0 1) + (n1 1)
(n0 1) + (n1 1)
In the kernelized version, we replace Xi with Zi = (Xi ) and
we nd w to maximize
J (w) =
where
and
SeB = (Z 0
wT SeB w
wT SeW w
Z 1 )(Z 0
!
(23.42)
Z 1 )T
!
(n0 1)Se0
(n1 1)Se1
SW =
+
:
(n0 1) + (n1 1)
(n0 1) + (n1 1)
23.10 Kernelization
465
Here, Sej is the sample of covariance of the Zi 's for which Y =
j . However, to take advantage of kernelization, we need to reexpress this in terms of inner products and then replace the
inner products with kernels.
It can be shown that the maximizing vector w is a linear
combination of the Zi 's. Hence we can write
w=
Also,
n
X
i=1
i Zi :
n
1 X
Zj =
(Xi)I (Yi = j ):
nj i=1
Therefore,
wT Z j =
n
X
i Zi
T i=1
n X
n
1 X
n
1 X
(Xi )I (Yi = j )
nj i=1
I (Y = j )ZiT (Xs)
nj i=1 s=1 i s
n
n
X
1 X
I (Y = j )(Xi)T (Xs)
=
nj i=1 i s=1 s
n
n
X
1 X
=
i I (Ys = j )K (Xi ; Xs)
nj i=1 s=1
= T Mj
=
where Mj is a vector whose ith component is
Mj (i) =
It follows that
where M = (M0
can write
n
1 X
K (Xi ; Xs)I (Yi = j ):
nj s=1
wT SeB w = T M
M1 )(M0 M1 )T . By similar calcuations, we
wT SeW w = T N
466
23. Classication
where
1
1
N = K0 I
1 K0T + K1 I
1 K1T ;
n0
n1
I is the identity matrix, 1 is a matrix of all one's, and Kj is
the n nj matrix with entries (Kj )rs = K (xr ; xs ) with xs varying over the observations in group j . Hence, we now nd to
maximize
T M
J ( ) = T :
N
Notice that all the qauntities are expressed in terms of the kernel. Formally, the solution is = N 1 (M0 M1 ). However, N
might be non-invertible. In this case on replaces N by N + bI
form some constant b. Finally, the projection onto the new subspace can be written as
U=
wT (x)
=
n
X
i=1
i K (xi ; x):
The support vector machine can similarly be kernelized. We
simply replace hXi ; Xj i with K (Xi ; Xj ). For example, instead
of maximizing (23.40), we now maximize
n
X
i=1
i
n X
n
1X
Y Y K (Xi ; Xj ):
2 i=1 k=1 i k i k
(23.43)
Pn
bi Yi K (X; Xi ).
i=1 The hyperplane can be written as Hb (x) = ba0 +
23.11 Other Classiers
There are many other classiers and space precludes a full dicussion of all of them. Let us briey mention a few.
The k-nearest-neighbors classier is very simple. Given a
point x, nd the k data points closest to x. Classify x using the
majority vote of these k neighbors. Ties can be broke randomly.
The parameter k can be chosen by cross-validation.
23.11 Other Classiers
467
Bagging is a method for reducing the variability of a clas-
sier. It is most helpful for highly non-linear classiers such as
a tree. We draw B boostrap samples from the data. The bth
bootstrap sample yields a classier hb . The nal classier is
b
h(x)
=
P
1 if B1 Bb=1 hb (x) 12
0 otherwise:
Boosting is a method for starting with a simple classier
and gradually improving it by retting the data giving higher
weight to misclassied samples. Suppose that H is a collection of
classiers, for example, tress with onely one split. Assume that
Yi 2 f 1; 1g and that each h is such that h(x) 2 f 1; 1g. We
usually give equal weight to all data points in the methods we
have discussed. But one can incorporate unequal weights quite
easily in most algorithms. For example, in constructing a tree,
we could replace the impurity measure with a weighted impurity
measure. The original version of bossting, called AdaBoost, is
as follows.
1. Set the weights wi = 1=n, i = 1; : : : ; n.
2. For j = 1; : : : ; J , do the following steps:
(a) Constructing a classier hj from the data using the
weights w1 ; : : : ; wn.
(b) Compute the weighted error estimate:
Lbj =
Pn
i I (Yi = hj (Xi ))
i=1 wP
:
n
i=1 wi
6
(c) Let j = log((1 Lbj )=Lbj ).
(d) Update the weights:
wi
wi e I (Y 6=h (X ))
j
i
j
i
468
23. Classication
3. The nal classier is
b
h(x)
= sign
J
X
j =1
j hj (x) :
There is now an enormous literature trying to explain and
improve on boosting. Whereas bagging is a variance reduction
technique, boosting can be thought of as a bias reduction technique. We starting with a simple { and hence highly biased {
classier, and we gradually reduce the bias. The disadvantage
of boosting is that the nal classier is quite complicated.
An excellent reference on classication is Hastie, Tibshirani and
Friedman (2001). For more on the theory, see Devroye, Gyor,
Lugosi (1996) and Vapnik (2001).
23.13 Exercises
3. Download the spam data from:
http://www-stat.stanford.edu/tibs/ElemStatLearn/index.html
The data le can also be found on the course web page.
The data contain 57 covariates relating to email messages.
Each email message was classied as spam (Y=1) or not
spam (Y=0). The outcome Y is the last column in the le.
The goal is to predict whether an email is spam or not.
23.13 Exercises
469
(a) Construct classication rules using (i) LDA, (ii) QDA,
(iii) logistic regression and (iv) a classication tree. For
each, report the observed misclassication error rate and
construct a 2-by-2 table of the form
Y =0
Y =1
b
h(x)
= 0 bh(x) = 1
Hint: Here are some things to remember when using R.
(i) To t a logistic regression and get the classied values:
d <- data.frame(x,y)
n <- nrow(x)
out <- glm(y ~ .,family=binomial,data=d)
p <- out$fitted.values
pred <- rep(0,n)
pred[p > .5] <- 1
table(y,pred)
(ii) To build a tree you must turn y into a factor and you
must do it before you make the data frame:
y <- as.factor(y)
d <- data.frame(x,y)
library(tree)
out <- tree(y ~ .,data=d)
print(summary(out))
plot(out,type="u",lwd=3)
text(out)
pred <- predict(out,d,type="class")
table(y,pred)
470
23. Classication
(b) Use 5-fold cross-validation to estimate the prediction
accuracy of LDA and logistic regression.
(c) Sometimes it helps to reduce the number of covariates.
One strategy is to compare Xi for the spam and email
group.. For each of the 57 covariates, test whether the
mean of the covriate is the same or dierent between the
two groups. Keep the 10 covariates with the smallest pvalues. Try LDA and logstic regression using only these 10
variables.
4. Let A be the set of two-dimensional spheres. That is, A 2
A if A = f(x; y) : (x a)2 + (y b)2 c2g for some a; b; c.
Find the VC dimension of A.
5. Classify the spam data using support vector machines.
Free software for the support vector machine is at
http://svmlight.joachims.org/
It is very easy to use. After installing, do the following.
First, you must recode the Yi's as +1 and -1. Then type
svm_learn data output
where \data" is the le with the data. It expects the data
in the following form:
1
1:3.4
2:.17
3:129
etc
where the rst number is Y (+1 or -1) and the rest of the
line means: X1 is 3.4, X2 is .17, etc.
If the le \test" contains test data, then type
23.13 Exercises
471
svm_classify test output predictions
to get the predictions on the test data. There are many
options available, as explained on the web site.
6. Use VC theory to get a condence interval on the true
error rate of the LDA classier in question 3. Suppose
that the covariate X = (X1 ; : : : ; Xd ) has d dimensions.
Assume that each variable has a continuous distribution.
7. Suppose that Xi 2 R and that Yi = 1 whenever jXi j 1 and Yi = 0 whenever jXi j > 1. Show that no linear
classier can perfectly classify these data. Show that the
kernelized data Zi = (Xi ; Xi2 ) can be linearly separated.
8. Repeat question 5 using the kernel K (x; xe) = (1 + xT xe)p .
Choose p by cross-validation.
9. Apply the k nearest neighbors classier to the \iris data."
Get the data in R from the command data(iris). Choose
k by cross-validation.
10. (Curse of Dimensionality.) Suppose that X has a uniform
distribution on the d-dimensional cube [ 1=2; 1=2]d . Let
R be the distance from the origin to the closest neighbor.
Show that the median of R is
0
@
where
1
1=d
1 1=n
2
A
1
vd (1)
d=2
vd (r) =
((d=2) + 1)
is the volume of a sphere of radius r. When does R exceed
the edge of the cube when n = 100; 1000; 10000?
rd
472
23. Classication
11. Fit a tree to the data in question 3. Now apply bagging
and report your results.
12. Fit a tree that uses only one split on one variable to the
data in question 3. Now apply boosting.
13. Let r(x) = P(Y = 1jX = x) and let rb(x) be an estimate
of r(x). Consider the classier
h(x) =
1 if rb(x) 1=2
0 otherwise:
Assume that rb(x) N (r (x); 2 (x)) for some functions
r(x) and 2 (x)). Show that
0
P(Y
6= h(X )) 1
B
@
sign (rb(x)
1
(1=2))(r(x)
(x)
(1=2))
where is the standard Normal cdf. Regard sign (rb(x)
(1=2))(r(x) (1=2)) as a type of bias term. Explain the
implications for the bias-variance trade-o in classication
(Friedman, 1997).
C
A
This is page 473
24
Stochastic Processes
24.1
Introduction
Most of this book has focused on iid sequences of random variables. Now we consider sequences of dependent random variables. For example, daily temperatures will form a sequence of
time-ordered random variables and clearly the temperature on
one day is not independent of the temperature on the previous
day.
A stochastic process fXt : t 2 T g is a collection of random variables. We shall sometimes write X (t) instead of Xt .
The variables Xt take values in some set X called the state
space. The set T is called the index set and for our purposes can be thought of as time. The index set can be discrete
T = f0; 1; 2; : : :g or continuous T = [0; 1) depending on the
application.
Example 24.1 (iid observations.) A sequence of iid random vari-
ables can be written as fXt : t 2 T g where T = f1; 2; 3; : : : ; g.
474
24. Stochastic Processes
Thus, a sequence of iid random variables is an example of a
stochastic process. Example 24.2 (The Weather.) Let
X = fsunny; cloudyg. A typi-
cal sequence (depending on where you live) might be
sunny; sunny; cloudy; sunny; cloudy; cloudy; This process has a discrete state space and a discrete index set.
Example 24.3 (Stock Prices.) Figure 24.1 shows the price of a
stock over time. The price is monitored continuously so the index
set T is not discrete. Price is discrete but, for practical purposes
we can imagine treating it as a continuous variable. Example 24.4 (Empirical Distribution Function.) Let X1 ; : : : ; Xn F where F is some cdf on [0,1]. Let
1X
I (Xi t)
Fbn (t) =
n i=1
n
be the empirical cdf . For any xed value t, Fbn (t) is a random
variable. But the whole empirical cdf
n
o
Fbn (t) : t 2 [0; 1]
is a stochastic process with a continuous state space and a continuous index set. We end this section by recalling a basic fact. If X1 ; : : : ; Xn
are random variables then we can write the joint density as
f (x1 ; : : : ; xn ) = f (x1 )f (x2 jx1 ) f (xn jx1 ; : : : ; xn 1 )
n
Y
=
f (xi jpasti )
(24.1)
i=1
where pasti refers to all the variables before Xi .
7
8
9
price
10
11
12
13
24.1 Introduction
0
2
4
6
8
10
time
FIGURE 24.1. Stock price over ten week period.
475
476
24.2
Markov Chains
The simplest stochastic process is a Markov chain in which the
distribution of Xt depends only on Xt 1 . In this section we assume that the state space is discrete, either X = f1; : : : ; N g
or X = f1; 2; : : : ; g and that the index set is T = f0; 1; 2; : : :g.
Typically, most authors write Xn instead of Xt when discussing
Markov chains and I will do so as well.
Denition 24.5 The process fXn : n
chain
if
P(Xn
2 T g is a Markov
= x j X0 ; : : : ; Xn 1 ) = P(Xn = x j Xn 1 ) (24.2)
for all n and for all x 2 X .
For a Markov chain, equation (24.1) simplifes to
f (x1 ; : : : ; xn ) = f (x1 )f (x2 jx1 )f (x3 jx2 ) f (xn jxn 1 ):
A Markov chain can be represented by the following DAG:
X1
X2
X3
Xn
Each variable has a single parent, namely, the previous observation.
The theory of Markov chains is a very rich and complex. We
have to get through many denitions before we can do anything
interesting. Our goal is to answer the following questions:
1. When does a Markov chain \settle down" into some sort
of equilibrium?
2. How do we estimate the parameters of a Markov chain?
24.2 Markov Chains
477
3. How can we construct Markov chains that converge to a
given equilibrium distribution and why would we want to
do that?
We will answer questions 1 and 2 in this chapter. We will
answer question 3 in the next chapter. To understand question
1, look at the two chains in Figure 24.2. The rst chain oscillates
all over the place and will continue to do so forever. The second
chain eventually settles into an equilibrium. If we constructed a
histogram of the rst process, it would keep changing as we got
more and more observations. But a histogram from the second
chain would eventually converge to some xed distribution.
The key quantities of a Markov
chain are the probabilities of jumping from one state into another state.
Transition Probabilities.
Denition 24.6 We call
P(Xn+1
= j jXn = i)
(24.3)
the transition probabilities. If the transition probabilities do not change with time, we say the chain is homogeneous. In this case we dene pij = P(Xn+1 = j jXn =
i). The matrix P whose (i; j ) element is pij is called the
transition matrix.
We will only consider homogeneous P
chains. Notice that P has
two properties: (i) pij 0 and (ii) i pij = 1. Each row is
a probability mass function. A matrix with these properties is
called a stochastic matrix.
Example 24.7 (Random Walk With Absorbing Barriers.) Let
f1; : : : ; N g.
X
=
Suppose you are standing at one of these points.
−50
−150
−100
x
0
478
0
200
400
600
800
1000
600
800
1000
4
2
0
−2
x
6
8
10
time
0
200
400
time
FIGURE 24.2. The rst chain does not settle down into
an equilibrium. The second does.
24.2 Markov Chains
479
Flip a coin with P(Heads) = p and P(T ails) = q = 1 p. If
it is heads, take one step to the right. If it is tails, take one
step to the left. If you hit one of the endpoints, stay there. The
transition matrix is
2
6
6
6
P=6
6
6
4
3
0 0
q
p
0 0 77
0 q 0 p 0 0 7
7
77 : 1
0
0
0
0
0
0
0
0
0
0
0
0
0
q
0
p 5
0
0
Example 24.8 Suppose the state space is
1
X
= fsunny; cloudyg.
Then X1 ; X2 ; : : : represents the weather for a sequence of days.
The weather today clearly depends on yesterday's weather. It
might also depend on the weather two days ago but as a rst
approximation we might assume that the dependence is only one
day back. In that case the weather is a Markov chain and a
typical transition matrix might be
Sunny Cloudy
Sunny 0.4
0.6
Cloudy 0.8
0.2
For example, if it is sunny today, there is a 60 per cent chance
it will be cloudy tommorrow. Let
pij (n) = P(Xm+n = j jXm = i)
(24.4)
be the probability of of going from state i to state j in n steps.
Let Pn be the matrix whose (i; j ) element is pij (n). These are
called the n-step transition probabilities.
Theorem
24.9 (The Chapman-Kolmogorov equations.)
n-step probabilities satisfy
pij (m + n) =
X
k
pik (m)pkj (n):
The
(24.5)
480
Proof.
Recall that, in general,
P(X
= x; Y = y ) = P(X = x)P(Y = y jX = x):
This fact is true in the more general form
P(X
= x; Y = y jZ = z ) = P(X = xjZ = z )P(Y = y jX = x; Z = z ):
Also, recall the law of total probability:
X
P(X = x) =
P(X = x; Y = y ):
y
Using these facts and the Markov property we have
pij (m + n) = P(Xm+n = j jX0 = i)
X
=
P(Xm+n = j; Xm = k jX0 = i)
=
=
=
k
X
k
X
k
X
k
P(Xm+n
= j jXm = k; X0 = i)P(Xm = kjX0 = i)
P(Xm+n
= j jXm = k)P(Xm = kjX0 = i)
pik (m)pkj (n):
Look closely at equation (24.5). This is nothing more than the
equation for matrix multiplication. Hence we have shown that
Pm+n
= Pm Pn :
(24.6)
By denition, P1 = P. Using the above theorem, P2 = P1+1 =
2
P1 P1 = PP = P . Continuing this way, we see that
Pn
= Pn |
P
P {z P}
multiply the matrix n times
:
(24.7)
24.2 Markov Chains
481
Let n = (n(1); : : : ; n(N )) be a row vector where
n (i) = P(Xn = i)
(24.8)
is the marginal probability that the chain is in state i at time
n. In particular, 0 is called the initial distribution. To simulate a Markov chain, all you need to know is 0 and P. The
simulation would look like this:
Draw X0 0 . Thus, P(X0 = i) = 0 (i).
Step 2: Suppose the outcome of step 1 is i. Draw X1 P. In
other words, P(X1 = j jX0 = i) = pij .
Step 3: Suppose the outcome of step 2 is j . Draw X2 P. In
other words, P(X2 = kjX1 = j ) = pjk .
And so on.
Step 1:
It might be diÆcult to understand the meaning of n. Imaging simulating the chain many times. Collect all the outcomes at
time n from all the chains. This histogram would look approximately like n . A consequence of theorem 24.9 is the following.
Lemma 24.10 The marginal probabilities are given by
n = 0 Pn :
Proof.
n (j ) = P(Xn = j )
X
=
P(Xn = j jX0 = i)P (X0 = i)
i
X
=
0 (i)pij (n) = 0 Pn : i
482
Summary
1. Transition matrix: P(i; j ) = P(Xn+1 = j jXn = i).
2. n-step matrix: Pn(i; j ) = P(Xn+m = j jXm = i).
3.
Pn
= Pn .
4. Marginal: n (i) = P(Xn = i).
5. n = 0 Pn .
The states of a Markov chain can be classied according to various properties.
States.
Denition 24.11 We say that i
j (or j is accessible from i) if pij (n) > 0 for some n, and we write
i ! j . If i ! j and j ! i then we write i $ j and we
say that i and j communicate.
reaches
The communication relation satises the following properties:
Theorem
24.12
1. i $ i.
2. If i $ j then j $ i.
3. If i $ j and j
4.
$ k then i $ k.
The set of states X can be written as a disjoint union
S S
of classes X = X X where two states i and j
1
2
communicate with each other if and only if they are in the
same class.
24.2 Markov Chains
483
If all states communicate with each other then the chain is
called irreducible. A set of states is closed if, once you enter
that set of states you never leave. A closed set consisting of a
single state is called an absorbing state.
Example 24.13 Let
X = f1; 2; 3; 4g and
0
B
B
P=B
B
@
1
0 0
C
0 0C
C
1
1 C
4
4 A
0 0 0 1
1
3
2
3
1
4
2
3
1
3
1
4
The classes are f1; 2g; f3g and f4g. State 4 is an absorbing state.
Suppose we start a chain in state i. Will the chain ever return
to state i? If so, that state is called persistent or recurrent.
Denition 24.14 State i is
P(Xn
24.15
or
persistent
if
= i for some n 1 j X0 = i) = 1:
Otherwise, state i is
Theorem
recurrent
transient
.
A state i is recurrent if and only if
X
n
pii (n) = 1:
(24.9)
A state i is transient if and only if
X
n
Proof.
Dene
In =
pii (n) < 1:
1 if Xn = i
0 if Xn 6= i:
(24.10)
484
P
The number of times that the chain is in state i is Y = 1
n=0 In .
The mean of Y , given that the chain starts in state i, is
1
X
E (Y jX0 = i) =
E (In jX0 = i)
n=0
=
=
1
X
n=0
1
X
n=0
= ijX0 = i)
P(Xn
pii (n):
Dene ai = P(Xn = i for some n 1 j X0 = i). If i is recurrent,
ai = 1. Thus, the chain will eventually return to i. Once it
does return to i, we argue again that since ai = 1, the chain will
return to state i again. By repeating this argument, we conclude
that E (Y jX0 = i) = 1. If i is transient, then ai < 1. When the
chain is in state i, there is a probability 1 ai > 0 that it will
never return to state i. Thus, the probability that the chain is
in state i exactly n times is ani 1 (1 ai ). This is a geometric
distribution which has nite mean. Theorem
24.16
Facts about recurrence.
1. If state i is recurrent and i $ j then j is recurrent.
2. If state i is transient and i $ j then j is transient.
3. A nite Markov chain must have at least one recurrent
state.
4. The states of a nite, irreducible Markov chain are all
recurrent.
Theorem
X
24.17 (Decomposition Theorem.)
can be written as the disjoint union
X = XT
[
X
[
1
X 2
The state space
24.2 Markov Chains
where XT are the transient states and each
ducible set of recurrent states.
485
Xi is a closed, irre-
To discuss the convregence of chains, we need a few more denitions. Suppose that
X0 = i. Dene the recurrence time
Convergence of Markov Chains.
Tij = minfn > 0 : Xn = j g
(24.11)
assuming Xn ever returns to state i, otherwise dene Tij = 1.
The mean recurrence time of a recurrent state i is
X
mi = E (Tii ) =
nfii (n)
(24.12)
n
where
fij (n) = P (X1 6= j; X2 6= j; : : : ; Xn
1
6= j; Xn = j jX
0
= i):
A recurrent state is null if mi = 1 otherwise it is called nonnull or positive.
Lemma 24.18 If a state is null and recurrent, then pnii
! 0.
Lemma 24.19 In a nite state Markov chain, all recurrent states
are positive.
Consider a three state chain with transition matrix
2
3
0 1 0
4 0 0 1 5:
1 0 0
Suppose we start the chain in state 1. Then we will be in state
3 at times 3, 6, 9, . . . . This is an example of a periodic chain.
Formally, the period of state i is d if pii (n) = 0 whenever n is
not divisible by d and d is the largest integer with this property.
Thus, d = gcdfn : pii (n) > 0g where gcd means \greater
common divisor." State i is periodic if d(i) > 1 and aperiodic
if d(i) = 1. A state with period 1 is called aperiodic.
486
Lemma 24.20 If state i has period d and i $ j then j has period
d.
Denition 24.21 A state is ergodic if it is recurrent, non-
null and aperiodic. A chain is ergodic if all its states are
ergodic.
Let = (i : i 2 X ) be a vector of non-negative numbers
that sum to one. Thus can be thought of as a probability mass
function.
Denition 24.22 We say that is a
variant
) distribution if = P.
stationary
(or
in-
Here is the intuition. Draw X0 from distribution and suppose that is a stationary distribution. Now draw X1 according to the transition probability of the chain. The distribition
of X1 is then 1 = 0 P = P = . The distribution of X2 is
P2 = ( P)P = P = . Continuing this way, we see that the
distribution of Xn is Pn = . In other words:
If at any time the chain has distribution then it will
continue to have distribution forever.
24.2 Markov Chains
Denition 24.23 We say that a chain has
tribution
if
2
6
6
n
P ! 6
4
..
.
487
limiting dis-
3
7
7
7
5
for some . In other words, j = limn!1 Pnij exists and
is independent of i.
Here is the main theorem.
An irreducible, ergodic Markov chain has
a unique stationary distribution . The limiting distrubution exists and is equal to . If g is any bounded function,
then, with probability 1,
Theorem
lim
N !1
24.24
N
1X
N
n=1
g (Xn ) ! E (g ) X
j
g (j )j :
(24.13)
The last statement of the theorem, is the law of large numbers
for Markov chains. It says that sample averages converge to their
expectations. Finally, there is a special condition which will be
useful later. We say that satises detailed balance if
i pij = pjij :
(24.14)
Detailed balance guarantees that is a stationary distribution.
If satises detailed balance then is a stationary distribution.
Theorem
24.25
th
P We need
P to show that
P P = . The j element of
P is i i pij = i j pji = j i pji = j : Proof.
488
The importance of detailed balance will become clear when
we discuss Markov chain Monte Carlo methods.
Just because a chain has a stationary distribution
does not mean it converges.
Warning!
Example 24.26 Let
2
3
0 1 0
P = 4 0 0 1 5:
1 0 0
Let = (1=3; 1=3; 1=3). Then P = so is a stationary
distribution. If the chain is started with the distribution it will
stay in that distribution. Imagine simulating many chains and
checking the marginal distribution at each time n. It will always
be the uniform distribution . But this chain does not have a
limit. It continues to cycle around forever. Examples of Markov Chains
Example 24.27 (Random Walk) . Let X = f: : : ; 2; 1; 0; 1; 2; : : : ; g
and suppose that pi;i+1 = p, pi;i 1 = q = 1 p. All states communicate hence either all the states are recurrent or all are transient. To see which, suppose we start at X0 = 0. Note that
2n n n
p00 (2n) =
p q
n
(24.15)
since, the only way to get back to 0 is to have n heads (steps
to the right) and n tails (steps to the left). We can approximate
this expression using Stirling's formula which says that
p
n! nn ne
n
p
2:
24.2 Markov Chains
489
Inserting this approximation into (24.15) shows that
p00 (2n) P
(4pq )n
pn
:
P
It is easy to check that n p00 (n) < 1 if and only if n p00 (2n) <
1. Moreover, Pn p00 (2n) = 1 if and only if p = q = 1=2. By
Theorem (24.15), the chain is recurrent if p = 1=2 otherwise it
is transient. Example 24.28 Let
X = f1; 2; 3; 4; 5; 6g. Let
2 1 1
0 0
6 21 32
6 4 4 0 0
6 1 1 1 1
6
6 4 4 4 4
P= 6
6 41 0 41 14
6
60 0 0 0
4
0 0 0 0
3
0 0
7
0 07
7
0 07
7
7
1
0 4 7
7
1 7
1
2
2 5
1
2
1
2
Then C1 = f1; 2g and C2 = f5; 6g are irreducible closed sets.
States 3 and 4 are transient because the of the path 3 ! 4 ! 6
and once you hit state 6 you cannot return to 3 or 4. Since
pii (1) > 0, all the states are aperiodic. In summary, 3 and 4 are
transient while 1,2,5 and 6 are ergodic. Example 24.29 (Hardy-Weinberg.) Here is a famous example from
genetics. Suppose a gene can be type A or type a. There are three
types of people (called genotypes): AA, Aa and aa. Let (p; q; r)
denote the fraction of people of each genotype. We assume that
everyone contributes one of their two copies of the gene at random to their children. We also assume that mates are selected
at random. The latter is not realistic however, it is often reasonable to assume that you do not choose your mate based on
whether they are AA, Aa or aa. (This would be false if the gene
was for eye color and if people chose mates based on eye color.)
Imagine if we pooled eveyone's genes together. The proportion
490
of A genes is P = p + (q=2) and the proportion of a genes is
Q = r + (q=2). A child is AA with probability P 2, aA with probability 2P Q and aa with probability Q2 . Thus, the fraction of A
genes in this generation is
q 2 q
q
P2 + PQ = p +
+ p+
r+ :
2
2
2
However, r = 1 p q . Substitute this in the above equation
and you get P 2 + P Q = P . A similar calculation shows that
fraction of a genes is Q. We have shown that the proportion of
type A and type a is P and Q and this remains stable after the
rst generation. The proportion of people of type AA, Aa, aa is
thus (P 2 ; 2P Q; Q2) from the second generation and on. This is
called the Hardy-Weinberg law.
Assume everyone has exactly one child. Now consider a xed
person and let Xn be the genotype of their nth descendent. This
is a Markov chain with state space X = fAA; Aa; aag. Some
basic calculations will show you that the transition matrix is
2
4
P Q
P
2
P +Q
0 P
2
0
Q
2
Q
3
5:
The stationary distribution is = (P 2; 2P Q; Q2 ).
Consider a chain with nite
state space X = f1; 2; : : : ; N g. Suppose we observe n observations X1 ; : : : ; Xn from this chain. The unknown parameters of a
Markov chain are the initial probabilities 0 = (0 (1); 0(2); : : : ; )
and the elements of the transition matrix P. Each row of P is
a multinomial distribution. So we are essentially estimating N
distributions (plus the initial probabilities). Let nij be the observed number of transitions from state i to state j . The likelihood function is
n
N Y
N
Y
Y
L(0; P) = 0(x0 ) pX 1;X = 0(x0 )
pnij :
Inference for Markov Chains.
ij
r=1
r
r
i=1 j =1
24.3 Poisson Processes
491
There is only one observation on 0 so we can't estimate that.
Rather, we focus on estimating P. The mle is obtained by maximizing L(0 ; P) subject to the constraint that the elements are
non-negative and the rows sum to 1. The solution is
pbij =
nij
ni
P
where ni = Nj=1 nij . Here we are assuming that ni > 0. If note,
then we set pbij = 0 by convention.
Theorem
24.30 (Consistency and Asymptotic Normality of the mle .)
Assume that the chain is ergodic. Let pbij (n) denote the mle after
n observations. Then pbij (n) P! pij . Also,
hp
Ni (n)(pbij
i
pij )
N (0; )
P
where the left hand side is a matrix, Ni (n) = nr=1 I (Xr = i)
and
8
< pij (1 pij ) (i; j ) = (k; `)
ij;k` =
p p
i = k; j 6= `
: 0 ij i`
otherwise:
24.3
Poisson Processes
One of the most studied and useful stochastic processes is the
Poisson process. It arises when we count occurences of events
over time, for example, traÆc accidents, radioactive decay, arrival of email messages etc. As the name suggests, the Poisson
process is intimately related to the Poisson distribution. Let's
rst review the Poisson distribution.
Recall that X has a Poisson distribution with parameter {
written X Poisson() { if
P (X
= x) p(x; ) =
e x
; x = 0; 1; 2; : : :
x!
492
Also recall that E (X ) = and V (X ) = . If X Poisson(),
Y Poisson( ) and X q Y , then X + Y Poisson( + ).
Finally, if N Poisson() and Y jN = n Binomial(n; p), then
the marginal distribution of Y is Y Poisson(p).
Now we describe the Poisson process. Imaging you are at your
computer. Each time a new email message arrives you record the
time. Let Xt be the number of messages you have received up
to and including time t. Then, fXt : t 2 [0; 1)g is a stochastic
process with state space X = f0; 1; 2; : : :g. A process of this form
is called a counting process. A Poisson process is a counting
process that satises certain conditions. In what follows, we will
sometimes write X (t) instead of Xt . Also, we need the following
notation. Write f (h) = o(h) if f (h)=h ! 0 as h ! 0. This
means that f (h) is smaller than h when h is close to 0. For
example, h2 = o(h).
Denition 24.31 A
493
is a stochastic process fXt : t 2 [0; 1)g with state space X = f0; 1; 2; : : :g
such that
Poisson process
1. X (0) = 0.
2. For any 0 = t0 < t1 < t2 < < tn , the increments
X (t1 )
X (t0 ); X (t2 )
X (t1 );
; X (tn)
X (tn 1 )
are independent.
3. There is a function (t) such that
P(X (t + h)
P(X (t + h)
We call (t) the
X (t) = 1) = (t)h + o(h) (24.16)
X (t) 2) = o(h):
(24.17)
intensity function
.
The last condition means that the probability of an event in
[t; t + h] is approximately h(t) while the probability of more
than one event is small.
Theorem
24.32
tion (t), then
If Xt is a Poisson process with intensity func-
X (s + t)
X (s) Poisson(m(s + t)
where
m(t) =
Z
t
0
m(s))
(s) ds:
In particular, X (t) Poisson(m(t)). Hence, E (X (t)) = m(t)
and V (X (t)) = m(t).
494
Denition 24.33 A Poisson process with intensity func-
tion (t) for some > 0 is called a homogeneous
Poisson process with rate . In this case,
X (t) Poisson(t):
Let X (t) be a homogeneous Poisson process with rate . Let
Wn be the time at which the nth event occurs and set W0 = 0.
The random variables W0 ; W1 ; : : : ; are called waiting times.
Let Sn = Wn+1 Wn . Then S0 ; S1 ; : : : ; are called sojourn times
or interarraival times.
The sojourn times S0 ; S1 ; : : : are iid random
variables. Their distribution is exponential with mean 1=, that
is, they have density
Theorem
24.34
f (s) = e
s ;
s 0:
The waiting time Wn Gamma(n; 1=) i.e. it has density
f (w) =
1 n n 1
w e
(n)
t
:
Hence, E (Wn ) = n= and V (Wn ) = n=2 .
Proof.
First, we have
P(S1 > t) = P(X (t) = 0) = e t
with shows that the cdf for S1 is 1 e
for S1 . Now,
j
P(S2 > t S1
t .
This shows the result
= s) = P(no events in (s; s + t]jS1 = s)
= P(no events in (s; s + t]) (increments are independent)
= e t :
495
Hence, S2 has an exponential distribution and is independent
of S1 . The result follows by repeating the argument. The result for Wn follows since a sum of exponentials has a Gamma
distribution. Example 24.35 Figures 24.3 and 24.4 show requests to a WWW
server in Calgary.1 Assuming that this is a homogeneous Poisson process, N X (T ) Poisson(T ). The likelihood is
L() / e
T
(T )N
which is maxmimized at
N
b = = 48:0077
T
in units per minute.
In the preceding example we might want to check if the assumption that the data follow a Poisson process is reasonable.
To proceed, let us rst note that if a Poisson process is observed
on [0; T ] then, given N = X (T ), the event times W1 ; : : : ; WN
are a sample from a Uniform(0; T ) distribution. To see intutively
why this is true, note that the probability of nding an event in
a small interval [t; t + h] is proportional to h. But this doesn't
depend on t. Since the probability is constant over t it must be
uniform.
Figure 24.5 shows a histogram and a some density estimates.
The density does not appear to be uniform. To test this formally,
let is divide into 4 equal length intervals I1 ; I2 ; I3 ; I4 . Let pi be
the probability of a point being in Ii . The null hypothesis is that
p1 = p2 = p3 = p4 . We can test this hypothesis using either a
likelihood ratio test or a 2 test. The latter is
4
X
(Oi Ei )2
i=1
Ei
1 See http://ita.ee.lbl.gov/html/contrib/Calgary-HTTP.html for more information.
0.6
0.8
1.0
1.2
1.4
496
0
2
4
6
0.6
0.8
1.0
1.2
1.4
time
0
200
400
600
800
time
FIGURE 24.3. Hits on a web server. First plot is rst
20 events. Second plot is several hours worth of data.
1000
1200
497
10
5
X(t)
15
20
24/09
24/09
24/09
24/09
24/09
24/09
600
200
0
X(t)
1000
time (seconds)
24/09
24/09
25/09
time (seconds)
FIGURE 24.4. X (t) for hits on a web server. First plot
is rst 20 events. Second plot is several hours worth of
data.
25/09
25/09
498
where Oi is the number of observations in Ii and Ei = n=4 is the
expected number under the null. This yields 2 = 252 which a
p-value of 0. Strong evidence againt the null.
24.4
Hidden Markov Models
24.5
This is standard material and there are many good references.
In this Chapter, I followed the following references quite closely:
Probability and Random Processes by Grimmett and Stirzaker
(1982), An Introduction to Stochastic Modeling by Taylor and
Karlin (1998), Stochastic Modeling of Scientic Data by Guttorp
(1995) and Probability Models for Computer Science by Ross
(2002). Some of the exercises are from these texts.
24.6
Exercises
1. Let X0 ; X1 ; : : : be a Markov chain with states f0; 1; 2g and
transition matrix
2
3
0:1 0:2 0:7
P = 4 0:9 0:1 0:0 5
0:1 0:8 0:1
Assume that 0 = (0:3; 0:4; 0:3). Find P(X0 = 0; X1 =
1; X2 = 2) and P(X0 = 0; X1 = 1; X2 = 1).
2. Let Y1 ; Y2; : : : be a sequence of iid observations such that
P(Y = 0) = 0:1, P(Y = 1) = 0:3, P(Y = 2) = 0:2,
P(Y = 3) = 0:4. Let X0 = 0 and let
Xn = maxfY1; : : : ; Yng:
Show that X0 ; X1 ; : : : is a Markov chain and nd the transition matrix.
24.6 Exercises
density(x = w/60)
0.0015
0.0005
0.0010
Density
150
100
0
0.0000
50
Frequency
200
Histogram of w/60
0
200
400
600
800
1000
1200
0
w/60
0.000
0.001
0.002
0.003
0.004
density(x = w/60, bw = "ucv")
Density
499
0
200
400
600
800
1000
1200
FIGURE 24.5. Density estimate of event times.
500
1000
1500
500
3. Consider a two state Markov chain with states X = f1; 2g
and transition matrix
1
a a
P=
b
1 b
where 0 < a < 1 and 0 < b < 1. Prove that
b
a n
a
+b
a
+b
lim P = b
:
a
n!1
a+b
a+b
4. Consider the chain from question 3 and set a = :1 and
b = :3. Simulate the chain. Let
n
1X
pbn (1) =
I (Xi = 1)
n
i=1
n
1X
I (Xi = 2)
pbn (2) =
n i=1
be the proportion of times the chain is in state 1 and
state 2. Plot pbn (1) and pbn (2) versus n and verify that they
converge to the values predicted from the answer in the
previous question.
5. An important Markov chain is the branching process
which is used in biology, genetics, nuclear physics and
many other elds. Suppose that an animal has Y children.
P1 Let pk = P (Y = k). Hence, pk 0 for all k and
k=0 pk = 1. Assume each animal has the same lifespan
and that they produce ospring according to the distribution pk . Let Xn be the number of animals in the nth
generation. Let Y1(n) ; : : : ; YX(n) be the ospring produced
in the nth generation. Note that
n
Xn+1 = Y1(n) + + YX(n) :
n
Let = E (Y ) and = V (Y ). Assume throughout this
question that X0 = 1. Let M (n) = E (Xn ) and V (n) =
V (Xn ).
2
24.6 Exercises
501
(a) Show that M (n+1) = M (n) and V (n+1) = 2 M (n)+
2 V (n).
(b) Show that M (n) = n and that V (n) = 2 n 1 (1 +
+ + n 1 ).
(c) What happens to the variance if > 1? What happens
to the variance if = 1? What happens to the variance if
< 1?
(d) The population goes extinct if Xn = 0 for some n. Let
us thus dene the extinction time N by
N = minfn : Xn = 0g:
Let F (n) = P (N
N . Show that
F (n) =
n) be the cdf of the random variable
1
X
k=0
pk (F (n
1))k ; n = 1; 2; : : :
Hint: Note that the event fN ng is the same as event
fXn = 0g. Thus, P(fN ng) = P(fXn = 0g). Let k be
the number of ospring of the original parent. The population becomes extinct at time n if and only if each of
the k sub-populations generated from the k ospring goes
extinct in n 1 generations.
(e) Suppose that p0 = 1=4, p1 = 1=2, p2 = 1=4. Use the
formula from (5d) to compute the cdf F (n).
502
6. Let
2
3
0:40 0:50 0:10
P = 4 0:05 0:70 0:25 5
0:05 0:50 0:45
Find the stationary distribution .
7. Show that if i is a recurrent state and i $ j , then j is a
recurrent state.
8. Let
2 1
3
0 13 0 0 13
3
6 21 14 14 0 0 0 7
7
6
60 0 0 0 1 0 7
6
7
P=
6 41 14 14 0 0 14 7
7
6
40 0 1 0 0 0 5
0 0 0 0 0 1
Which states are transient? Which states are recurrent?
9. Let
0
1
P =
1 0
Show that = (1=2; 1=2) is a stationary distribution. Does
this chain converge? Why/why not?
10. Let 0 < p < 1 and q = 1
2
q
6
6q
P =6 q
6
4q
1
p. Let
3
p 0 0 0
0 p 0 07
7
0 0 p 07
7
0 0 0 p5
0 0 0 0
Find the limiting distribution of the chain.
11. Let X (t) be an inhomogeneous PoissonR process with intensity function (t) > 0. Let (t) = 0t (u)du. Dene
Y (s) = X (t) where s = (t). Show that Y (s) is a homogeneous Poisson process with intensity = 1.
24.6 Exercises
503
12. Let X (t) be a Poisson process with intensity . Find the
conditional distribution of X (t) given that X (t + s) = n.
13. Let X (t) be a Poisson process with intensity . Find the
probability that X (t) is odd, i.e. P(X (t) = 1; 3; 5; : : :).
14. Suppose that people logging in to the University computer
system is described by a Poisson process X (t) with intensity . Assume that a person stays logged in for some
random time with cdf G. Assume these times are all independent. Let Y (t) be the number of people on the system
at time t. Find the distribution of Y (t).
15. Let X (t) be a Poisson process with intensity . Let W1 ; W2 ; : : : ;
be the waiting times. Let f be an arbitrary function. Show
that
0
1
Zt
X
(t)
X
E@
f (Wi )A = f (w)dw:
i=1
0
16. A two dimensional Poisson point process is a process of
random points on the plane such that (i) for any set A,
the number of points falling in A is Poisson with mean
(A) where (A) is the area of A, (ii) the number of
events in nonoverlapping regions is independent. Consider
an arbitrary point x0 in the plane. Let X denote the distance from x0 to the nearest random point. Show that
2
P(X > t) = e t
and
E (X ) =
p1
2 :
This is page 505
25
Simulation Methods
In this we will see that by generating data in a clever way, we
can solve a number of problems such as integrating or maximizing a complicated function.1 For integration, we will study
three methods: (i) basic Monte Carlo integration, (ii) importance sampling and (iii) Markov chain Monte Carlo (MCMC).
25.1 Bayesian Inference Revisited
Simulation methods are especially useful in Bayesian inference
so let us briey review the main ideas in Bayesian inference.
Given a prior f () and data X n = (X1 ; : : : ; Xn ) the posterior
density is
f (jX n) = R
1 My main source for this chapter is
and G. Casella.
L()f ()
L(u)f (u)du
Monte Carlo Statistical Methods by C. Robert
506
25. Simulation Methods
where L() is the likelihood function. The posterior mean is
=
Z
f (j
X n)d
R
=
R
L()f ()d
L()f ()d :
If = (1 ; : : : ; k ) is multidimensional, then we might be interested in the posterior for one of the components, 1 , say. This
marginal posterior density is
f (1 j
X n)
=
Z Z
Z
f ( ; : : : ; k jX n)d dk
1
2
which involves high dimensional integration.
You can see that integrals play a big role in Bayesian inference. When is high dimensional, it may not be feasible to
calculate these integrals analytically. Simulation methods will
often be very helpful.
25.2 Basic Monte Carlo Integration
R
Suppose you want to evaluate the integral I = ab h(x)dx for
some function h. If h is an \easy" function like a polynomial
or triginometric function then we can do the integral in closed
form. In practice, h can be very complicated and there may be no
known closed form expression for I . There are many numerical
techniques for evaluating I such as Simpson's rule, the trapezoidal rule, Guassian quadrature and so on. In some cases these
techniques work very well. But other times they may not work
so well. In particular, it is hard to extend them to higher dimensions. Monte Carlo integration is another approach to evaluating
I which is notable for its simplicity, generality and scalability.
Let us begin by writing
I=
Z b
a
h(x)dx =
Z b
a
h(x)(b a)
1
b a
dx =
Z b
a
w(x)f (x)dx
507
where w(x) = h(x)(b a) and f (x) = 1=(b a). Notice that f
is the density for a uniform random variable over (a; b). Hence,
I = Ef (w(X ))
where X Unif(a; b).
Suppose we generate X1 ; : : : ; XN
large. By the law of large numbers
Ib 1
N
N
X
i=1
Unif(a; b) where N is
p
w(Xi ) !
E (w(X )) = I:
This is the basic Monte Carlo integration method. We can
also compute the standard error of the estimate
s
se
b = p
N
where
P
Ib)2
N 1
where Yi = w(Xi). A 1 condence interval for I is Ib z=2 se
b.
We can take N as large as we want and hence make the length
of the condence inteval very small.
s =
2
i (Yi
Let's try this on an example
where we know the
R1 3
3
true answer. Let h(x) = x . Hence, I = 0 x dx = 1=4. Based
on N = 10; 000 I got Ib = :2481101 with a standard error of
.0028. Not bad at all.
Example 25.1
A simple generalization of the basic method is to consider
integrals of the form
I=
Z
h(x)f (x)dx
where f (x) is a probability density function. Taking f to be a
Unif (a,b) gives us the special case above. The only dierence is
508
that now we draw X1 ; : : : ; XN
Ib
1
N
f and take
N
X
i=1
h(Xi )
as before.
Example 25.2
Let
1
2
f (x) = p e x =2
2
be the standard Normal pdf. Suppose we want to compute the cdf
at some point x:
I=
Z x
1
f (s)ds = (x):
Of course, we could look this up in a table or use the pnorm
command in R. But suppose you don't have access to either.
Write
Z
I = h(s)f (s)ds
where
1 s<x
0 s x:
Now we generate X1 ; : : : ; XN N (0; 1) and set
1X
number of observations
Ib =
h(Xi ) =
N i
N
h(s) =
x:
For example, with x = 2, the true answer is (2) = :9772 and
the Monte Carlo estimate with N = 10; 000 yields .9751. Using
N = 100; 000 I get .9771.
Let X Binomial(n; p1 ) and Y Binomial(m; p2 ). We would like to estimate Æ = p2 p1 . The mle
is Æb = pb2 pb1 = (Y=m) (X=n). We can get the standard error
se
b using the delta method (remember?) which yields
Example 25.3 (Two Binomials)
r
se
b =
pb1 (1 pb1 ) pb2 (1 pb2 )
+
n
m
509
and then construct a 95 per cent condence interval Æb 2 se
b.
Now consider a Bayesian analysis. Suppose we use the prior
f (p1 ; p2 ) = f (p1 )f (p2 ) = 1, that is, a at prior on (p1 ; p2 ). The
posterior is
f (p1 ; p2 jX; Y ) / pX1 (1 p1 )n
X pY (1
2
p2 )m Y :
The posterior mean of Æ is
Æ=
Z
1
0
Z
0
1
Æ (p1 ; p2 )f (p1 ; p2 jX; Y ) =
Z
1
Z
0
0
1
(p2 p1 )f (p1 ; p2 jX; Y ):
If we want the posterior density of Æ we can rst get the posterior
cdf
Z
F (cjX; Y ) = P (Æ cjX; Y ) = f (p1 ; p2 jX; Y )
A
where A = f(p1 ; p2 ) : p2 p1 cg. The density can then be
obtained by dierentiating F .
To avoid all these integrals, let's use simulation. Note that
f (p1 ; p2 jX; Y ) = f (p1 jX )f (p2 jY ) which implies that p1 and p2
are independent under the posterior distribution. Also, we see
that p1 jX Beta(X +1; n X +1) and p2 jY Beta(Y +1; m
Y + 1). Hence, we can simulate (P1(1) ; P2(1) ); : : : ; (P1(N ) ; P2(N ) )
from the posterior by drawing
P1(i)
P2(i)
Beta(X + 1; n X + 1)
Beta(Y + 1; m Y + 1)
for i = 1; : : : ; N . Now let Æ (i) = P2(i)
Æ
1
N
X
i
P1(i) . Then,
Æ (i) :
We can also get a 95 per cent posterior interval for Æ by sorting
the simulated values, and nding the :025 and :975 quantile. The
posterior density f (Æ jX; Y ) can be obtained by applying density
510
estimation techniques to Æ (1) ; : : : ; Æ (N ) or, simply by plotting a
histogram.
For example, suppose that n = m = 10, X = 8 and Y = 6.
The R code is:
x <- 8; y <- 6
n <- 10; m <- 10
B <- 10000
p1 <- rbeta(B,x+1,n-x+1)
p2 <- rbeta(B,y+1,m-y+1)
delta <- p2 - p1
print(mean(delta))
left <- quantile(delta,.025)
right <- quantile(delta,.975)
print(c(left,right))
I found that Æ and a 95 per cent posterior interval of (-0.52,0.20).
The posterior density can be estimated from a histogram of the
simulated values as shown in Figure 25.1.
Suppose we conduct an experiment
by giving rats one of ten possible doses of a drug, denoted by
x1 < x2 < : : : < x10 . For each dose level xi we use n rats and
we observe Yi , the number that survive. Thus we have ten independent binomials Yi Binomial(n; pi ). Suppose we know from
biological considerations that higher doses should have higher
probability of death. Thus, p1 p2 p10 . Suppose we
want to estimate the dose at which the animals have a 50 per
cent chance of dying. This is called the LD50. Formally, Æ = xj
where
j = minfi : pi :50g:
Example 25.4 (Dose Response)
Notice that Æ is implicitly a (complicated) function of p1 ; : : : ; p10
so we can write Æ = g (p1 ; : : : ; p10 ) for some g . This just means
that if we know (p1 ; : : : ; p10 ) then we can nd Æ . The posterior
511
1.0
0.5
0.0
f(delta | data)
1.5
2.0
−0.8
−0.6
−0.4
−0.2
0.0
delta = p2 − p1
FIGURE 25.1. Posterior of Æ from simulation.
0.2
0.4
0.6
512
mean of Æ is
Z Z
Z
g(p ; : : : ; p )f (p ; : : : ; p jY ; : : : ; Y )dp dp : : : dp :
A
1
10
1
10
1
10
1
2
10
The integral is over the region
A = f(p1 ; : : : ; p10 ) : p1 p10 g:
Similarly, the posterior cdf of Æ is
F (cjY1; : : : ; Y10 ) = P
(Æ cjYZ1 ; : : : ; Y10 )
Z Z
f (p ; : : : ; p jY ; : : : ; Y )dp dp : : : dp
=
B
1
10
1
10
1
2
where
B=A
\
f(p ; : : : ; p
1
10
) : g (p1; : : : ; p10 ) cg:
We would need to do a 10 dimensional integral over a resticted
region A. Instead, let's use simulation.
Let us take a at prior truncated over A. Except for the truncation, each Pi has once again a Beta distribution. To draw from
the posterior we do the following steps:
(1) Draw Pi Beta(Yi + 1; n Yi + 1); i = 1; : : : ; 10.
(2) If P1 P2 P10 keep this draw. Otherwise, throw
it away and draw again until you get one you can keep.
(3) Let Æ = xj where
j = minfi : Pi > :50g:
We repeat this N times to get Æ (1) ; : : : ; Æ (N ) and take
E (Æ jY1; : : : ; Y10 ) 1
N
X
i
Æ (i) :
10
513
Æ is a discrete variable. We can estimate its probability mass
function by
P (Æ = xj jY1 ; : : : ; Y10 ) 1
N
N
X
i=1
I (Æ (i) = j ):
For example, suppose that x = (1; 2; : : : ; 10). Figure 25.2 shows
some data with n = 15 animals at each dose. The R code is a
bit tricky.
## assume x and y are given
n <- rep(15,10)
B <- 10000
p <- matrix(0,B,10)
check <- rep(0,B)
for(i in 1:10){
p[,i] <- rbeta(B, y[i]+1, n[i]-y[i]+1)
}
for(i in 1:B){
check[i] <- min(diff(p[i,]))
}
good <- (1:B)[check >= 0]
p <- p[good,]
B <- nrow(p)
delta <- rep(0,B)
for(i in 1:B){
temp <- cumsum(p[i,])
j <- (1:10)[temp > .5]
j <- min(j)
delta[i] <- x[j]
}
print(mean(delta))
left <- quantile(delta,.025)
right <- quantile(delta,.975)
print(c(left,right))
1.0
0.8
0.8
t(p)
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
proportion who died
1.0
514
2
4
6
8
10
0.4
0.2
0.0
f(delta | data)
0.6
0.8
dose
3
4
5
LD50
FIGURE 25.2. Dose repoonse data.
wpq6
m
N
3zc
N
N
Nuonhgeba5d0982174yxvskrtflji
N
6
N
N
i
N
N
pNg983eyvr
w
m
qnha41bzxskcj
N
Nuod7520tfl
2
N
Nq653xzvtj
N
N8
N
m
Nuphgd97yri
wnoeb4skcfl
N
01a2
Nd0
N96rt
N
wuqocl
N
N
m
Nnhgea7541xzvsj
N
N83y
N
pbkfi
Nugt
N
N
N
po2bazxvk
N
N
q607
N
N
Nnhed4315crl
N8s
myfji
9
w
N
Nut
1
Na
6
m
Nhgd750
N
Nqpoe92xvcrl
w4
N
N
Nbskij
N
nzf
N38y
ua5i
N
N
wqpoh6bvrt
N
Nngd9870321zyxskflj
me4c
qn6
N
N
wupohgba5d9821zxyvskcrtij
N
N
e703l
m
4f
qn6a5st
N
N
N
wupohgeb94321d08zxvyckrfi
m
7lj
N
m
wuqpnogba651ed432h0987xzyvskcrtflji
N
2
4
6
x
8
10
25.3 Importance Sampling
515
The posterior draws for p1 ; : : : ; p10 are shown in the second
panel in the gure. We nd that that Æ = 4:04 with a 95 per
cent interval of (3,5). The third panel shows the posterior for Æ
which is necessarily a discrete distribution.
R
Consider the integral I = h(x)f (x)dx where f is a probability
density. The basic Monte Carlo method involves sampling from
f . However, there are cases where we may not know how to
sample from f . For example, in Bayesian inference, the posterior
density density is is obtained by multiplying the likelihood L()
times the prior f (). There is no guarantee that f (jx) will be
a known distribution like a Normal or Gamma or whatever.
Importance sampling is a generalization of basic Monte Carlo
which overcomes this problem. Let g be a probability density
that we know how to simulate from. Then
I=
Z
h(x)f (x)dx =
Z
h(x)f (x)
g (x)dx = Eg (Y )
g (x)
where Y = h(X )f (X )=g (X ) and the expectation Eg (Y ) is with
respect to g . We can simulate X1 ; : : : ; XN g and estimate I
by
1 X h(Xi )f (Xi )
1X
Yi =
:
Ib =
N i
N i
g (Xi )
This is called importance sampling. By the law of large nump
bers, Ib !
I . However, there is a catch. It's possible that Ib might
have an innite standard error. To see why, recall that I is the
mean of w(x) = h(x)f (x)=g (x). The second moment of this
quantity is
Eg (w (X )) =
2
Z h(x)f (x)
g (x)
2
g (x)dx =
Z
h2 (x)f 2 (x)
dx:
g (x)
516
If g has thinner tails than f , then this integral might be innite.
To avoid this, a basic rule in importance sampling is to sample
from a density g with thicker tails than f . Also, suppose that
g (x) is small over some set A where f (x) is large. Again, the
ratio of f=g could be large leading to a large variance. This
implies that we should choose g to be similar in shape to f . In
summary, a good choice for an importance sampling density g
should be similar to f but with thicker tails. In fact, we can say
what the optimal choice of g is.
Theorem 25.5
is
The choice of g that minimizes the variance of Ib
g (x) =
jh(x)jf (x) :
jh(s)jf (s)ds
R
PROOF. The variance of w = fh=g is
Eg (w ) (E (w )) =
2
2
Z
2
=
=
w (x)g (x)dx
Z
2
Z
Z
h2 (x)f 2 (x)
g (x)dx
g 2 (x)
h2 (x)f 2 (x)
g (x)dx
g 2 (x)
2
w(x)g (x)dx
Z
Z
2
h(x)f (x)
g (x)dx
g (x)
h(x)f (x)dx
The second integral does not depend on g so we only need to
minimize the rst integral. Now, from Jensen's inequality we
have
Eg (W ) (Eg (jW j)) =
2
2
Z
2
2
jh(x)jf (x)dx :
This establishes a lower bound on Eg (W 2 ). However, Eg (W 2 )
equals this lower bound which proves the claim. This theorem is interesting but it is only of theoretical interest. If we did not know how to sample from f then it is unlikely
:
517
R
that we could sample from jh(x)jf (x)= jh(s)jf (s)ds. In practice, we simply try to nd a thick tailed distribution g which is
similar to f jhj.
Let's Restimate I = P (Z > 3) =
:0013 where Z N (0; 1). Write I = h(x)f (x)dx where f (x) is
the standard Normal density and h(x) = 1 if x > 3 andP0 otherwise. The basic Monte Carlo estimator is Ib = N 1 i h(Xi ).
Using N = 100 we nd (from simulating many times) that
E (Ib) = :0015 and V ar(Ib) = :0039. Notice that most observations are wasted in the sense that most are not near the right
tail. We will estimate this with importance sampling taking g
to be a Normal(4,1) density.
We draw values from g and the
P
1
b
estimate is now I = N
i f (Xi )h(Xi )=g (Xi ). In this case we
b
nd that E (I ) = :0011 and V ar(Ib) = :0002: We have reduced
the standard deviation by a factor of 20.
Example 25.6 (Tail Probability)
Suppose we have
measurements X1 ; : : : ; Xn of some physical quantity . We might
model this as
Xi = + i :
Example 25.7 (Measurement Model With Outliers)
If we assume that i N (0; 1) then Xi N (i ; 1). However,
when taking measurements, it is often the case that we get the
occasional wild obvervation, or outlier. This suggests that a Normal might be a poor model since Normals have thin tails which
implies that extreme observations are rare. One way to improve
the model is to use a density for i with a thicker tail, for example, a t-distribution with degrees of freedom which has the
form
( +1)=2
+1 2
x
1
2
:
t(x) =
1 + 2
Smaller values of correspond to thicker tails. For the sake of
illustration we will take = 3. Suppose we observe n Xi = + i ,
i = 1; : : : ; n where i has a t distribution with = 3. We will
518
take a at prior on . The likelihood is L() =
and the posterior mean of is
=
R
R
Qn
i=1 t(Xi
)
L()d
L()d :
We can estimate the top and bottom integral using importance
sampling. We draw 1 ; : : : ; N g and then
PN j L(j )
j =1 g(j )
P
N L(j ) :
1
j =1 g(j )
N
1
N
To illustrate the idea, I drew n = 2 observations. The posterior
mean (ccomputed numerically) is -0.54. Using a Normal importance sampler g yields an estimate of -0.74. Using a Cauchy (tdistribution with 1 degree of freedom) importance sampler yields
an estimate of -0.53.
25.4 MCMC Part I: The Metropolis-Hastings
Algorithm
R
Consider again the problem of estimating the integral I = h(x)f (x)dx.
In this chapter we introduce Markov chain Monte Carlo (MCMC)
methods. The idea is to constuct a Markov chain X1 ; X2 ; : : : ;
whose stationary distribution is f . Under certain conditions it
will then follow that
1
N
N
X
i=1
p
h(Xi ) !
Ef (h(X )) = I:
This works because there is a law of large numbers for Markov
chains called the ergodic theorem.
The Metropolis-Hastings algorithm is a specic MCMC
method that works as follows. Let q (y jx) be an arbitrary, friendly
distribution (i.e. we know how to sample from q (y jx)). The conditional density q (y jx) is called the proposal distribution.
25.4 MCMC Part I: The Metropolis-Hastings Algorithm
519
The Metropolis-Hastings algorithm creates a sequence of observations X0 ; X1 ; : : : ; as follows.
Choose X0 arbitrarily. Suppose
we have generated X0 ; X1 ; : : : ; Xi. To generate Xi+1 do the following:
Metropolis-Hastings Algorithm.
(1) Generate a proposal or candidate value Y
(2) Evaluate r r(Xi; Y ) where
q(yjXi).
f (y ) q (xjy )
r(x; y ) = min
;1 :
f (x) q (y jx)
(3) Set
Xi+1 =
Y with probability r
Xi with probability 1 r:
REMARK 1: A simple way to execute step (3) is to generate
U (0; 1). If U < r set Xi+1 = Y otherwise set Xi+1 = Xi .
REMARK 2: A common choice for q (y jx) is N (x; b2 ) for some
b > 0. This means that the proposal is draw from a Normal,
centered at the current value.
REMARK 3: If the proposal density q is symmetric, q (y jx) =
q (xjy ), then r simplies to
f (Y )
;1 :
r = min
f (Xi )
The Normal proposal distribution mentioned in Remark 2 is an
example of a symmetric proposal density.
520
By construction, X0 ; X1 ; : : : is a Markov chain. But why does
this Markov chain have f as its stationary distribution? Before
we explain why, let us rst do an example.
The Cauchy distribution has density
1 1
f (x) =
:
1 + x2
Our goal is to simulate a Markov chain whose stationary distribution is f . As suggested in the remark above, we take q (y jx) to
be a N (x; b2 ). So in this case,
Example 25.8
f (y )
1 + x2
r(x; y ) = min
; 1 = min
; 1 :
f (x)
1 + y2
So the algorithm is to draw Y
Xi+1 =
N (Xi ; b ) and set
2
Y with probability r(Xi ; Y )
Xi with probability 1 r(Xi; Y ):
The R code is very simple:
metrop <- function(N,b){
x <- rep(0,N)
for(i in 2:N){
y <- rnorm(1,x[i-1],b)
r <- (1+x^2)/(1+y^2)
u <- runif(1)
if(u < r)x[i] <- y else x[i] <- x[i-1]
}
return(x)
}
The simulator requires a choice of b. Figure 25.3 shows three
chains of length N = 1000 using b = :1, b = 1 and b = 10. The
histograms of the simulated values are shown in Figure 25.4. Setting b = :1 forces the chain to take small steps. As a result, the
chain doesn't \explore" much of the sample space. The histogram
521
from the sample does not approximate the true density very well.
Setting b = 10 causes the proposals to often be far in the tails,
making r small and hence we reject the proposal and keep the
chain at its current position. The result is that the chain \gets
stuck" at the same place quite often. Again, this means that the
histogram from the sample does not approximate the true density
very well. The middle choice avoids these extremes and results in
a Markov chain sample that better represents the density sooner.
In summary, there are are tuning parameters and the eÆciency
of the chain depends on these parameters. We'll discus this in
more detail later.
If the sample from the Markov chain starts to \look like"
the target distribution f quickly, then we say that the chain is
\mixing well." Constructing a chain that mixes well is somewhat
of an art.
Why It Works. Recall from the previous chapter that a
distribution satises detailed balance for a Markov chain if
pij i = pjij :
We then showed that if satises detailed balance, then it is a
stationary distribution for the chain.
Because we are now dealing with continuous state Markov
chains, we will change notation a little and write p(x; y ) for the
probability of making a transition from x to y . Also, let's use
f (x) instead of for a distribution.R In this new notation, f is
a stationary distribution if f (x) = f (y )p(y; x)dy and detailed
balance holds for f if
f (x)p(x; y ) = f (y )p(y; x):
Detailed balance implies that f is a stationary distribution since,
if detailed balance holds, then
Z
f (y )p(y; x)dy =
Z
f (x)p(x; y )dy = f (x)
Z
p(x; y )dy = f (x)
−1.0
0.0
1.0
2.0
522
200
400
600
800
1000
0
200
400
600
800
1000
0
200
400
600
800
1000
−8
−4
0
2
4
−2
0
2
4
0
FIGURE 25.3. Three Metropolis chains corresponding
to b = :1, b = 1, b = 10.
523
0.0
0.1
0.2
0.3
0.4
0.5
0
5
−5
0
5
−5
0
5
0.0
0.1
0.2
0.3
0.4
0.5
0.0
0.1
0.2
0.3
0.4
0.5
−5
FIGURE 25.4. Histograms from the three chains. The
true density is also plotted.
524
R
which shows that f (x) = f (y )p(y; x)dy as required. Our goal
is to show that f satises detailed balance which will imply that
f is a stationary distribution for the chain.
Consider two points x and y . Either
f (x)q (y jx) < f (y )q (xjy ) or f (x)q (y jx) > f (y )q (xjy ):
We will ignore ties which we can do in the continuous setting.
Without loss of generality, assume that f (x)q (y jx) > f (y )q (xjy ).
This implies that
r(x; y ) =
f (y ) q (xjy )
f (x) q (y jx)
and that r(y; x) = 1. Now p(x; y ) is the probability of jumping
from x to y . This requires two things: (i) the proposal distribution must generate y and (ii) you must accept y . Thus,
p(x; y ) = q (y jx)r(x; y ) = q (y jx)
f (y ) q (xjy ) f (y )
=
q (xjy ):
f (x) q (y jx) f (x)
Therefore,
f (x)p(x; y ) = f (y )q (xjy ):
(25.1)
On the other hand, p(y; x) is the probability of jumping from
y to x. This requires two things: (i) the proposal distribution
must generate x and (ii) you must accept x. This occurs with
probability p(y; x) = q (xjy )r(y; x) = q (xjy ). Hence,
f (y )p(y; x) = f (y )q (xjy ):
(25.2)
Comparing (25.1) and (25.2), we see that we have shown that
detailed balance holds.
25.5 MCMC Part II: Dierent Flavors
There are dierent types of MCMC algorithm. Here we will
consider a few of the most popular versions.
525
In the previous section we considered drawing a proposal Y of the form
Random-walk-Metropolis-Hastings.
Y = Xi + i
where i comes from some distribution with density g . In other
words, q (y jx) = g (y x). We saw that in this case,
f (y )
r(x; y ) = min 1;
:
f (x)
This is called a random-walk-Metropolis-Hastings method.
The reason for the name is that, if we did not do the acceptreject step, we would be simulating a random walk. The most
common choice for g is a N (0; b2 ). The hard part is choosng b
so that the chain mixes well. A good rule of thumb is: choose b
so that you accept the proposals about 50 per cent of the time.
Warning: This method doesn't make sense unless X takes
values on the whole real line. If X is restricted to some interval
then it is best to transform X , say, Y = m(X ) say, where Y takes
values on the whole real line. For example, if X 2 (0; 1) then
you might take Y = log X and then simulate the distribution
for Y instead of X .
This is like an importancesampling version of MCMC. In this we draw the proposal from
a xed distribution g . Generally, g is chosen to be an approximation to f . The acceptance probability becomes
Independence-Metropolis-Hastings.
f (y ) g (x)
r(x; y ) = min 1;
:
f (x) g (y )
The two previous methods can be easily
adapted, in principle, to work in higher dimensions. In practice,
Gibbs Sampling.
526
tuning the chains to make them mix well is hard. Gibbs sampling
is a way to turn a high-dimensional problem into several one
dimensional problems.
Here's how it works for a bivariate problem. Suppose that
(X; Y ) has density fX;Y (x; y ). First, suppose that it is possible to simulate from the conditional distributions fX jY (xjy ) and
fY jX (y jx). Let (X0 ; Y0) be starting values. Asume we have drawn
(X0 ; Y0 ); : : : ; (Xn; Yn). Then the Gibbs sampling algorithm for
getting (Xn+1 ; Yn+1 ) is:
The n + 1 step in Gibbs sampling:
Xn+1
Yn+1
fX jY (xjYn)
fY jX (yjXn
+1
)
This generalizes in the obvious way to higher dimensions.
Gibbs sampling is very
useful for a class of models called hierarchical models. Here
is a simple case. Suppose we draw a sample of k cities. From
each city we draw ni people and observe how many people Yi
have a disease. Thus, Yi Binomial(ni ; pi ). We are allowing
for dierent disease rates in dierent cities. We can also think
of the p0i s as random draws from some distribution F . We can
write this model in the following way:
Example 25.9 (Normal Hierarchical Model)
Pi
Yi jPi = pi
F
Binomial(ni; pi):
We are
interested in estimating the p0i s and the overall disease
R
rate pdF (p).
To proceed, it will simplify matters if we make some transformations that allow us to use some Normal approxmations. Let
527
p
pbi = Yi =ni . Recall that pbi N (pi ; si) where si = pbi (1 pbi )=ni .
Let i = log(pi =(1 pi )) and dene Zi bi = log(pbi =(1 pbi )).
By the delta method,
bi
N ( i ; i )
2
where i2 = 1=(npbi(1 pbi )). Experience shows that the Normal
approximation for is more accurate than the Normal approximation for p so we shall work with . We shall treat i as
known. Furthermore, we shall take the distribution of the i0 s to
be Normal. The hierarchical model is now
Zi j
N (; )
N ( i ; i ):
2
i
2
i
As yet another simplication we take = 1. The unknown parameter are = (; 1 ; : : : ; k ). The likelihood function is
L() /
/
Y
i
Y
i
f ( i j)
exp
Y
i
1
(
2
f (Zi j )
i
)
2
exp
1
(Z
2i2 i
i)
2
:
If we use the prior f () / 1 then the posterior is proportional
to the likelihood. To use Gibbs sampling, we need to nd the
conditional distribution of each parameter conditional on all the
others. Let us begin by nding f (jrest) where \rest" refers to
all the other variables. We can throw away any terms that don't
involve . Thus,
Y
1
2
f (jrest) /
exp
(
)
2 i
i
k
2
/ exp 2 ( b)
where
1X
b=
:
k i i
528
Hence we see that jrest N (b; 1=k). Next we will nd f ( jrest).
Again, we can throw away any terms not involving i leaving us
with
1
1
2
2
f ( i jrest) / exp
(
) exp
(Z
i)
2 i
2i2 i
1
2
/ exp 2d2 ( i ei)
i
where
ei =
Zi
i2
+
1
and d2i =
1
1 + 2
1 + 12
i
i
and so i jrest N (ei ; di ). The Gibbs sampling algorithm then
involves iterating the following steps N times:
2
draw draw 1
..
.
draw
k
N (b; v )
N (e ; d )
2
1
2
1
..
.
N (ek ; d2k ):
It is understood that at each step, the most recently drawn version of each variable is used.
Let's now consider a numerical example. Suppose that there
are k = 20 cities and we sample n = 20 people from each city.
The R function for this problem is:
gibbs.fun <- function(y,n,N){
k <- length(y)
p.hat <- y/n
Z <- log(p.hat/(1-p.hat))
sigma <- sqrt(1/(n*p.hat*(1-p.hat)))
v <- sqrt(1/sum(1/sigma^2))
mu <- rep(0,N)
psi <- matrix(0,N,k)
for(i in 2:N){
529
### draw mu given rest
b <- v^2*sum(psi[i-1,]/sigma^2)
mu[i] <- rnorm(1,b,v)
### draw psi given rest
e <- (Z + mu[i]/sigma^2)/(1 + (1/sigma^2))
d <- sqrt(1/(1 + (1/sigma^2)))
psi[i,] <- rnorm(k,e,d)
}
list(mu=mu,psi=psi)
}
After running the chain, we can convert each i back into pi
by way of pi = e =(1 + e ). The raw proportions are shown in
Figure 25.6. Figure 25.5 shows \trace plots" of the Markov chain
for p1 and . (We should also look at the plots for p2 ; : : : ; p20 .)
Figure 25.6 shows the posterior for based on the simulated
values. The second panel of Figure 25.6 shows the raw proportions and the Bayes estimates. Note that the Bayes estimates
are \shrunk" together. The parameter controls the amount of
shrinkage. We set = 1 but in practice, we should treat as another unknown parameter and let the data determine how much
shrinkage is needed.
i
i
So far we assumed that we know how to draw samples from
the conditionals fX jY (xjy ) and fY jX (y jx). If we don't know how,
we can still use the Gibbs sampling algorithm by drawing each
observation using a Metropolis-Hastings step. Let q be a proposal distribution for x and let qe be a proposal distribution for
y . When we do a Metropolis step for X we treat Y as xed.
Similarly, when we do a Metropolis step for Y we treat X as
xed. Here are the steps:
0.6
0.4
p1
0.8
530
0
200
400
600
800
1000
800
1000
−0.2
−0.6
mu
0.2
Simulated values of p1
0
200
400
600
Simulated values of mu
FIGURE 25.5. Simulated values of p1 and .
531
0
50
100
150
200
−0.6
−0.4
−0.2
0.0
0.2
0.4
0.0
1.0
2.0
3.0
mu
0.0
0.2
0.4
0.6
raw estimates and Bayes estimates
FIGURE 25.6. Top panel: posterior histogram of .
Lower panel: raw proptions and the Bayes posterior estimates.
0.8
1.0
532
Metropolis within Gibbs:
(1a) Draw a proposal Z q (z jXn ).
(1b) Evaluate
f (Z; Yn) q (Xn jZ )
r = min
;1 :
f (Xn; Yn ) q (Z jXn )
(1c) Set
Xn+1 =
Z with probability r
Xn with probability 1 r:
(2a) Draw a proposal Z qe(z jYn ).
(2b) Evaluate
f (Xn+1; Z ) qe(YnjZ )
r = min
;1 :
f (Xn+1 ; Yn) qe(Z jYn)
(2c) Set
Yn+1 =
Z with probability r
Yn with probability 1 r:
Again, this generalizes to more than two dimensions.
"!$#%'&)(+*,(+"-.#/ 0213*5476'0/(+8:9<;= >6?(@#%A3(+"-.#CBD(+A$ 9E6?(+#%3FG473(H4 &I-.J'*5(+(K-L!M1N(O9+P
(+9Q-.#/8RS-5#/47UTV;+47VBDA3(+';+(K95(O-59W4 *XJY!M1?4 -.J'(+95#%9<-5(+9Q-.#/'F3Z
][ \?^`_NaWbMcOad^feg'ad^)\?_G*5(h&)(+*59i-.4]13*,4kjV#%A3#/'F]]95#%3F70%(ml,6?(+9Q-IF7n3(O959,op4 &q95478:(srYnDSVt
-.#C-L!H4S&D#/"-.(O*5(O9,-=ZvuXJ3(wrYnD "-.#C-L!K4 &q#/"-.(O*5(O9,-x;O47n30%Am6N(sE1D *5 8:(O-.(O*i#%yW13 *. 8:(O-5*5#%;
8R4MA3(O0zTYH{U|i}>~TMH13*,476D 63#%0/#C-L!@A3(O395#C-L!&)n33;O-5#/47iTM*5(+F *5(+9,95#%47R&)n3';O-.#%47iTY47*
y13*5(OA3#/;h-.#/4 &)4 *y&)nV-.n3*,(jS 0%n3(4 &i95478:(*. 3A'478jS *,# 630%( Z
+VkdSk.VYvIG"d >7VkHOQ.]kG W¡ O'¢ £¤NSWSK 7¥
§¦kwR HŸ©SODª""«dk<@¬3ªYk. `+£x<¦dpOQ.]k y¢ "S7d"X'§¦d
YkQ@, <X¥Yd"V®S¥`Y "¯ 7£
°²± \?_´³xµ2bM_ ± b¶c+b"aK·s¸H#/9X@9,(O-X-5JDS-X;O47"-§ #%39wmrYnD "-5#%-L!G4S&2#/"-.(O*5(+9Q- H¹ #C-.J
95478:(E1'*5(+9,;+*,#/6?(+AG13*,476D 6'#/0/#C-L!ºI»$¼½ZI¾D47*½(h¿' 8:130/(ST ¹ (]8:#/F7J"- ¹ SY-s-.4@;O4739Q-.*5n';O :#/"-.(O*,jS 0q·s¸KÀÂÁfÃDÄ§ÅhÆx-5JDS-Ç-.*5 139½ :n33ÈY34 ¹ G1DS*. 8:(O-5(+* É"Ê 1N(O*½;O(+"-Ë4 &-5J3(
-.#%8R(SZ
LÌ RÍ?Î´ÏI\DadÍÐbMc=^fcWakbMcOak^`_2Ñ2T ¹ (X9Q-§ *Q- ¹ #%-5Jy95478:(wA3(h&f n30C-x-.J3(O47*,!pÒH;=S0/0/(OAyp_UÓÐÔ`Ô
ÍNÎ´ÏI\DadÍ2bMc=^`cwÒ: 'A ¹ (E 9,È#%&-5J3(<ADS-§13*,4djM#%A3(W9,n'Õ:;+#%(+"-w(hjV#%A3(+';+(E-54H*,(Ö,(+;h-s-5J3(
-.J3(O47*,! ZÇÌ×&x34 - ¹ (K*5(O-. #/-5J3(Mn'0/0UJ"!M1N4S-.J3(O95#/9OZ
º7ºkå
< ¦dE¥zÙØ5¥/OQ7Ú7dÛ
§¦dR"ª"¯¯q¦k" §¦d+§Ü
X3ªd]Ýv¦"¥fEÞÐS=ß
d+Q7£HàX§¦d¥IS¥`K ß
d'¯ 7ÛSK :Ø×O+Sk.dÛ
§¦d¶"ª"¯¯ Ü$"¥GØ`f"¯áÚdÛ
â¥/×ãL+hs§¦dy"ª"¯¯Ú£äÜ
º7º
Y© "K"¯ 3£ QÝVÚ 3¯áÚ"7dÛD£
º]» ¼
0%74 FqÁ ¼xÆ
¸À
$Àº
*
,+ .-0/
"!"#
%$&'(&'
)#
Ä +Ä ¸ s(+*,347n30%0/#`Á ?Æ
¢ À
·s¸GÀ Á 3¢ H¸ » ,¸3Ä '¢ ¸ ,¸"Æ
3¢ ¸K» 3¢ ¸DÁº» 3¢ ¸"Æ
436587:9;9=<:>:?%@BA DCCC @ ,EGF
.H "I ?KJL<NMO?%PRQSJ=T
p
A=mon p.q
Plk
I ?O9=?KJ I ?KJ=M
UV<NQSJXWYQZ9=>D[X\^]_?a`N>:<NJb`;cDde?9=<NQSJ=M?V>KMfQSghìM'?B<'j H Qe> H
A @ [
\
9=?K]rUV?KJ=MUV<NJs I ?KJLUV?tQSJ=M'?K]vuNìdj<N] H Qe>
H xw H y w #z%{ ?K]V?
A
[R<2MO?V>KM)M { ?J=7dSd { 9=<NM { ?V>KQe>M { ìM)M { ?2UV<NQSJRQe>jìQS]_M { ìM
wa| ~} P k

A
Qe>V H
8 H
z ?hUaìJ]_??VUKMYM { ?#J=7dSd { 9=<NM { ?V>KQe> z%{ ?KJ H
H P
|
Qe>tdì]?K]tM { ìJ[ { ?v7;>KMfQ sUaìMQ<NJ0j<N]#M { ?V>:?
9b]_<:UV? I 7]_?V> z QSdSdcK?#M { ?>K7c?VUKM
<'j
M { ?JL?M8j? z
: 21
U { `K9bMO?K]_>D[
¡G¢h£¤X¥§¦G¨¥©£Dª
«¥©£¢r¤
´(O- Ä OÄ ¸6N( | DA -§1N4 #/"-&)*5478 9,478:(:A'#/9,-5*5#%63n'-.#%47¡~Z 1?47#/"(+9Q-.#%8S-547* ¢ ¸ 4 &vy1D *5 8R(h-.(O* %# 9X95478:(]&)n33;h-.#%47¶4S& Ä +Ä ¸MP
¢ ¸HÀ Á Ä +Ä ¸7Æ
(KA'(OBD3(
7äd Á ¢ ¸"ÆÇÀ dÁ ¢ ¸"Æv»
Á 'ZCºdÆ
-.46?(:-5J3(63#S9p4 & ¢ ¸VZ (9.=!>-.J3S- ¢ ¸#/9KÓi_ v^)gDc+bMµ #%& pÁ ¢ ¸7ÆpÀ Z <M6'# 9,(+AVt
3(O959<n395(OA-.4:*5(O;+(+#Cj7(K8mn3;§JS-5-5(+"-.#%47¶6'n'-W-5J3(+9,(HA3k!M9W#%-#/934S-W;O4739,#/A3(O*5(+Aâj7(O*,!
#/8:1?47*,-. "- Y8SY!:4 &´-5J3(](O9,-.#%8-.47*,9 ¹ ( ¹ #/0%0?n'95(p *,(E63#/ 95(OAUZ ®1?47#/"-Ë(+9Q-.#/8RS-547*
¢ ¸$4S&<$1D *.S8R(h-.(+* #/9 ± \N_Ðc=^fcOakbM_Na#%& ¢ ¸ » Z s4 395#%9,-.(O3;O!#/9m$*,(= 9,47D 6'0/(
*5(OrYn3#/*,(+8:(+"-@&)47*H(+9Q-.#/8RS-547*59OZ¶uXJ3(A3#/9Q-.*5#%63n'-5#/47 4 & ¢ ¸¶#/9@;+ 0/0%(+A -.J3(cOgDeÏiÔ`^f_ÐÑ
µÐ^`cOa S^ iÓ2ad^)\?_½ZWuXJ3(y9,-. 3ADS*5A¡A3(OjM#/S-.#%47$4 & ¢ ¸#/9];+ 0/0%(+A-5J3(ycOakg3_iµ2g µ®b d\ YT
A3(O34 -.(OA¶6"! Q P
, À , Á ¢ ¸YÆIÀ yÁ ¢ ¸YÆ
Á 'Z 7Æ
& -.(OUT2#%-#/9p'4 -K1N479,95#%630/(m-54¶;O478R1'n'-.(y-.J'(:9Q-§ 'AD *5A (+*,*547*63nV-Hn'95nD 0%0%! ¹ (:;= (+9Q-.#%8S-5(<-5J3(]9Q-§ 3A3 *5A(O*5*54 *+ZIuXJ3(](O9,-5#/8RS-.(OAG9,-. 3AD *,A(+*,*547*Ë#/9sA'(+34 -5(+A6"! Q¢ Z
Y© "K"¯ 3£
Ä +Ä ¸ Ë(+*5'47n30/0%#`Á ?Æ
'¢ ¸>À
HQ À
]Á 3¢ ¸7Æ@À
pÁ Æ@À '¢ ¸
Q¢ À Ð¢ ÁQº» ?¢ Æ
Á 3¢ ¸7ÆÇÀ ÐÁº» ?Æ
@BA CCC @
¬
Px
r®
@BA CDCC @
°¯
@BA CDCC @
KC
±
²´³
h±
µ
¶
©²
¸
h·
©®
G
b¹ º
R»
¼ ¶
¼
¿¾
À
KC
½¼N¼ 8¼
µ Z
Á
*
²
H
Ë
À
H
A m p p
,+ ÃÂ°Äl?KM´@BA CDCC @ ÅEÆF
ìJ I ed ?KM H
P k
@ [Ç { K? J
.H
p
A m p
P k
Qe>r7JbcDQf`N>:? I [È { ?0>KMìJ I ì] I ?K]v]_<N]2Qe>
² @
H >:< H
P [
É H
ÊH a P©[# { ?V? >KMQSghìMO? I >KMìJ I ì] I ?KK] ]_<N]Qe>
¿ H
H ©
ººÊ
uXJ3( rMn3 0/#C-L!®4 &R 1N4 #/"->(+9Q-.#%8S-5(#/9>9,478:(O-.#%8R(O9> 9,95(+9,95(OA 6"!®-.J'( ebMg3_
c UÓ2g dbMµ b d\ MT'47* HT3A3(hBD3(OA6"!
À dÁ ¢ ¸» Æ
<(O;= 0%0-.J3S- kÁ Æs*,(O&)(O*59X-.4:(h¿V1?(+;O-.S-.#%47 ¹ #%-5Jâ*,(+9,1N(O;O-<-.4-.J3(A3#%9,-.*,#/63nV-.#/4 ¸
xÁ Ä OÄ D¸ ÆÇÀ xÁ Æ
-.JD-pF7(+'(+*.-.(+A¡-.J'(yADS-§VZKÌ×-A'4V(O9p34 -p8:(= ¹ (: *5(mkj (+*5 F7#/'F4dj (+*KGA3(O395#C-L!
&)47* Z
<¦dO"¥S
À 7äd Á ¢ ¸YÆ dÁ ¢ ¸YÆ
Á VZå"Æ
)!&'0Y$ & &K!"
¼
½¼¼ 8¼
| C
L* °²Y³
²³
A CCC
{ ?
L*
¸
p.q
A
p
¸
UaìJRcK? z ]vQSMMO?KJR`N>
| y
*
À³
vC
µ
v} ´(O- ¸HÀ Á ¢ ¸7ÆhZIuXJ3(O
dÁ ¢ ¸]» Æ À dÁ ¢ ¸» ¸ ¸p» Æ
À dÁ ¢ ¸» ¸7Æ 3Á ¸p» Æ dÁ ¢ ¸» ¸7Æ dÁ ¸p» Æ
À Á ¸p» Æ dÁ ¢ ¸p» ¸"Æ
À 7äd Á ¢ ¸7Æ
<¦dO"¥S 7
,
$¢ ¸
¢ ¸ »
v} âÌ×& 7äd â 3A Q -5J3(+Tv6"!¡uXJ3(+4 *5(+8 'Záå'T
'ZRÌ×&)470/0%4 ¹ 9x-.JDS- ¢ ¸ » ZÁ (+;+ 0/0DA'(OBD3#C-.#%47 'Záå'ZÚÆ@uXJ'(W*,(+95n'0%-Ç&)470/0%4 ¹ 9v&)*5478 1D *,-EÁ)6qÆ
4 &iuXJ3(+47*,(+8 'Z 3Z
Y© "H"¯ 3£
Á 3¢ ¸7ÆIÀ
Q À ÐÁº<» qÆ
63#/ 9À » $À
3¢ ¸ »
3¢ ¸
"¬
³
"!
|
²Y³
²Y³
|
)y
| y
²Y³
| ²Y³
| y²´³
| y
j
º
$#&%
Qe>V
¹ º
|
ìJ I
KCÊË
º
`N>
P
'
º
M { ?KJ
(
Qe>0UV<NJ >KQe>KMO?KJ=M´M { ìM
[
º
º
Ã Ë
*)+
/-
*
'
À
|
y²Y³
+
'
,
º
µ
'
L*
º
.-
?KM7]vJ=QSJ#M'<M { ?UV<NQSJWYQZ9;9bQSJh?½ìg
9bde?V z ? { ìui?)M { ìM ² n H
1032
>:<XM { i
` M
°H 6H &' ìJ I
Qe>V H
eQ >t` UV<NJ >KQe>KM'?KJ=M?V>KMfQSghìM'<N][ Ë
'
H
ÅH a P
º
'
[54?KJLUV?V H
¹ º
H
H "M { ìM
º7º
-
"!"#
%$&'(&'
)#
s4739,#/A3(O* AD-§y1D #%*59KÁ Ä. Æ Ä OÄdÁ ¸MÄ§D¸"ÆË&)*,478 *5(OF7*5(O959,#/47â8R4MA3(O0
À®xÁ Æ
¹ J3(O*5( pÁ ÆÀ G 'A #/9E $n33ÈY34 ¹ ¡*5(OF7*5(O95#%47&)n33;h-.#%47UZ W$(+9Q-.#/8RS-5(@4 &Ë
#/9
k¢ ¸DÁ NÆ½À kj (+*5 F7(4 & 0%0´ 95n3;§Jâ-.JDS- » &)47*y9,478R( 'Z <4 -.#%;+(G-5JDS-&)47*y(Oj (+*Q! ¹ (âJD=j7( n33ÈY34 ¹ 1DS*. 8:(O-5(+*
xÁ NÆS3AS¶(O9,-.#%8-.47* k¢ ¸DÁ NÆhZ½uXJ3(m^f_NakbYÑ dg'akbMµ ebYgD_ c ÓÐg dbMµ b d\ #/9XA'(OBD3(OA6"!
À Á d¢ ¸3Á NÆ5Æ À ´ 7 Á d¢ ¸3Á NÆ,Æ Á k¢ ¸DÁ NÆ5Æ À 7äd Á k¢ ¸DÁ NÆ5Æ Á d¢ ¸3Á NÆ,Æ À #/"-.(OF7*.-.(+A63#/ 9 #/"-.(OF7*.S-5(+AjS *5#/ 3;O(
Á 'Z YÆ
8R 0/0Vj S0/n3(O9Ë4 & ;= n39,(E9,8 0%0363#/ 9Ë6'n'-s0/ *5F (wj S*5#S3;+(SZ 2 *5F7(jS 0/n'(+9Ç4 & ;+ n395(
0S*5F7(Ë63# 9i63n'-I9,8S0/0Yj S*5#S3;+(SZvuXJ3#%9x0/(+ A39v-54E-.J3( i^)gDc 3g ^)gD_ ± bya dgDµ2bY\ -5JDS¹ ( ¹ #%0/0UA'#/95;On3959#%âA'(O-§S#/0 ¹ J3(O ¹ (H;O4kj7(+*<347'1D *.S8R(h-.*5#%;]*5(OF7*5(O959,#/47UZ
ÌL8R "!;= 9,(+9OT3H1?47#%Y-½(O9,-.#%8-.(JD 9Ë GS9,!M8R1V-.471'-5#/; W47*,8S03A3#/9Q-.*,#/63n'-5#/47T
-.J3S-#/9OT
¢ ¸ Á Ä Q Æ
Á 'Z Ê Æ
uXJ3#%9X#/9<m&f ;O- ¹ ( ¹ #%0/0N4 & -.(O¶8R È (pn'95(K4 &Z
P
»
@BA
A CCC
p
² w
p
@
p
@
yxw
p
®
'
p
p
@
'

8¼
L*
L*
|
|
¼¼ 8¼
L*
| y
¼
#y
À
~À
|
|
| y
iC
µ Ã
©¬
"¶
| KC
¼
:¼
µ
~¢h¤ "!$#¤&%'# ()#
¥©¨
®ºI»¼ ± \?_´³xµ2bM_ ± b^f_Nakb½¼*3
g3Ô&)47*sK1D *5 8R(h-.(O* #/9Ë #%"-.(+*QjS 0?·s¸HÀ Á`ÃDÄ§ÅhÆ
¹ J3(O*5(ÃÀ Ã?Á @BA ÄDCCC+Ä @ ¸7ÆHS3A ÅGÀ ÅÁ @BA ÄCCDCOÄ @ ¸7ÆH *,(&)n33;h-.#%4739@4S&X-.J3(GA3S-§
95n';§J¶-.J3S+
³dÁ -, ·s¸7Æ/. º<»¼ËÄ &)47*S0/0 -,10 C
Áµ'Z "Æ
ÌL ¹ 47*,A39+TvÁ`ÃDÄ§ÅhÆs-.*5 139 m¹ #%-.Jâ13*,476D 63#%0/#C-L!º<» ¼ËZ±(K;=S0/0xº<» ¼-.J3( ± \2? b½¼dg3Ñqb

-
4 &p-5J3($;+4 'BDA3(O3;+(>#%Y-5(+*Qj S0zZ s4 8R8:4730C!7TX1?(+47130%(n39,( É"Ê 1?(+*;+(+"-;O47'BDA3(O3;+(
(»
ºº
#/"-.(O*,jS 0%9 ¹ J3#/;§J ;+47*,*5(+9,1N4 3A39-.4 ;§J34V4 95#/'F¼ À Ê Z <4 -.(SP·s¸ ^fc dgD_ÐµÐ\?e
gD_Ðµ ^fc³ ´bMµ ]Ì×& #/9yj7(O;O-.4 *<-5J3(+ ¹ (Kn395(H;+47'B3A3(+3;O(@9,(O-mÁf95n';§J>y9513J3(O*5(
47* â(+0%0/#/1'95(kÆ½#/39Q-.(=SA¶4 &v #/"-5(+*,jS 0`Z
g _Ð^`_2Ñ <uXJ'(+*5(K#/9X8@n3;§J¶;O47'&)n39,#/47â 6?47n'-J'4 ¹ -.4#/"-5(+*51'*5(O-Wy;+4 'BDA3(O3;+(
#/"-.(O*,jS 0`Z ;+47'B3A3(+3;O(#/"-.(O*,jS 0v#%9E34S-G13*54 6D 63#%0/#%-L!â9Q-§S-5(+8:(+"- 6?47n'- 9,#/3;O(
#%9wHB'¿V(OArMn3 "-.#%-L! T'34S-X@*5 3A3478jS *,# 630%( Z 478:(W-.( ¿M-.9w#%Y-5(+*,13*5(h-w;+4 'BDA3(O3;+(
#/"-.(O*,jS 0%9I 9x&)470%0/4 ¹ 9+PÐ#%&qÌi*5(+1?(=-I-.J3(X( ¿'1?(+*,#/8:(+"-Ç4kj7(O*ËS3A4dj (+*OTY-5J3(#/"-.(O*,jS 0 ¹ #/0/0
;+47"-. #/p-.J'(½13 *. 8:(O-5(+* É"Ê 1?(+*´;O(+"-Ð4 &V-5J3(I-.#%8R(SZÐuXJ'#/9´#/9´;O47*5*,(+;h-i6'n'-Ðn395(O0/(+9,9i9,#/3;O(
¹ (E*. *,(+0%!m*5(+1?(=-s-.J'(E9.S8R(<(h¿V1?(+*5#%8R(O"-s4kj7(O*X 3AR4kj7(+*OZ 6?(O-,-.(+*Ë#/"-.(O*513*,(O-.S-.#%47
#/9w-5J3#/9OP
à]YO 7UO'ª:+V¯¯ OOEYdQRYd+VdQ§¥`ªdO $7¥Ç+SkE+'k¨V"SdO
Úk¥/"¯2)Y¥ES¥YWO¥] O£ àpYO '½+Vª$+V¯¯ OOGd YdQ>"d
+VdQ§¥`ªdO´ ]7¥Y+Sk+'k¨V"SdOwÚkS¥%Y¯§)"¥M"ª""¥S¯ dOHS¥YWO¥
£ àpYO 'ÇO'ª$O'¯¯ +Odh YkQ$Yd+'d§¥zªdh: 7¥WOSk
+Vk¨'"d+@Úk¥/"¯ )"¥´"yª""¥¯k+S¥YWO¥2 d£ N'ªK+'k."ªdE§¦7
ÇhO'd§¥zªdO.dÛ O'k¨V"Sd+kS¥%Y¯ )"¥Q+¬3ªdd+¡ yª""¥¯k+
S¥YWO¥H Ä Ä ¦dS S¥½+kp+Vª"¥<kS¥/"¯ ]Ú¯áN§¥YR§¦d
§¥zªdS¥ "EhS¥]"¯ªd £ ¦dS¥/ d dO+ Úk§¥/V3ªdO§¦d "=
¥7+d.dÛ:§¦d,YW ©M7¥ÈkX dS¥Ð"d: dS¥×£
Y© "H"¯ 3£
G"!"
&'t
$ %$
'
¼
C'
¼
®
®
-
A
0
6Â
0
|
A
*
+
|
CCC
0
0
Yui?K] I ` JL? z >O9 `K9=?K]_>]_?O9=<N]vM
<V9bQSJ=Q<NJ9=<NdSdZ>D[<N]0?½ìg
9bde?V"M { ?
g Q { M>D` M { ì
M 9=?K]UV?KJ=M<'j
M { ?9=<V9b7dìMfQ<NJjìui<N]ì]vgQSJ9bQSde<NM> z QSM { i7J D> [
>K7 i
` dSd <N7 z QSdSd8>:?V?`t>KM ìMO?Kg2?KJ=Md Q"!;?#M { eQ >Y9=<NdSd8Qe>ÙVUK7]ìMO?MO< z QSM { QSJ%$#9=<NQSJ=M>
9=?K] UV?KJ=MX<'j6M { ?MQSg2?D[
{ ? ì]V?Ê>D` QSJxM { ìM µ ( Qe>` &' 9=?K] UV?KJ=M
&'
7å
VU <NJs I ?KJLUV?"QSJ=MO?K]vuNìdj<N]"M { )
? Mf]v78?
cD7M%7J)!NJL< z J9b]_<V9=<N]vMQ<NJ H <'j9=?V<V9bde? %
z { <jìui<N]
ì]vgQSJ9bQSde<NMS> z QSM { i7J >D[ %Sj <N7j<N]Kg `BUV<NJs I ?KJLUV?QSJ=MO?K]vuNìdLM { Qe> z ` K? ui?K] iI `
j<N]rM { ?r]_?V>KM<'j <N7]rd Q j?V &' 9=?K]XUV?KJ=M<'j <N7]hQSJ=MO?K]KuNìdZ> z QSdSdÚV<NJ=MìQSJ M { ?hMf]v78?
9 i
` ]ìg2?KM'?K][Ê { Qe>Qe>#Mf]v78?r?Kui?KJRM { <N7i {Å <N7Rì]_?h?V>KMfQSghìMfQSJ` I Q *?K]_?KJ=M,+D7 ìJ=MQSM
- `
I Q *?K]_?KJ=M 9=<NdSd.+D78?V>KMfQ<NJ0/B?Kui?K] 2I ` [
Ð-.(+*OT ¹ ( ¹ #%0/0?A3#/9,;+n39,9 sk! (+9,# â8R(h-.J34MA39X#% ¹ J3#/;§J ¹ (p-5*5(=- 9X#%&Ð#%- ¹ (O*5(
R*5 3A3478jS *,# 630%(K 3A ¹ (mA34R8SÈ (@1'*5476DS63#/0%#%-L!G9Q-§S-5(+8:(+"-.9] 6N4 n'- ZÌL$1D *t
-.#%;+n30/ *+T ¹ ( ¹ #/0/0?8R È (]9Q-§S-5(+8:(+"-.9s0/#/ÈS(lQ-5J3(]13*,476D 6'#/0/#C-L!y-.JD- #%9X·s¸MT3F7#%j (+G-5J3(
¬
F
7º º
ADS-.'T´#/9 É7Ê 1?(+*;O(+"-=Z o<4 ¹ (Oj (+*OTi-.J'(+95( w=!7(O95#/ #%"-.(+*QjS 0/9p*,(O&)(+*p-54âA'(+F7*,(+(ht×4 &t
6?(+0/#%(O&s13*54763 63#/0%#%-5#/(+9OZuXJ3(+9,( w=!7(O95#/ #/"-.(O*,jS 0%9 ¹ #%0/0Ç34 -+Tv#/F7(+'(+*.S0zTi-5*. 1-.J3(
1D *5 8:(O-.(O* É"Ê 1N(O*;+(+"-<4 &Ð-.J3(p-5#/8:( Z
Y© "K"¯ 3£
·s¸âÀ Á 3¢ ¸H» ,¸VÄ 3¢ ¸ Q¸"Æ
¸ À 0/47FqÁ ¼vÆ 3Á Æ
+ Á , ·s¸7/Æ . º» ¼
·s¸ º» ¼
E98:(+"-.#%473(+Aâ(=S*50/#%(+*OT31N4 #/"-(+9Q-.#/8RS-547*59w4S& -.(+âJD=j7(H0/#/8:#%-5#/3F W4 *58R 0UA3#%9Qt
-.*,#/63nV-.#/4 UTY8R(+ 3#/'FH-.J3S-s(OrYnDS-.#%47¡Á 'Z Ê Æ½J3470%A39+TY-.J3S-w#%9+T ¢ ¸ Á Ä ,¢ Æ ZxÌLG-.J'#/9
;=S95( ¹ (K;=S¶;O4739,-5*5n3;h-@Á` 131'*54=¿V#/8RS-.(=Æs;+47'B3A3(+3;O(H#%"-.(+*QjS 0/9 9X&)470%0/4 ¹ 9+Z
¦d+"¥ v"¥zE"¯ ß. dQ+âÝ'k¨V"SdO kS¥%Y¯Ú£
¢ ¸ Á Ä Q¢ Æ
+ À ÁQº<» Áf¼ Æ5Æ
{U|Ð}
+ Á Æ½À ¼
Á» ÆÇÀº½»¡¼
Á VÄ=ºdÆ
·s¸HÀ Á ¢ ¸» Q¢ Ä ¢ ¸ Q¢ Æ
Á 'Z 7Æ
+
dÁ , ·s¸ Æ º<» ¼
Á 'Z "Æ
v} ´(O-I¸ À Á ¢ ¸» Æ ,¢ Z Ë! 959,n38R1V-.#/4 x¸ ¹ J3(O*5(
Á 'Ä=ºdÆ ZW(O+ 3;+(ST
+
dÁ - , ·s¸7Æ À ¢ ¸p» ,¢ ¢ ¸ Q¢
+
À +#" v» ¢ ¸,¢ » !
» %$
À º» ¼
¾D47* É"Ê 1N(O*x;+(OY-Ç;O47'BDA'(+3;O(<#%"-.(+*QjS 0/9OT"¼¡À Ê 'A Àº É W0%(= A3#%3F
-.4H-.J3( 1'13*54=¿V#/8RS-.( É"Ê 1?(+*w;O(+"-;+4 'BDA3(O3;+(#/"-5(+*,jS 0 ¢ ¸ ,¢ Z ( ¹ #/0%0?A'#/95;On3959
-.J'(H;O4739,-5*5n3;h-.#%47¶4S&x;+4 'BDA3(O3;+(m#%"-.(+*QjS 0/9#/8:47*5(F7(O3(+*5 0/#C-L!#/-.J'(H*,(+9,-W4 &i-.J3(
6?4V47ÈqZ
:µ
"!"#
%$&'(&'
)#
rF
BF
*
,+
w |
vJ
M{ B
? UV<NQSJrW´QZ9;9bQSJ>:?KMfMfQSJN
de?KM
a P [ ]_<Ng 4<:? * I QSJ >,QSJL? +D7 ìd QSM

w
H yÈw 0z%{ ?K]V?
[ $0/0QSM8j<NdSde< z >,M { ìM
H
%
- '
H
j<N]t?Kui?K] )H [4?KJLUV?V
Qe>t`
UV<NJs I ?KJLUV?,QSJ=M'?K]vuNìd[ Ë
®
|
fµ
Äl?KM
Z3Ê587:9;9=<:>:?M { i
` M
A
<N]vghìdìJ I de?KM
k

|
z%{ ?KV] ? ÈE
|
|
cK?M { ?
Qe>V
|
Äl?KM
<'j#`r>KMìJ I ì] I
ìJ I
y
|
|
KC
| [
M { ìM
[
,'
µ
{ ?KJ
³
¬
º
a
C
µ 4µ
ÇF
E
'
³
³
)y
|
³
|
º
C
|
|
|
|
Ë
'
C'
(

|
iC
±
-

ººÉ
©)!
$& $$l&'
¸
Y© "H"¯ 3£ ¸ Ä +Ä ¸ s(O*5347¸ n'0/0/#`Á ?Æ
3¢ ¸@À
:Á 3¢ ¸7Æ À
:Á ÆÀ
ÐÁQº:» ?Æ¡À ÐÁº:» qÆÀ 2ÁQº:»
Q¢ À 3¢ ¸3ÁQº» 3¢ ¸7Æ
I, À ÐÁºX» ?Æ
?Æ
3¢ ¸Á ´Ä ¢ Æ
ºÇ» ¼
¢ Q¢ À ¢ '¢ ¸3Áº» 3¢ ¸"Æ
p
Am p.q
+ oÄl?KM@BA CDCC @ E(F
ìJ I de?KM H
P k
H
A @ [2 { ?KJ
p
m p.q
m p.q
Plk|
Pk|
Plk|VP H
À H
oH
oH
H
A À @
A H
ÅH a P ìJ I
H a P©[ M { ?Y?KJ=M]ìd
H a P©[ 4?KJLUV?V
H
H
ÄQSgQSM { ?V<N]V?Kgt H
VU <NJs I ?KJLUV?
.H | [h { K? ]_?j<N]V?V©ìJ6`K9;9b]_<QSghìMO?
QSJ=M'?K]vuNìd Qe>
H
H
H (
H (
C
|
|
P
*
Y<Ng
9 ì]_? M { Qe> z QSM { M { ?xUV<NJs I ?KJLUV?RQSJ=MO?K]vuNìd,QSJ M { ? 9b]V?Ku:Q<N7;> ?½ìg
9bde?D[ { ?
<N]vghìd T cV`N>:? I SQ J=M'?K]vuNìdYQe>2> { <N]vM'?K]BcD7M,QSM#<NJ=d Ê{ `N> K` 9;9b]_<QSghìMO?Kd - dì]?2>DìgT
9bde? X
/
UV<N]v]_?VUKMUV<Nui?K]`:?D[ Ë

$#¨£¨
¢¥
#¨¥£¤
½47n3*Ç;+470%0/(+ F7n3(wJ3 9 ¹ *5#%-,-.(O R 0%F747*5#C-.J38 &)47*I9,#/8mn'0S-5#/3F 3#/139Ç4 &]&f #/*x;O47#/Z
½47n *,(R9,n3951'#/;+#%47n39H4 &X!747n'*@;+4 0/0/(+ F7n3( á9K13*,47F7*.+ S8R8:#/3F9,ÈM#%0/0%9+Z ´(O- Ä hÄ ¸
A3(+'4 -.( 9,#/8mn'0S-5(+A ;+47#% 3#/139: 3A 0/(h- À Á ÀºdÆ Z ´(O- RA'(+34 -5(¶-5J3(
J"!V1?4 -5J3(+9,#/9-.JDS-y-5J3(95#%8mn30-.47*H#/9y;+4 *5*5(O;O-R 3A 0/(h- A3(O34 -5(-5J3(J"!V1?4 -5J3(+9,#/9
-.JD-p-.J3(y9,#/8mn30/S-.4 *E#/9E34 -p&f #%*+Z K#%9p;= 0%0/(OA-.J3(:_UÓÐÔfÔWÍNÎUÏI\qakÍÐbMc=^fc 3A #%9
;= 0%0/(OA-5J3(gDÔ akb S_2g'ad^ ?bÍ?Î´ÏI\DadÍÐbMc=^fc Z (;= ¹ *5#C-.(]-.J'(KJ"!M1?4 -.J3(O95(O9] 9
<P GÀº j7(O*595n'9 P Àº
Ì×& ¹ (WF7(O-Eº 95#%8mn30-.(+A D#%139w 'AG 0%0Ðº @ *,(Hº T ¹ (]8:#/F7J"-½*5(+ 95473 630%!y-. È (<-.J3#%9
9p(OjM#%A3(+3;O(: F" #%39,- =ZHuXJ3#/9p#%9p ¡(h¿' 8:130%(m4 &sÍNÎ´ÏI\DadÍ2bVc+^`cGakbMcOad^f_2ÑÐZ 2S-.(O*
¹ ( ¹ #/0%09,(+(-.JDS-;O478:8R471'*. ;h-.#/;O(#/9w-.4*,(Ö,(+;h- ¹ J3(O
3¢ ¸p» ¸
( ¹ #%0/0A'#/95;On3959-.J'#/9 3Aâ4 -5J3(+*J"!M1N4 -5J3(+9,#/9X-.(O9,-59W#%âA'(O-§S#/0U#%¶y0/S-.(O*X;§JD 1'-5(+*+Z
¬
P
@
,H
¬
%!
@A DCCC @
"!$#
A
&!$#
'!
½¼
A
±
!$#
' '
H

A H)(
!
:C
' '
¬
*!$#
+!#
A
H
¾
±
.-
#%
%
6¤6£ «10
/
23
A

|
C
,
#¤&!6£
54
+
'n *mA3(OB33#%-5#/474 &;+47VBDA3(+';+(#/"-5(+*,jS 0½*5(OrYn3#/*,(+9@-5JDS- dÁ , ·s¸7Æ . º» ¼
&)47*]S0/0 ),0 Z E$ÏI\?^f_Na R^`c+b¡gDchÎ2eÏ2ak\Dad^ ± ;+47'B3A3(+3;O(y#/"-.(O*,jS 0Ð*5(OrMn'#/*5(O9E-.J3SÁ
®
76
³
º
+
/0 #%8 #/V&z¸ Á , ·s¸ Æ). º@» ¼ &)47*yS0/0 , 0 Z W+ Ói_Ð^ z\ e gDcOÎ´eÏÐa=\qak^ ±
;+4 'BDA3(O3;+(y#%Y-5(+*Qj S0v*5(OrYn3*5(O9]-.JDS-E0/#%8 #/'&`¸ #/'& kÁ ), ·s¸7Æ . ºW» ¼ËZpuXJ3(
131'*54=¿V#/8RS-.( <47*58R 0Ct×6D 9,(+A#/"-5(+*,jS 0´#%9:1?47#%Y- ¹ #%95(K 9Q!V8:1'-54 -.#%;p;+47'B3A3(+3;O(@#%Vt
-.(O*,jS 0`ZxÌLF7(+3(O*. 0`T'#C-X8R#%F7J"-w34S-6N(mn3'#%&)47*,8 9Q!M8R1'-54 -.#%;];+4 'BDA3(O3;+(K#/"-.(O*,jS 0`Z

'
"!"#
%$&'(&'
)#
³
b¼
®
³
³

A Concise Course in Statistical Inference Brief Contents

Transkrypt

Podobne dokumenty