Colloquium Biometricum, 40

Transkrypt

Colloquium Biometricum, 40
The Polish Biometric Society
The 40th International Biometrical Colloquium
and
Second Polish-Portuguese Workshop on Biometry
in honour of Prof. J.T. Mexia
Abstracts
29 August – 2 September 2010
Będlewo/Poznań, Poland
1
Contents
Katarzyna Ambroży, Iwona Mejza
Balance and Efficiency of some Augmented Split-Block-Plot Design ……….………………5
Ewa Bakinowska
Zastosowanie modeli logistycznych do analizy doświadczeń polowych ..................................6
Ewa Bakinowska, Anna Szczepańska
Detection of change point in the experiment with the winter wheat ………………….………7
Bilal Ahmad Bhat
On Gamma distribution with an application using S-Plus Software ………………………….8
Tadeusz Caliński
On the analysis of experiments in nested block designs ………………………………………9
Elisabete Carolino, Isabel Barão
Comparison of Acceptance Sampling Plans for non-Gaussian variables with Acceptance
Sampling Plans for Gaussian variables (obtained by Box-Cox transformation) …………….10
Carlos A. Coelho, Joao T. Mexia
The Distribution of the Test Statistic for Testing the Equality of two Generalized Variances
in the Non-central Linear Case. An Example of Application of the results on the
Distribution of the Product of Independent Gamma-Ratio Random Variables …………...…11
Anita Dobek
Different approaches to genetic epistasis ……………………………………………...……..12
Miguel Fonseca
Linear models, analysis of variance and more …………………………………….……...….13
Bożena Gładyszewska, Izabela Kuna–Broniowska, Anna Ciupak
Analysis of Poisson’s ratio variation of tomato fruit peel …………………..……………….14
Janusz Gołaszewski, Anna Zaręba, Dariusz Załuski, Anna Imiołek,
Aneta Stawiana-Kosiorek, Tomasz Bieńkowski
A procedure for testing crop production technology ...............................................................15
M. Ivette Gomes, M. Manuela Neves
Estimation of parameters of extreme events for random censored data ..................................16
Darek Gozdowski, Wiesław Mądry, Adriana Derejko, Jan Rozbicki
Wnioskowanie na podstawie wielokrotnej serii doświadczeń dwuczynnikowej
w układzie split – block ...........................................................................................................17
Dariusz Gozdowski, Stanisław Samborski, Eike Stefan Dobers
Evaluation of methods for the detection of spatial outliers in the yield dataof winter wheat ..18
2
Jolanta Grala-Michalak, Katarzyna Kaźmierczak
Discriminant analysis for Kraft’s classes of trees …………………………...……………….19
Luís M. Grilo, Helena L. Grilo, António de Oliveira
Quantifying the (dis)agreement between two medicine measurements methods ……………20
Abdollah Hajivandi
Determining years of life lost (YLL) of leading death causes responsible for reduction
in life expectancy (boushehr – Iran) ………………………………………………...……….21
Zofia Hanusz, Joanna Tarasińska
Simulation study on multivariate normality based on Shapiro – Wilk statistics …………….22
Anna Imiołek, Janusz Gołaszewski, Dariusz Załuski
Badania ankietowe jako źródło informacji o kluczowych czynnikach agrotechnicznych
w produkcji żyta ozimego (Secale cereale L.) .........................................................................23
Katarzyna Kaźmierczak, Witold Pazdrowski, Agnieszka Jędraszak,
Marek Szymański, Marcin Nawrot
Crown width of a tree and its relationships with age, height and diameter at breast
height based on common oak (Quercus robur L.) ……………………………...……………24
Andrzej Kornacki, Katarzyna Ostroga
Zastosowanie kryterium Akaike do selekcji rozkładu normalnego .........................................25
Marcin Kozak, Agnieszka Wnuk, Dariusz Gozdowski, Zdzisław Wyszyński
Visualizing bivariate relationships with hexagonally binned data ………………….………..26
Katarzyna Marczyńska, Stanisław Mejza
Unreplicated experiments in early stage breeding programs ………………….……………..27
João Tiago Mexia
Models and inference – the normal case ………………………….………………………….28
Amilcar Oliveira, Teresa Oliveira
Using R packages in experimental design ……………………….…………………………..29
Dariusz Parys
Type I error rates in multiple testing ……………………………..…………………………..30
Wiesław Pilarczyk, Anna Fraś
On the precision of winter rape variety testing trials in Poland ……...………………………31
Stanisław Pluta, Agnieszka Masny, Wiesław Mądry, Edward Żurawicz
Fruit crop breeding with using biometrical methods ……………………………...…………32
Paulo C. Rodrigues, Ep Heuvelink, Marco Bink, Leo Marcelis, Fred van Eeuwijk
Crop growth modelling and QTL analysis of multilocation trials …………...………………33
3
Alicja Szabelska, Michał Siatkowski, Teresa Goszczurna, Joanna Zyprych
Overview of growth models in R ………………………………………………...…………..34
Agnieszka Tomkowiak, Alicja Szabelska, Joanna Zyprych, Zbigniew Broda,
Idzi Siatkowski
Analiza zróżnicowania genetycznego odmian i klonów koniczyny białej
(Trifolium Repens L.) przy użyciu markerów molekularnych .................................................35
Joanna Ukalska, Krzysztof Ukalski, Jakub Borkowski
An application of the generalized linear models for an examination of the phenotypic
quality of roe deer ……………………………………………………..……………………..36
Dorota Weigt, Alicja Szabelska, Joanna Zyprych, Idzi Siatkowski,
Zbigniew Broda
Morfological analysis of inflorescence mutants inalfalfa (Medicago sativa L.sl.)
with the respect to seed yield traits ………………………………………………..…………37
Bogna Zawieja, Wiesław Pilarczyk, Bogna Kowalczyk
Comparisons of uniformity decisions based on Coyu and Bennett’s methods –
simulated data …………………………………………………………………….………….38
Joanna Zyprych, Alicja Szabelska, Idzi Siatkowski
Gene’s selection based on statistical tests ………………………………………..…………..39
4
Balance and Efficiency of some Augmented Split-Block-Plot Design
Katarzyna Ambroży, Iwona Mejza
Department of Mathematical and Statistical Methods
Poznan University of Life Sciences
A construction procedure of an augmented split-block-plot design with control subplot
treatments is presented in the paper. In the modelling data the structure of an experimental
material and a four-step randomization scheme are taken into account. With respect to the
analysis of the obtained randomization model with six strata the approach typical to the
multistratum experiments with orthogonal block structure is adapted. A numerical example is
presented to illustrate the method of the construction, statistical properties of the final design
and their consequences for an analysis.
5
Zastosowanie modeli logistycznych do analizy doświadczeń polowych
Ewa Bakinowska
Katedra Metod Matematycznych i Statystycznych
Uniwersytet Przyrodniczy w Poznaniu
Hodowla nowych odmian zbóż jest procesem kosztownym i długotrwałym (wieloletnim).
Początkowo pracuje się nad uzyskaniem nowego materiału genetycznego i rozmnożeniem go.
Następnie wykonuje się badania porównawcze nowych genotypów w celu wykrycia linii
dających nadzieję uzyskania nowych odmian zbóż. Badania porównawcze polegają na
zakładaniu doświadczeń jednopowtórzeniowych z dużą liczbą genotypów. Na etapie
doświadczeń jednopowtórzeniowych dokonuje się ostrej selekcji i do dalszych badań
doświadczalnych wybieranych jest około 20-40% genotypów. Kolejne doświadczenia, z
mniejszą liczbą genotypów, wykonywane są już w układach doświadczalnych z
powtórzeniami.
Główna cecha badanych linii, brana pod uwagę przy wyborze do dalszych badań, to plon.
Niemniej jednak dla genotypów plonujących na tym samym poziomie ważną rolę odgrywają
inne, wizualne cechy. Są to m.in. wysokość rośliny, wyrównanie, porażenie mączniakiem
prawdziwym, wyleganie, porażenie rdzą brunatną, plamistość liści.
Celem pracy jest odpowiedź na pytanie jaki wpływ, oprócz plonu, na wybór linii do dalszych
doświadczeń, (wielopowtórzeniowych) przedwstępnych i wstępnych, mają inne obserwowane
cechy. Materiał doświadczalny stanowiły dane z jęczmieniem jarym pochodzące z
doświadczenia jednopowtórzeniowego przeprowadzonego w Stacji Hodowli Roślin
„Modzurów” – Grupy Szelejewo w 2006 roku. Do analizy użyto modelu logistycznego.
6
Detection of change point in the experiment with the winter wheat
Ewa Bakinowska, Anna Szczepańska
Department of Mathematical and Statistical Methods
Poznań University of Life Sciences
The aim is a presentation of the change point’s estimation of growth of biomass. To analysis
of experiment the nonparametric regression model was applied. The change point was treated
as an abrupt change in the response function. To determine the change point the theory
showed by Paul L. Speckman was used.
References
Paul L. Speckman (1994). Detection of change-points in the nonparametric regression.
7
On Gamma distribution with an application using S-Plus Software
Bilal Ahmad Bhat
Division of Sericulture
Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir
Mirgund, (J&K) India
Gamma distribution is a natural extension of the exponential distribution which has appeared
in the literature since the early 1800s. Johnson and Kotz (1970) discuss this Distribution and
include 130 references. This distribution is one of the commonly used statistical distribution
in practice. In the literature there exists a number of generalizations of this distribution. In this
paper, another approach is suggested to derive Gamma Distribution. Finally, numerical
illustrations using S-Plus software are also given in case of real time data.
8
On the analysis of experiments in nested block designs
Tadeusz Caliński
Department of Mathematical and Statistical Methods
Poznań University of Life Sciences, Poland
Nested block designs are quite often used in practice, particularly in agricultural
experimentation. Their statistical properties have been considered in many papers, as
reviewed by Bailey (1999). Of special interest are those nested block designs which satisfy
the general balance property introduced by Nelder (1965) and discussed by several authors,
by Bailey (1994) and by Bogacka and Mejza (1994) in particular. The purpose of the present
paper is to give explicit formulae for analyzing an experiment carried out in a nested block
design having the general balance property. They follow from a randomization-derived mixed
model, decomposed into stratum submodels. Of particular interest is the combined analysis
allowing the information from higher strata to be recovered. The paper is essentially an
extension of some results presented in Caliński and Kageyama (2000).
References
Bailey, R. A. (1994). General balance: Artificial theory or practical relevance?
In: Caliński, T., Kala, R. (Eds.), Proc. Int. Conf. on Linear Statist. Inference
LINSTAT’93 (pp. 171-184). Kluwer Acad. Publ., Dordrecht.
Bailey, R. A. (1999). Choosing designs for nested blocks. Listy Biometryczne – Biometr.
Lett. 36, 85-126.
Bogacka, B. and Mejza, S. (1994). Optimality of generally balanced experimental block
designs. In: Caliński, T., Kala, R. (Eds.), Proc. Int. Conf. on Linear Statist. Inference
LINSTAT’93 (pp. 185-194). Kluwer Acad. Publ., Dordrecht.
Caliński, T. and Kageyama, S. (2000). Block Designs: A Randomization Approach,
Volume I: Analysis. Lecture Notes in Statistics, Volume 150. Springer, New York.
Nelder, J. A. (1965). The analysis of randomized experiments with orthogonal block
structure. Proc. Roy. Soc. Lond. Ser. A 283, 147-178.
9
Comparison of Acceptance Sampling Plans for non-Gaussian variables with
Acceptance Sampling Plans for Gaussian variables (obtained by Box-Cox
transformation)
Elisabete Carolino1, Isabel Barão2
1
ESTeSL, IPL, Portugal 2 DEIO, FCUL, Portugal
In the quality control of a production process (of goods and services), from a statistical point
of view, focus is either on the process itself with application of Statistical Process Control, or
on its frontiers, with application of Acceptance Sampling (AS) – studied here – and
Experimental Design. AS is used to inspect either the output process – final product – or the
input – initial product. The purpose of AS is to determine a course of action, not to estimate
lot quality. AS prescribes a procedure that, if applied to a series of lots, will give a specified
risk of accepting lots of given quality. In other words, AS yields quality assurance. An AS
plan merely accepts and rejects lots, considering sampling information.The AS by variables is
based on the hypothesis that the observed quality characteristics follow a known distribution,
namely the Gaussian distribution (classical case of the AS by variables – treated in classical
standards). This is sometimes, however, an abusive assumption, that leads to wrong decisions.
AS for non-Gaussian, mainly asymmetrical variables, is thus relevant. When we have a nonGaussian distribution we can build specific AS plans associated with that distribution. If the
real distribution of data is very asymmetric and/or has heavy tails, but we are able to
adequately model the data and estimate its parameters, which usually is not easy, we can use
those specific AS plans. Alternatively, we can make the transformation of the original data
into normal values through a transformation of the Box-Cox type, which requires no prior
modeling process of the data and then use AS plans for the classical case – the Gaussian case.
In this work we will address the problem of determining AS plans by variables for
Exponential distribution, Gamma distribution and Extreme Value distributions. Considering
the same sample, the acceptance sampling plans specific to each non-Gaussian variable will
be compared with acceptance sampling plans for Gaussian variables (after Box-Cox
transformation) in terms of acceptance rate of the lot. The results show advantages in
applying the Box-Cox transformations to normalize the data and then applying the acceptance
sampling plans for Gaussian variables.
10
The Distribution of the Test Statistic for Testing the Equality of two
Generalized Variances in the Non-central Linear Case
An Example of Application of the results on the Distribution of the
Product of Independent Gamma-Ratio Random Variables
Carlos A. Coelho, Joao T. Mexia
Mathematics Department - Faculdade de Ciencias e Tecnologia - Universidade Nova de
Lisboa
In his presentation we will study in detail the exact distribution of the likelihood ratio test
statistic for testing the equality of two generalized variances in the non-central linear case and
will also consider in detail some near-exact distributions for this same statistic. The results
obtained are based on the recent book "Product and Ratio of Generalized Gamma-Ratio
Random Variables: Exact and Near-exact Distributions - Applications" by the same authors.
Simulations and numerical studies are used to show the usefulness of the near-exact
distributions in handling so complicated distributions, as well as the sharp closeness of such
distributions to the exact distribution.
Reference
Carlos A. Coelho, Joao T. Mexia (2010). Product and Ratio of Generalized Gamma-Ratio
Random Variables: Exact and Near-exact Distributions - Applications, LAP - Lambert
Academic Publishing AG & Co. KG, Saarbreucken, Germany (ISBN: 978-3-8383-5846-8).
11
Different approaches to genetic epistasis
Anita Dobek
Department of Mathematical and Statistical Methods
Poznań University of Life Sciences
In the last years much efforts has been done for the identification of genes that are responsible
for different quantitative traits, especially in medicine, biology etc. The very important
problem is to detect genes which alone have small influence on the phenotype. The problem
may be solved by the analysis of interaction of such gene with other one. On the other hand
it is well known that the gene-gene interaction as well as gene-environment interaction plays
a pivotal role in the developments of an organism.
Due to the importance of this problem there is a huge literature dealing with the genetic
epistasis. However, the scientist representing different disciplines are using different
definitions and terminology. Consequently, it is difficult to compare the proposed statistical
tools used for the identification and estimation of gene – gene interaction effects.
The presentation of some important interpretations may facilitate the proper choice
of a statistical method used in this context.
12
Linear models, analysis of variance and more
Miguel Fonseca
In this work, I will discuss the work my work developed jointly with Prof. João Tiago Mexia.
Starting with linear mixed models, many were the incursions in estimation, hypothesis testing
and construction of confidence regions for parameters in these models. Results on mixed
linear models will be presented, with emphasis on orthogonal models.
References
[1] Fonseca M., Mexia J.T., Zmy´slony R. (2008). Inference in normal models with commutative
orthogonal block structure. Acta et Commentationes Universitatis Tartuensis de Mathematica,
12,3-16.
[2] Fonseca M., Mexia J., Zmy´slony R. (2006). Binary operations on Jordan algebras and
orthogonal normal models. Linear Algebra and Its Applications 417, 75-86.
[3] Fonseca M., Mexia J.T., Zmy´slony R. (2003). Exact Distributions for the Generalized F
Statistic. Discussiones Mathematicae – Probability and Statistics 22, 37–51.
[4] Fonseca M., Mexia J.T., Zmy´slony R. (2003). Estimating and Testing of Variance
Components: an Application to a Grapevine Experiment. Biometrical Letters 40, 1–7.
[5] Fonseca M., Mexia J.T., Zmy´slony R. (2003). Estimators and Tests for Variance
Components in Cross Nested Orthogonal Models. Discussiones Mathematicae – Probability
and Statistics 23, 173–201.
13
Analysis of Poisson’s ratio variation of tomato fruit peel
Bożena Gładyszewska1, Izabela Kuna–Broniowska2, Anna Ciupak1
1
2
Department of Physics
Department of Applied Mathematics and Informatics
University of Life Sciences of Lublin
The paper presents results of studies on the effects of storage time and temperature on the
Poisson's ratio variation skin of two varieties of greenhouse tomato: Admiro and Encore.
Poisson's coefficient is one of the most important parameters determining the strength of the
material. It was noted that Poisson's ratio variation in fruit peel Admiro stored at 13 0C
recommended by Polish Standard, was stored in the initial period from 0.7 to 0.8, and after 10
days declined to about 0.6 and remained at that level until the end of the experiment.
By contrast, the variety Encore characterized by lower values and lower variability
of Poisson's ratio, which stood at from 0.4 to 0.5 during the period of storage. Higher storage
temperature, which was 210C, reduced the duration of the investigation to 12 days, because
after this period it was not possible to separate the sample because of the state structure of the
fruit surface. The value of Poisson's ratio for both varieties stored at room temperature
fluctuated throughout the entire experience around the value 0.5.
14
A procedure for testing crop production technology
Janusz Gołaszewski1, Anna Zaręba1, Dariusz Załuski1, Anna Imiołek1,
Aneta Stawiana-Kosiorek1, Tomasz Bieńkowski2
1
Department of Plant Breeding and Seed Production
University of Warmia and Mazury in Olsztyn, Poland
2
Household Agricultural Production Seed Central Ltd.
Profitable crop production requires a quick adaptation of technology to market demands.
Thus, a traditional technology ought to be modified or made anew what means that key
agrotechnical factors which shape high yield or a property of the yield will be changed.
For testing new technology we propose a three-stage approach:
1) detection of the key technology factors on the basis of the results from survey and
advanced experimental designs (FD, FFD),
2) implementation of those factors into a series of FDD on-farm experiments, and
3) estimation of efficiency of new technology as well as economic analysis of profitability.
The approach is illustrated by empirical data obtained from testing green pea production
technology in the north-eastern Poland. From the estimation of main and interaction effects
the three factors and their levels were implemented in the three types of FFDs generated from
FD-2^3 which finally were dislocated in six farms. The synthesis of the data has given
information on the statistical efficiency and economic profitability of changing each tested
agrotechnological factor. It was concluded that the suggested procedure may be implemented
in testing other crop production technologies, allowing for a specificity of a given crop.
15
Estimation of parameters of extreme events
for random censored data
M. Ivette Gomes1, M. Manuela Neves2
1
Universidade de Lisboa, Faculdade de Ciências, DEIO e CEAUL
2
Universidade Técnica de Lisboa, ISA, e CEAUL
In the area of Statistics of Extremes we deal essentially with the estimation
of parameters extreme events, like the probability of exceedance of a high level
or a high quantile, situated in the border or even beyond the range of the available data. The
most common assumptions on any set of univariate data are either independently, identically
distributed
or weakly dependent and stationary complete samples, from an unknown
distribution function F. However, in the analysis of lifetime data, observations are usually
censored. We shall now assume the case of random censorship, where apart from a recent
paper by Einmahl et al. (2008) and another by Gomes and Neves (2010), there is only, as far
as we know, a brief reference to the topic in Reiss and Thomas (1997, Section 6.1) and a
paper by Beirlant et al. (2007). In such a context of random censorship, as in all applications
of extreme value theory, the estimation of the extreme value index (EVI) is of primordial
importance. Such a parameter measures the heaviness of the tail and has been widely studied
in the literature. For heavy tails we mention the classical Hill estimator (Hill, 1975) and the
most recent minimum-variance reduced-bias estimators of the EVI (Caeiro et al., 2005;
Gomes et al., 2007; Gomes et al., 2008) and of extreme quantiles (Gomes and Pestana, 2007).
For a general EVI estimation, we mention the moment estimator of Dekkers et al., (1989) and
the “maximum likelihood” estimator (Smith, 1997; Drees et al., 2004). We shall give here
special attention to such estimation, as well as associated high quantile estimation under
random censoring, making use of a recent general EVI estimator, the mixed moment
estimator in Fraga Alves et al. (2009). We shall illustrate the results with simulations and with
the application of the methodology to a set of survival data.
16
Wnioskowanie na podstawie wielokrotnej serii doświadczeń
dwuczynnikowej w układzie split – block
Darek Gozdowski1, Wiesław Mądry1, Adriana Derejko1, Jan Rozbicki2
1
Kadedra Doświadczalnictwa i Bioinformatyki
2
Katedra Agronomii
Szkoła Główna Gospodarstwa Wiejskiego
Celem pracy jest sformułowanie łącznej analizy wariancji dla danych z serii doświadczeń
dwuczynnikowej w Porejestracyjnym Doświadczalnictwie Odmianowym (PDO). Skupienie
się na średnich w konfiguracji dla efektów głównych odmian i sposobu uprawy
oraz interakcje pomiędzy tymi czynnikami, tym samym dążąc do oceny wpływu dwóch
sposobów uprawy na średni plon pszenicy. Ocena ta będzie pokazana w różnych
środowiskach przeciętnie dla odmian oraz na interakcji odmiana x sposób uprawy.
W ostatniej odsłonie ukazano wpływ sposobu uprawy dla każdej z badanych odmian
przeciętnie dla każdej z miejscowości. Doświadczenie zaplanowane jest w układzie
split – block, seria doświadczeń zawiera 25 odmian, 2 sposoby uprawy oraz 8 lat.
17
Evaluation of methods for the detection of spatial outliers in the yield data
of winter wheat
Dariusz Gozdowski1, Stanisław Samborski2, Eike Stefan Dobers3
1
Department of Experimental Design and Bioinformatics, Warsaw University of Life Sciences
2
Department of Agronomy, Warsaw University of Life Sciences
3
Faculty of Geoscience and Geography, Georg – August University
Yield maps are a valuable source of spatial data in precision agriculture, but only if they
report crop yields close to the actual yields. Unfortunately, devices used to monitor crop
yields quite often register data significantly different from actual yield values. The number of
such incorrect data (spatial outliers) saved depends on the presence of obstacles in the field,
stops of harvester, etc. Share of the spatial outliers usually ranges from 10 to even 50%. It is
difficult and laborious, to point out the outliers based on raw yield data and visual assessment
of yield maps. Statistical methods that could help to detect such outliers are very desirable.
This work presents evaluation of three methods of spatial outlier detection in yield data.
Raw yield data used for the analyses came from a field cropped with winter wheat in 2009
located in north of Poland. Three methods were used for the spatial outliers detection,
one method based on histogram and two methods based on spatial autocorrelation coefficient
(Moran’s I). Different percentages of the outliers were detected using each of the methods
and quite weak correspondence between the methods was achieved.
The study proved that the use of the autocorrelation coefficient Moran’s I alone, is not
an objective method for the spatial detection of outliers within raw yield data. The detection
of spatial outliers based on negative value of Moran’s I was not sufficient and many outliers
pointed out earlier by the histogram method were not detected.
It has been observed that not only negative autocorrelation coefficient Moran’s I but also its
very high value can be the indicator of an outlier.
The process of detection of spatial outliers should consist of classical methods (e.g. removing
very high and very low values of grain yield) and complementary methods based
on the autocorrelation coefficient as a final step for creation of reliable yield maps.
18
Discriminant analysis for Kraft’s classes of trees
Jolanta Grala-Michalak1, Katarzyna Kaźmierczak2
1
Faculty of Mathematics and Computer Science
Adam Mickiewicz University
2
Department of Forest Management
Poznań University of Life Sciences
The paper presents results of discriminant analysis for Kraft’s classes of trees.
Kraft’s classification is based on tree position in the stand social structure and its crown
development and extent. Belonging to a given social class it reflects a position of a tree
in a stand, and through this, its growth potential. The aim of the analysis was the choosing
of the variables which mostly determined the Kraft’s class of tree and the construction
of discriminant functions which well classifies data to Kraft’s classes.
19
Quantifying the (dis)agreement between two medicine
measurements methods
Luís M. Grilo1, Helena L. Grilo2, António de Oliveira3
1
Mathematics Department
Polytechnic Institute of Tomar, Portugal
2
Mathematics Department
Polytechnic Institute of Tomar, Portugal
3
Medical Expert, Portugal
To analyze the serum levels of folic acid in a blood sample we use two different medicine
measurement methods, which usually do not produce exactly the same results. In order to
replace the old method by the new one, without causing problems in clinical interpretation,
we need to assess the agreement of the available data, which in this case presents a complex
variation across the range of the measurement. To do so, we estimate the 95% limits of
agreement, before and after logarithmic transformation, and we also consider an appropriate
use of regression. We apply these two different statistical techniques that are very useful and
easy to interpret by medical researchers.
20
Determining years of life lost (YLL) of leading death causes responsible for
reduction in life expectancy (boushehr – Iran)
Abdollah Hajivandi
Biostatistics and Epidemiology Department
Isfahan University of Medical Sciences, Isfahan, Iran
Introduction: death causes with highest rate are not always major factors responsible for
reduction of life expectancy but those are responsible which have highest YLL (years of life
lost} that is life expectancy at age at death time. this subject is investigated for mortality data
of population of boushehr province, located in south of Iran .
Methodology: years of life lost (YLL) of three leading death causes in province are computed
for both gender separately based on life expectancies, age and number of death due to each
death cause using mortality data.
Results: in both gender groups, hearts diseases are leading causes of death but only in group
of women this death cause contributes highest YLL. In men number of death from accidents
is about half of it due to heart diseases but years of life lost (YLL) by accidents is 2.5 times
more than YLL of heart disease. Mean of ages at death occurrence in accident groups of two
sexes are lowest among means of three leading causes of death.
Conclusion: driving accidents have highest influence on reduction in life expectancy
comparing to other leading causes of death in community specially in men because of low
mean age of people who are dead in driving accidents. This survey shows that years of life
lost (YLL) of death causes is more important than number of death in different death causes
in life expectation promotion projects of the community.
21
Simulation study on multivariate normality
based on Shapiro – Wilk statistics
Zofia Hanusz, Joanna Tarasińska
Department of Applied Mathematics and Computer Science
University of Live Sciences in Lublin
The
paper
concerns
three
tests
for
multivariate
normality
based
on
the
Shapiro – Wilk W statistic for the principal components of a covariance matrix. Two of them
were proposed by Srivastava and Hui (1987), the third was introduced by Hanusz
and Tarasińska (2008b). The type I errors of these tests at significance levels 0.1, 0.05
and 0.01 are evaluated both for the sample and residuals in the two data groups. The powers
of the tests under consideration against chosen alternative distributions are also presented
in both the sample and residual cases.
22
Badania ankietowe jako źródło informacji o kluczowych czynnikach
agrotechnicznych w produkcji żyta ozimego (Secale cereale L.)
Anna Imiołek, Janusz Gołaszewski, Dariusz Załuski
Department of Plant Breeding and Seed Production
University of Warmia and Mazury in Olsztyn, Poland
Badania ankietowe są powszechnie stosowaną metodą gromadzenia danych w badaniach
rynkowych lub socjologicznych. W badaniu agrotechniki upraw ankiety wykorzystuje
się relatywnie rzadko, pomimo iż odpowiednio skonstruowany kwestionariusz ankietowy
może stanowić podstawę uzyskania cennych informacji o technologii i możliwościach
wprowadzania innowacji technologicznej.
Badania ankietowe własne miały na celu określenie kluczowych elementów technologii
produkcji roślin. Rośliną testową było żyto ozime (Secale cereale L.) uprawiane na ziarno.
Ankietyzacją objęto większość producentów ziarna żyta w północno-wschodniej Polsce
prowadzących uprawę na areale większym niż 1 ha. Kwestionariusz ankietowy zawierał
pytania dotyczące charakterystyki ogólnej gospodarstwa, czynników technologicznych
produkcji, oceny energochłonności (agrotechniczna) oraz struktury nakładów. Zakodowane
dane o czynnikach produkcji stanowiły zmienne ogólnego modelu liniowego. W analizie
wyników dokonano analizy technologii produkcji oraz detekcji czynników kluczowych
technologii
uprawy
z
wykorzystaniem
sum
kwadratów
typu
III
w
ANOVA
oraz oszacowanych wielkości efektów czynnikowych eta-kwadrat.
23
Crown width of a tree and its relationships with age, height and diameter at
breast height based on common oak (Quercus robur L.)
Katarzyna Kaźmierczak1, Witold Pazdrowski2, Agnieszka Jędraszak2,
Marek Szymański2, Marcin Nawrot2
1
Department of Forest Management
2
Department of Forest Utilization
Poznań University of Life Sciences,
The paper presents results of studies on a dependency between crown width on basic
measurements of the tree. The aim of the analysis was to determine the strength of the
relationship between crown diameter of a tree and its diameter at breast height and height as
well as age. Moreover, regression equations were developed for the estimation of crown
width. The experimental material comprised measurement data of 33 oaks (aged from 41 to
148 years). Crown projection area was established for each tree on the basis of characteristic
points of the tree crown projected using a crown projector. Crowns were projected in eight
geographical directions. Equation parameters were estimated using the method of least
squares and the strength of the relationship was established on the basis of empirical data. A
function most accurately explaining the dependence of crown width on diameter at breast
height was selected. Moreover, the distribution of measurement data was compared with the
normal distribution and their basic statistical characteristics were established. The power of a
relationship between crown width and diameter at breast height, height and age of a tree was
evaluated using a correlation coefficient for a linear dependence. In turn, the strength of the
dependence between crown width and diameter at breast height was evaluated by a correlation
ratio for curvilinear functions. In view of the statistically significant dependence between
crown width of oaks and measured tree traits the analysis of regression was conducted,
assuming the investigated traits (age, height and diameter at breast height) as independent
variables. The progression stepwise regression was applied. Crown width of a tree may be
determined both on the basis of information on the age, height and diameter at breast height
of a tree. Diameter at breast height was the best among measurable traits of a tree for the
estimation of crown width. This is shown by the strength of a dependence between both traits
and the easily measurable diameter at a height of 1.3 m.
24
Zastosowanie kryterium Akaike do selekcji rozkładu normalnego
Andrzej Kornacki1, Katarzyna Ostroga2
1
Katedra Zastosowań Matematyki i Informatyki
2
Katedra Maszynoznawstwa Rolniczego
Uniwersytet Przyrodniczy w Lublinie
Rozkład normalny jest powszechnie stosowany w naukach przyrodniczych, technicznych jak
również humanistycznych i społecznych. W literaturze opisanych jest wiele testów
sprawdzających normalność rozkładu badanej cechy zarówno w przypadku jedno
jak i wielowymiarowym. (literatura). Wszystkie one wymagają stosowania odpowiednich
tablic wartości krytycznych.
W niniejszej pracy proponuje się zastosowanie do selekcji między rozkładem normalnym
i logarytmo–normalnym kryterium informacyjnego Akaike. Kryterium to wywodzi się
z teorii informacji. Sugerowaną metodę zastosowano do danych uzyskanych w doświadczeniu
z plonami buraków cukrowych .Wnioski wynikające z kryterium Akaike potwierdzono
klasycznym testem Shapiro–Wilka.
25
Visualizing bivariate relationships with hexagonally binned data
Marcin Kozak1, Agnieszka Wnuk1, Dariusz Gozdowski1, Zdzisław Wyszyński2
1
Department of Experimental Design and Bioinformatics
2
Department of Agronomy
Warsaw University of Life Sciences
Scatterplots are overwhelmingly often used in numerous branches of science. Unfortunately,
they fail when the number of points in a plot is very large. Several graphical techniques are
available to overcome such problems, and one of them if graphing hexagonally binned data.
The algorithm for hexagonal binning is quite complex, but the resulting graph is easy to
understand: a color of hexagonal symbols represents a number of observations that occurred
in the space covered by the corresponding bins. Unfortunately, this technique still has not
gained popularity among researchers, and the aim of our work was to present the usefulness
of graphing hexagonally binned data. This will be done for a three – year field experiment
with spring barley. The display is compared with regular scatterplots and it is shown that it
provides information about a relationship when regular scatterplots fail, even despite
employing very small symbols and jittering.
26
Unreplicated experiments in early stage breeding programs
Katarzyna Marczyńska, Stanisław Mejza
Department of Mathematical and Statistical Methods
Poznań University of Life Sciences
In plant breeding trials, during the early stages of the improvement process, it is not possible
to use an experimental design that satisfies the requirement of replicating all the treatments
because of the large number of genotypes involved, the small amount of seed and the low
availability of resources. Hence, the unreplicated designs are used for early generation testing
when hundreds or even thousands new genotypes need evaluation in the same trial using
a limited amount of seed that is enough for one replicated only. To control the real
or potential heterogeneity of experimental units, control (check) plots are arranged in the trial.
There are many methods of using information resulting from check plots. In the paper
the main tool of exploring this information will be based on a response surface methodology
(RSM). At the beginning we will try to identify response surface characterizing experimental
environments. The obtained response surface we will be then used to adjust the observations
for genotypes. Finally, so adjusted data will be used for inference concerning the next steps
of breeding program. The theoretical considerations will be illustrated with the example
dealing with spring barley.
27
Models and inference – the normal case
João Tiago Mexia
Faculdade de Ciências e Tecnologia. Universidade Nova de Lisboa, 2825 Monte da
Caparica, Portugal
The normality is a key feature when making inference in many classes of models. In this
paper we present and obtain inference for several classes of normal models, namely mixed
models, models with orthogonal block structure, and models with commutative orthogonal
block structure. The assumption of normality in these families of models leads to estimators,
tests and confidence regios with optimal properties. Some operations with these models, such
as crossing and nesting, are discussed.
28
Using R packages in experimental design
Amilcar Oliveira1, Teresa Oliveira2
1
2
Center of Statistics and Applications, University of Lisbon
Departament of Sciences and Tecnology, Universidade Aberta, Portugal
In Experimental Design besides the crucial parts of new methodological advancements
and new areas of application, it urges to improve the willingness of the future researcher
by developing the skills to take advantage of the ultimate roles. In the twenty – first century
new challenges emerge from the software development which allowed great advances
in all areas of research, particularly with respect to Statistics and Experimental Design.
Usually it was common the use of computer programs such as STATISTICA, SPSS or SAS
in teaching and research programs. Recently, software R appears as the current program
of greater investment in the scientific community of Statistics. R is a user-friendly software,
free, open to the community and manageable according to the specific needs in each situation.
In this work we present a retrospective of the main functions and packages used in problems
involving Experimental Design issues. We will seek to point out ways towards
the development of topics of interest in this area which until now have not been adequately
addressed in R.
References
Atkinson, A.C. and Donev, A.N. (1992). Optimum Experimental Designs. Oxford: Clarendon
Press.
Bailey, R.A. (1981). A uni_ed approach to design of experiments. Journal of the Royal
Statistical Society, Series A 144 , 214-223.
Box G. E. P, Hunter, W. C. and Hunter, J. S. (2005). Statistics for Experimenters
(2nd edition). New York: Wiley.
Fox, J. (2005). The R Commander: A Basic-Statistics Graphical User Interface to R. Journal
of Statistical Software 14 , Issue 9.
Groemping, U. (2009). Design of Experiments in R. Presentation at UseR! 2009 in Rennes,
France.
Lenth, R.V. (1989). Quick and Easy Analysis of Unreplicated Factorials. Technometrics 31,
469-473.
29
Type I error rates in multiple testing
Dariusz Parys
University of Lodz
In multiple testing of hypothesis we have many possible definitions for the Type I error rates.
In this paper we consider the family-wise error rate (FWER), the generalized family – wise
error rate (gFWER), the per-family error rate (PFER), the per-comparison error rate (PCER),
the median based per – family error rate (mPFER), the quantile number of false positives
(QNFP). We treat the Type I error rate as a parameter
n
( FU n , Rn ) of a joint
distribution FU , R of the numbers of Type I errors En.
n n
We also consider the class of multiple testing procedures that control a given Type I error rate
at an acceptable level α with remarks on power of these procedures.
30
On the precision of winter rape variety testing trials in Poland
Wiesław Pilarczyk1,2, Anna Fraś2
1
Department of Mathematical and Statistical Methods
Poznań University of Life Sciences
2
The Research Centre for Cultivar Testing
Winter rape is an important crop in Poland. So every new variety before listing in National
List is carefully examined by The Research Centre for Cultivar Testing. The performance of
varieties is checked in numerous VCU (value for cultivation and use) trials performed at
experimental stations characterized by different soil and climatic conditions. Winter rape is
very sensitive to extreme climatic conditions, e.g. frost in winter and drought periods in
vegetation season. Such factors influence not only the performance of individual varieties but
also overall precision of trials. Precision is often identified with the average value of least
significant difference expressed in percents of general mean (LSD%). The precision of VCU
trials on cereals in Poland has been reported in the papers by Pilarczyk [1987, 2008] and by
Pilarczyk and Fraś [2008]. In this research, using similar ethodology, the precision of winter
rape trials is investigated using extensive data from trials performed in the period between
2000 and 2008.
References
Pilarczyk,W.,1987, Precision of field trials in incomplete block designs for several species,
Biuletyn Oceny Odmian 18-19, str. 161-169.
Pilarczyk W., 2008, Confidence bounds for precision in cereal trials in Poland, Biometricke
Metody a Modely v Podohospodarskej Vede, Vyskume a Vyucbe, pp. 139-145.
Pilarczyk W., Fraś A., 2008., Ocena precyzji doświadczeń rejestrowych zbóż w Polsce,
Biuletyn IHAR 249, str. 19-27.
31
Fruit crop breeding with using biometrical methods
Stanisław Pluta1, Agnieszka Masny1, Wiesław Mądry2, Edward Żurawicz1
1
Research Institute of Pomology and Floriculture
2
Warsaw University of Life Sciences
A basic method of the fruit crop breeding is a conventional approach, that involves crossing
of different parents, and then positive selection amongst the obtained offspring (seedlings) of
the F1 generation. However, producing new cultivars by this way is long-term, needs the high
financial support and a lot of work. Applying the molecular techniques as well as biometric
methods permit shortening of the breeding cycle and increasing a breeding efficiency. The
selection of parental genotypes for crossing programs can be done on the basis of their
assessment for phenotypic traits (per se) in the cultivar collection (gene bank). Crossing
programs could be more effective when the breeding value of parental genotypes were
assessed on the basis of the general and specific combining ability effects for important traits.
Combining abilities of parents can be assessed with mating designs including the diallel or
factorial mating designs. Crossing of selected parents is carried out in controlled conditions
and in the field. Seedlings are produced in the winter-spring season in the heated greenhouse
and, then, after their hardening in the second half of May they are planted in the selection
fields. The assessment and selection of the breeding material are done in few steps: on
seedlings level, on advanced clones and in multi-environment cultivar trials. It takes from 8
till 15 years, depending on fruit crop. In the breeding process knowledge of variation,
heritability and correlation among important traits play a substantial role within the gene pool.
In order to know these population characteristics multivariate statistical methods should be
used. The most often utilized are the PCA and Cluster analysis. The band pattern for the best
advanced breeding clones is performed by using DNA fingerprinting. It permits an
identification of the genotype and confirming its origin. The most valuable breeding clones
are submitted for the register trials conducted by COBORU for their final evaluation. Multienvironment trials are conducted including two their kinds carried out parallel. In the first
kind of trial testing for DUS of new tested genotypes is conducted. In the second one the
VCU is assessed. On the basis of results of the both trials the best breeding clones are released
in the National Register as original cultivars.
32
Crop growth modelling and QTL analysis of multilocation trials
Paulo C. Rodrigues1,2, Ep Heuvelink3, Marco Bink2, Leo Marcelis3, 4,
Fred van Eeuwijk2,5
1
CMA and Department of Mathematics, Faculty of Sciences and Technology, Nova University
of Lisbon, 2Biometrics, Wageningen University and Research Centre
3
Horticultural Supply Chains, Wageningen University and Research Centre
4
Greenhouse Horticulture, Wageningen University and Research Centre
5
Centre for Biosystems Genomics, P.O. Box 98, 6700 AB Wageningen, The Netherlands
A different response of genotypes across environments is frequent in multi-location trials and
is known as genotype by environment interaction (GEI). The study and understanding of these
interactions is a major challenge for breeders and agronomic researchers. However, for the
last two decades, molecular markers and mapping techniques have allowed researchers to go
one step further and analyse the whole genome to detect specific genes which influence a
quantitative trait such as yield. These specific locations (loci) are called quantitative trait loci
(QTL), and the ”new” challenge of breeders is the analysis of QTL by environment
interaction (QEI).
In this paper we use an adaptation of the LINTUL (light interception and utilization
simulator) crop growth model with 7 physiological parameters, to simulate two-way
phenotypic data tables. To each of these 7 parameters, a number of QTL are assigned in order
to study different genetic architectures. Considering θ the vector of the 7 parameters and f(.) a
nonlinear function, the phenotypic realisations (e.g. yield for genotype i and environment j)
can be written as
Phei, j
f (θ)i
i, j , with
θ
QTL
θ
(1)
The objectives of this simulation study are: (i) to determine whether is possible to generate
realistic GEI and QEI, including crossovers, using a simple crop growth model with 7
physiological parameters without GEI; (ii) to determine whether the QTL for physiological
parameters are found in a QTL analysis of the two-way table of phenotypic data; and (iii) to
explore and compare different genetic architectures underlying yield simulated by crop
growth modelling. GEI and QEI for yield of sweet pepper (Capsicum annuum L.) were
simulated through the crop growth model. In this case study, the QTL assigned to some of the
physiological parameters matched the QTL detected for yield.
33
Overview of growth models in R
Alicja Szabelska1, Michał Siatkowski2, Teresa Goszczurna1, Joanna Zyprych1
1
Department of Mathematical and Statistical Methods
2
Department of Agricultural Engineering
Poznan University of Life Sciences
The growth rates are a useful tool for modeling the natural events that involve investigation of
changes of the response in time. With given model of growth rate we can consider time
as a continuous variable instead of discrete one. Depending on the type of modeled process
we can obtain better or worse fit of given model to empirical data. The considered models
differ from each other. However for specific values of the parameters can introduce similar
shape of the growth curves. For the most of processes it is possible to find appropriate model.
In some cases there exists more than one suitable model.
This poster presents five growth models: exponential, logistic, log-logistic, Gompertz and
Weibull. For each model the formula, the graphical representation and tools for estimation
of the parameters in R platform are described. Next, Akaike Information Criterion
and Bayesian Information Criterion are presented as a criteria, which permit comparing them
and choosing the best model.
34
Analiza zróżnicowania genetycznego odmian i klonów koniczyny białej
(Trifolium Repens L.) przy użyciu markerów molekularnych
Agnieszka Tomkowiak1, Alicja Szabelska2, Joanna Zyprych2, Zbigniew Broda1,
Idzi Siatkowski2
1
Katedra Genetyki i Hodowli Roślin,2Katedra Metod Matematycznych i Statystycznych
Uniwersytet Przyrodniczy w Poznaniu
W ostatnich latach obserwuje się stopniowy wzrost zainteresowania uprawą roślin
motylkowatych, będący wynikiem zastępowania w żywieniu zwierząt gospodarskich białka
zwierzęcego paszami roślin wysokobiałkowych.
Celem badań była analiza zróżnicowania genetycznego odmian i klonów koniczyny białej z
wykorzystaniem techniki RAPD - PCR oraz określenie udziału poszczególnych klonów w
tworzeniu odmian na podstawie podobieństwa genetycznego określonego przy pomocy
markerów molekularnych. Materiałem roślinnym użytym do badań były cztery odmiany
koniczyny białej oraz jedna odmiana rozmnażana generatywnie poprzez szkółkę selekcyjnorozmnożeniową. Dla zobrazowania dystansów genetycznych badanych współczynników
zastosowano metodę grupowania elementów we względnie jednorodne klasy. Podstawą
grupowania jest podobieństwo pomiędzy elementami – wyrażone przy pomocy metryki
euklidesowej. Przy wyznaczaniu grup odmian podobnych zastosowano algorytm grupowania
hierarchicznego – algorytm tworzy dla zbioru obiektów hierarchię klasyfikacji, zaczynając od
takiego podziału, w którym każdy obiekt stanowi samodzielne skupienie, a kończąc na
podziale, w którym wszystkie obiekty należą do jednego skupienia. W procesie grupowania
wykorzystano metodę średnich. Chcąc porównać wyniki dla każdego współczynnika
wykorzystano test Mantela i wpółczynnik korelacji Spearmana.
Na podstawie przeprowadzonych analiz stwierdzono, że wyselekcjonowane startery
generowały polimorfizm, który pozwolił dobrać komponenty do krzyżowań w celu
testowania zdolności kombinacyjnych. Dendrogramy wykonane w oparciu o współczynniki
Nei i Ochai tworzą grupy podobieństwa w skład których wchodzą odmiany wraz ze swoimi
klonami najprecyzyjniej więc grupują formy pod względem pochodzenia. Współczynniki
Simple Matching, Hamman oraz Roger and Tanimoto tworzą grupy, które często zawierają
odmiany i klony nie należące do danej odmiany nie są więc użyteczne przy wyborze
komponentów do krzyżowań.
35
An application of the generalized linear models for an examination of the
phenotypic quality of roe deer
Joanna Ukalska1, Krzysztof Ukalski1, Jakub Borkowski 2
1
Biometry Division, Department of Econometrics and Statistics,
Warsaw University of Life Sciences
2
Department of Forest Ecology
Forest Research Institute, Raszyn
The influence of forest environment (forest regeneration after a 1992 forest fire covered with
young stands (low quality deer habitat) and unburned forest of diversified stand age classes
(high quality deer habitat)) and climatic factors (the mean temperature and the total number
of days with snow cover in January and February) on roe deer antler asymmetry in two
age classes of roe deer males was studied. Data were collected by local hunters from 366 shot
males during 1998-2007. We applied 4 generalized linear models: Poisson model, Poisson
adjusted for overdispersion, negative binomial and negative binomial with log canonical link
function. Goodness-of-fit statistics were checked as well as residuals plots. There was
a significant difference in roe deer antler asymmetry incidence between age classes for both
considered habitats while weather conditions didn’t influence roe deer antler asymmetry.
36
Morfological analysis of inflorescence mutants in
alfalfa (Medicago sativa L.sl.) with the respect to seed yield traits
Dorota Weigt1, Alicja Szabelska2, Joanna Zyprych2, Idzi Siatkowski2,
Zbigniew Broda1
1
2
Department of Genetics and Plant Breeding
Department of Mathematical and Statistical Methods
Poznan University of Life Sciences
The research was performed based on lucerne plants (Medicago sativa L.) which belong to
three types of inflorescence’s mutation: mutant with a long peduncle inflorescence – lp,
mutant with a branched raceme inflorescence – br, and mutant with top flowering
inflorescence - tf. Radius variety, which has inflorescence typical for Medicago species, was
used as a control.
Material was analyzed with respect to 5 qualitative features that are main components
of the seed yield: number of raceme in the shoot, flower number per raceme, pod number per
raceme, seed number per pod, and number of embryos per ovary.
The results obtained from the biometric measurements constituted the starting material
for the statistical analysis. The graphical interpretation of the mutants of the features was
performed using the boxplots. To investigate the dissimilarities in the seed yield traits for
different forms of lucerne multivariave analysis of variance (MANOVA) was performed.
Taking into account the size of data and the number of forms the assumptions of normality
(using Shapiro-Wilks and Shapiro – Francia tests) and the assumption equality of variances
(using Bartlett and Levene tests) were verified. In MANOVA four parametric tests were used:
Hotelling-Lawley test, Pillai test, Wilks test, Roy test. In addition, nonparametric MANOVA
were performed using permutational test. Since the results of each test revealed differences in
the seed yiled traits, each feature was analyzed separately using parametric and nonparametric
ANOVA. Tukey’s test was used to investigate significant distinctions in the seed yield
between analyzed forms.
37
Comparisons of uniformity decisions based on Coyu
and Bennett’s methods – simulated data
Bogna Zawieja1, Wiesław Pilarczyk1,2, Bogna Kowalczyk2
1
Department of Mathematical and Statistical Methods
Poznań University of Live Sciences
2
The Research Centre for Cultivar Testing
Uniformity decisions concerning new varieties of plants are based both on quantitative
characteristics and on qualitative characteristics. Decision rules for qualitative characteristics
(usually “qualitative” is equivalent with “visually assessed”) are rather simple. Namely
for every new variety the number of non-typical plants in a fixed sample size is counted
and if it is larger than the threshold value (established by crop-experts), the variety is treated
as non-uniform. More complicated procedure is applied for quantitative characteristics.
Decisions are based on comparisons of standard deviation of candidate variety with average
value of standard deviations of so called reference varieties. A special procedure called
COYU (combined over years uniformity) was elaborated by member states of UPOV
(International Union for Protection of New Varieties of Plants) for this purpose, Talbot
(2000). The COYU method is – to some degree – an officially promoted method. But some
other methods are still under consideration. One of such methods uses the Bennett test
for coefficients of variation. The details of this new approach are given in paper by Zawieja
and Pilarczyk (2005, 2006, 2007) and by Zawieja, Pilarczyk and Kowalczyk (2009).
Some comparisons of uniformity decisions concerning winter wheat and oilseed rape varieties
based on COYU and Bennett’s test are also included in mentioned papers. During the annual
session of Technical Working Party on Automation and Computer Programs (held in
Alexandria, Virginia in June 2009) it was suggested to compare decisions on uniformity
of varieties using simulated data based on real measurements. So in the present paper
this problem is reconsidered using real data for oilseed varieties (reference set) and simulated
data (candidate varieties).
38
Gene’s selection based on statistical tests
Joanna Zyprych, Alicja Szabelska, Idzi Siatkowski
Department of Mathematical and Statistical Methods
Poznań University of Life Sciences
The technology of microarrays allows the investigation of thousands and millions of genes
at the same time. It enables to indicate the information about expression profile of genes.
Statistical analysis is widely used in searching over- and under expressed genes. Apparently,
there exist many statistical tests that verify the assumed hypothesizes. The classic example
of such tests is a group of tests verifying the equality of means of expression levels.
Researcher can often be mystified about the choice of the most appropriate test
in his investigation. Presented poster constitutes assistance in solving this problem.
Firstly, within the group of tests verifying the equality of means - Brown-Forsythe test,
F-ANOVA test and Kruskall-Wallis test - the analysis of efficiency of these tests
was performed with respect to classification of the differential genes. Secondly, the analogous
analysis was undergone for tests concerning the equality of variances, i.e. Bartlett test,
Fligner – Killeen test and Levene test. Thirdly, based on the previously selected genes
as the training set the prediction of the chosen sample with remaining genes was tested
applying several methods of machine learning techniques. As the results of this analysis
we present values of misclassified genes. With this approach we can determine the most
differential genes. The aim is the comparison of several statistical tests and review
of the usefulness of these tests in the selection of genes from microarray experiments.
All the computations were performed with usage of R platform, version 2.10.0.
39