Artykuł w wersji angielskiej w formacie PDF
Transkrypt
Artykuł w wersji angielskiej w formacie PDF
Strona |1 Nauki ścisłe priorytetem społeczeństwa opartego na wiedzy Artykuły na platformę CMS Dr Anna Rybak Institute of Computer Science University in Białystok Ways of presenting problems in statistics Introduction In the following article the basics of descriptive statistics, which deals with describing, demonstrating in various forms and analysing the results of researches conducted on a random sample are presented. The methods of arranging and visualisation of statistical data (frequency distribution, histograms and other diagrams) as well as descriptive statistics and examining the correlation between statistical features have been demonstrated. These issues go beyond the core curriculum in mathematics, but they are not difficult and together with the help of appropriate computer visualisation a student can comprehend them easily. Problem A survey involving the sample of 40 students was conducted. Each of the students answered the question: „How many books did you read last month?” Here are the answers of the students: 5, 1, 2, 0, 5, 4, 4, 1, 1, 1, 2, 0, 0, 0, 3, 1, 1, 2, 5, 4, 6, 4, 0, 1, 2, 3, 5, 2, 1, 2, 3, 0, 2, 4, 3, 2, 2, 3, 0, 1. What can be inferred about reading in this group of youth? Theoretical introduction Statistics is a branch of mathematics, which deals with statistical deduction, which means formulating and verifying general conclusions (statistical hypotheses) on the basis of a finite results number concerning random observations. While carrying out statistical research of a certain collectivity (population) its representative group called sample is chosen. The sample is the subject of direct examination, and the results are generalized to the whole population. The phenomenon being examined is called statistical feature (the name „variable” may also be used – it is commonly used in software concerning statistics), and the results of a research conducted on a sample – feature values. The reliability of such research depends mainly on the sample chosen. Statistics is divided into two branches: descriptive statistics, which deals with elaborating, presenting in different forms and analysing the results of research conducted on a random sample and mathematical statistics, which deals with concluding concerning feature values arrangement Projekt współfinansowany przez Unię Europejską w ramach Europejskiego Funduszu Społecznego Strona |2 Nauki ścisłe priorytetem społeczeństwa opartego na wiedzy Artykuły na platformę CMS in the whole population depending on results of sample research. Results obtained from the research on a sample may be demonstrated in various graphical ways (charts, various diagrams) and analysed with so called numerical statistics. In mass-media the effects of applying methods of descriptive statistics to visualise results of research concerning various social, political, economic, cultural and other phenomena are very often observed. The proper software is very useful while elaborating statistical data (especially while producing diagrams and carrying out complex mathematical operations). A program which will be used when considering statistical issues is Statistics and Probability (Statystyka i prawdopodobieństwo). Its English demo version may be downloaded from www.vusoft2.nl (file vustatengdemo.zip). Any spreadsheet may also be used. Examination of the issue In the problem presented in the introduction all students of a certain school may be taken as a population. The research sample comprises forty chosen students, and the feature examined is the number of books read. The question stated: ” What can be said about reading in the group of youth?” should be more detailed. Which specific questions may be asked in order to obtain as much as possible information describing the phenomenon in possibly the most exact way? First of all it should be noticed that data are not arranged so they are not readable. Any activity of analysing and concluding is hindered. With the help of a program Statistics and Probability (Statystyka i prawdopodobieństwo) data may be arranged in the form of a chart called number board (table), using options Statistics/Tables/Number board. The table is demonstrated below: No of books 0-1 2-3 4-5 6-7 Total No 16 14 9 1 40 The left column includes so called classes of feature values. Such data grouping occurs when the Projekt współfinansowany przez Unię Europejską w ramach Europejskiego Funduszu Społecznego Strona |3 Nauki ścisłe priorytetem społeczeństwa opartego na wiedzy Artykuły na platformę CMS number of observations is high. If we want to have all values of the feature examined (that means the number of books read) enlisted in the left table column, we use the recently used option with the button Classes stating the number of classes as 7 (it is the number of values of the feature examined): No of books 0 1 2 3 4 5 6 Total No 7 9 10 4 5 4 1 40 If the field Percentage is marked, the following table will be obtained: No of books 0 1 2 3 4 5 6 Total No % 7 9 10 4 5 4 1 40 17,50 22,50 25,00 10,00 12,50 10,00 2,50 100% Depending on the above tables what questions can be asked? For instance: How many books did the most students read? What percentage is it of the number of students examined? How many books did the least students read? What percentage i sit of the number of students examined? Are there students who don’t read at all? How many of them are there? What percentage is it of the group of students examined? Answer the above questions. Maybe you notice other issues that could be asked about? Tabular data arrangement is not the most graphic. The data will be demonstrated in the form of different types of diagrams. It can be done using the option Statistics/Diagrams and choosing an appropriate type of diagram or Show all. The option Data diagrams from main Menu may also be Projekt współfinansowany przez Unię Europejską w ramach Europejskiego Funduszu Społecznego Strona |4 Nauki ścisłe priorytetem społeczeństwa opartego na wiedzy Artykuły na platformę CMS used. In such a way a bar graph including the number of books read may be obtained: Or including the in general reading: percentage of individual numbers of books read What conclusions can you draw on the basis of the graph? It is clearly visible that most students tested read less books (0, 1 or 2). It is also possible to prepare a pie diagram which is extremely useful while examining the structure of a phenomenon: Examining the phenomenon of reading in students could be more thorough if the students tested were asked about different features. The research can be extended so that reading can be seen in the context of other features. To do that students are asked about sex and the mark in Polish. Projekt współfinansowany przez Unię Europejską w ramach Europejskiego Funduszu Społecznego Strona |5 Nauki ścisłe priorytetem społeczeństwa opartego na wiedzy Artykuły na platformę CMS The sex designations are assigned as follows: The number 0 symbolises male, 1 – female. What questions can be now asked to the issue? For instance: Who reads more books: males or females? Who has got better marks in Polish: males or females? Is there any connection (in statistics it is called correlation) between the mark in Polish and the number of books read? In order to answer the questions concerning the difference of reading in males and females the data will be divided into two groups according to gender variable. The option Statistics/Data/Divide will be used, and the variable sex as a division variable will be chosen. The division into two classes have been made: males and females. It is reflected in the number table: sex No of books 0 1 2 3 4 5 6 Total sex 0 No sex 1 % 6 9 4 3 1 0 1 24 25,00 37,50 16,67 12,50 4,17 0,00 4,17 100% No % 1 0 6 1 4 4 0 16 Total 6,25 0,00 37,50 6,25 25,00 25,00 0,00 100% 7 9 10 4 5 4 1 40 as well as in the graphs whose forms we can choose: It is possible to evaluate which graph is more readable and convenient as far as making Projekt współfinansowany przez Unię Europejską w ramach Europejskiego Funduszu Społecznego Strona |6 Nauki ścisłe priorytetem społeczeństwa opartego na wiedzy Artykuły na platformę CMS comparisons is concerned. The above graphs illustrate the reading level in males and females. If you want to demonstrate Polish grades, you must pick the proper variable before making a graph. Then you may obtain such a graph: In order to examine if there is any correlation between the Polish grade and the number of books read it is possible to use the option Statistics/Diagrams/Scatterplot. On the basis of graph and coefficients placed in the table above the diagrams we can conclude if there is a correlation between the variables. If (in the case of a linear model) points on the graph are arranged near the straight line, there is a correlation. The presence of the correlation is also deducted according to correlation coefficient, which is a number of a range -1, 1 . The higher is absolute value of this coefficient, the bigger is correlation. Projekt współfinansowany przez Unię Europejską w ramach Europejskiego Funduszu Społecznego Strona |7 Nauki ścisłe priorytetem społeczeństwa opartego na wiedzy Artykuły na platformę CMS Data of a certain research have been elaborated and analysed on a graph. The extension of the research has been planned so that the phenomenon examined (reading level) could be analysed more comprehensively. Conclusions: Reading books in students has not been well developed. Females read more books than males. Females have better grade in Polish than males. There is a strong correlation between the grade in Polish and the number of the books read, and it is higher in males than females. The conclusions comprise a generalization concerning the whole population, and are drawn on the basis of group observation. These are statistical hypotheses, with the exception of the first conclusion which has been very inaccurately formed – we do not know what it means in mathematical sense: „reading has not been well developed”. A statistical hypothesis may be verified (check whether it is right or wrong) using the methods of mathematical statistics. Also the program Statistics and probability (Statystyka i prawdopodobieństwo) can help with that. It is not, however, an issue at a secondary school. Projekt współfinansowany przez Unię Europejską w ramach Europejskiego Funduszu Społecznego