Marek Walesiak*

Transkrypt

Marek Walesiak*
author: Andrzej Dudek ([email protected])
Data Analysis in R Environment
Notes for Students:
1. R program can be obtained at GPL (free) licence from: http://cran.r-project.org/. R Studio can
be downloaded from http://www.rstudio.com/
2. Main condition for passing this course is preparation of project in one of the fields:
Cluster
analysis, multidimensional scaling, Factor analysis, Linear ranging, Decision trees, Neural networks
3. Projects can be developed separately or in (max) two person groups
4. One student (group) should prepare at least one project, but for better notes You could prepare
more (max three)
5. Each project should contain minimum three parts
a) data in csv format
b) R language scripts (in .r format)
c) Short description of used data and achieved goals in *.doc, *.odt or *.pdf format
6. The deadline for project preparing and sending to teacher is: January 15th 2017
7. Please sign sent project(s) with first name and surname (not only email address)
8. Mails with preliminary acceptance of projects will be send within seven days
9. Main language for labs is English
10. Additional materials/help files for course will be available at http://wgrit.ae.jgora.pl/ad
11. My email address is [email protected]. My handy nr is: 601 790 753 (emergency only
– use with care:-)
References:
[1] Dennis B.(2012) The R Student Companion. Chapman & Hall/CRC Press, Boca Raton, FL, 2012. ISBN 978-14398-7540-7.
[2] Walesiak M., Gatnar E. (red.) (2009), Statystyczna analiza danych z wykorzystaniem programu R, PWN, Warszawa.
[3] Gatnar E., Walesiak M. (red.) (2011), Analiza danych jakościowych i symbolicznych z wykorzystaniem programu
R, C.H. Beck, Warszawa.
[4] R Development Core Team (2012), R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, URL http://www.R-project.org.
[5] Walesiak M., Dudek A. (2012), clusterSim package, URL http://www.R-project.org.
Schedule
24.10.16 17.00 Introduction
28.11.16 17.00 R Environment, data visualisation
12.12.16 17.00 Cluster Analysis, Multidimensional scaling,
19.12.16 17.00 Factor analysis, Linear Ranging or Decision trees or Neural Networks (this labs
will probably be rescheduled)
12.01.17 16.00 Own work – finishing of projects (3 hours)
1
author: Andrzej Dudek ([email protected])
Editor – command lines persists and every time
can by executed by selecting lines and clicking
run (ctrl+r)
Command line – command is executed then
vanishes (but you can go back by history tab
or by arrow up, arrow down)
Environment – variables in workspace, data import
Utilities (installed libraries, chart plots, help)
1
author: Andrzej Dudek ([email protected])
 Calculate 2+2
 Calculate 17 /4
 Calculate the reminder 1345 / 11
 Calculate the 1345 / 11 as integer
 Calculate 45 squared
1
 Calculate third degree root of 343 (hint –n-th degree root from a is a n )
 Print numbers from 1 to 20
 Print numbers from 15 do 9
 Print even numbers from 2 do 22
 Insert matrix :
15
20
-4
-19
-2
19
-5
12
-1
10
14
2
3
4
6
-11
-9
-7
-12
-17
-10
-15
-13
5
7
Calculate determinant, transposed matrix , inverse matrix
 Load package clusterSim
 Display help on function cluster.Gen
 Search in the Help system on http://www.r-project.org/ everything relating to the method of
k-means (kmeans)
 Solve the systems of equations
1x
-1y
+5z
=
4x
+3y
-2z
= 11
+2z
= -8
-4x
3x
1
-4z
= -10
5x
-5
+1z
= -24
-3x
+4y
-2z
=
16
 Insert matrix
1
author: Andrzej Dudek ([email protected])
1 2

4  3
xij  
3 2

5 4
2 1

4 5

1 4

3 1 
Calculate sum of elements in third column, mean of elements in second row, maximal values for each column
 In sheets matrix1 and matrix2 of do_r.xls workbook are matrix to be multiply. Save worksheets as csv files. Import data into R. Make a multiplication. The result save in csv file
(write.table (..., sep = ""; ", dec =", ")). Place the results in the spreadsheet Excel workbook
result.
 EXAMPLE
The main type of graph in the environment R is a scatter plot of the data metric (quotient or
interval) using the plot function. This example shows the possibility of obtaining a multipart diagram mfrow using a parameter, which made the different types of charts or line created regression (function lm).First read Dane_3_1.csv file
x <- Dane_3_1
attach(x)
# Utworzenie obszaru dla czterech rysunków
par(mfrow=c(2,2), pty="s")
# Wykres rozrzutu z punktami
plot(x4, x5, lwd=2)
title("a)", font.main=1)
# Wykres rozrzutu z punktami i linią regresji
plot(x4, x5, lwd=2)
abline(lm(x5~x4), lwd=2)
title("b)", font.main=1)
# Wykres rozrzutu z punktami i numerami obiektów
plot(x4, x5,lwd=2)
text(x4, x5, pos=1)
title("c)", font.main=1)
# Wykres rozrzutu z liniami pionowymi
plot(x4, x5, type="h", lwd=2)
title("d)", font.main=1)
detach(x)
 Generate a set consisting of three classes of measures in (0.2); (-2,2), (2-2) and the matrix the
variance / covariance matrix. View this collection, so that each class was painted a different
color (clusterSim, plot)
 For the preparation of graphs of mathematical functions is substantially a function curve (see.
a) to c)), although it is possible to apply the function plot (see. E.g. d)).
options(OutDec =",")
2
author: Andrzej Dudek ([email protected])
# Funkcja sin(x) i ln(x) na jednym rysunku
curve(sin(x), col="red", xlim=c(0,4*pi), ylim=c(-1,log(4*pi)),
ylab="sin(x) i ln(x)")
curve(log(x), col="blue", add=TRUE, xlim=c(0,4*pi), ylab="sin(x)
i ln(x)")
title("a)", font.main=1)
windows()
# Funkcja sqrt(x*(1-x))
curve(sqrt(x*(1-x)), 0, 1, ylab=expression(sqrt(x(1-x))))
title("b)", font.main=1)
windows()
# Funkcja potęgowa
curve(2*x^0.4, ylab=expression(2*x^0.4), 0, 4)
title("c)", font.main=1)
windows()
# Wykres y=1/x z wykorzystaniem funkcji plot
x <- seq(0, 1, 0.01)
y <- 1/x
plot(x, y, type="l")
text(0.2, 40, labels="y=1/x")
title("d)", font.main=1)
 Create graph (chart) of function x^3-3x^2+1 in interval <-5,7>
3

Podobne dokumenty