22 November 2013
Today I rewiew code example boxplot, mean, packages.psych and study new code. New code is ggplot2. ggplot2 is a data visualization package for the statistical programming language R. It is based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.
Code
library(datasets) ?iris names(iris) # calling the variable names in iris iris.df <- iris # import iris data into workspace iris_df <- iris boxplot(iris.df) ?boxplot # horizontal boxplot/whisker plot boxplot(iris.df, horizontal = TRUE, col = “rainbow”) mean(iris.df$Sepal.Length) mean(iris.df$Sepal.Width) mean(iris.df$Petal.Length) mean(iris.df$Petal.Width) # to get the whole descriptive statistics # 1. install the package install.packages(“psych”) # 2. library(psych) describe(iris.df) # QUESTION? # Is the sepal lenght statistically longer than petal lenght? # If so, at what level of confidence? # Question? # We are now comparing the mean of # Sepal.Lenght against the mean of Petal.Lenght # Is this test a “pair” t-test # or “independent variable t-test” t.test(y1,y2) # pair t-test t.test(y ~ x)# independent variable(x) t-test # ANSWER = pair t-test t.test(iris.df$Sepal.Length, iris.df$Petal.Length) # Output # t = 13.0984, df = 211.543, p-value < 2.2e-16 # p-value is probability of making a mistake # if we reject null hypothesis (a.k.a. accepting alternative hypothesis) # alternative hypothesis: true difference in means is not equal to 0 # “true” = Considering all the population (all the iris in the world), # “difference in means” = “Sepal.Length – Petal.Length” # or # “difference in means” = “Petal.Length – Sepal.Length” # > or < zero # “Sepal.Length > Petal.Length” or “Sepal.Length < Petal.Length” # from p-value < 2.2e-16, e-16 is ???? # 2.2e-16 = 2.2*10^(-16) = 2.2 * 0.0000000000000001 # 2.2e-16 = 0.0000000000000022 # # We have a chance of being wrong at 0.00000000000022% # if we conclude that Sepal.Length is not equal to Petal.Length # Normally we use 95% confident level / 0.05 significant level # Therefore at 95% CL, if p-value <0.05, we can reject null hypothesis. # Or at 99% CL, if p-value <0.01, we can reject null hypothesis. # What if we test the difference in means of # Sepal.Length different species? install.packages(“ggplot2″) library(ggplot2) names(iris) qplot(Sepal.Length, Petal.Length, data=iris.df) ?qplot qplot(Sepal.Length, Petal.Length, data=iris.df, colour=Species) t.test(iris.df$Sepal.Length ~ iris.df$Species) mtcars ?mtcars qplot(mpg, hp, data = mtcars, colour = am) t.test(mtcars$mpg ~ mtcars$am) with(mtcars, t.test(mpg ~ am)) cars <- mtcars
————————————————————————————————————————————————-
SUPAPON PUNPOTHA (541610143)
Faculty of Economics Chiangmai University
E-mail : Sp.evening@gmail.com