:: packages.ggplot2 ::

22 November 2013

               Today I rewiew code example boxplot, mean, packages.psych and study new code. New code is ggplot2. ggplot2 is a data visualization package for the statistical programming language R. It is based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle  as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.

Code

library(datasets)
 ?iris
names(iris) # calling the variable names in iris
 iris.df <- iris # import iris data into workspace
 iris_df <- iris
boxplot(iris.df)
?boxplot
# horizontal boxplot/whisker plot
 boxplot(iris.df, horizontal = TRUE, col = “rainbow”)
mean(iris.df$Sepal.Length)
 mean(iris.df$Sepal.Width)
 mean(iris.df$Petal.Length)
 mean(iris.df$Petal.Width)
# to get the whole descriptive statistics
 # 1. install the package
 install.packages(“psych”)
# 2.
 library(psych)
describe(iris.df)
# QUESTION?
 # Is the sepal lenght statistically longer than petal lenght?
 # If so, at what level of confidence?
# Question?
 # We are now comparing the mean of
 # Sepal.Lenght against the mean of Petal.Lenght
 # Is this test a “pair” t-test
 # or “independent variable t-test”
t.test(y1,y2) # pair t-test
 t.test(y ~ x)# independent variable(x) t-test
# ANSWER = pair t-test
 t.test(iris.df$Sepal.Length, iris.df$Petal.Length)
# Output
 # t = 13.0984, df = 211.543, p-value < 2.2e-16
 # p-value is probability of making a mistake
 # if we reject null hypothesis (a.k.a. accepting alternative hypothesis)
 # alternative hypothesis: true difference in means is not equal to 0
 # “true” = Considering all the population (all the iris in the world),
 # “difference in means” = “Sepal.Length – Petal.Length”
 # or
 # “difference in means” = “Petal.Length – Sepal.Length”
 # > or < zero
 # “Sepal.Length > Petal.Length” or “Sepal.Length < Petal.Length”
 # from p-value < 2.2e-16, e-16 is ????
 # 2.2e-16 = 2.2*10^(-16) = 2.2 * 0.0000000000000001
 # 2.2e-16 = 0.0000000000000022
 #
 # We have a chance of being wrong at 0.00000000000022%
 # if we conclude that Sepal.Length is not equal to Petal.Length
# Normally we use 95% confident level / 0.05 significant level
 # Therefore at 95% CL, if p-value <0.05, we can reject null hypothesis.
 # Or at 99% CL, if p-value <0.01, we can reject null hypothesis.
# What if we test the difference in means of
 # Sepal.Length different species?
install.packages(“ggplot2″)
 library(ggplot2)
 names(iris)
 qplot(Sepal.Length, Petal.Length, data=iris.df)
?qplot
qplot(Sepal.Length, Petal.Length, data=iris.df, colour=Species)
t.test(iris.df$Sepal.Length ~ iris.df$Species)
mtcars
 ?mtcars
qplot(mpg, hp, data = mtcars, colour = am)
t.test(mtcars$mpg ~ mtcars$am)
with(mtcars, t.test(mpg ~ am))
cars <- mtcars

————————————————————————————————————————————————-
SUPAPON  PUNPOTHA (541610143)
Faculty of Economics Chiangmai University
E-mail : Sp.evening@gmail.com

Leave a comment