Basic
Data Analysis through R/R Studio
Hey Readers,
TASH here.!!
In this blog, I 'll design a basic
data analysis program in R using R Studio by utilizing the features of R Studio
to create some visual representation of that data. Following steps will be
performed to achieve our goal.
- Installing R.
- Installing R Studio.
- Downloading/importing data in R .
- Transforming Data / Running queries on data.
- Basic data analysis using statistical averages.
- Plotting data distribution.
Let's go over the tutorial by performing one step at a time.
1.
Installing R :
The following website will help to
download the R package for Windows XP/Vista/7/8/8.1/10 with 64/32bits (https://cran.r-project.org/bin/windows/base/).
* link to download the R Package (https://cran.r-project.org/bin/windows/base/R-3.2.5-win.exe)
2.
Installing R Studio :
The following website will help to
download the R Studio on Windows XP/Vista/7/8/8.1/10 with 64/32bits (https://www.rstudio.com/products/rstudio/download/)
* link to download the R Studio for
Windows Platform (https://download1.rstudio.org/RStudio-0.99.896.exe)
.
3.
Downloading/importing data in R .
For this tutorial we will use the
sample census data set ACS (http://stat511.cwick.co.nz/homeworks/acs_or.csv) . There are two ways to
import this data in R. One way is to import the data programmatically by
executing the following command in the console window of R Studio.
acs <-
read.csv(url("http://stat511.cwick.co.nz/homeworks/acs_or.csv"))
or else we can import dataset by just clicking
IMPORT DATASET and providing the URL Address.
Once this command is executed by
pressing Enter, the dataset will be downloaded from the internet, read as a csv
file and assigned to the variable name acs.
Imported Dataset..
Setting up the preferences of separator
( , ), name (table name) and other parameters, click on the Import
button. The dataset will be imported in R Studio and assigned to the
variable name as set before.
Any dataset can be viewed by executing the following line:
>View(acs_or)
4.
Transforming Data :
you can use
various transformation features of R to manipulate the data. Let's learn few of
the basic data access techniques.
To access a particular column, Ex. age_husband in our case.
> acs_or$age_husband
To access data as a vector
> acs_or[1,3]
> acs_or[1,3]
you can use the subset function of R. if
we want those rows from the dataset in which the age_husband is greater than
age_wife.
For this we 'll run the following
command in console
> a <- subset(acs_or,age_husband >
age_wife)
The
above statement will return the set the rows in which the age_husband is greater
than age_wife and assign those rows to a
.
5.
Basic data analysis using statistical averages:
Following functions can be used to
calculate the averages of the dataset :
a.
For
mean (average of the numbers) of column:
> mean(acs_or$age_husband)
b.
For
median (add up all the numbers & then divide by the number
of number.) of column:
> median(acs_or$age_husband)
c.
For Quantile (dividing the observations in a sample in the same way.) of column:
> quantile(acs_or$age_husband)
d.
For Variance (measurement of the spread between numbers in a data set) of column:
> var(acs_or$age_husband)
e.
For Standard
Deviation (measure of how spread out numbers are.) of column:
> sd(acs_or$age_husband)
You
can also get the statistical summary of the dataset by just running on either a
column or the complete dataset:
> summary(acs_or)
To use a small range of
rows from huge data set to perform analysis:
> s <- acs_or[1:100,]
It defines to create a new
dataset with 1 – 100 rows and all columns and store it in s.
6.
Plotting data distribution:
A very liked feature of
R studio is its built in data visualizer for R. Any data set imported in R can
visualized using the plot and several other functions of R. For Example
a.
To create a scatter plot of a data set, you can run the following command in
console:
> plot(x = s$age_husband, y = s$age_wife, type = 'p')

Where s is the subset of the original dataset and type 'p' set the plot type as point.

> plot(x = s$age_husband, y = s$age_wife, type = 'l')

You can also choose line and other change type variable to 'l' etc.

b.
To draw a Histogram of a dataset, you can run the command
> hist(acs_or$number_children)

for Bar Plots, run the following set of commands:
> counts <- table(acs_or$bedrooms)
>
> barplot(counts, main ="BedRooms Distribution", xlab = "Number of BedRooms" )


I hope this will give you a basic idea on how to do simple statistics in R.
No comments:
Post a Comment