R Data Frame Tutorial

R Data Frame Tutorial

By Debasis Das (29-Aug-2016)

In this R Data Frame tutorial we will load a data frame from a csv file and work on the data frame

Our csv has the following columns

  1. geo
  2. country
  3. product
  4. price
  5. unitsales
  6. revenue

You can download the csv used in this tutorial here UnitSales
R Data Frame CSV Snapshot

Check the current working directory

> getwd()
[1] "/Users/debasisdas"

Change the working directory

#set the working directory to the place where the csv is located
> setwd("/Users/debasisdas/Technology/R/")
> getwd()
[1] "/Users/debasisdas/Technology/R"

Load a data frame from a csv

> unitsales <- read.csv("UnitSales.csv") 
> unitsales
        geo      country     product price salesunit revenue
1  americas          usa    computer   800       100   80000
2  americas          usa     speaker    50        20    1000
3  americas          usa     monitor   150       200   30000
4  americas          usa accessories    30       100    3000
5  americas       mexico    computer   800       200  160000
6  americas       mexico     speaker    50        32    1600
7  americas       mexico     monitor   150       212   31800
8  americas       mexico accessories    30       432   12960
9  americas       brazil    computer   800       456  364800
10 americas       brazil     speaker    50       777   38850
11 americas       brazil     monitor   150       876  131400
12 americas       brazil accessories    30       987   29610
13     asia        india    computer   800       654  523200
14     asia        india     speaker    50       456   22800
15     asia        india     monitor   150       678  101700
16     asia        india accessories    30       102    3060
17     asia        china    computer   800       111   88800
18     asia        china     speaker    50        21    1050
19     asia        china     monitor   150        32    4800
20     asia        china accessories    30        32     960
21     asia rest of asia    computer   800        32   25600
22     asia rest of asia     speaker    50        11     550
23     asia rest of asia     monitor   150        11    1650
24     asia rest of asia accessories    30        11     330

Check the class of the object created by reading from csv

> class(unitsales)
[1] "data.frame"

Check the dimensions of the unitsales object

> dim(unitsales)
[1] 24  6
> length(unitsales)
[1] 6
> attributes(unitsales)
$names
[1] "geo"       "country"   "product"   "price"     "salesunit" "revenue"  

$class
[1] "data.frame"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Check the number of rows and number of columns in a data frame

> nrow(unitsales)
[1] 24
> ncol(unitsales)
[1] 6

Data frame head function

> #Instead of printing out the entire data frame, it is often desirable to preview it with the head function beforehand.
> head(unitsales)
       geo country     product price salesunit revenue
1 americas     usa    computer   800       100   80000
2 americas     usa     speaker    50        20    1000
3 americas     usa     monitor   150       200   30000
4 americas     usa accessories    30       100    3000
5 americas  mexico    computer   800       200  160000
6 americas  mexico     speaker    50        32    1600

Get specific columns from the data frame

> #print only specific columns
> unitsales[c("geo","country","product","salesunit")]
        geo      country     product salesunit
1  americas          usa    computer       100
2  americas          usa     speaker        20
3  americas          usa     monitor       200
4  americas          usa accessories       100
5  americas       mexico    computer       200
6  americas       mexico     speaker        32
7  americas       mexico     monitor       212
8  americas       mexico accessories       432
9  americas       brazil    computer       456
10 americas       brazil     speaker       777
11 americas       brazil     monitor       876
12 americas       brazil accessories       987
13     asia        india    computer       654
14     asia        india     speaker       456
15     asia        india     monitor       678
16     asia        india accessories       102
17     asia        china    computer       111
18     asia        china     speaker        21
19     asia        china     monitor        32
20     asia        china accessories        32
21     asia rest of asia    computer        32
22     asia rest of asia     speaker        11
23     asia rest of asia     monitor        11
24     asia rest of asia accessories        11

Summary of data frame

> #print the summary of the unitsales data frame
> summary(unitsales)
       geo             country         product      price         salesunit        revenue      
 americas:12   brazil      :4   accessories:6   Min.   : 30.0   Min.   : 11.0   Min.   :   330  
 asia    :12   china       :4   computer   :6   1st Qu.: 45.0   1st Qu.: 32.0   1st Qu.:  1638  
               india       :4   monitor    :6   Median :100.0   Median :106.5   Median : 24200  
               mexico      :4   speaker    :6   Mean   :257.5   Mean   :272.6   Mean   : 69147  
               rest of asia:4                   3rd Qu.:312.5   3rd Qu.:456.0   3rd Qu.: 82200  
               usa         :4                   Max.   :800.0   Max.   :987.0   Max.   :523200  

Subset of data frame

> #subset
> #get the data only for usa
> subset(unitsales, country=="usa")
       geo country     product price salesunit revenue
1 americas     usa    computer   800       100   80000
2 americas     usa     speaker    50        20    1000
3 americas     usa     monitor   150       200   30000
4 americas     usa accessories    30       100    3000

Subset a data frame based on a condition

> #get the data only for usa and computers
> subset(unitsales, country=="usa" & product == "computer")
       geo country  product price salesunit revenue
1 americas     usa computer   800       100   80000


> #get the total revenue for all geos
> sum(unitsales["revenue"])
[1] 1659520

> #get the total revenue for geo = americas
> sum(subset(unitsales,geo=="americas")["revenue"])
[1] 885020

Aggregate a data frame

> #aggregate by country and revenue
> aggregate(unitsales$revenue, by = list(country=unitsales$country), FUN=sum)
       country      x
1       brazil 564660
2        china  95610
3        india 650760
4       mexico 206360
5 rest of asia  28130
6          usa 114000


> #aggregate by geo and revenue
> aggregate(unitsales$revenue, by = list(geo=unitsales$geo), FUN=sum)
       geo      x
1 americas 885020
2     asia 774500

Posted in R Programming Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Recent Posts


Hit Counter provided by technology news