Vision Igniter: 9월 2014

Coursera의 "Getting and Cleaning Data" Quiz 1.
단순한 퀴즈와 짧은 코드지만 기록을 위해 남겨둔다.

Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
and load the data into R. The code book, describing the variable names is here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
How many properties are worth $1,000,000 or more?

A1.

fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"

download.file(fileUrl, destfile="./quiz1.csv")

list.files("./")

d <- read.table("./quiz1.csv", sep=",", header=TRUE, na.string=0)

head(d)

d[d$VAL > 23 & !is.na(d$VAL), "VAL"]

Question 3

Download the Excel spreadsheet on Natural Gas Aquisition Program here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx

Read rows 18-23 and columns 7-15 into R and assign the result to a variable called:

dat

What is the value of:

 sum(dat$Zip*dat$Ext,na.rm=T)

(original data source: http://catalog.data.gov/dataset/natural-gas-acquisition-program)

A3.

install.packages("xlsx")

library(xlsx)

fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx"

download.file(fileUrl, destfile="./quiz2.xlsx")

colIndex <- 7:15

rowIndex <- 18:23

dat <- read.xlsx("./quiz2.xlsx", sheetIndex=1, colIndex=colIndex, rowIndex=rowIndex)

sum(dat$Zip*dat$Ext,na.rm=T)

Question 4

Read the XML data on Baltimore restaurants from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

How many restaurants have zipcode 21231?

A4.

install.packages("XML")

library(XML)

fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"

download.file(fileUrl, destfile="./quiz4.xml")

doc <- xmlTreeParse("./quiz4.xml", useInternal=TRUE)

rootNode <- xmlRoot(doc)

xmlName(rootNode)

zipcode <- xpathSApply(rootNode, "//zipcode", xmlValue)

zipcode[zipcode=="21231"]

Question 5

DT

Which of the following is the fastest way to calculate the average value of the variable

pwgtp15

broken down by sex using the data.table package?

A5.

fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv "

download.file(fileUrl, destfile="./quiz5.csv")

DT <- read.table("./quiz5.csv", sep=",", header=TRUE)

head(DT)

sapply(split(DT$pwgtp15,DT$SEX),mean)

Vision Igniter

2014년 9월 9일 화요일

Getting and Cleaning Data Quiz1

Question 3

Question 4

Question 5

프로필