R and Eurostat bulk data
In this Exercise I am testing Eurostat bulk data source and plot these data into Heatmap. Let’s try with this data:
“Harmonised unemployment rates (%) – monthly data (ei_lmhr_m)”
#You will also find Eurostat Data source from here:
http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/bulk_download
#I will automate this data downloading and extracting, but just now this is semiautomatic
# create download directory and set it
.exdir = 'c:/data/tmp2' # put there your own data folder
dir.create(.exdir)
.file = file.path(.exdir, 'ei_lmhr_m.tsv.gz') # change this
# download file
url = 'http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data
%2Fei_lmhr_m.tsv.gz'
download.file(url, .file)
# untar it (Note: I do not know why I got error message: Error in getOct(block, 100, 8) : invalid octal digit)
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))
# Argh...something going wrong with this step, so I have to manipulate just downloaded data. First I remove comma
# from very first variables and etc... I always use Notetab light as a Text editor in this kind of task.
# Reading file into R. Please refer here your own data folder...
input <- read.table("c:/data/tmp2/ei_lmhr_m.tsv", header=TRUE, sep="\t", na.strings=":", dec=".", strip.white=TRUE)
#just checking
head(input)
# LM-UN-T-TOT = Unemployment rate according to ILO definition - Total rate
# NSA = not seasonally adjusted
input<- input[which(input$indic=="LM-UN-T-TOT"),]
input<- input[which(input$s_adj=="NSA"),]
#giving appropriate names in to the heatmap (without this manouver there will be only row id)
row.names(input) <- input$geo.time
#just checking
head(input)
#Column selection. We will get data between time period 05/2008 - 05/2012
input2 <- input[,5:53]
# data frame must change into data matrix to produce heatmap.
input_matrix <- data.matrix(input2)
#heatmap is almost here
input_heatmap <- heatmap(input_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c
(5,10), xlab = "Harmonised unemployment rates (%) - monthly data", ylab= "Country or Area")
#saving heatmap into folder
jpeg("G:/data/home/2012/marko/blogi_rbloggerqvist/data/eurostat/Harmonised unemployment rates percent
monthly data.jpg")
input_heatmap <- heatmap(input_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c
(5,10), xlab = "Harmonised unemployment rates (%) - monthly data", ylab= "Country or Area")
dev.off()
Have fun,
Marko