Finding Centroid point from shape file and labelling map
Finding Centroid point from shape file and map data labelling
In this test I try to find centroid point in shp file. In additionally I will show how to use this centroid point information when to labelling area name into map.
R Library PBSmapping offers wide range of functions to do that job.
Needed Library
require(rgdal)
## Loading required package: rgdal
## Loading required package: sp
## Warning: package 'sp' was built under R version 2.15.1
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.9.0, released 2011/12/29 Path to GDAL shared
## files: C:/Program Files/R/R-2.14.2/library/rgdal/gdal Loaded PROJ.4
## runtime: Rel. 4.7.1, 23 September 2009, [PJ_VERSION: 470] Path to PROJ.4
## shared files: C:/Program Files/R/R-2.14.2/library/rgdal/proj
require(PBSmapping)
## Loading required package: PBSmapping
## ----------------------------------------------------------- PBS Mapping
## 2.62.34 -- Copyright (C) 2003-2012 Fisheries and Oceans Canada
##
## PBS Mapping comes with ABSOLUTELY NO WARRANTY; for details see the file
## COPYING. This is free software, and you are welcome to redistribute it
## under certain conditions, as outlined in the above file.
##
## A complete user guide 'PBSmapping-UG.pdf' is located at C:/Program
## Files/R/R-2.14.2/library/PBSmapping/doc/PBSmapping-UG.pdf
##
## To see demos, type '.PBSfigs()'.
##
## Packaged on 2012-03-01 Pacific Biological Station, Nanaimo
## -----------------------------------------------------------
require(maptools)
## Loading required package: maptools
## Loading required package: foreign
## Loading required package: lattice
## Checking rgeos availability: TRUE
Importing Shapefile
FIN_adm2.pb <- importShapefile("FIN_adm2")
ImportShapefile reads the .prj file if it exists, but it
does not adopt the proj4 format
proj.abbr <- attr(FIN_adm2.pb, "projection")
# abbreviated projection info
## ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2 NL_NAME_2
## 1 78 FIN Finland 1 Eastern Finland 1 North Karelia <NA>
## 2 78 FIN Finland 1 Eastern Finland 2 North Savonia <NA>
## 3 78 FIN Finland 1 Eastern Finland 3 Päijänne Tavastia <NA>
## 4 78 FIN Finland 1 Eastern Finland 4 Southern Savonia <NA>
## 5 78 FIN Finland 2 Lapland 5 Lapland <NA>
## 6 78 FIN Finland 3 Oulu 6 Kainuu <NA>
## VARNAME_2 TYPE_2 ENGTYPE_2
## 1 Pohjois-Karjala|Norra Karelen Maakunta|landskap Region
## 2 Pohjois-Savo|Norra Savolax Maakunta|landskap Region
## 3 Päijät-Häme|Päijänne Tavastland Maakunta|landskap Region
## 4 Etelä-Savo|Södra Savolax Maakunta|landskap Region
## 5 Lappi|Lappland Maakunta|landskap Region
## 6 Kainuu|Kajanaland Maakunta|landskap Region
merging shape files area name and centroid points (x and y coordinates) labelData as data frame
Within couple of days I have been testing DEA modelling (Data Envelopment Analysis) with different R-package. Finally,…. I have found such a comprehensive way to calculate Malmquist indices.
Chart 1
My primary purpose is to show how to use nonparametric methods for measuring efficiency and productivity by using R-programs nonparaeff -package.
At this same time I will show you how to present Malmquist indices within googleVis -world map and finally I will introduse how to make forecast chart for productivity (Chart 1), effectiveness and technical effectiveness indices. In this work I used R-program forecast -library.
First of all I would like to thank Author Dong-hyun Oh for nice work with this nonparaeff-package.
In this working paper I will use example of faremalm2. Like nonpraeff -package documentation we calculate Malmquist productivity growth index of OECD countries
(productivity, technical efficiency and efficiency). As data source we have used Penn World Table (like original sources) with following version:
OECD Timeseries 1980-1990 version pwt5.6
OECD Timeseries 1990-2009 version pwt7.0
In pwt7.0 I cannot find capital stock data, so I downloaded it from here (http://www.ifw-kiel.de/forschung/datenbanken/netcap)
Productivity calculation 1980-1990 As output variables is used Total GDP of a country. This variables is calculated using GDP per capita (rgdpl)
and amount of total population (pop). As input variables is used Total labor force (gdp/rgdpwok) and Total capital stock (kapw * labor)
In second productivity computing calculation 1990-2009 I only use one input and one output variables (labor and gdp). This is because there is no
capital stock data available in pwt7.0 (please correct this If I missed something). I will re-calculate this period asap when I get capital data from all countries.
In third productivity computing 1992-2002 I use pwt7.0 and get capital stock data from different OECD datasource
(http://www.ifw-kiel.de/forschung/datenbanken/netcap). As input I used labor and capital variables and as output variables gdp.
When use faremalm2 function you will get the following outputs:
pc.c –> Productivity growth for each country
pc.y –> A trend of productivity growth of SELECTED countries (in this work we use OECD)
tc.c –> Technical efficiency for each country
tc.y –> A trend of technical efficiency growth of SELECTED countries
ec.c –> Efficiency change for each country
ec.y –> A trend of efficiency change of SELECTED countries
Useage of faremalm2
faremalm2(dat = NULL, noutput = 1, id = “id”, year = “year”)
dat –> The format of this data frame is data.frame(id, year, outputs, inputs).
noutput –> number of outputs
id –> column name for DMU:s
year –> column name for the time index
## Malmquist productivity growth index of OECD countries
#First install data from OECD
install.packages("pwt") # (installing a package can take a couple of minutes)
#Setting the library into use
library(pwt) ## Use Penn World Table
library(nonparaeff) # DEA modelling
More information about pwt data you can find there (http://pwt.econ.upenn.edu/php_site/pwt_index.php)
#my.dat <- pwt5.6 #used in 1980-1990
#my.dat <- pwt6.3 #
my.dat <- pwt7.0 #used in 1990-2009 and 1992-2002
#head(my.dat)
summary(my.dat)
#my.dat$country
#capital stock data missing so we get it from there
# http://www.ifw-kiel.de/forschung/datenbanken/netcap
#picking up OECD countries
my.oecd.ctry <- c("AUS", "AUT", "BEL", "CAN", "CHE", "DNK", "ESP", "FIN", "FRA", "GBR", "GER", "GRC", "IRL", "ISL", "ITA", "JPN", "KOR", "LUX", "MEX", "NLD", "NOR", "NZL", "PRT", "SWE", "TUR", "USA", "DEU")
#NOTE WPCODE USED IN OLDER DATASET
#adding country code
#my.dat <- my.dat[my.dat$wbcode %in% my.oecd.ctry,] #use this with pwt5.6
my.dat <- my.dat[my.dat$isocode %in% my.oecd.ctry,] #use this with pwt7.0
summary(my.dat)
Note! I encounter problem (NA or other reason) thats why I do this out of box. my.dat is transformed into data.frame and after
that modified with Excel. This is only used in timeseries 1992-2002 productivity calculation.
#INSERTING some variables to my.dat (ie. INPUT and OUTPUT -variables calculation)
my.dat$rgdpl <- as.numeric(my.dat$rgdpl) ## GDP per capita
my.dat$pop <- as.numeric(my.dat$pop) ## total population (1000)
my.dat$rgdpwok <- as.numeric(my.dat$rgdpwok) ## GDP per labor
my.dat$gdp <- my.dat$rgdpl * my.dat$pop ## Total GDP of a country
my.dat$labor <- with(my.dat, gdp/rgdpwok) ## Total labor force
my.dat$capital <- with(my.dat, kapw * labor) ## Total capital stock MISSING pwt7 note: this is used in pwt5.6
#my.dat$capital <- as.numeric(my.dat$capital)
## Total capital stock ADDED FROM DIFF. SOURCE now used in time
series 1993-2002
my.dat$kapw <- as.numeric(my.dat$kapw) ## Capital stock per labor MISSING pwt7 note: this is used in pwt5.6
#my.dat$kapw <- with(my.dat, my.dat$capital/my.dat$labor) ##
Capital stock per labor now used in time series 1993-2002
#variable used in 1990-2009 Malmquist productivity calculation
#1980-1990
#1990-2009
#1992-2002
#these variables used in timeseries 1980-1990 and 1992-2002
#oecd <- my.dat[, c("country", "wbcode", "year", "gdp", "labor", "capital")] #used in time series 1980-1990
oecd <- my.dat[, c("country", "isocode", "year", "gdp", "labor")] # 1990-2009
#oecd <- my.dat[, c("country", "isocode", "year", "gdp", "labor", "capital")] #used in time series 1992-2002
#Now calcuatel productivity (pc), effiency (ec) and technical efficiency (tc)
#huom. tämä on tehty eri aikasarjoille ja hyvä niin koska mahdollistaa eri aikajaksojen tarkastelun
library(nonparaeff) # DEA modelling
re.oecd <- faremalm2(dat = oecd, noutput = 1, id = "isocode", year = "year")
#re.oecd <- faremalm2(dat = oecd, noutput = 1, id = "wbcode", year = "year")
summary(re.oecd)
######################################################
#ISOCODE USED IN #1990-2009 and 1992-2002
#note:
## productivity growth for each country
pc.c <- tapply(re.oecd$pc, re.oecd$isocode, geometric.mean)
#pc.c <- tapply(re.oecd$pc, re.oecd$wbcode, geometric.mean)
## a trend of productivity growth of SELECTED countries
pc.y <- tapply(re.oecd$pc, re.oecd$year, geometric.mean)
## technical efficiency for each country
tc.c <- tapply(re.oecd$tc, re.oecd$isocode, geometric.mean)
#tc.c <- tapply(re.oecd$tc, re.oecd$wbcode, geometric.mean)
## a trend of technical efficiency growth of SELECTED countries
tc.y <- tapply(re.oecd$tc, re.oecd$year, geometric.mean)
## efficiency change for each country
ec.c <- tapply(re.oecd$ec, re.oecd$isocode, geometric.mean)
#ec.c <- tapply(re.oecd$ec, re.oecd$wbcode, geometric.mean)
## a trend of efficiency change of SELECTED countries
ec.y <- tapply(re.oecd$ec, re.oecd$year, geometric.mean)
######################################################
pc.c
#1980-1990, 1990-2009
#country -variable(we use it in google vis plotting)
## productivity growth for each country
pc.c <- tapply(re.oecd$pc, re.oecd$country, geometric.mean)
## technical efficiency for each country
tc.c <- tapply(re.oecd$tc, re.oecd$country, geometric.mean)
## efficiency change for each country
ec.c <- tapply(re.oecd$ec, re.oecd$country, geometric.mean)
#missing time series 1992-2002 TURKEY, KOREA, LUXEMBOURG AND MEKSIKO capital stock data
#VUOSITTAINEN startyear-endyear KESKIARVO TEHOKKUUSKERTOIMESSA (MALMQUIST - FARREL)
#ec: efficiency change #TEHOKKUUSMUUTOS
ec.c_df <- as.data.frame(ec.c)
summary(ec.c_df)
ec.c_df
#tc: technical change #TEKNISEN TEHOKKUUDEN MUUTOS
tc.c_df <- as.data.frame(tc.c)
#pc: productivity change #TUOTTAVUUDEN MUUTOS
pc.c_df <- as.data.frame(pc.c)
#replace some country name with new one…
ec.c_df3 <- gsub(“United States of America”, “United States”, ec.c_df2$Country)
ec.c_df3 <- gsub(“Germany, West”, “Germany”, ec.c_df3)
ec.c_df3 <- gsub(“Korea, Republic”, “South Korea”, ec.c_df3)
#as a dataframe MUUNNETAAN DATA FRAMEKSI
ec.c_df3 <- as.data.frame(ec.c_df3)
tc.c_df3 <- as.data.frame(tc.c_df3)
pc.c_df3 <- as.data.frame(pc.c_df3)
#check total number of rows KATSOTAAN RIVINUMEROINNIT MERGELLE
summary(ec.c_df2)
nrow(ec.c_df2)
nrow(ec.c_df3)
summary(tc.c_df2)
nrow(tc.c_df2)
nrow(tc.c_df3)
summary(pc.c_df2)
nrow(pc.c_df2)
nrow(pc.c_df3)
#used in time series
#1980-1990
#1990-2009
#giving row id ...NUMEROIDDAAN ID MERGEÄ VARTEN
ec.c_df2$id1=c(1:26)
ec.c_df3$id1=c(1:26)
tc.c_df2$id1=c(1:26)
tc.c_df3$id1=c(1:26)
pc.c_df2$id1=c(1:26)
pc.c_df3$id1=c(1:26)
#...and for time series
#1991-2002
#giving row id ...NUMEROIDDAAN ID MERGEÄ VARTEN
ec.c_df2$id1=c(1:22)
ec.c_df3$id1=c(1:22)
tc.c_df2$id1=c(1:22)
tc.c_df3$id1=c(1:22)
pc.c_df2$id1=c(1:22)
pc.c_df3$id1=c(1:22)
#merge data table and renamed country by id .....
ec.c_df4 <-merge(ec.c_df2, ec.c_df3, by.x="id1", by.y="id1", all = TRUE)
tc.c_df4 <-merge(tc.c_df2, tc.c_df3, by.x="id1", by.y="id1", all = TRUE)
pc.c_df4 <-merge(pc.c_df2, pc.c_df3, by.x="id1", by.y="id1", all = TRUE)
#check the data
head(ec.c_df4)
head(tc.c_df4)
head(pc.c_df4)
summary(ec.c_df4)
#naming trick
names(ec.c_df5)<- c("Country", "Effiency change in OECD 1980-1990")
names(tc.c_df5)<- c("Country", "Technical Effiency change in OECD 1980-1990")
names(pc.c_df5)<- c("Country", "Productivity change in OECD 1980-1990")
#naming time series 1991-2009
names(ec.c_df5)<- c("Country", "Effiency change in OECD 1991-2009")
names(tc.c_df5)<- c("Country", "Technical Effiency change in OECD 1991-2009")
names(pc.c_df5)<- c("Country", "Productivity change in OECD 1991-2009")
#1991-2002
names(ec.c_df5)<- c("Country", "Effiency change in OECD 1991-2002")
names(tc.c_df5)<- c("Country", "Technical Effiency change in OECD 1991-2002")
names(pc.c_df5)<- c("Country", "Productivity change in OECD 1991-2002")
head(ec.c_df5)
head(tc.c_df5)
head(pc.c_df5)
#set library
library(googleVis)
#1980-1990
#me/230712
#1
#KMEAN VALUE OF MALMQUIST INDICES 1980-1990 Efficency change
Geo1=gvisGeoMap(ec.c_df5, locationvar="Country", numvar="Effiency change in OECD 1980-1990", options=list (height=450, width=800, dataMode='regions'))
#plot(Geo1)
cat(Geo1$html$chart, file=".../oecd_mean_eff_1980_1990_malmquist2.html")
#2
#KESKIARVO MEAN VALUE OF MALMQUIST INDICES 1980-1990 Efficency change
Geo2=gvisGeoMap(tc.c_df5, locationvar="Country", numvar="Technical Effiency change in OECD 1980-1990", options=list(height=450, width=800, dataMode='regions'))
#plot(Geo2)
cat(Geo2$html$chart, file=".../oecd_mean_tech_eff_1980_1990_malmquist2.html")
#3
#KESKIARVO MEAN VALUE OF MALMQUIST INDICES 1980-1990 Efficency change
Geo3=gvisGeoMap(pc.c_df5, locationvar="Country", numvar="Productivity change in OECD 1980-1990", options=list(height=450, width=800, dataMode='regions'))
#plot(Geo3)
cat(Geo3$html$chart, file=".../oecd_mean_prod_1980_1990_malmquist2.html")
#1991-2009 (capital var dropped)
#me/240712
#1
#KESKIARVO MEAN VALUE OF MALMQUIST INDICES
Geo1=gvisGeoMap(ec.c_df5, locationvar="Country", numvar="Effiency change in OECD 1991-2009", options=list (height=450, width=800, dataMode='regions'))
plot(Geo1)
cat(Geo1$html$chart, file=".../oecd_mean_eff_1991_2009_malmquist2.html")
#2
#VALUE OF MALMQUIST INDICES
Geo2=gvisGeoMap(tc.c_df5, locationvar="Country", numvar="Technical Effiency change in OECD 1991-2009", options=list(height=450, width=800, dataMode='regions'))
plot(Geo2)
cat(Geo2$html$chart, file=".../oecd_mean_tech_eff_1991_2009_malmquist2.html")
#3
#KESKIARVO MEAN VALUE OF MALMQUIST INDICES
Geo3=gvisGeoMap(pc.c_df5, locationvar="Country", numvar="Productivity change in OECD 1991-2009", options=list(height=450, width=800, dataMode='regions'))
plot(Geo3)
cat(Geo3$html$chart, file=".../oecd_mean_prod_1991_2009_malmquist2.html")
#1991-2002 (capital as input)
#me/240712
#1
#KESKIARVO MEAN VALUE OF MALMQUIST INDICES
Geo1=gvisGeoMap(ec.c_df5, locationvar="Country", numvar="Effiency change in OECD 1991-2002", options=list (height=450, width=800, dataMode='regions'))
plot(Geo1)
cat(Geo1$html$chart, file=".../oecd_mean_eff_1991_2002_malmquist2.html")
#2
#KESKIARVO MEAN VALUE OF MALMQUIST INDICES
Geo2=gvisGeoMap(tc.c_df5, locationvar="Country", numvar="Technical Effiency change in OECD 1991-2002", options=list(height=450, width=800, dataMode='regions'))
plot(Geo2)
cat(Geo2$html$chart, file=".../oecd_mean_tech_eff_1991_2002_malmquist2.html")
#3
#KESKIARVO MEAN VALUE OF MALMQUIST INDICES
Geo3=gvisGeoMap(pc.c_df5, locationvar="Country", numvar="Productivity change in OECD 1991-2002", options=list(height=450, width=800, dataMode='regions'))
plot(Geo3)
cat(Geo3$html$chart, file=".../oecd_mean_prod_1991_2002_malmquist2.html")
Now we plot productivity, effectiveness and technical effectiveness mean values historical trend and forecast by using
The cubic smoothing spline model. It is equivalent to an ARIMA(0,2,2) model.
What is the most informative way to present statistical dataset or actually time series data? Most convient way is to animate this data but how to do that with fastest and easiest way? Now I want to present one way to do that. Rob J Hyndman presented this data and animation technology in his blog on October 2010. Professor Hyndman presented French mortality data and how to utilize demography R-library.
In this example I will introduce how to present mortality, fertility and migration data from Finland. As data source I will use data from here: http://www.mortality.org/
To get data you have to register as a user in The Human Mortality Database (HMD).
Data source: Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at www.mortality.org or www.humanmortality.de (data downloaded on 17.07.2012).
Finnish death rate (mortality 1x1) data table name is Mx_1x1.txt
and address is http://www.mortality.org/hmd/FIN/STATS/Mx_1x1.txt
Of course you could also use hmd function to get data from HMD (included in demographic library) ie.
norway <- hmd.mx("NOR", "user", "pass", "Norway")
See more informaton about this from there.
Any way I downloaded Finnish data (23 Mb) as bulk transfer after registering above mentioned MHD service. In this case I do the following things:
nyears <- length(fin_mortality$year) for(i in 1:nyears)
{ pdf(paste("fimale",i,".pdf",sep=""),height=4,width=6.5)
x <- fin_mortality
if(i<nyears) x$
rate$male[,(i+1):nyears] <- NA
plot(x,series="male",ylim=c(-9.5,1.5),
main=paste("Finland: male mortality (",fin_mortality$year[1]-1+i,")",
sep=""))
if(i>1)
x$rate$male[,1:(i-1)] <- NA
lines(x,series='male',lwd=2,col=1) dev.off() }
#NEXT NOW WE WILL DO SEVERAL PDF FILES….AGAIN. 🙂
#FEMALE
nyears <- length(fin_mortality$year) for(i in 1:nyears)
{ pdf(paste("fifemale",i,".pdf",sep=""),height=4,width=6.5)
x <- fin_mortality if(i<nyears)
x$rate$female[,(i+1):nyears] <- NA
plot(x,series="female",ylim=c(-9.5,1.5),
main=paste("Finland: female mortality (",fin_mortality$year[1]-1+i,")",
sep="")) if(i>1) x$rate$female[,1:(i-1)] <- NA
lines(x,series='female',lwd=2,col=1) dev.off() }
#OK NOW LETS DO SOW TRICKS WITH LATEX
STEP 3
Note! run this latex file at the same working directory as previous R step.
In this example I use TexMakerX -program’s quick build –> and after that you will find animated pdf file from your working director.
%Save this in the same working dir Note: replace fifemale text with fimale to
produce male mortality animation.
\documentclass{article}
\usepackage{animate}
\usepackage{graphicx}
\begin{document}
\begin{center}
\centerline{\animategraphics[controls,buttonsize=0.3cm,width=12.5cm]
{6}{"fifemale"}{1}{132}}
\end{center}
\end{document}
How to make sense with line colours? Please read this article from Hyndman and Shang. http://robjhyndman.com/papers/Rainbow5.pdf “… the colors of the curves follow the order of a rainbow, with the oldest data in red and the most recent data in violet….”
STEP 5
For Web publishing it is better to do gif animation: R and ImageMagick Note: install also these package if you encounter problems Microsoft Visual C++ 2010 Redistributable Package (x86) and Microsoft Visual C++ 2010 Redistributable Package (x64) package But before do that try to do following: Note for some reason (I do not know why) I have to set my path variables like this: C:\Program Files\ImageMagick; So, I create ImageMagick -folder and copy convert.exe from folder C:\Program Files (x86)\ImageMagick-6.7.8-Q16 …and everything go fine now.
#In working director create a new folder and set the working directory to the new folder.
dir.create(“examples”)
setwd(“examples”)
library(animation)
library(demography)
#let’s check how many years we have nyears length(fin_mortality$year)
#now we will do as many frames as years we have
#MALE
makeplot <- function(){ for(i in 1:nyears) { x <- fin_mortality if(i<nyears) x$rate$male[,(i+1):nyears] <- NA plot(x,series="male",ylim=c(-9.5,1.5), main=paste("Finland: male mortality (",fin_mortality$year[1]-1+i,")",sep="")) if(i>1) x$rate$male[,1:(i-1)] <- NA
lines(x,series='male',lwd=2,col=1) } }
oopt = ani.options(interval = 0, nmax = nyears) saveMovie(makeplot(),interval = 0.1, width = 580, height = 400) ani.options(oopt)
As energy engineer I have sometimes need to visualize energy balance data using Sankey diagram. In this example I will introduce how to do that with R.
As a data source I will use Central Finlands energy balance 2010 data.
To produce Sankey with R we use Function SankeyR (you will find same source as I use in this example from here)
We run first SankeyR function. Copy following code into your RGui.
SankeyR <- function(inputs, losses, unit, labels, format="plot"){
########################
# SankeyR version 1.01 (updated August 10, 2010)
# is a function for creating Sankey Diagrams in R.
# See http://www.sankey-diagrams.com for excellent examples of Sankey Diagrams.
#
# OPTIONS:
# 'inputs' is a vector of input values
# 'losses' is a vector of loss values
# 'unit' is a string of the unit
# 'labels' is a vector of the labels for inputs and losses
# 'format' is the type of plotting:
# The default is "plot," which produces a plot in the R graphics device.
# Current alternate options include "pdf" and "bmp," which produce
# those file types under the name "Sankey.xxx" in the current directory.
#
# Inputs do not need to equal losses. Any difference will be displayed
# as a discrepancy in the height of the left and right sides of the diagram.
# This capability enables the developer to examine imbalances in flows.
# Percentages are a proportion of the inputs (so, the outputs might not equal 100%).
#
# EXAMPLE:
# Try using these values for the global carbon cycle, from Schlesinger (1997):
# inputs = c(120,92)
# losses = c(45,75,90,1,6)
# unit = "GtC/yr"
# labels = c("GPP","Ocean assimilation","Ra","Rh","Ocean loss","LULCC","Fossil fuel emissions")
# SankeyR(inputs,losses,unit,labels)
#
# UPDATES:
# 8/10/10 - Added drawing for only one input.
#
# CREDITS:
# Created for R by Aaron BERDANIER
# send questions or comments to
# aaron.berdanier@gmail.com
#
# SankeyR is based strongly on drawSankey for Matlab,
# from James SPELLING, KTH-EGI-EKV (spelling@kth.se)
# http://leniwiki.epfl.ch/index.php/DrawSankey
#
# Distributed under Creative Commons Attribution Non-Commercial.
# Licensees may copy, distribute, display, and perform the work and make
# derivative works based on it only for noncommercial purposes.
#
# Aaron would appreciate notification if you modify or improve this function.
########################
# Save new position of arrow top
limTop = limTop - frLosses[i]
# Advance to new separation point
newPos = posTop + rE + 0.01
# Draw top line to new separation point
lines(c(posTop,newPos), c(limTop,limTop), lwd=w)
# Save new advancement point
posTop = newPos
# SEPARATION LINES - not implemented yet
}
###
# Push the arrow forwards a little after all side-arrows drawn
newPos = max(posTop, posBot) + max(0.05*limTop,0.05)
# Draw lines to this new position
lines(c(posTop,newPos), c(limTop,limTop),lwd=w)
lines(c(posMid,newPos), c(limTop-frLosses[length(frLosses)],limTop-frLosses[length(frLosses)]),lwd=w)
# Draw final arrowhead for the output
lines(c(newPos,newPos,newPos+max(0.04,0.8*(frLosses[length(frLosses)])),newPos,newPos),
c(limTop,limTop+max(0.015,abs(frLosses[length
#saving into file jpg. note: picture quality not so good
jpeg(".../energybalance.jpg")
SankeyR(inputs,losses,unit,labels)
dev.off()
#saving into file note: picture quality not so good
png(".../energybalance.png")
SankeyR(inputs,losses,unit,labels)
dev.off()
#saving into file note: note: picture quality not so good
bmp(".../energybalance.bmp")
SankeyR(inputs,losses,unit,labels)
dev.off()
#saving into file note: picture quality excellent
pdf(".../energybalance.pdf")
SankeyR(inputs,losses,unit,labels)
dev.off()
Result is ok, but just now I have no idea how to produce “sub-branches” of losses. Anyway very nice routine to produce Sankey….And here you will find Sankey diagram in pdf format energybalance.
In this Exercise I am testing Eurostat bulk data source and plot these data into Heatmap. Let’s try with this data:
“Harmonised unemployment rates (%) – monthly data (ei_lmhr_m)”
#I will automate this data downloading and extracting, but just now this is semiautomatic
# create download directory and set it
.exdir = 'c:/data/tmp2' # put there your own data folder
dir.create(.exdir)
.file = file.path(.exdir, 'ei_lmhr_m.tsv.gz') # change this
# download file
url = 'http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data
%2Fei_lmhr_m.tsv.gz'
download.file(url, .file)
# untar it (Note: I do not know why I got error message: Error in getOct(block, 100, 8) : invalid octal digit)
untar(.file, compressed = 'gzip', exdir = path.expand(.exdir))
# Argh...something going wrong with this step, so I have to manipulate just downloaded data. First I remove comma
# from very first variables and etc... I always use Notetab light as a Text editor in this kind of task.
# Reading file into R. Please refer here your own data folder...
input <- read.table("c:/data/tmp2/ei_lmhr_m.tsv", header=TRUE, sep="\t", na.strings=":", dec=".", strip.white=TRUE)
#just checking
head(input)
# LM-UN-T-TOT = Unemployment rate according to ILO definition - Total rate
# NSA = not seasonally adjusted
input<- input[which(input$indic=="LM-UN-T-TOT"),]
input<- input[which(input$s_adj=="NSA"),]
#giving appropriate names in to the heatmap (without this manouver there will be only row id)
row.names(input) <- input$geo.time
#just checking
head(input)
#Column selection. We will get data between time period 05/2008 - 05/2012
input2 <- input[,5:53]
# data frame must change into data matrix to produce heatmap.
input_matrix <- data.matrix(input2)
#heatmap is almost here
input_heatmap <- heatmap(input_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c
(5,10), xlab = "Harmonised unemployment rates (%) - monthly data", ylab= "Country or Area")
#saving heatmap into folder
jpeg("G:/data/home/2012/marko/blogi_rbloggerqvist/data/eurostat/Harmonised unemployment rates percent
monthly data.jpg")
input_heatmap <- heatmap(input_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c
(5,10), xlab = "Harmonised unemployment rates (%) - monthly data", ylab= "Country or Area")
dev.off()
Have fun,
Marko