R and data visualization, R and maps, R sample code with data

R and population maps in Finland 31.5.2012

June 29, 2012 by admin - No comments

This example source is almost same like this. I used at first time sorvi-package. In this example I modified little bit original and try to compare MML and GADM shape – files. As we know there is some need to updates into GADM shapefiles (as you will see). Datafiles source date is 31.05.2012 from Väestörekisterikeskus. Thanks to Louhos -people and their invaluable work within open -data project.

# library sorvi You will find there

library(sorvi)
library(rgeos)
library(rgdal)
if (!gpclibPermit()) { gpclibPermit() }

# Finland map and municipility data at gadm-format
gadm <- GetGADM(“FIN_adm”, “Kunta”)

##now we use also MML as data source (shape) because of compare shape files
# (C) MML 2011
data(MML)
sp <- MML[[“1_milj_Shape_etrs_shape”]][[“kunta1_p”]]

# vaestorekisteri population data from data source vrk.fi 31.05.2012
vrek <- GetPopulationRegister(“http://vrk.fi/default.aspx?docid=6706&site=3&id=0”)

# Attach vrk data into map object and
# set population as zero where na
gadm$asukkaita <- log10(rowSums(vrek[gadm$Kunta, c(“Miehet”, “Naiset”)]))
gadm$asukkaita[is.na(gadm$asukkaita)] <- 0

sp$asukkaita <- log10(rowSums(vrek[sp$Kunta, c(“Miehet”, “Naiset”)]))
sp$asukkaita[is.na(sp$asukkaita)] <- 0

# male and female share
gadm$miehet.osuus <- vrek[gadm$Kunta, “Miehet”]/vrek[gadm$Kunta, “Yhteensa”]
gadm$naiset.osuus <- vrek[gadm$Kunta, “Naiset”]/vrek[gadm$Kunta, “Yhteensa”]

gadm_summary_male <- summary(vrek[gadm$Kunta, “Miehet”]/vrek[gadm$Kunta, “Yhteensa”])
gadm_summary_female <- summary(vrek[gadm$Kunta, “Naiset”]/vrek[gadm$Kunta, “Yhteensa”])

sp$miehet.osuus <- vrek[sp$Kunta, “Miehet”]/vrek[sp$Kunta, “Yhteensa”]
sp$naiset.osuus <- vrek[sp$Kunta, “Naiset”]/vrek[sp$Kunta, “Yhteensa”]

sp_summary_male <- summary(vrek[sp$Kunta, “Miehet”]/vrek[sp$Kunta, “Yhteensa”])
sp_summary_female <- summary(vrek[sp$Kunta, “Naiset”]/vrek[sp$Kunta, “Yhteensa”])

#some summary stat

gadm_summary_male
gadm_summary_female
sp_summary_male
sp_summary_female

hist_gadm_male <- hist(gadm$miehet.osuus)
hist_gadm_female <- hist(gadm$naiset.osuus)
hist_sp_male <- hist(sp$miehet.osuus)
hist_sp_female <- hist(sp$naiset.osuus)

plot(density(gadm$miehet.osuus,na.rm=TRUE))
plot(density(gadm$naiset.osuus,na.rm=TRUE))
plot(density(sp$miehet.osuus,na.rm=TRUE))
plot(density(sp$naiset.osuus,na.rm=TRUE))

#saving hist figure into file …\r_maps
jpeg(“…/hist.jpg”)
layout(matrix(1:4,2,2))
hist_gadm_male <- hist(gadm$miehet.osuus)
hist_gadm_female <- hist(gadm$naiset.osuus)
hist_sp_male <- hist(sp$miehet.osuus)
hist_sp_female <- hist(sp$naiset.osuus)
dev.off()

#saving density figre into file …\r_maps (replace … as your fav. folder)
jpeg(“…/density.jpg”)
layout(matrix(1:4,2,2))
plot(density(gadm$miehet.osuus,na.rm=TRUE))
plot(density(gadm$naiset.osuus,na.rm=TRUE))
plot(density(sp$miehet.osuus,na.rm=TRUE))
plot(density(sp$naiset.osuus,na.rm=TRUE))
dev.off()

# set share 50% male/female
# where na
gadm$miehet.osuus[is.na(gadm$miehet.osuus)] <- 0.5
gadm$naiset.osuus[is.na(gadm$naiset.osuus)] <- 0.5

#same with MML

sp$miehet.osuus[is.na(sp$miehet.osuus)] <- 0.5
sp$naiset.osuus[is.na(sp$naiset.osuus)] <- 0.5

# border of interval GADM
varname1 <- “naiset.osuus”
interval1 <- max(abs(gadm[[varname1]] – 0.5))
at1 <- seq(0.5 – interval1, 0.5 + interval1, length = 100)

#border of interval MML

varname2 <- “naiset.osuus”
interval2 <- max(abs(sp[[varname2]] – 0.5))
at2 <- seq(0.5 – interval2, 0.5 + interval2, length = 100)

#PLOTTING INTO FILE

jpeg(“j:/todo/r_maps/gadm_280612.jpg”)
q1 <- PlotShape(gadm, varname1, type = “twoway”, at = at1, main = “Share of women (red) and men (blue) 31.05.2012 n=437 (GADM)”)
dev.off()

jpeg(“j:/todo/r_maps/mml_280612.jpg”)
q2 <- PlotShape(sp, varname2, type = “twoway”, at = at2, main = “Share of women (red) and men (blue) 31.05.2012 n=337 (MML)”)
dev.off()

#Which one is best?

GADM projection is not like “normal” and MML coastline is little bit odd, but I am patiently looking forward to improvement into this issue….

ME/290612

Tagged with: Finland map population maps R

R and data visualization, R and maps

R and VIX index

June 27, 2012 by admin - No comments

Using R as visualizing Volatility Index VIX

On September 22, 2003, the CBOE began disseminating price level information using revised methodology for the CBOE Volatility Index, VIX. A spreadsheet with more than 13 years of price history data using this new methodology is now available.

VIX?

In 1993, the Chicago Board Options Exchange® (CBOE®) introduced the CBOE Volatility Index®, VIX®, and it quickly became the benchmark for stock market volatility. It is widely followed and has been cited in hundreds of news articles in the Wall Street Journal, Barron’s and other leading financial publications. Since volatility often signifies financial turmoil, VIX is often referred to as the “investor fear gauge”.
Source: http://www.cboe.com/micro/vix/faq.aspx#1

So, I found this example from there. I change it little bit. In this exampele it use whole time series you will found there. Now let’s look at closer this amazing approach…
#required library require(quantmod) require(ggplot2) require(reshape2) require(plyr) require(scales)# download data source input <- read.table("http://www.cboe.com/publish/ScheduledTask/MktData/datahouse/vixcurrent.csv", header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)# creating dataframe dat<-data.frame(date=index(VIX),VIX) #look what we get head(dat)#date conversation input$date <- as.Date(input$Date, "%m/%d/%Y")## below some example do to that ## read in date/time info in format 'm/d/y h:m:s' ## dates <- c("02/27/92", "02/27/92", "01/14/92", "02/28/92", "02/01/92") ## times <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26") ## x <- paste(dates) ## y <- strptime(x, "%m/%d/%y")# extract year input$year<-as.numeric(as.POSIXlt(input$date)$year+1900)# and the month too OK input$month<-as.numeric(as.POSIXlt(input$date)$mon+1)# monts into right order and giving factor name input$monthf<-factor(input$month,levels=as.character(1:12),labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)# week days input$weekday = as.POSIXlt(input$date)$wday# week day ordering and gibing factor name (weekdays) input$weekdayf<-factor(input$weekday,levels=rev(1:7),labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")),ordered=TRUE)# yearmonth input$yearmonth<-as.yearmon(input$date) input$yearmonthf<-factor(input$yearmonth)# week of year by rows input$week <- as.numeric(format(input$date,"%W"))# making week to start at number 1 input<-ddply(input,.(yearmonthf),transform,monthweek=1+week-min(week))# Ok, Now we are ready to plotP<- ggplot(input, aes(monthweek, weekdayf, fill = VIX.Close)) + geom_tile(colour = "white") + facet_grid(year~monthf) + scale_fill_gradient(low="yellow", high="red") + opts(title = "Heatmap - view of future expected stock market volatility (risk)") + xlab("Week of Month") + ylab("")#show the plot P# save the picture # example - output graph to jpeg file jpeg("heatmaps.jpg") P dev.off()...and output will look like this:

That’s it,

Cheers, Marko

Tagged with: R volatility index

R and maps, R sample code with data

R and Earthquake data

June 24, 2012 by admin - No comments

Earth quakes of the last 30 days

These example come from here. I follow instruction how to plot earthquage data into world map between 24.5. – 23.6.2012. My first test is not succesfull, so I make my copy paste tricks. Final result show now “right” magnitude.

## Plot world wide earth quakes of the last 30 days with magnitude >= 4.0 library(googleVis) library(XML) ## Get earthquake data of the last 30 days ## That is ok, put for some reason I do not get right magnitude into picture. eq=readHTMLTable(readLines("http://www.iris.edu/seismon/last30.html")) eq <- eq[[2]] ## extract the eq table summary(eq) #something wrong with this last30 file #I have to copy and paste web-page data into Notepad and saved this into file earthquake.txt #eq <- read.table("j:/todo/R_geodata/earthquake.txt", header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE) #WEB-server datafile note I change MAG into magni eq <- read.table("http://ekqvist.goeuropeinfo.com/rbloggerqvist/data/earthq/earthquake.txt", header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)

# lat and lon must be the same variable eq$loc=paste(eq$LAT, eq$LON, sep=":") ## create a lat:long location variable

# making geomap earthq <- gvisGeoMap(eq, locationvar="loc", numvar="magni", hovervar="DATE", options=list(dataMode="markers"))

#plotting results into html file #plot(earthq) cat(earthq$html$chart, file="j:/todo/R_geodata/earthq.html")

By clicking this link you will see the result file.

Please, be patient because it takes about 30sec to open….

Greetings,

Marko

Tagged with: earth quakes googleVis R

R and gVisGeoMap

June 23, 2012 by admin - No comments

Helppoa kuin heinänteko

Here in Finland we have phrase: ” Helppoa kuin heinänteko”. Now my purpose is to indicate what is the relation between this phrase and using R library googleVis.
In this example we use UN datasets from this source.

All datasets used in this examples You could download bloggerqvist server. Of course I encourage You to find appropriate UN dataset and find data within your interest.

Example1: Coke Oven gas production 2006-2009 (TJ)

################################################################ UN data source: http://data.un.org/ Coke oven gas - production ################################################################ # setting amazing library library(googleVis) ################################################################ # download data input <- read.table("http://energy.goeuropeinfo.com/data/un_data/UNdata_Export_20120622_210218958.csv", header=TRUE, sep=";", na.strings="NA", dec=",", strip.white=TRUE)

#selecting yearly data select09<- input[which(input$Year=="2009"),] select08<- input[which(input$Year=="2008"),] select07<- input[which(input$Year=="2007"),] select06<- input[which(input$Year=="2006"),]#selecting variables Map09<- data.frame(select09$Country.or.Area, select09$Quantity) Map08<- data.frame(select08$Country.or.Area, select08$Quantity) Map07<- data.frame(select07$Country.or.Area, select07$Quantity) Map06<- data.frame(select06$Country.or.Area, select06$Quantity)#change variable name names(Map09)<- c("Country", "Coke-oven gas prod. TJ") names(Map08)<- c("Country", "Coke-oven gas prod. TJ") names(Map07)<- c("Country", "Coke-oven gas prod. TJ") names(Map06)<- c("Country", "Coke-oven gas prod. TJ")#year 2009 unmap09=gvisGeoMap(Map09, locationvar="Country", numvar="Coke-oven gas prod. TJ", options=list(height=350, dataMode='regions', chartid="Coke-oven prod 2009")) #year 2008 unmap08=gvisGeoMap(Map08, locationvar="Country", numvar="Coke-oven gas prod. TJ", options=list(height=350, dataMode='regions')) #year 2007 unmap07=gvisGeoMap(Map07, locationvar="Country", numvar="Coke-oven gas prod. TJ", options=list(height=350, dataMode='regions')) #year 2006 unmap06=gvisGeoMap(Map06, locationvar="Country", numvar="Coke-oven gas prod. TJ", options=list(height=350, dataMode='regions'))#testing that all is ok plot(unmap09) plot(unmap08) plot(unmap07) plot(unmap06)# saving just created map into html-file cat(unmap09$html$chart, file="j:/todo/UN/undata_cokeoven_prod_TJ_2009.html") cat(unmap08$html$chart, file="j:/todo/UN/undata_cokeoven_prod_TJ_2008.html") cat(unmap07$html$chart, file="j:/todo/UN/undata_cokeoven_prod_TJ_2007.html") cat(unmap06$html$chart, file="j:/todo/UN/undata_cokeoven_prod_TJ_2006.html")

Results file looks like this:
Map 2009, 2008, 2007 and 2006

Have fun,
Marko

…..oops Here Is also an another example….

Example2: Energy use (kg oil eqv per capita)

################################################################ UN data source: http://data.un.org/ energy use ################################################################ library(googleVis) input <- read.table("http://energy.goeuropeinfo.com/data/UNdata_Export_20120623_070334184_energyuse_kg_eqv_oil_per_capita.txt", header=TRUE, sep=";", na.strings="NA", dec=",", strip.white=TRUE) # select year into matrices select09<- input[which(input$Year=="2009"),] select08<- input[which(input$Year=="2008"),] select07<- input[which(input$Year=="2007"),] select06<- input[which(input$Year=="2006"),]#select area and value field Map09<- data.frame(select09$Country.or.Area, select09$Value) Map08<- data.frame(select08$Country.or.Area, select08$Value) Map07<- data.frame(select07$Country.or.Area, select07$Value) Map06<- data.frame(select06$Country.or.Area, select06$Value)names(Map09)<- c("Country", "Energy Use Eqv oil kg per capita") names(Map08)<- c("Country", "Energy Use Eqv oil kg per capita") names(Map07)<- c("Country", "Energy Use Eqv oil kg per capita") names(Map06)<- c("Country", "Energy Use Eqv oil kg per capita")unmap09=gvisGeoMap(Map09, locationvar="Country", numvar="Energy Use Eqv oil kg per capita", options=list(height=350, dataMode='regions')) unmap08=gvisGeoMap(Map08, locationvar="Country", numvar="Energy Use Eqv oil kg per capita", options=list(height=350, dataMode='regions')) unmap07=gvisGeoMap(Map07, locationvar="Country", numvar="Energy Use Eqv oil kg per capita", options=list(height=350, dataMode='regions')) unmap06=gvisGeoMap(Map06, locationvar="Country", numvar="Energy Use Eqv oil kg per capita", options=list(height=350, dataMode='regions', chartid="06"),chartid="06")

plot(unmap09)
plot(unmap08)
plot(unmap07)
plot(unmap06)

cat(unmap09$html$chart, file="j:/todo/UN/undata_energyuse_eqvkgpercapita_2009.html") cat(unmap08$html$chart, file="j:/todo/UN/undata_energyuse_eqvkgpercapita_2008.html") cat(unmap07$html$chart, file="j:/todo/UN/undata_energyuse_eqvkgpercapita_2007.html") cat(unmap06$html$chart, file="j:/todo/UN/undata_energyuse_eqvkgpercapita_2006.html")

that’s it…

Tagged with: Coke Oven gas production energy use gVisGeoMap R

R and data visualization, R sample code with data

R and GoogleVision

June 21, 2012 by admin - No comments

Inspired with TED and Hans Rosling

I found this idea from TED.com where Hans Rosling gave an inspiring talk at TED about social and economic developments in the world over the last 50 years. Rosling visualise his talk in amazing way using animated bubble charts. After that (2010) this technology is also available all others thanks to Google and R-package developers. You only need R version 2.11.0 or higher and googleVis -library >= 0.2.4 to do you own “bubble chart”.

Short introduction

googleVis is an R package providing an interface between R and the Google Visualisation API. All output of googleVis is html code that contains the data and references to JavaScript functions hosted by Google. You only need to upload this html file into your operators server and open your webbrowser. Voilaa…after that you could “rattle” your data with very new way.
Please, hold onto your hat now we go forward within this issue….

I was so facinated about this “bubble charting” technology that I decided to make also my own test. I have always want to know how energy production, air emissions, and GPD link together…so I collect data from Eurostat and UN to produce my own “bubble chart” from all EU-countries. Of course I calculated all dependent variables per population.

We need only few rows of code:
#first activate library in R library(googleVis)

#download data from server. Note you have to first "manipulate" your data to appropriate format.

energia <- read.table("http://energy.goeuropeinfo.com/data/CEIP_CLRTAP_UN_UNFCCC_EUROSTAT_me_edited.csv", header=TRUE, sep=";", na.strings="NA", dec=",", strip.white=TRUE) #lets check that everythin is ok data(energia)

#after that just write code below to produce your bubble energiadat <- gvisMotionChart(energia, "Maa", "Vuosi", options=list(width=730, height=540)) plot(energiadat) cat(energiadat$html$chart, file="eu_ente_emissions.html")

#upload html file into your service provider server

Open Motion chart from here

In Motion chart you could choose four variables to present at the same time. To “run” this chart just push the play button.

By the way…I have also made one oral presentation in January by using motion chartting and belive me…audience ask me to show more and more motion…..

Have Fun,
Marko

Source cite:
1 Markus Gesmann and Diego de Castillo. Using the Google Visualisation API with R. The R Journal, 3(2):40-44, December 2011.

2 CEIP, CLRTAP, UN, UNFCCC and Eurostat emissons data.

Tagged with: animation bubble charts motion chart R TED

R sample code with data

Using R with examining apartment house energy consumption effectiveness (DEA effectiveness)

June 17, 2012 by admin - No comments

This example is test only. Data itself is “real” data and I think you could do that example also with own data if available. In this example I used library benchmarking.

First set the library and download data from source

#################################
library(Benchmarking)
#################################
#DEA 1
#output haitakkeet huoneiston suhteen vesi, lämpö ja sähkö
#input vuokransuuruus, asuntojen lukumäärä
#/*lukee suoraan sas-ohjelman csv exportatun datan*/
deamk <- read.table(“http://ekqvist.goeuropeinfo.com/rbloggerqvist/data/deamk.csv”, header=TRUE, sep=”;”, na.strings=”NA”, dec=”,”,
strip.white=TRUE)

#dea1
x1 <- with(deamk, cbind(vuokra_e_nelio_kk, asuntoja)) #INPUT PANOKSET
x1 <- with(deamk, cbind(asuntoja, vuokra_e_nelio_kk)) #INPUT PANOKSET

x1 <- with(deamk, cbind(vuokra_e_nelio_kk)) #INPUT PANOKSET
y1 <- with(deamk, cbind(vesi_vrk_as_litraa, sahko_MWh, lampo_MWh)) # OUTPUT TUOTOKSET

dea.plot.frontier(x1,y1,txt=TRUE, main=”DEA Tehokkuusrintama HeKa aineistosta 2011″)
dea.plot(x1,y1)

#If you want graph into local folder change appropriate folder name below
# example – output graph to jpeg file
jpeg(“i:/todo/asuntodata_dea/deamk.jpg”)
dea.plot.frontier(x1,y1,txt=TRUE, main=”DEA Tehokkuusrintama HeKa aineistosta 2011″)
dev.off()

#note before that install “benchmarking” library
e_dea2011out <- dea(x1,y1, RTS=”vrs”, ORIENTATION=”out”)
e_dea2011in <- dea(x1,y1, RTS=”vrs”, ORIENTATION=”in”) #TÄTÄ KÄYTETTY ANALYYSISITEKSTISSÄ

#saving results into file
#tehokkuusluvut tiedostoon
write.table(eff(e_dea2011out), file = “i:/todo/asuntodata_dea/e_dea2011out.txt”, sep = “\t”, col.names = NA, qmethod = “double”)
write.table(eff(e_dea2011in), file = “i:/todo/asuntodata_dea/e_dea2011in.txt”, sep = “\t”, col.names = NA, qmethod = “double”)

#..more informative by using summary
sink(“i:/todo/asuntodata_dea/summary_deamk.txt”)
summary(e_dea2011in)
sink()

#plotting effectiveness data with density graph
jpeg(“i:/todo/asuntodata_dea/dea1_density_mkkiinteistot.jpg”)
eff.dens.plot(e_dea2011in , main=”DEA tiheysjakauma HeKa 2011″)
dev.off()
#and same pic here

#070512

#we are also interested which variable explain most best effectiveness..

#Meitä kiinnostaa myös mitkä valituista muuttujista selittävät parhaiten
# tehokkuutta
# vesi, lämpö ja sähkö, vuokransuuruus
#testataan tätä lineaarisella regressiolla
# yhdistetään eff tiedostoon alkuperäiset muuttujat
#luetaan eff tiedosto sellainsenaan edellä ajetusta

#here we read above effectiveness data table (note 1= best unit)
effvalmis <- read.table(“http://ekqvist.goeuropeinfo.com/rbloggerqvist/data/eff_dea2011valmis.txt”, header=TRUE, sep=”\t”, na.strings=”NA”, dec=”.”, strip.white=TRUE)
#in this operation we could use merge and unique link variable “id”
yhdista1 <- merge(deamk, effvalmis, by.x = “id”, by.y = “id”, all = TRUE)
attach(yhdista1)
names(yhdista1)
summary(yhdista1)

#Examine variables more details – TUTKITAAN MUUTTUJIA TARKEMMIN
hist(yhdista1$x)
hist(yhdista1$vuokra_e_nelio_kk)
hist(yhdista1$vesi_vrk_as_litraa)
hist(yhdista1$sahko_MWh)
hist(yhdista1$lampo_MWh)

#TUTKITAAN MUUTTUJIA TARKEMMIN
jpeg(“j:/todo/asuntodata_dea/hist_matriisikuva.jpg”)
layout(matrix(1:6,2,3))
hist(yhdista1$x)
hist(yhdista1$vuokra_e_nelio_kk)
hist(yhdista1$vesi_vrk_as_litraa)
hist(yhdista1$sahko_MWh)
hist(yhdista1$lampo_MWh)
dev.off()

#DENSITY
jpeg(“j:/todo/asuntodata_dea/density_matriisikuva.jpg”)
layout(matrix(1:6,2,3))
plot(density(yhdista1$x,na.rm=TRUE))
plot(density(yhdista1$vuokra_e_nelio_kk,na.rm=TRUE))
plot(density(yhdista1$vesi_vrk_as_litraa,na.rm=TRUE))
plot(density(yhdista1$sahko_MWh,na.rm=TRUE))
plot(density(yhdista1$lampo_MWh,na.rm=TRUE))
dev.off()

#SORTED VERSIO
jpeg(“j:/todo/asuntodata_dea/sorted_matriisikuva.jpg”)
layout(matrix(1:6,2,3))
plot(sort(yhdista1$x),pch=”1″)
plot(sort(yhdista1$vuokra_e_nelio_kk),pch=”2″)
plot(sort(yhdista1$vesi_vrk_as_litraa),pch=”3″)
plot(sort(yhdista1$sahko_MWh),pch=”4″)
plot(sort(yhdista1$lampo_MWh),pch=”5″)
dev.off()

#valitaan vain tietyt muuttujat pairs kuvaajaan
yhdista2 <- with(yhdista1, cbind(x, vuokra_e_nelio_kk, vesi_vrk_as_litraa, sahko_MWh, lampo_MWh)) #INPUT PANOKSET

#pairs
jpeg(“j:/todo/asuntodata_dea/pairs_matriisikuva.jpg”)
pairs(yhdista2)
dev.off()

#selitetään tehokkuuskerrointa panos ja tuotosmuuttujilla
sink(“i:/todo/asuntodata_dea/summary_deamk_fit_kaikki_muuttujat.txt”)
lm.r = lm(x ~ vuokra_e_nelio_kk + vesi_vrk_as_litraa + sahko_MWh + lampo_MWh)
summary(lm.r)
sink()

#vain tuotosmuuttujat
sink(“i:/todo/asuntodata_dea/summary_deamk_fit_vain_tuotos_muuttujat.txt”)
lm0.r = lm(x ~ vesi_vrk_as_litraa + sahko_MWh + lampo_MWh)
summary(lm0.r)
sink()

#kertoimet
coef(lm.r) # gives the model’s coefficients

#jäännöstermit eli kuinka paljon annetuilla arvoilla ennuste poikkeaa toteumasta
resid(lm.r) # gives the residual errors in Y

fitted(lm.r) # gives the predicted values for Y

#RESIDUAALIEN TUTKIMINEN

layout(matrix(1:4,2,2))
plot(lm.r)

#KAIKKI
jpeg(“i:/todo/asuntodata_dea/dea_residuaalitutkinta_mkkiinteistot_kaikki.jpg”)
layout(matrix(1:4,2,2))
plot(lm.r)
dev.off()

#VAIN TUOTOSMUUTTUJAT
jpeg(“i:/todo/asuntodata_dea/dea_residuaalitutkinta_mkkiinteistot_tuotosmuuttujat.jpg”)
layout(matrix(1:4,2,2))
plot(lm0.r)
dev.off()

#VUOKRA
jpeg(“i:/todo/asuntodata_dea/dea_residuaalitutkinta_mkkiinteistot_vuokra.jpg”)
layout(matrix(1:4,2,2))
plot(lm1.r)
dev.off()

#VESI
jpeg(“i:/todo/asuntodata_dea/dea_residuaalitutkinta_mkkiinteistot_vesi.jpg”)
layout(matrix(1:4,2,2))
plot(lm2.r)
dev.off()

#SÄHKÖ
jpeg(“i:/todo/asuntodata_dea/dea_residuaalitutkinta_mkkiinteistot_sahko.jpg”)
layout(matrix(1:4,2,2))
plot(lm3.r)
dev.off()

#LÄMPÖ
jpeg(“i:/todo/asuntodata_dea/dea_residuaalitutkinta_mkkiinteistot_lampo.jpg”)
layout(matrix(1:4,2,2))
plot(lm4.r)
dev.off()

#below in Finnish only

ennen kuin regressiomallin tulokset voisi hyväksyä, tulisi arvioida sen sopivuus
selittää dataa. yksi tapa toteuttaa tämä on tutkia residuaaleja visuaalisesti.
mikäli malli on soveltuva, tällöin residuaalivirheet tulisi olla satunnaisia ja
normaalisti jakautuneita.

Lisäksi, poistamalla “tapaus” one case ei tulisi merkittävästi vaikuttaa
mallin sopivuuteen. R mahdollistaa neljän graafisen esityksen tämän arvoimiseksi
käyttämällä plot komentoa.

upper left corner /vasen yläkulma
residuaalit vs mallilla sovitetut arvot. residuaalien tulisi olla satunnaisesti jakautuneita keskiviivan ympärillä. keskiviiva kuvaa virhearvoa nolla. pisteiden suhteen ei siis tulisi olla merkittävää hajontaa tämän viivan ympärillä.

lower left corner / vasen alakulma
standardi Q-Q kuvaaja, tämän kuvaajan tulisi vihjata ovatko residuaalivirheet
normaalisti jakautuneita.

upper right corner /oikea yläkulma
scale-location standardisoitujen residuaalien neliöjuuri (virhetermien tai virhee neliöjuuri) sovituksen tai mallin arvojen funktiona. tässäkin tapauksessa tässä ei tulisi olla tai nähtävissä selkeää trendiä. mikäli on, niin malli tulisi tarkistaa tai mahdollisesti poistaa outlier havaintoja.

lower right corner / oikea alakulma
näyttää jokaisen havainnon mitan eli “vipuvaikutuksen” tai “voiman” kun määritetään regression tuottamia tuloksia. Cooks etäisyys kuvataan kahdella viivalla, joka on eräs mittari kuvaamaan havaintojen merkitsevyyttä regressiomallissa. pieni etäisyys tarkoittaa sitä, että havainnon / havaintojen poistamisella on vain pieni vaikutus regression tuloksiin. Etäisyydet, jotka ylittävät arvon yksi ovat kyseenalaisia ja viittaavat aineistossa mahdollisesti oleviin outliereihin tai muuten vääränlaiseen malliin (malliasetelmaan).

#Using the Results of a Regression to Make Predictions

vuokra_e_nelio_kk_new=c(3,4,4.5,4.6,4.5,5.5,6.9,5.8,3.6,4.7,5.3,6.2,5.1,3.9,4.3);vuokra_e_nelio_kk_new
predict(lm.r,data.frame(vuokra_e_nelio_kk = vuokra_e_nelio_kk_new), level = 0.9, interval = “confidence”)
where ‘lwr’ is the lower limit of the confidence interval and ‘upr’ is the upper limit of the confidence interval. R does not contain a feature for finding the confidence intervals for predicted values of the independent variable for specified values of dependent variables.

# data plotting as scatter plot/ pistekuvaaja näitä voidaan tehdä niin monta kuin on muuttujia

#SCATTER PLOT

jpeg(“j:/todo/asuntodata_dea/scatterplot_withoutfitline_rent.jpg”)
plot(x, vuokra_e_nelio_kk) dev.off()
plot(x, vesi_vrk_as_litraa)
plot(x, sahko_MWh)
plot(x, lampo_MWh)
plot(x, asuntoja)

#FITLINE / KUVAAJA FIT JA TOTEUTUNEET
lm1.r = lm(x ~ vuokra_e_nelio_kk)
plot(x ~ vuokra_e_nelio_kk, yhdista1)
abline(lm1.r$coef,lty=5)

lm2.r = lm(x ~ vesi_vrk_as_litraa)
plot(x ~ vesi_vrk_as_litraa, yhdista1)
abline(lm2.r$coef,lty=5)

lm3.r = lm(x ~ sahko_MWh)
plot(x ~ sahko_MWh, yhdista1)
abline(lm3.r$coef,lty=5)

lm4.r = lm(x ~ lampo_MWh )
plot(x ~ lampo_MWh, yhdista1)
abline(lm4.r$coef,lty=5)

lm5.r = lm(x ~ asuntoja)
plot(x ~ asuntoja, yhdista1)
abline(lm5.r$coef,lty=5)

# example – output graph to jpeg file
jpeg(“i:/todo/asuntodata_dea/fitline_vuokra.jpg”)
lm1.r = lm(x ~ vuokra_e_nelio_kk)
plot(x ~ vuokra_e_nelio_kk, yhdista1)
abline(lm1.r$coef,lty=5)
dev.off()

# example – output graph to jpeg file
jpeg(“i:/todo/asuntodata_dea/fitline_vesi.jpg”)
lm2.r = lm(x ~ vesi_vrk_as_litraa)
plot(x ~ vesi_vrk_as_litraa, yhdista1)
abline(lm2.r$coef,lty=5)
dev.off()

# example – output graph to jpeg file
jpeg(“i:/todo/asuntodata_dea/fitline_sahko.jpg”)
lm3.r = lm(x ~ sahko_MWh)
plot(x ~ sahko_MWh, yhdista1)
abline(lm3.r$coef,lty=5)
dev.off()

# example – output graph to jpeg file
jpeg(“i:/todo/asuntodata_dea/fitline_lampo.jpg”)
lm4.r = lm(x ~ lampo_MWh )
plot(x ~ lampo_MWh, yhdista1)
abline(lm4.r$coef,lty=5)
dev.off()

# example – output graph to jpeg file
jpeg(“i:/todo/asuntodata_dea/fitline_asuntoja.jpg”)
lm5.r = lm(x ~ asuntoja)
plot(x ~ asuntoja, yhdista1)
abline(lm5.r$coef,lty=6)
dev.off()

#IN THE SAME PICTURE / SAMAAN KUVAAN
jpeg(“i:/todo/asuntodata_dea/fitline_matriisikuva.jpg”)
layout(matrix(1:6,2,3))
lm1.r = lm(x ~ vuokra_e_nelio_kk)
plot(x ~ vuokra_e_nelio_kk, yhdista1)
abline(lm1.r$coef,lty=5)
lm2.r = lm(x ~ vesi_vrk_as_litraa)
plot(x ~ vesi_vrk_as_litraa, yhdista1)
abline(lm2.r$coef,lty=5)
lm3.r = lm(x ~ sahko_MWh)
plot(x ~ sahko_MWh, yhdista1)
abline(lm3.r$coef,lty=5)
lm4.r = lm(x ~ lampo_MWh )
plot(x ~ lampo_MWh, yhdista1)
abline(lm4.r$coef,lty=5)
lm5.r = lm(x ~ asuntoja)
plot(x ~ asuntoja, yhdista1)
abline(lm5.r$coef,lty=5)
dev.off()

#MUUTTUJAT SUHTEESSA TEHOKKUUSKERTOIMEEN
#LISÄKSI PIIRRETÄÄN KUNKIN MUUTTUJAN FIT-VIIVA
jpeg(“j:/todo/asuntodata_dea/fitline_kaikki_tuotosmuuttujat_samassakuvassa2.jpg”)
matplot(cbind(lampo_MWh,sahko_MWh,vesi_vrk_as_litraa),x,xlab=”x”,ylab=”y”)
#matplot(cbind(x,(lampo_MWh,sahko_MWh,vesi_vrk_as_litraa),xlab=”x”,ylab=”y”)
abline(lm4.r,lty=1)#LÄMPÖ
abline(lm3.r,lty=2)#SÄHKÖ
abline(lm2.r,lty=3)#VESI
dev.off()

Figure : Original x shown with “1”, with small error as “2” and with large error as “3”. The regression lines for the no measurement error, small error and large error are
shown as solid, dotted and dashed lines respectively.

#DENSITY / JAKAUMAT
jpeg(“i:/todo/asuntodata_dea/density_matriisikuva.jpg”)
layout(matrix(1:4,2,2))
plot(density(lampo_MWh,na.rm=TRUE))
plot(density(sahko_MWh,na.rm=TRUE))
plot(density(vesi_vrk_as_litraa,na.rm=TRUE))
dev.off()

#EXAMINING CORRELATION / KATSOTAAN VIELÄ KORRELAATIOT
data <- with(yhdista1, cbind(x, asuntoja, vuokra_e_nelio_kk, vesi_vrk_as_litraa, sahko_MWh, lampo_MWh))

# First Correlogram Example
jpeg(“i:/todo/asuntodata_dea/correlogram1_.jpg”)
library(corrgram)
corrgram(data, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main=”DEA tehokkuuskertoimen (x) ja mallin muuttujien välinen korrelaatio”)
dev.off()

# Second Correlogram Example
# library(corrgram)
jpeg(“i:/todo/asuntodata_dea/correlogram2_.jpg”)
corrgram(data, order=TRUE, lower.panel=panel.ellipse,
upper.panel=panel.pts, text.panel=panel.txt,
diag.panel=panel.minmax,
main=”DEA tehokkuuskertoimen (x) ja mallin muuttujien välinen korrelaatio”)
dev.off()

# Third Correlogram Example
jpeg(“i:/todo/asuntodata_dea/correlogram3_.jpg”)
library(corrgram)
corrgram(data, order=NULL, lower.panel=panel.shade,
upper.panel=NULL, text.panel=panel.txt,
main=”DEA tehokkuuskertoimen (x) ja mallin muuttujien välinen korrelaatio”)
dev.off()

library(psych)
library(corrgram)

#PIIRRETÄÄN KUVAA
jpeg(“i:/todo/asuntodata_dea/matrix_scatterplots_.jpg”)
pairs(data, main=”DEA tehokkuuskertoimen (x) ja mallin muuttujien välinen korrelaatio”) #draw a matrix of scatter plots for the first 5 variables
dev.off()

jpeg(“i:/todo/asuntodata_dea/histograms_for_all_variables_.jpg”)
multi.hist(data) #draw the histograms for all variables
dev.off()

jpeg(“i:/todo/asuntodata_dea/scatter_histograms_correlations_for_all_variables_.jpg”)
pairs.panels(data) #draws scatterplots, histograms, and shows correlations (source found in the psych package)
dev.off()

That’s it…..

Tagged with: benchmarking DEA R

R testing

Welcome

June 15, 2012 by admin - No comments

Welcome to My very best collection of R world.

Tagged with: R

Month: June 2012