Sunday, April 19, 2015

R Tips - SQLDF, Lists

R is a wonderful language, and I hope more and more enthusiasts contribute to it. However like me, almost everybody is stuck with small irritating issues.

  • Joining/Filtering tables using SQLDF in R
SQLDF is a very powerful module. It brings the power of SQL to the already statistically powerful R

install.packages("sqldf")
library(sqldf)
crestValues = subset(testdata,Direction=='Crest')  

# Simple Select Query with SQLDF in R
out=sqldf("select Date,Close,JulianDate from crestValues order by JulianDate asc")

#  Inner Join  with SQLDF in R

finaltables11 = sqldf("select out2.lowJDate,out2.highJDate from tables11 inner join out2 where tables11.s2==out2.s2)

# Left Outer Join with SQLDF in R
tempt=sqldf("select t1.Close,max(t2.Close),t1.JulianDate as lowJDate,t2.JulianDate as highJDate from out t1 left outer join out t2 where t2.JulianDate > t1.JulianDate group by t1.Close")


# Fetch First Rows in a SQLDF R Query
tables11 = sqldf("select sum(highJDate-lowJDate),Count,s2 from tables2 group by Count,s2 limit 1")
  
  •   Accessing Data Frames using strings
There might be occassions where we have made a lot of data frames and we would like to access them based on user input/flow logic. Surprisingly this was a difficult task

Suppose we have 5 data frames created  
tables5 = sqldf("select count(s5) as Count,s5,lowJDate,highJDate from out2 group by s5 having count(s5) = 2 order by count(s5) desc")
  tables4 = sqldf("select count(s4) as Count,s4,lowJDate,highJDate from out2 group by s4 having count(s4) = 2 order by count(s4) desc")
  tables3 = sqldf("select count(s3) as Count,s3,lowJDate,highJDate from out2 group by s3 having count(s3) = 2 order by count(s3) desc")
  tables2 = sqldf("select count(s2) as Count,s2,lowJDate,highJDate from out2 group by s2 having count(s2) = 2 order by count(s2) desc")
  tables1 = sqldf("select count(s1) as Count,s1,lowJDate,highJDate from out2 group by s1 having count(s1) = 2 order by count(s1) desc")

  

Now I want to access each of the data frames in a for loop. The way I did it was

mylist = list(tables1,tables2,tables3,tables4,tables5)
for(k in 5:1)
  {
    remove(tables11,finaltables11)
    #tables11 = sqldf("select sum(highJDate-lowJDate),Count,s1 from tables1 group by Count,s1 limit 1")
    if(nrow(mylist[[k]])==0)
    {
      next;
    }

....
}

No comments: