Monday, December 26, 2016

Twiddle Algorithm

Twiddle Algorithm is the simplest of "intelligent" brute force algorithms that are out there. I have written a simple code to demonstrate how the algorithm works and as a sample data set, I have taken financial stock data for the past 5 years


# Implementation of a simple Twiddle Algorithm with Simple stock data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# Let us get the stock prices of around 5 companies which we thing might have some relation to each other
# TATA STEEL
# ONGC
# DLF
# SBI
# MMTC
import pandas as pd
TATA=pd.read_csv('http://real-chart.finance.yahoo.com/table.csv?s=TATASTEEL.NS&d=11&e=26&f=2016&g=d&a=8&b=17&c=2011&ignore=.csv')
ONGC=pd.read_csv('http://real-chart.finance.yahoo.com/table.csv?s=ONGC.NS&d=11&e=26&f=2016&g=d&a=8&b=17&c=2011&ignore=.csv')
DLF=pd.read_csv('http://real-chart.finance.yahoo.com/table.csv?s=DLF.NS&d=11&e=26&f=2016&g=d&a=8&b=17&c=2011&ignore=.csv')
SBI=pd.read_csv('http://real-chart.finance.yahoo.com/table.csv?s=SBIN.NS&d=11&e=26&f=2016&g=d&a=8&b=17&c=2011&ignore=.csv')
MMTC=pd.read_csv('http://real-chart.finance.yahoo.com/table.csv?s=MMTC.NS&d=11&e=26&f=2016&g=d&a=8&b=17&c=2011&ignore=.csv')

# Prepare the data frame
data=pd.DataFrame(data={'tata':TATA['Close'],'ongc':ONGC['Close'],'dlf':DLF['Close'],'sbi':SBI['Close'],'mmtc':MMTC['Close']})

# Convert the data to returns
data=data.apply(lambda x : x/x.shift(1) -1,axis=0)
data=data[1:len(data)]


code. We will now be implementing two functions
1) errorFunction
This will return the sum of squared errorsa
2) twiddle
This is the actual implementation of the twiddle algorithm
 
   

# We will now be implementing the Twiddle algorithm
import numpy as np

def errorFunction(data,vectorArray):
    squares=(np.dot(data[['dlf','mmtc','ongc','sbi']],np.array(vectorArray).T) - data['tata'] ) ** 2
    return np.sum(squares)

def twiddle(data,weights,dweights,weightPosDelta,weightNegDelta,debug=False):
    counter=0
    weightHistory=pd.DataFrame(columns=('dlf','mmtc','ongc','sbi'))
    for i in  range(len(weights)):
        bestError=errorFunction(data,weights)
        while(1):
            counter=counter+1
            weightHistory.loc[counter]=weights
            weights[i]=weights[i] + dweights[i]
            error=errorFunction(data,weights)
            if(abs(error - bestError) < 0.0001):
                break
            if(error < bestError):
                bestError=error
                dweights[i]=dweights[i]*(1+weightPosDelta)
            else:
                weights[i]=weights[i] - (2 * dweights[i])
                error=errorFunction(data,weights)
                if(error > bestError):
                    weights[i]=weights[i] + dweights[i]
                    dweights[i]=dweights[i] * (1-weightNegDelta)
                else:
                    bestError=error
            if(debug):
                print("The weights are {} and the error is {} ".format(weights,error))    
        
    # We will plot the individual weights  
    plt.figure(figsize=(20,10))
    plt.plot(weightHistory['dlf'],color='b',label='DLF')
    plt.plot(weightHistory['mmtc'],color='r',label='MMTC')
    plt.plot(weightHistory['ongc'],color='y',label='ONGC')
    plt.plot(weightHistory['sbi'],color='g',label='SBI')
    plt.legend()
    plt.ylabel('Weights of the explanatory variables')
    plt.xlabel('Counter Index')
    plt.title('Variation of weights')
    plt.show()
    # We will plot the final curve
    plt.figure(figsize=(20,10))
    plt.plot(data['tata'],color='b',label='Actual')
    plt.plot(np.dot(data[['dlf','mmtc','ongc','sbi']],np.array(weights).T),color='r',label='Twiddle')
    plt.legend()
    plt.ylabel('Price TATA STEEL')
    plt.xlabel('Date Index')
    plt.title('Actual vs Predicted')
    plt.show()
    return(weights)


When we run the function, we will be getting the following output
1) Plot with the trend of how the individual weights evolve
2) Plot with the predicted(Twiddle) vs actual values
3) The final weights 
   


dweights=[1,1,1,1]
twiddle_weights=twiddle(data,weights,dweights,0.1,0.1)
print(twiddle_weights)





We can see the way, the weights are recalculated. It is not a very robust algorithm, but it can be tweaked to your convenience

Play with it. A notebook is available in my github post
https://github.com/anantguptadbl/utilities/blob/master/Twiddle%20Algorithm.ipynb

Thursday, June 16, 2016

Sparse and Dense Arrays/Matrices in Python

We must have heard about sparse and dense matrices in Python. I tried to analyse the space usage between these two

The code is as follows

  
         
import getsizeof
import numpy as np
import scipy.sparse as sps
import sys

def createGeometric(start,ratio,lengthofSequence):
 i=0
 while i < lengthofSequence:
  yield(start * pow(ratio,i))
  i=i+1

numpySizeList=[]
scipySizeList=[]
lengthList=[]
for value in createGeometric(1,3,10):
 a=np.random.rand(value)
 b=sps.csr_matrix(np.random.rand(value))
 numpySizeList.append(sys.getsizeof(a))
 scipySizeList.append(sys.getsizeof(b))
 lengthList.append(value)

import matplotlib.pyplot as plt
import math
plt.figure()
numpySizeList=map(lambda x:math.log(x),numpySizeList)
scipySizeList=map(lambda x:math.log(x),scipySizeList)
lengthList=map(lambda x:math.log(x),lengthList)
plt.plot(numpySizeList)
plt.plot(scipySizeList)
plt.plot(lengthList)  
        
I plotted the space usage


RED : Length of Array
GREEN : Log of Size of Sparse Array
BLUE : Log of Size of  Dense Array

The difference is just immense. I am trying to understand how the underlying data structure works for these matrices