Thursday, June 16, 2016

Sparse and Dense Arrays/Matrices in Python

We must have heard about sparse and dense matrices in Python. I tried to analyse the space usage between these two

The code is as follows

  
         
import getsizeof
import numpy as np
import scipy.sparse as sps
import sys

def createGeometric(start,ratio,lengthofSequence):
 i=0
 while i < lengthofSequence:
  yield(start * pow(ratio,i))
  i=i+1

numpySizeList=[]
scipySizeList=[]
lengthList=[]
for value in createGeometric(1,3,10):
 a=np.random.rand(value)
 b=sps.csr_matrix(np.random.rand(value))
 numpySizeList.append(sys.getsizeof(a))
 scipySizeList.append(sys.getsizeof(b))
 lengthList.append(value)

import matplotlib.pyplot as plt
import math
plt.figure()
numpySizeList=map(lambda x:math.log(x),numpySizeList)
scipySizeList=map(lambda x:math.log(x),scipySizeList)
lengthList=map(lambda x:math.log(x),lengthList)
plt.plot(numpySizeList)
plt.plot(scipySizeList)
plt.plot(lengthList)  
        
I plotted the space usage


RED : Length of Array
GREEN : Log of Size of Sparse Array
BLUE : Log of Size of  Dense Array

The difference is just immense. I am trying to understand how the underlying data structure works for these matrices