Thursday, December 25, 2014

Cricket Stats

I was playing around with cricket data for some time now, and there are some interesting observations

1) Average Runs scored per match YOY

2) Average Runs scored when tier2 teams won the toss and elected to bat

  3) Average Runs scored when tier2 ( Kenya, Zimbabwe) teams won the toss and elected to bowl

The above stats would state that maybe 2014 was a good year for cricket, but we have forgotten one important measure. The number of matches. More the number of matches, more the normalization ( averaging out )

In the following graphs, I plotted the average runs and the number of matches YOY for all the 3 forms of cricket

1) ODI

 2) T20

We see that for the year 2014 there is a surge in the number of runs scored in all 3 versions of the game. However, the data has fewer matches so statistically we cannot say for sure that 2014 has been a very good year.
However, I always thought that through the years, the number of runs scored have always been increasing, but that is not what we observer, especially between 2009 and 2013. Let us dig deeper

We see that in the year 2014, there were a lot of matches > 30 with the total being above 500. This drove the average up. From the graph we can make some more observations.

Now, we will  be applying similar stats on the following factors
1) Wickets fallen in a match
2) Runs scored till fall of first wicket
3) % of highest wicket partnership vs total runs

Will come up with the details

Potholes at Traffic Signals

Everybody must have at some time or the other faced a situation where we are stuck at a traffic signal, idly looking at the smoke coming out of the nearby truck. We keep looking ahead to notice the reason the traffic is held up. When we reach the signal, we find it was a pothole. How could the pothole hold up the entire traffic? It is very well true


average length of the vehicle 2.5
average speed of vehicles ( km / hr ) 30
time for whch the signal is open ( seconds ) 60
time for whch the signal is closed ( seconds ) 180
idling losses ( average ) mL/hr 500
Road Length ( in m ) 500

The above chart calculates the loss in  litres of petrol, per crossing, per period ( signal open + signal close ) assuming that there are 3 lanes. The x axis is the average speed of  vehicles, and the y axis shows the loss of oil due to "Idling"

The following link will now give you vehicle population of India. A graphical representation of the same is as follows
We see that there has been a humongous increase in 2 wheeler and small 4 wheelers in the last 2 decades. Assuming that out of this 50% vehicles get stuck at traffic signals (which is highly optimistic) the loss of oil runs into millions per year

With the following assumptions

Idling losses ( ml/Hr  ) 200
Average time wasted at traffic/pothole ( hr/day ) 0.5

The total cost currently would stand not less than 100 million INR. It is obviously much more but I tried to arrive at a calculated value with a conservative approach

Wednesday, December 24, 2014

Crime Against Women

Crime against women is on the rise, but what we read in the newspapers about violent attacks in only a tip of the iceberg. People might think molestation/rape is the foremost crime, but it is not. According to the data pulled from

1) DOWRY is one of the foremost reasons of crime against women and Andhra Pradesh has a huge chunk of it.  The numbers include cases registered for dowry prevention act

2) The following graph sums up all the registered cases by State. Uttar Pradesh might be sharing a low % here but we can attribute it to the low level of awareness of filing a case. I am trying to incoporate these factors but will have to get more data
3) The following graph looks at the age group of the people who commit crimes. As expected most of them lie in the 18-30 range
4) On trying to find out whether rural/urban has a part to play, it was found that the rural graph has more correlation than the urban graph

5)  I regressed the crime occurrences with the following factors

 The following are the results of the regression. We can see that the R Square is too less to say anything definite about the relationship of crime and the geographical boundaries. There are other factors

Age GroupR- Squared
<18 td="">5.82%
18 to 3016.76%
30 to 4514.86%
45 to 6013.97%
> 6011.77%

Age Group R- Squared
<18 td=""> 4.55%
18 to 30 7.38%
30 to 45 5.95%
45 to 60 5.70%
> 60 3.34%