Saturday, March 28, 2015

Can we predict Financial Crisis : Equity Correlation

Can we predict any financial crisis? Can we take a look at the stocks at any moment and say whether the market is moving towards another booom. Everybody wants to have such an analysis in their bag.
This will not only help an organization avoid market losses but use that information to their advantage

A financial crisis is preceded by a financial bubble, where speculation reaches its maximum. In other words the stocks become highly correlated. So it does not matter where you are investing. Every stock would have a positive return

1) Multiple Correlation Coefficient
This is the basis of my analysis. On a particular exchange, If i observe the correlation statistics of all stocks and can prove that the correlation is converging then we can raise the flag. However with hundreds of stocks it is difficult to analyse the correlation matrix. So we have to come up with a singular number. So i used the Multiple Correlation Coefficient. You can reach more about it at http://en.wikipedia.org/wiki/Multiple_correlation

Let us assume that we have 100 stock prices. We can make the first stock price as the dependent variable and the rest of the 99 stocks as the independent.
We then calculate the correlation matrix for the 99 stocks which gives as the below matrix R


We then calculate the matrix C which is the correlation between the dependent stock and each of the 99 independent stocks

The Multiple Correlation Coefficient is now simply



I took BSE(Bombay Stock Exchange) as my initial sample space. I have two graphs


 The values plotted are for the Adjusted Correlation Coefficient from 2000 to 2010

 The values plotted are for the Adjusted Correlation Coefficient from 2000 to 2014

However, the above method can be assumed to be brute force. We would like to intelligibly select the independent variables. I plan to use PCA and Cholesky for further analysis and compare the data

2) Principal Component Analysis


3) Cholesky Transformation

Sunday, March 22, 2015

Historical Finance News Feed

I scraped some financial news data for the year 2015. The file is present here. I am scraping data for the past 15 20 years for all subject lines ( Commodities, FX, Bonds etc )

Scraping the data is not enough. We would like to link the news to some effect on the stock prices so that we can use it for prediction/forecasting.

ANALYSIS 1
I took the stock prices and found the variation between the High and Opening Prices. Any day with a movement of more than 4 SD's can be marked as potential news days. We will then take the news from these days and mark them as having POSITIVE Sentiment on the Stock

Code
# Accessing the news feed content in R
library(XML)
library(RCurl)
# Find the big deviations and find if there are related news and vice versa

# We will read data from the master source file of RICS
Tickers = read.csv('C:\\Anant\\MyLearning\\Statistics\\SpreadAnalysis\\WorldTickerList.csv')

# For now we will take the example of GOOGLE in that list
Tickers = Tickers[Tickers$TICKER=='GOOG',]

# We will use the Ticker value and download data from Yahoo Finance
# You can also customise the date ranges
URL = paste(c('http://real-chart.finance.yahoo.com/table.csv?s=',as.character(Tickers$TICKER),'&a=00&b=01&c=2015&d=08&e=30&f=2015&g=d&ignore=.csv'),collapse="")
GOOG = read.csv(URL)

GOOG$OpenHighSpread = GOOG$High - GOOG$Open
GOOG$LowHighSpread = GOOG$High - GOOG$Low
GOOG$OpenHigh = (GOOG$OpenHighSpread - mean(GOOG$OpenHighSpread))/sd(GOOG$OpenHighSpread)
GOOG$LowHigh = (GOOG$LowHighSpread - mean(GOOG$LowHighSpread))/sd(GOOG$LowHighSpread)

MajorPoints = GOOG[GOOG$OpenHigh < -3 | GOOG$OpenHigh > 3,]

# We see that there were 4 dates when there was a lot of deviation in the Open High
# There must have been some news around these dates

#############################################################################################
# Source 1 : GOOGLE

#############################################################################################
# source 2 Reuters
for(dateValue in MajorPoints$Date)
{
  newsDateURL = paste(c('http://www.reuters.com/finance/stocks/companyNews?symbol=GOOG.O&date=',format(as.Date(dateValue,"%Y-%m-%d"),"%m%d%Y")),collapse="")
  #newsDateURL = paste(c(newsURL,'&startdate=',dateValue,'&enddate=',dateValue),collapse="")
  print(newsDateURL)
  doc = getURL(newsDateURL)
  doc = htmlParse(doc)
  news = xpathSApply(doc,'//div[@id = "companyNews"]/div/div/div/p')
}

#############################################################################################
# Source 3 Google
for(dateValue in MajorPoints$Date)
{
  newsDateURL = paste(c('http://finance.yahoo.com/q/h?s=',as.character(Tickers$TICKER),'&t',as.character(dateValue)),collapse="")
  #newsDateURL = paste(c(newsURL,'&startdate=',dateValue,'&enddate=',dateValue),collapse="")
  print(newsDateURL)
  doc = getURL(newsDateURL)
  doc = htmlParse(doc)
  news = xpathSApply(doc,'//div[@class = "mod yfi_quote_headline withsky"]/ul/li//a')
}


Next step is to do some Language Processing on this data

Historical Finance News Feed

I scraped some financial news data for the year 2015. The file is present here. I am scraping data for the past 15 20 years for all subject lines ( Commodities, FX, Bonds etc )

Scraping the data is not enough. We would like to link the news to some effect on the stock prices so that we can use it for prediction/forecasting.

ANALYSIS 1
I took the stock prices and found the variation between the High and Opening Prices. Any day with a movement of more than 4 SD's can be marked as potential news days. We will then take the news from these days and mark them as having POSITIVE Sentiment on the Stock

Code
# Accessing the news feed content in R
library(XML)
library(RCurl)
# Find the big deviations and find if there are related news and vice versa

# We will read data from the master source file of RICS
Tickers = read.csv('C:\\Anant\\MyLearning\\Statistics\\SpreadAnalysis\\WorldTickerList.csv')

# For now we will take the example of GOOGLE in that list
Tickers = Tickers[Tickers$TICKER=='GOOG',]

# We will use the Ticker value and download data from Yahoo Finance
# You can also customise the date ranges
URL = paste(c('http://real-chart.finance.yahoo.com/table.csv?s=',as.character(Tickers$TICKER),'&a=00&b=01&c=2015&d=08&e=30&f=2015&g=d&ignore=.csv'),collapse="")
GOOG = read.csv(URL)

GOOG$OpenHighSpread = GOOG$High - GOOG$Open
GOOG$LowHighSpread = GOOG$High - GOOG$Low
GOOG$OpenHigh = (GOOG$OpenHighSpread - mean(GOOG$OpenHighSpread))/sd(GOOG$OpenHighSpread)
GOOG$LowHigh = (GOOG$LowHighSpread - mean(GOOG$LowHighSpread))/sd(GOOG$LowHighSpread)

MajorPoints = GOOG[GOOG$OpenHigh < -3 | GOOG$OpenHigh > 3,]

# We see that there were 4 dates when there was a lot of deviation in the Open High
# There must have been some news around these dates

#############################################################################################
# Source 1 : GOOGLE

#############################################################################################
# source 2 Reuters
for(dateValue in MajorPoints$Date)
{
  newsDateURL = paste(c('http://www.reuters.com/finance/stocks/companyNews?symbol=GOOG.O&date=',format(as.Date(dateValue,"%Y-%m-%d"),"%m%d%Y")),collapse="")
  #newsDateURL = paste(c(newsURL,'&startdate=',dateValue,'&enddate=',dateValue),collapse="")
  print(newsDateURL)
  doc = getURL(newsDateURL)
  doc = htmlParse(doc)
  news = xpathSApply(doc,'//div[@id = "companyNews"]/div/div/div/p')
}

#############################################################################################
# Source 3 Google
for(dateValue in MajorPoints$Date)
{
  newsDateURL = paste(c('http://finance.yahoo.com/q/h?s=',as.character(Tickers$TICKER),'&t',as.character(dateValue)),collapse="")
  #newsDateURL = paste(c(newsURL,'&startdate=',dateValue,'&enddate=',dateValue),collapse="")
  print(newsDateURL)
  doc = getURL(newsDateURL)
  doc = htmlParse(doc)
  news = xpathSApply(doc,'//div[@class = "mod yfi_quote_headline withsky"]/ul/li//a')
}


Next step is to do some Language Processing on this data

Thursday, March 19, 2015

Finding monthly Patterns in Stocks

There are some stocks that perform well at a certain time of the year due to the nature of the business. I tried to isolate those stocks. I took BSE as my base and got data for the past 12 years

The graph below is for VOLTAS.BO ( Sum of Returns for the entire month vs Month ) across 12 years

The code for obtaining the above graph can be found here

I then took other exchanges around the world to find patterns in stock prices. I calculated the standard deviation of the monthly returns for each month across the years. The stocks with the lowest standard deviations were the ones which saw equal increase or decrease in returns during that month across years. The code for doing it can be found here

Within the code you can replace the line
dataFile <- br="" nant="" orldtickerlist.csv="" read.csv="" tatistics="" tockdata="" ylearning="">
with your own ticker list. I have a reference to the world ticker list present in my earlier blog post
http://simplyanant.blogspot.in/2015/03/download-all-tickers-for.html

 For BSE I got the following results. You can use the commented ggplot command in the code to plot the graph and see for yourself. There will be a month where the returns follow a pattern
[1] "TURBO.BO"
[1] "KEDIAVA.BO"
[1] "JRIIIL.BO"
[1] "JPPOWER4.BO"
[1] "JOLLYRID.BO"
[1] "JUMBFNL.BO"
[1] "TURBO.BO"
[1] "KEDIAVA.BO"
[1] "JAGSONFI.BO"
[1] "JRIIIL.BO"
[1] "JPPOWER4.BO"
[1] "JOLLYRID.BO"
[1] "JUMBFNL.BO"
[1] "RAJOIL.BO"

Best of luck hunting stock patterns :) Any suggestions are welcome

Obviously the following will give you a hint :)





Tuesday, March 17, 2015

Mumbai Weather Analysis

After staying in mumbai for the past 4.5 years, I have come to the conclusion that the weather is not quite what I had expected. I had expected a moderate one with lots of wind, but sadly that is not the case. It might be because of the high rise construction or the traffic, but who knows

I collected data available for a mumbai weather station. The data is available in form of an excel sheet here

Let us start with max temperature

In the summer season, the max temperature has actually decreased from 2010 onwards. It rose to its maximum in the month of May

We can see that the month of November considered everywhere else as a winter month is not so, for the month of November

October sees a sharp rise in temperature as the monsoon recedes


Let us come now to Minimum temperatures

Surprisingly, there is a huge shift between the temperatures of March and April ( the onset of summer )
Rain keeps the climate moderate with less differences between the max and min temperatures which can be seen here

There is again a huge difference between the min temperature for the month of Nov and that of Dec

Monday, March 9, 2015

Download all tickers for Stocks,ETF,Mutual Fund etc from all over the world

Sadly there is no easy way to download all the ticker information. The only way I could do it is to parse the Yahoo finance web pages and parse them

Dowload the file 

The above file has the timestamp of 15th March. I used PErl to scrape the web pages. If somebody wants the Perl code to obtain this information, just let me know. It is pretty simple, although it will take 5 to 6 hours

Sunday, March 8, 2015

Pulling Stock Data through R/YQL

After i got the list of all tickers of products around the world, I started pulling data and doing some analysis. The easiest of the analysis were
1) Finding the spread
2) Finding the correlated products

I chose stocks from 2 countries
a) BSE Stocks from India
b) German Stocks from the DE Bourse
c) Currency data all over the world

Here are a few interesting points
From the analysis from the year 2013 onwards, the currencies CAD and GTQ had the highest negative correlation. In order to verify it, I pulled data from the website and it matches

On the flip side, the highly correlated currencies were DKK and XAF


One more observation is that most of the OPEC countries have their currency highly correlated to the dollar trying to find a correlation between them and other currencies s similar to finding correlations between USD and other currencies

The R script which i used to find out these is located here and here. The first file is to calculate the spread/deviation and the second is to download the data. The scripts are pretty simple, one needs to change the directories though for their use

If you want to use excel to extract the stock information, then the best method would be YQL .It is easy to use and you have all sorts of information in  a structured format

If you want to pull out historical prices with the following specifciations
Symbol/Ticker : symbol
StartDate
EndDate

then the correpsonding weblink to extract this info would be ( the below is in excel vba format )
"http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20in%20%28%27 " & symbol & "%27%29%20and%20startDate%20=%20%27" & startDate & "%27%20and%20endDate%20=%20%27" & endDate & "%27&diagnostics=true&env=store://datatables.org/alltableswithkeys"

I have created an excel sheet where you can give a list of the stock tickers and the start and end date and it will download the neccessary information within the excel sheet

You can download the sheet here and tweak it as per your requirements. Enjoy