Whoever studies data science feels a need to at least once attempt the giant of all prediction games, to achieve the unachievable- to predict the stock market.
I was drawn towards it during my journey of data science learning. In the process, I came to realize that stock market is risky as it heavily works on the sentiment factor. This article is a stint of technical analysis and hence, just for educational purpose. Although one can not ignore the fact that sentiment analysis is also a part of technical analysis now but I admit I am just at the beginner’s stage and hence not too proficient with it.
The data which was needed to be studied was extracted from the API of Yahoo finance. (There are tons of other sources as well).
The daily stock data of JP Morgan and Chase(whose ticker is JPM) from the year 2010 to the year 2017 was extracted for the purpose.
The default data has 6 features in it consisting of Open, High, Low, Close, Adjusted Close and Volume which were saved in the data frame df_JPM with a total of 2014 entries and as it has no NULL values, therefore it doesn’t need any kind of cleaning.
Getting Started with Feature Engineering:
To analyse any dataset, sometimes we need to create some additional features to get some more insight and to get a hint of what data might be indicating us to.
SMA is a moving average which is calculated by adding the close price of the stock for a given number of time period(Here for us it is 20 days) whereas EMA is an exponential moving average. They are both important for evaluating many more features and analysing everything as a whole.
From the ‘Close’ feature we can evaluate Gain and Loss by comparing today’s closing price with previous day’s.
Relative Strength indicator tells us about stock price performance and it is the ratio of Average Gain and Average Loss(each is SMA of Gain and loss of 14 days).
Using Relative strength we can understand the speed and change of stock prices(RSI).
Since df_JPM is out main data frame so we have done most of the calculations with df_temp and updated the df_JPM only when we became sure of the outcomes we had got.
The Money Flow Index (MFI) is an oscillator that uses both price and volume to measure buying and selling pressure.MFI is also known as volume-weighted RSI.
Both, the moving averages coming closer or going farther away from each other, are important to us. The convergence indicates the increasing up-side momentum whereas Divergence is an indicator of increasing down side momentum. It reveals changes in the strength, direction, momentum and duration of a trend in a stock’s price.
Calculating MACD tells us whether the day was trade-able or not. If the day was trade-able then 1 is assigned to MACD entry else, 0.
From here our analysis would be done keeping MACD in the center. The binary values of MACD make it easier to train our machine learning algorithms with it as a target.
Before feeding our data into any machine learning algorithm, it is very necessary for us to make sure that the dataset has only those features in its arsenal which are correlated with each other. Principal Component Analysis is a technique to find that out.
The chart is shown below reveals the MACD with value 0 (blue) and 1(yellow).
PCA gives us the heat-map displaying the correlation amongst features. Volume, Loss and Average_Loss are those features which are negatively correlated and may hamper the efficiency of the algorithms. So removing them is recommended. The data has been split into train_set and test_set. train_set has the MACD feature included in it so that we can train our algorithms with it and test_set is deprived of it.
The comparison of output obtained from various algorithms.
I hope you have found this post helpful.