Downloading Historical Stock Data With Pandas

This is a short post quickly outlining the Python module Pandas, which has been a great find. Pandas make the manipulation of labelled multi-dimensional data very easy indeed allowing operations that might be more familiar to users of a typical spreadsheet app to be coded up in Python in no time.

Installation of Pandas and the associated Pandas DataReader modules is a simple as this:

python -m pip install pandas
python -m pip install pandas-datareader

I’m not going to cover details of the core Pandas feature set, but rather wanted to quickly show how the DataReader modules makes the automatic collection of historical stock data very easy indeed… so for example this coder snippet shows how to download closing prices for Microsoft stock over a roughly three year time-frame, and save it to a CSV file.

import as pdr_data
import datetime
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2013, 1, 27)
data = pdr_data.DataReader('MSFT', 'google', start, end)

TensorFlow Setup For Windows 10

This is more a summary of my own experience getting Tensorflow up and running on my own Windows 10 PC, rather than a complete setup guide, but might help people figure out solutions to the few of the problems I encountered along the way, including myself in future if I do this again on a fresh PC.

Before starting I’d advise reviewing the documentation for installing tenorflow here and possibly the beginners user guide here too.

Step 1 is to install the NVidia CUDA Toolkit, and NVidia CudaDNN. I already had CUDA v8.0 installed which is exactly what we need so we are good so far, but I didn’t have the DNN lib installed. My first reaction was to download and try to setup v6.0 but I later found that v5.1 was needed, so instead you need to download that version specifically. The official setup guide requests that you put the DNN lib somewhere and then add that location to the PATH envionment variable. I’ll come back to that later.

I’m planning to use Tensorflow from Python. I had Python 3.5.1 installed. For some reason the available tensorflow module wasn’t happy about that and I had to downgrade to Python 3.5.0 before it would install. Old versions of python are here. To complete the downgrade I also had to change the path environment variable to ensure the new (older) version was used in preference to the newer one.

The python version can be easily checked by doing this.

python --version
Python 3.5.0

I also had to upgrade the version of PIP before it would install, which can be done by doing this.

python -m pip install --upgrade pip

Finally I was able to install the package, where you can choose either of the following for either a CPU or GPU implementation. I chose GPU.

python -m pip install --upgrade tensorflow
python -m pip install --upgrade tensorflow-gpu

Having done all that, it still wouldn’t work for me, complaining about missing DLL’s now ?!

Quite a few forums suggest that the errors meant I was missing “Visual C++ 2015 Redistributable Update 3”. I am missing that as it happens, so I spent some time trying to install it. Eventually I realized that the Visual C++ 2017 version is a binary compatible in-place upgrade to the Visual C++ 2015 version, and so what I had installed should already should meet the requirements of TensorFlow. This wasn’t the problem.

Fixing this problem actually involved me ignoring the CUDA installation instructions, and so instead of using PATH to locate the DNN lib, I found it only worked when I merged the files from DNN into the CUDA installation itself.

Having done that I can run the sample from the beginners guide without error!

I’m new to this, but the first thing I did was to modify the sample so the network looked a bit more like the networks I’ve built by hand (in C++) to learn to read the MNIST samples, basically adding a hidden layer with 100 neurons and randomizing the weights, so the code near the top ends up looking like this.

x = tf.placeholder(tf.float32, [None, 784])
W1 = tf.Variable(tf.random_normal([784, 100], stddev=0.01))
W2 = tf.Variable(tf.random_normal([100, 10], stddev=0.01))
b1 = tf.Variable(tf.zeros([100]))
b2 = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, W1) + b1
y = tf.nn.relu(y)
y = tf.matmul(y, W2) + b2

Alongside that I also increased the number of batches to be processed to 20k, which get’s me a score of 98.0%!

UPDATE: Trying to run some of the other samples this morning I’m getting crashes in the Python process coupled with error messages relating to cupti64_80.dll. I eventually figure out that this DLL comes with the CUDA Toolkit and that I already have it installed under CUDA\v8.0\extras\CUPTI\libx64, though that location is not in the PATH. Adding it to the PATH resolves this problem.