NASDAQ 100 stock data

Download

You can download NASDAQ100 stock data from here.

Full dataset

Description

NASDAQ 100 stock dataset consists of stock prices of 104 corporations under NASDAQ 100 and the index value of NASDAQ 100. The frequency of the data collection is one-minute. This data covers the period from July 26, 2016 to April 28, 2017, in total 191 days.

Each day contains 391 data points (for all the corporations) and 390 data points (for NASDAQ 100 Index) from the opening to closing of the market. Data points from 2 to 391 of corporations are corresponding to the data points from 1 to 390 of NASDAQ 100.

The ticker symbols of the stocks are in the file stock_name.txt.

Symbol Lookup: http://www.google.com/finance/.

Files

Each file in the directory /nasdaq100/full/separate/ includes the stock price of one corporation each day.

Columns in the file = “index, date, close, high, low, open, volume”.

All the stocks only have 210 data points on the day November 25, 2016.

The file full_non_padding.csv saves the close price of all the stocks and NASDAQ100 Index in the directory /nasdaq100/full/separate/ into one “.csv” file with missing data denoted as NaN.

The order of the ticker names in the columns is the same as that in the file stock_name.txt.

Note!!!

Linear Technology (LLTC) only has 157 days’ data as Shire PLC (SHPG) replaced Linear Technology (LLTC) in the NASDAQ100 index. The close price of LLTC in the file full_non_padding.csv is marked as 0 after 158 days.

Small dataset

Discription

This dataset is a subset of the full NASDAQ 100 stock dataset used in [1]. It includes 105 days' stock data starting from July 26, 2016 to December 22, 2016. Each day contains 390 data points except for 210 data points on November 25 and 180 data points on Decmber 22.

Some of the corporations under NASDAQ 100 are not included in this dataset because they have too much missing data. There are in total 81 major coporations in this dataset and we interpolate the missing data with linear interpolation.

In [1], the first 35,100 data points are used as the training set and the following 2,730 data points are used as the validation set. The last 2,730 data points are used as the test set.

File

The file nasdaq100_padding.csv saves the close price of 81 stocks and NASDAQ 100 index (the last column).

The order of the ticker names in the columns is the same as that in the file small_stock_name.txt.

Extended dataset

Discription

We collect the stock price of 10 corporations newly joining the NASDAQ 100 Index. This data covers the period from March 29, 2017 to April 28, 2017, in total 23 days. The frequency of the data collection is still one-minute.

Files

Each file in the directory /nasdaq100/extended/separate/ includes the stock price of one corporation each day.

Columns in the file = “index, date, close, high, low, open, volume”.

The file extended_non_padding.csv saves the close price of all the stocks in the directory /nasdaq100/extended/separate/ into one “.csv” file with missing data denoted as NaN.

Columns in the file:
“CTAS, GOOG, HAS, HOLX, IDXX, JBHT, KLAC, LILA, LILAK, SHPG”.

Citation

The dataset can be used for time series prediction and stock market analysis.
If you are going to use this dataset, please cite the following paper:

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., Cottrell, G.
International Joint Conference on Artificial Intelligence (IJCAI) , 2017
pdf

Code

You can read the data in the .csv file into a pandas data frame with the following codes:

import pandas as pd data = pd.read_csv('./nasdaq100/small/nasdaq100_padding.csv')


If you have any questions about this dataset, please contact: Yao Qin and Dongjin Song.