Skip to main content
Ctrl+K
Data Analysis in Finance - Home
  • Welcome

Getting Started

  • Why Python and finance?
  • Python set-up
    • Using Github and Github Codespaces
    • Markdown
    • Packages
    • Code style, PEP8, and linting
    • Using AI Coding Tools

The Basics

  • CompSci 101: Types, control, and numpy arrays
    • The Basics
    • Numpy and arrays
  • Working with data
    • Importing data
    • pandas
    • Cleaning our data
    • Exploratory data analysis (EDA)
    • Merging and reshaping data
    • Using SQL in Python
    • polars: A fast, fancy pandas alternative
  • Data visualization
    • seaborn
    • matplotlib
    • plotly
  • Data APIs
    • FRED API and pandas-datareader
    • NASDAQ Data Link
    • Databento
    • WRDS (Wharton Research Data Services)
  • Financial time series

Applications

  • Essential portfolio math
  • Portfolio optimization
  • Machine learning and unsupervised learning
  • Factor models
  • Supervised machine learning and regression topics
  • Logit models
  • Risk management
  • Monte Carlo and portfolios
  • Decision Trees
  • Option basics
  • Trading Strategies and the BT Package
  • .ipynb

NASDAQ Data Link

Contents

  • Getting Started
  • Set-up
    • Secure API Key Usage
  • Exploring Available Data
  • Example: Zillow Housing Data
    • Understanding the Data Structure
    • Exploring Regions
    • Finding Specific Regions
    • Downloading Housing Data
    • Exporting Data
  • Using Rapid API
    • REST APIs
    • Working with JSON Data
  • Data on Kaggle

NASDAQ Data Link#

NASDAQ Data Link (formerly Quandl) provides access to a wide variety of financial and alternative datasets.

Getting Started#

  1. Create an account at data.nasdaq.com/sign-up - choose Academic if you’re a student

  2. Set up two-factor authentication (Google Authenticator recommended)

  3. Get your API key from data.nasdaq.com/account/profile

  4. Store the key as a GitHub Secret (recommended) or in a .env file

Install the package:

pip install nasdaq-data-link
../_images/07-nasdaq-home.png

Fig. 37 NASDAQ Data Link homepage.#

Set-up#

import nasdaqdatalink
import os
from dotenv import load_dotenv

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline
# Load API key
load_dotenv()
NASDAQ_API_KEY = os.getenv('NASDAQ_API_KEY')

# Configure the API
nasdaqdatalink.ApiConfig.api_key = NASDAQ_API_KEY

Secure API Key Usage#

For GitHub Codespaces, use secrets:

# Recommended approach
import nasdaqdatalink as ndl
import os

NASDAQ_API_KEY = os.environ.get('NASDAQ')
ndl.ApiConfig.api_key = NASDAQ_API_KEY

Exploring Available Data#

Click EXPLORE on the main data page to see all available datasets. Much of the data is premium (paid), but you can filter for free data.

../_images/07-nasdaq-explore.png

Fig. 38 Exploring NASDAQ data options.#

Example: Zillow Housing Data#

NASDAQ provides Zillow real estate data for free.

../_images/07-nasdaq-zillow.png

Fig. 39 Zillow housing data on NASDAQ.#

The data is organized into tables:

  • ZILLOW/DATA - The actual housing values

  • ZILLOW/REGIONS - Region identifiers and descriptions

  • ZILLOW/INDICATORS - Indicator descriptions

Understanding the Data Structure#

Before downloading data, you need to understand what filters are available. Check the documentation for filtering options.

Exploring Regions#

Let’s first pull the regions table to understand what geographic areas are available.

regions = nasdaqdatalink.get_table('ZILLOW/REGIONS', paginate=True)
regions
---------------------------------------------------------------------------
ForbiddenError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 regions = nasdaqdatalink.get_table('ZILLOW/REGIONS', paginate=True)
      2 regions

File ~/Documents/fin-data-analysis-text/.venv/lib/python3.9/site-packages/nasdaqdatalink/get_table.py:19, in get_table(datatable_code, **options)
     17 while True:
     18     next_options = copy.deepcopy(options)
---> 19     next_data = Datatable(datatable_code).data(params=next_options)
     21     if data is None:
     22         data = next_data

File ~/Documents/fin-data-analysis-text/.venv/lib/python3.9/site-packages/nasdaqdatalink/model/datatable.py:31, in Datatable.data(self, **options)
     29 if not options:
     30     options = {'params': {}}
---> 31 return Data.page(self, **options)

File ~/Documents/fin-data-analysis-text/.venv/lib/python3.9/site-packages/nasdaqdatalink/operations/list.py:30, in ListOperation.page(cls, datatable, **options)
     26 request_type = RequestType.get_request_type(path, **options)
     28 updated_options = Util.convert_options(request_type=request_type, **options)
---> 30 r = Connection.request(request_type, path, **updated_options)
     32 response_data = r.json()
     33 Util.convert_to_dates(response_data)

File ~/Documents/fin-data-analysis-text/.venv/lib/python3.9/site-packages/nasdaqdatalink/connection.py:40, in Connection.request(cls, http_verb, url, **options)
     36 options['headers'] = headers
     38 abs_url = '%s/%s' % (ApiConfig.api_base, url)
---> 40 return cls.execute_request(http_verb, abs_url, **options)

File ~/Documents/fin-data-analysis-text/.venv/lib/python3.9/site-packages/nasdaqdatalink/connection.py:52, in Connection.execute_request(cls, http_verb, url, **options)
     47 response = session.request(method=http_verb,
     48                            url=url,
     49                            verify=ApiConfig.verify_ssl,
     50                            **options)
     51 if response.status_code < 200 or response.status_code >= 300:
---> 52     cls.handle_api_error(response)
     53 else:
     54     return response

File ~/Documents/fin-data-analysis-text/.venv/lib/python3.9/site-packages/nasdaqdatalink/connection.py:120, in Connection.handle_api_error(cls, resp)
    109 d_klass = {
    110     'L': LimitExceededError,
    111     'M': InternalServerError,
   (...)
    116     'X': ServiceUnavailableError
    117 }
    118 klass = d_klass.get(code_letter, DataLinkError)
--> 120 raise klass(message, resp.status_code, resp.text, resp.headers, code)

ForbiddenError: (Status 403) (Nasdaq Data Link Error QEPx04) A valid API key is required to retrieve data. Please check your API key and try again. You can find your API key under your account settings.

Note

The paginate=True parameter is important! Without it, you only get the first 10,000 rows. With pagination, the limit extends to 1,000,000 rows.

# Filter for just cities
cities = regions[regions.region_type == 'city']
cities
cities.info()
# Filter for counties
counties = regions[regions.region_type == 'county']
counties

Finding Specific Regions#

You can search the text in a column to find specific regions. Let’s find all NC counties:

nc_counties = counties[counties['region'].str.contains(';NC')]
nc_counties
nc_counties.info()

There are 100 counties in NC, so this worked.

Downloading Housing Data#

Now we can use these region IDs to pull actual housing data. First, convert to a list:

nc_county_list = nc_counties['region_id'].to_list()

Now pull the data for a specific indicator. ZATT is one of the housing indicators.

Warning

Be careful with large downloads! They can exceed API limits and cause timeouts.

# Example: Pull NC county data (commented to avoid API calls during build)
# zillow_nc = nasdaqdatalink.get_table(
#     'ZILLOW/DATA', 
#     indicator_id='ZATT', 
#     paginate=True, 
#     region_id=nc_county_list,
#     qopts={'columns': ['indicator_id', 'region_id', 'date', 'value']}
# )
# zillow_nc.head(25)

Exporting Data#

You can export to CSV for exploration in Excel:

# Export counties to CSV
counties.to_csv('counties.csv', index=True)

Using Rapid API#

Rapid API is a marketplace for thousands of APIs - markets, sports, weather, and more. Many have free tiers.

../_images/07-rapidapi.png

Fig. 40 Rapid API marketplace.#

REST APIs#

Most APIs on Rapid API are REST APIs - a standardized way for computers to communicate using standard data formats like JSON.

Learn more at the API Learn page.

Working with JSON Data#

API responses typically come in JSON format. Here’s how to handle them:

import requests

# Make API request
response = requests.get(url, headers=headers)

# Convert to JSON
data_json = response.json()

# Flatten nested JSON to DataFrame
df = pd.json_normalize(data=data_json)
../_images/07-json.png

Fig. 41 JSON structure - nested dictionaries and lists.#

For deeply nested JSON, use the record_path parameter:

# Flatten nested 'events' key
df = pd.json_normalize(data=data_json, record_path=['events'])

Data on Kaggle#

Kaggle is another great source for datasets. Search their datasets for finance-related data.

Examples:

  • Consumer Finance Complaints

  • Stock price histories

  • Economic indicators

Kaggle data is usually pre-cleaned for competitions, but you’ll still need to do some preparation work.

previous

FRED API and pandas-datareader

next

Databento

Contents
  • Getting Started
  • Set-up
    • Secure API Key Usage
  • Exploring Available Data
  • Example: Zillow Housing Data
    • Understanding the Data Structure
    • Exploring Regions
    • Finding Specific Regions
    • Downloading Housing Data
    • Exporting Data
  • Using Rapid API
    • REST APIs
    • Working with JSON Data
  • Data on Kaggle

By Prof. Adam Aiken