WRDS (Wharton Research Data Services)#

WRDS is the gold standard for academic financial research. It provides access to dozens of research databases used in peer-reviewed finance and accounting research.

What is WRDS?#

WRDS is a data platform maintained by the Wharton School at the University of Pennsylvania. It hosts:

  • CRSP - Stock prices, returns, and market data back to 1926

  • Compustat - Financial statements for public companies

  • IBES - Analyst earnings forecasts

  • TAQ - High-frequency trade and quote data

  • Thomson Reuters - Institutional holdings (13F filings)

  • And many more…

Getting Access: Student Class Accounts#

WRDS class accounts let you work with real financial data to complete assignments. You’ll need a Class Code from your instructor.

New to WRDS? How to Enroll#

  1. Go to the WRDS Registration form

  2. Enter your identifying information

  3. Select Elon University from the Subscriber drop-down

  4. Your User type should be Class - Students with Code (selected by default)

  5. Enter the Class Code provided by your instructor

  6. Click Register for WRDS

Note

WRDS requires two-factor authentication. Install the Duo Mobile app on your smartphone before registering. You’ll use this for all future logins.

Already Have a WRDS Account?#

If you already have a WRDS account from a previous class:

  1. Log in at wrds-www.wharton.upenn.edu

  2. Go to Your AccountYour Account Info (top right)

  3. Scroll to the Your Classes table

  4. Click Enroll in a Class

  5. Enter your new Class Code and click Submit

Tip

If your previous class has expired, you’ll see a message when you try to log in. Just enter your new Class Code in the box provided and confirm enrollment.

Need Help?#

If you have trouble setting up your account, contact WRDS Support. Use the email associated with your WRDS account when opening a support ticket.

Data Strategy: What’s Available and How to Combine It#

Before diving into code, think about what data you need and where to get it. WRDS is powerful, but it’s just one piece of your data toolkit. With Claude Code, you don’t need to memorize SQL syntax—focus on understanding what data exists and how to link it together.

Elon’s WRDS Subscriptions#

Database

Library

What It Contains

Key Identifiers

CRSP

crsp

Stock prices, returns, volume (back to 1926)

permno, cusip

Compustat

comp

Financial statements (income, balance sheet, cash flow)

gvkey, tic (ticker)

IBES

ibes

Analyst earnings forecasts and recommendations

ticker

Dow Jones

djones

Dow Jones Index historical data

date

Note

WRDS also offers many free datasets including Dow Jones Averages, Penn World Tables, and others. Use db.list_libraries() to see everything available.

The Identifier Problem#

Different databases use different identifiers for the same company:

Source

Identifier

Example (Apple)

CRSP

permno

14593

Compustat

gvkey

1690

Compustat

tic

AAPL

Bloomberg

Ticker

AAPL US Equity

Yahoo Finance

Ticker

AAPL

SEC EDGAR

CIK

0000320193

Ticker symbols are the easiest way to link datasets, but be careful:

  • Tickers change (Google was GOOG, then added GOOGL)

  • Tickers get reused (old companies’ tickers assigned to new ones)

  • Different sources format tickers differently

Combining WRDS with Other Data Sources#

You’ll often need to combine WRDS data with other sources:

Data Need

WRDS Source

External Source

Link By

Stock returns + financials

CRSP + Compustat

tic (ticker) in pandas

Fundamentals + real-time prices

Compustat

Yahoo Finance, Bloomberg

Ticker symbol

Historical returns + ESG scores

CRSP

Bloomberg, Refinitiv

Ticker symbol

Analyst forecasts + stock performance

IBES + CRSP

tickertic

Macro data + stock returns

CRSP

FRED API

Date alignment

Practical Workflow#

Since you have Claude Code, here’s a realistic workflow:

  1. Pull historical data from WRDS (CRSP for returns, Compustat for financials)

  2. Export to CSV or keep in DataFrame

  3. Pull complementary data from Bloomberg, Yahoo Finance, or FRED

  4. Merge in pandas using ticker symbol and date

# Example: Merge WRDS data with Yahoo Finance data
import yfinance as yf

# Get fundamentals from Compustat (via WRDS)
compustat_data = db.raw_sql("""
    SELECT tic, datadate, at, ni, sale
    FROM comp.funda
    WHERE tic = 'AAPL' AND datafmt = 'STD'
""")

# Get recent prices from Yahoo Finance
aapl = yf.download('AAPL', start='2023-01-01')

# Now you have both historical fundamentals AND current prices

What to Ask Claude Code#

When working with WRDS, tell Claude Code:

  • What data you need (e.g., “daily stock returns for Apple from 2020-2023”)

  • Which database to use (e.g., “use CRSP for returns, Compustat for financials”)

  • What identifiers you have (e.g., “I have ticker symbols” or “I have permno values”)

  • How you want to combine data (e.g., “merge with Yahoo Finance data by ticker”)

Claude Code will handle the SQL syntax—you focus on the research question.

Set-up#

Install the WRDS Python library:

pip install wrds

Connecting to WRDS#

import wrds

# First time: will prompt for username/password
# Creates a .pgpass file for future connections
db = wrds.Connection()

After your first connection, credentials are stored locally and you won’t need to enter them again.

How WRDS Data is Organized#

Before querying data, it helps to understand WRDS’s structure:

  • Libraries - Top-level data providers (e.g., crsp, comp, djones)

  • Tables - Datasets within each library (e.g., dsf, funda, djdaily)

  • Variables - Column headers within each table (e.g., date, ret, prc)

Think of it like: library.tablecrsp.dsf (CRSP Daily Stock File)

Discovering What’s Available#

import wrds
db = wrds.Connection()

# List all available libraries
sorted(db.list_libraries())

# List tables in a specific library
db.list_tables(library='crsp')

# Describe a table's columns (variables)
db.describe_table(library='crsp', table='dsf')

# Get the row count for a table
db.get_row_count(library='djones', table='djdaily')

Tip

You can also browse the WRDS Dataset List online to explore available data without writing code.

Warning

Library and table names must be all lowercase. Using CRSP instead of crsp will cause errors.

Querying WRDS Data#

The wrds module provides two main methods for retrieving data:

Method

Best For

Flexibility

get_table()

Simple queries, quick exploration

Lower

raw_sql()

Complex queries, joins, filters

Higher

All query results are returned as pandas DataFrames, ready for analysis.

Using get_table()#

The simplest way to retrieve data—good for exploring datasets:

# Get first 10 rows of Dow Jones data
data = db.get_table(library='djones', table='djdaily', obs=10)

# Select specific columns only
data = db.get_table('djones', 'djdaily', 
                    columns=['date', 'dji'], 
                    obs=10)

Key parameters:

  • library - The library to query (required)

  • table - The dataset to query (required)

  • columns - List of columns to include (optional)

  • obs - Number of observations to return (optional)

Using raw_sql()#

For more control, write SQL queries directly. This is what you’ll use for most research:

# Basic SQL query
data = db.raw_sql("""
    SELECT date, dji 
    FROM djones.djdaily 
    LIMIT 10
""", date_cols=['date'])

# Query with filters
data = db.raw_sql("""
    SELECT permno, date, prc, ret
    FROM crsp.dsf
    WHERE permno = 14593
    AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", date_cols=['date'])

Key parameters:

  • sql - The SQL query string (required)

  • date_cols - List of columns to parse as dates (recommended)

Important

SQL Basics:

  • SELECT specifies which columns you want

  • FROM library.table specifies the data source

  • WHERE filters rows based on conditions

  • LIMIT restricts the number of rows returned

Always Limit Results While Developing#

WRDS datasets can be huge. Always limit your results while testing:

# With get_table(): use obs parameter
data = db.get_table('crsp', 'dsf', obs=100)

# With raw_sql(): use LIMIT clause
data = db.raw_sql("SELECT * FROM crsp.dsf LIMIT 100")

Once your query is working correctly, remove the limit to get the full dataset.

Warning

Running queries without limits on large tables (like crsp.dsf) can take a very long time and may timeout. Start small!

Example Workflow: Exploring CRSP Data#

Let’s walk through a typical research workflow using the CRSP Daily Stock File (crsp.dsf). This mirrors how you’d approach a real analysis.

Step 1: Explore the Data Structure#

import wrds
db = wrds.Connection()

# What tables are in CRSP?
db.list_tables('crsp')

# What variables are in the Daily Stock File?
db.describe_table('crsp', 'dsf')

Step 2: Preview the Data#

Always look at a small sample first:

# Get first 100 rows to see what we're working with
sample = db.get_table('crsp', 'dsf', obs=100)
sample.head()

Step 3: Select Specific Columns#

Based on our preview, let’s focus on the variables we need:

# Get specific columns only
data = db.raw_sql("""
    SELECT cusip, permno, date, bidlo, askhi, prc, ret
    FROM crsp.dsf 
    LIMIT 100
""", date_cols=['date'])

Step 4: Filter for Specific Securities#

Query data for specific companies using permno (CRSP’s permanent identifier):

# Query multiple stocks for a single day
data = db.raw_sql("""
    SELECT cusip, permno, date, bidlo, askhi
    FROM crsp.dsf
    WHERE permno IN (14593, 90319, 12490, 17778)
    AND date = '2013-01-04'
""", date_cols=['date'])

Step 5: Filter by Date Range and Conditions#

Add more sophisticated filters:

# Find high ask prices between 2010-2013
data = db.raw_sql("""
    SELECT cusip, permno, date, bidlo, askhi 
    FROM crsp.dsf 
    WHERE permno IN (14593, 90319, 12490, 17778) 
    AND date BETWEEN '2010-01-01' AND '2013-12-31' 
    AND askhi > 2000
""", date_cols=['date'])

Step 6: Aggregate and Analyze#

Use SQL aggregation functions:

# Find all stocks that ever had ask price > $2000
data = db.raw_sql("""
    SELECT DISTINCT permno 
    FROM crsp.dsf 
    WHERE askhi > 2000
""")

# Find the highest ask price ever recorded
data = db.raw_sql("""
    SELECT permno, askhi, date 
    FROM crsp.dsf 
    WHERE askhi > 2000 
    ORDER BY askhi DESC 
    LIMIT 1
""", date_cols=['date'])

Step 7: Work with Results in Pandas#

Once you have your data, use pandas for further analysis:

# Query returns data
returns = db.raw_sql("""
    SELECT permno, date, ret
    FROM crsp.dsf
    WHERE permno = 14593
    AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", date_cols=['date'])

# Set date as index
returns = returns.set_index('date')

# Calculate cumulative returns
returns['cum_ret'] = (1 + returns['ret']).cumprod() - 1

# Plot
returns['cum_ret'].plot(title='Apple Cumulative Returns 2023')

Key CRSP Variables#

Variable

Description

permno

Permanent security identifier

cusip

CUSIP identifier

date

Trading date

prc

Closing price (negative = bid/ask average)

ret

Holding period return

bidlo

Lowest bid price

askhi

Highest ask price

vol

Trading volume (shares)

shrout

Shares outstanding (thousands)

Example: Compustat Financial Data#

Compustat (comp) contains accounting data from financial statements.

# Query annual financial data for Apple
query = """
SELECT gvkey, datadate, conm, tic, at, sale, ni, ceq
FROM comp.funda
WHERE tic = 'AAPL'
AND datafmt = 'STD'
AND indfmt = 'INDL'
AND consol = 'C'
AND popsrc = 'D'
ORDER BY datadate
"""

aapl_financials = db.raw_sql(query, date_cols=['datadate'])

Note

The filters datafmt = 'STD', indfmt = 'INDL', consol = 'C', and popsrc = 'D' are standard Compustat filters that select standardized, consolidated, domestic data. Always include them.

Key Compustat Variables#

Variable

Description

gvkey

Global company key (Compustat identifier)

tic

Ticker symbol

conm

Company name

datadate

Fiscal period end date

at

Total assets

sale

Net sales/revenue

ni

Net income

ceq

Common equity

lt

Total liabilities

Joining Tables#

You can join multiple tables within the same library in a single query:

# Join Compustat fundamentals with monthly pricing
query = """
SELECT a.gvkey, a.datadate, a.tic, a.conm, a.at, a.lt, 
       b.prccm, b.cshoq
FROM comp.funda a
JOIN comp.secm b 
    ON a.gvkey = b.gvkey 
    AND a.datadate = b.datadate
WHERE a.tic = 'IBM' 
AND a.datafmt = 'STD' 
AND a.consol = 'C' 
AND a.indfmt = 'INDL'
"""

ibm_data = db.raw_sql(query, date_cols=['datadate'])

Warning

Joining large datasets can be slow. Always filter by date range or specific securities.

Note

Some institutions subscribe to the CRSP/Compustat Merged database, which provides a linking table to connect stock returns (CRSP) with financial statements (Compustat). Check with your instructor about which databases are available through your class subscription.

Passing Python Variables to SQL#

You can pass Python variables to your SQL queries. This is cleaner than building strings manually:

# Define parameters as a dictionary
params = {
    "tickers": ("AAPL", "MSFT", "GOOGL", "AMZN"),
    "start_date": "2020-01-01"
}

# Use %(param_name)s syntax in SQL
data = db.raw_sql("""
    SELECT gvkey, datadate, tic, at, ni
    FROM comp.funda
    WHERE tic IN %(tickers)s
    AND datafmt = 'STD'
    AND datadate >= %(start_date)s
""", params=params, date_cols=['datadate'])

Another Example#

# Pass a list of CRSP identifiers
params = {"permnos": (14593, 10107, 93436)}  # AAPL, MSFT, AMZN

data = db.raw_sql("""
    SELECT permno, date, ret 
    FROM crsp.dsf
    WHERE permno IN %(permnos)s
    AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", params=params, date_cols=['date'])

This is especially useful when your list of securities comes from another data source or earlier analysis.

Managing Your Connection#

WRDS limits you to 5 simultaneous connections. Always close your connection when done:

import wrds

# Connect
db = wrds.Connection()

# Do your work
data = db.raw_sql("SELECT date, dji FROM djones.djdaily LIMIT 10")

# Close when finished
db.close()

You can still use your data DataFrame after closing—only the database link is severed.

Resources#

WRDS Documentation#

Dataset Manuals#

Getting Help#