WRDS (Wharton Research Data Services)

WRDS (Wharton Research Data Services)#

WRDS is the gold standard for academic financial research. It provides access to dozens of research databases used in peer-reviewed finance and accounting research.

What is WRDS?#

WRDS is a data platform maintained by the Wharton School at the University of Pennsylvania. It hosts:

CRSP - Stock prices, returns, and market data back to 1926
Compustat - Financial statements for public companies
IBES - Analyst earnings forecasts
TAQ - High-frequency trade and quote data
Thomson Reuters - Institutional holdings (13F filings)
And many more…

Getting Access: Student Class Accounts#

WRDS class accounts let you work with real financial data to complete assignments. You’ll need a Class Code from your instructor.

New to WRDS? How to Enroll#

Go to the WRDS Registration form
Enter your identifying information
Select Elon University from the Subscriber drop-down
Your User type should be Class - Students with Code (selected by default)
Enter the Class Code provided by your instructor
Click Register for WRDS

Note

WRDS requires two-factor authentication. Install the Duo Mobile app on your smartphone before registering. You’ll use this for all future logins.

Already Have a WRDS Account?#

If you already have a WRDS account from a previous class:

Log in at wrds-www.wharton.upenn.edu
Go to Your Account → Your Account Info (top right)
Scroll to the Your Classes table
Click Enroll in a Class
Enter your new Class Code and click Submit

Tip

If your previous class has expired, you’ll see a message when you try to log in. Just enter your new Class Code in the box provided and confirm enrollment.

Need Help?#

If you have trouble setting up your account, contact WRDS Support. Use the email associated with your WRDS account when opening a support ticket.

Data Strategy: What’s Available and How to Combine It#

Before diving into code, think about what data you need and where to get it. WRDS is powerful, but it’s just one piece of your data toolkit. With Claude Code, you don’t need to memorize SQL syntax—focus on understanding what data exists and how to link it together.

Elon’s WRDS Subscriptions#

Database	Library	What It Contains	Key Identifiers
CRSP	`crsp`	Stock prices, returns, volume (back to 1926)	`permno`, `cusip`
Compustat	`comp`	Financial statements (income, balance sheet, cash flow)	`gvkey`, `tic` (ticker)
IBES	`ibes`	Analyst earnings forecasts and recommendations	`ticker`
Dow Jones	`djones`	Dow Jones Index historical data	`date`

Note

WRDS also offers many free datasets including Dow Jones Averages, Penn World Tables, and others. Use db.list_libraries() to see everything available.

The Identifier Problem#

Different databases use different identifiers for the same company:

Source	Identifier	Example (Apple)
CRSP	`permno`	14593
Compustat	`gvkey`	1690
Compustat	`tic`	AAPL
Bloomberg	Ticker	AAPL US Equity
Yahoo Finance	Ticker	AAPL
SEC EDGAR	CIK	0000320193

Ticker symbols are the easiest way to link datasets, but be careful:

Tickers change (Google was GOOG, then added GOOGL)
Tickers get reused (old companies’ tickers assigned to new ones)
Different sources format tickers differently

Combining WRDS with Other Data Sources#

You’ll often need to combine WRDS data with other sources:

Data Need	WRDS Source	External Source	Link By
Stock returns + financials	CRSP + Compustat	—	`tic` (ticker) in pandas
Fundamentals + real-time prices	Compustat	Yahoo Finance, Bloomberg	Ticker symbol
Historical returns + ESG scores	CRSP	Bloomberg, Refinitiv	Ticker symbol
Analyst forecasts + stock performance	IBES + CRSP	—	`ticker` → `tic`
Macro data + stock returns	CRSP	FRED API	Date alignment

Practical Workflow#

Since you have Claude Code, here’s a realistic workflow:

Pull historical data from WRDS (CRSP for returns, Compustat for financials)
Export to CSV or keep in DataFrame
Pull complementary data from Bloomberg, Yahoo Finance, or FRED
Merge in pandas using ticker symbol and date

# Example: Merge WRDS data with Yahoo Finance data
import yfinance as yf

# Get fundamentals from Compustat (via WRDS)
compustat_data = db.raw_sql("""
    SELECT tic, datadate, at, ni, sale
    FROM comp.funda
    WHERE tic = 'AAPL' AND datafmt = 'STD'
""")

# Get recent prices from Yahoo Finance
aapl = yf.download('AAPL', start='2023-01-01')

# Now you have both historical fundamentals AND current prices

What to Ask Claude Code#

When working with WRDS, tell Claude Code:

What data you need (e.g., “daily stock returns for Apple from 2020-2023”)
Which database to use (e.g., “use CRSP for returns, Compustat for financials”)
What identifiers you have (e.g., “I have ticker symbols” or “I have permno values”)
How you want to combine data (e.g., “merge with Yahoo Finance data by ticker”)

Claude Code will handle the SQL syntax—you focus on the research question.

Set-up#

Install the WRDS Python library:

pip install wrds

Connecting to WRDS#

import wrds

# First time: will prompt for username/password
# Creates a .pgpass file for future connections
db = wrds.Connection()

After your first connection, credentials are stored locally and you won’t need to enter them again.

How WRDS Data is Organized#

Before querying data, it helps to understand WRDS’s structure:

Libraries - Top-level data providers (e.g., crsp, comp, djones)
Tables - Datasets within each library (e.g., dsf, funda, djdaily)
Variables - Column headers within each table (e.g., date, ret, prc)

Think of it like: library.table → crsp.dsf (CRSP Daily Stock File)

Discovering What’s Available#

import wrds
db = wrds.Connection()

# List all available libraries
sorted(db.list_libraries())

# List tables in a specific library
db.list_tables(library='crsp')

# Describe a table's columns (variables)
db.describe_table(library='crsp', table='dsf')

# Get the row count for a table
db.get_row_count(library='djones', table='djdaily')

Tip

You can also browse the WRDS Dataset List online to explore available data without writing code.

Warning

Library and table names must be all lowercase. Using CRSP instead of crsp will cause errors.

Querying WRDS Data#

The wrds module provides two main methods for retrieving data:

Method	Best For	Flexibility
`get_table()`	Simple queries, quick exploration	Lower
`raw_sql()`	Complex queries, joins, filters	Higher

All query results are returned as pandas DataFrames, ready for analysis.

Using get_table()#

The simplest way to retrieve data—good for exploring datasets:

# Get first 10 rows of Dow Jones data
data = db.get_table(library='djones', table='djdaily', obs=10)

# Select specific columns only
data = db.get_table('djones', 'djdaily', 
                    columns=['date', 'dji'], 
                    obs=10)

Key parameters:

library - The library to query (required)
table - The dataset to query (required)
columns - List of columns to include (optional)
obs - Number of observations to return (optional)

Using raw_sql()#

For more control, write SQL queries directly. This is what you’ll use for most research:

# Basic SQL query
data = db.raw_sql("""
    SELECT date, dji 
    FROM djones.djdaily 
    LIMIT 10
""", date_cols=['date'])

# Query with filters
data = db.raw_sql("""
    SELECT permno, date, prc, ret
    FROM crsp.dsf
    WHERE permno = 14593
    AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", date_cols=['date'])

Key parameters:

sql - The SQL query string (required)
date_cols - List of columns to parse as dates (recommended)

Important

SQL Basics:

SELECT specifies which columns you want
FROM library.table specifies the data source
WHERE filters rows based on conditions
LIMIT restricts the number of rows returned

Always Limit Results While Developing#

WRDS datasets can be huge. Always limit your results while testing:

# With get_table(): use obs parameter
data = db.get_table('crsp', 'dsf', obs=100)

# With raw_sql(): use LIMIT clause
data = db.raw_sql("SELECT * FROM crsp.dsf LIMIT 100")

Once your query is working correctly, remove the limit to get the full dataset.

Warning

Running queries without limits on large tables (like crsp.dsf) can take a very long time and may timeout. Start small!

Example Workflow: Exploring CRSP Data#

Let’s walk through a typical research workflow using the CRSP Daily Stock File (crsp.dsf). This mirrors how you’d approach a real analysis.

Step 1: Explore the Data Structure#

import wrds
db = wrds.Connection()

# What tables are in CRSP?
db.list_tables('crsp')

# What variables are in the Daily Stock File?
db.describe_table('crsp', 'dsf')

Step 2: Preview the Data#

Always look at a small sample first:

# Get first 100 rows to see what we're working with
sample = db.get_table('crsp', 'dsf', obs=100)
sample.head()

Step 3: Select Specific Columns#

Based on our preview, let’s focus on the variables we need:

# Get specific columns only
data = db.raw_sql("""
    SELECT cusip, permno, date, bidlo, askhi, prc, ret
    FROM crsp.dsf 
    LIMIT 100
""", date_cols=['date'])

Step 4: Filter for Specific Securities#

Query data for specific companies using permno (CRSP’s permanent identifier):

# Query multiple stocks for a single day
data = db.raw_sql("""
    SELECT cusip, permno, date, bidlo, askhi
    FROM crsp.dsf
    WHERE permno IN (14593, 90319, 12490, 17778)
    AND date = '2013-01-04'
""", date_cols=['date'])

Step 5: Filter by Date Range and Conditions#

Add more sophisticated filters:

# Find high ask prices between 2010-2013
data = db.raw_sql("""
    SELECT cusip, permno, date, bidlo, askhi 
    FROM crsp.dsf 
    WHERE permno IN (14593, 90319, 12490, 17778) 
    AND date BETWEEN '2010-01-01' AND '2013-12-31' 
    AND askhi > 2000
""", date_cols=['date'])

Step 6: Aggregate and Analyze#

Use SQL aggregation functions:

# Find all stocks that ever had ask price > $2000
data = db.raw_sql("""
    SELECT DISTINCT permno 
    FROM crsp.dsf 
    WHERE askhi > 2000
""")

# Find the highest ask price ever recorded
data = db.raw_sql("""
    SELECT permno, askhi, date 
    FROM crsp.dsf 
    WHERE askhi > 2000 
    ORDER BY askhi DESC 
    LIMIT 1
""", date_cols=['date'])

Step 7: Work with Results in Pandas#

Once you have your data, use pandas for further analysis:

# Query returns data
returns = db.raw_sql("""
    SELECT permno, date, ret
    FROM crsp.dsf
    WHERE permno = 14593
    AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", date_cols=['date'])

# Set date as index
returns = returns.set_index('date')

# Calculate cumulative returns
returns['cum_ret'] = (1 + returns['ret']).cumprod() - 1

# Plot
returns['cum_ret'].plot(title='Apple Cumulative Returns 2023')

Key CRSP Variables#

Variable	Description
`permno`	Permanent security identifier
`cusip`	CUSIP identifier
`date`	Trading date
`prc`	Closing price (negative = bid/ask average)
`ret`	Holding period return
`bidlo`	Lowest bid price
`askhi`	Highest ask price
`vol`	Trading volume (shares)
`shrout`	Shares outstanding (thousands)

Example: Compustat Financial Data#

Compustat (comp) contains accounting data from financial statements.

# Query annual financial data for Apple
query = """
SELECT gvkey, datadate, conm, tic, at, sale, ni, ceq
FROM comp.funda
WHERE tic = 'AAPL'
AND datafmt = 'STD'
AND indfmt = 'INDL'
AND consol = 'C'
AND popsrc = 'D'
ORDER BY datadate
"""

aapl_financials = db.raw_sql(query, date_cols=['datadate'])

Note

The filters datafmt = 'STD', indfmt = 'INDL', consol = 'C', and popsrc = 'D' are standard Compustat filters that select standardized, consolidated, domestic data. Always include them.

Key Compustat Variables#

Variable	Description
`gvkey`	Global company key (Compustat identifier)
`tic`	Ticker symbol
`conm`	Company name
`datadate`	Fiscal period end date
`at`	Total assets
`sale`	Net sales/revenue
`ni`	Net income
`ceq`	Common equity
`lt`	Total liabilities

Joining Tables#

You can join multiple tables within the same library in a single query:

# Join Compustat fundamentals with monthly pricing
query = """
SELECT a.gvkey, a.datadate, a.tic, a.conm, a.at, a.lt, 
       b.prccm, b.cshoq
FROM comp.funda a
JOIN comp.secm b 
    ON a.gvkey = b.gvkey 
    AND a.datadate = b.datadate
WHERE a.tic = 'IBM' 
AND a.datafmt = 'STD' 
AND a.consol = 'C' 
AND a.indfmt = 'INDL'
"""

ibm_data = db.raw_sql(query, date_cols=['datadate'])

Warning

Joining large datasets can be slow. Always filter by date range or specific securities.

Note

Some institutions subscribe to the CRSP/Compustat Merged database, which provides a linking table to connect stock returns (CRSP) with financial statements (Compustat). Check with your instructor about which databases are available through your class subscription.

Passing Python Variables to SQL#

You can pass Python variables to your SQL queries. This is cleaner than building strings manually:

# Define parameters as a dictionary
params = {
    "tickers": ("AAPL", "MSFT", "GOOGL", "AMZN"),
    "start_date": "2020-01-01"
}

# Use %(param_name)s syntax in SQL
data = db.raw_sql("""
    SELECT gvkey, datadate, tic, at, ni
    FROM comp.funda
    WHERE tic IN %(tickers)s
    AND datafmt = 'STD'
    AND datadate >= %(start_date)s
""", params=params, date_cols=['datadate'])

Another Example#

# Pass a list of CRSP identifiers
params = {"permnos": (14593, 10107, 93436)}  # AAPL, MSFT, AMZN

data = db.raw_sql("""
    SELECT permno, date, ret 
    FROM crsp.dsf
    WHERE permno IN %(permnos)s
    AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", params=params, date_cols=['date'])

This is especially useful when your list of securities comes from another data source or earlier analysis.

Managing Your Connection#

WRDS limits you to 5 simultaneous connections. Always close your connection when done:

import wrds

# Connect
db = wrds.Connection()

# Do your work
data = db.raw_sql("SELECT date, dji FROM djones.djdaily LIMIT 10")

# Close when finished
db.close()

You can still use your data DataFrame after closing—only the database link is severed.

Using a Context Manager (Recommended)#

Python’s with statement automatically closes the connection:

import wrds

with wrds.Connection() as db:
    data = db.raw_sql("SELECT date, dji FROM djones.djdaily")
# Connection closes automatically here

Tip

If you get “too many connections” errors:

Close any open connections with db.close()
Restart your Jupyter kernel
Wait a few minutes for old connections to timeout

Resources#

WRDS Documentation#

Dataset Manuals#

Getting Help#

WRDS Support Contact
Python inline help: help(db.raw_sql) or help(db.get_table)

WRDS (Wharton Research Data Services)

Contents

WRDS (Wharton Research Data Services)#

What is WRDS?#

Getting Access: Student Class Accounts#

New to WRDS? How to Enroll#

Already Have a WRDS Account?#

Need Help?#

Data Strategy: What’s Available and How to Combine It#

Elon’s WRDS Subscriptions#

The Identifier Problem#

Combining WRDS with Other Data Sources#

Practical Workflow#

What to Ask Claude Code#

Set-up#

Connecting to WRDS#

How WRDS Data is Organized#

Discovering What’s Available#

Querying WRDS Data#

Using get_table()#

Using raw_sql()#

Always Limit Results While Developing#

Example Workflow: Exploring CRSP Data#

Step 1: Explore the Data Structure#

Step 2: Preview the Data#

Step 3: Select Specific Columns#

Step 4: Filter for Specific Securities#

Step 5: Filter by Date Range and Conditions#

Step 6: Aggregate and Analyze#

Step 7: Work with Results in Pandas#

Key CRSP Variables#

Example: Compustat Financial Data#

Key Compustat Variables#

Joining Tables#

Passing Python Variables to SQL#

Another Example#

Managing Your Connection#

Using a Context Manager (Recommended)#

Resources#

WRDS Documentation#

Dataset Manuals#

Getting Help#