WRDS (Wharton Research Data Services)#
WRDS is the gold standard for academic financial research. It provides access to dozens of research databases used in peer-reviewed finance and accounting research.
What is WRDS?#
WRDS is a data platform maintained by the Wharton School at the University of Pennsylvania. It hosts:
CRSP - Stock prices, returns, and market data back to 1926
Compustat - Financial statements for public companies
IBES - Analyst earnings forecasts
TAQ - High-frequency trade and quote data
Thomson Reuters - Institutional holdings (13F filings)
And many more…
Getting Access: Student Class Accounts#
WRDS class accounts let you work with real financial data to complete assignments. You’ll need a Class Code from your instructor.
New to WRDS? How to Enroll#
Go to the WRDS Registration form
Enter your identifying information
Select Elon University from the Subscriber drop-down
Your User type should be Class - Students with Code (selected by default)
Enter the Class Code provided by your instructor
Click Register for WRDS
Note
WRDS requires two-factor authentication. Install the Duo Mobile app on your smartphone before registering. You’ll use this for all future logins.
Already Have a WRDS Account?#
If you already have a WRDS account from a previous class:
Log in at wrds-www.wharton.upenn.edu
Go to Your Account → Your Account Info (top right)
Scroll to the Your Classes table
Click Enroll in a Class
Enter your new Class Code and click Submit
Tip
If your previous class has expired, you’ll see a message when you try to log in. Just enter your new Class Code in the box provided and confirm enrollment.
Need Help?#
If you have trouble setting up your account, contact WRDS Support. Use the email associated with your WRDS account when opening a support ticket.
Data Strategy: What’s Available and How to Combine It#
Before diving into code, think about what data you need and where to get it. WRDS is powerful, but it’s just one piece of your data toolkit. With Claude Code, you don’t need to memorize SQL syntax—focus on understanding what data exists and how to link it together.
Elon’s WRDS Subscriptions#
Database |
Library |
What It Contains |
Key Identifiers |
|---|---|---|---|
CRSP |
|
Stock prices, returns, volume (back to 1926) |
|
Compustat |
|
Financial statements (income, balance sheet, cash flow) |
|
IBES |
|
Analyst earnings forecasts and recommendations |
|
Dow Jones |
|
Dow Jones Index historical data |
|
Note
WRDS also offers many free datasets including Dow Jones Averages, Penn World Tables, and others. Use db.list_libraries() to see everything available.
The Identifier Problem#
Different databases use different identifiers for the same company:
Source |
Identifier |
Example (Apple) |
|---|---|---|
CRSP |
|
14593 |
Compustat |
|
1690 |
Compustat |
|
AAPL |
Bloomberg |
Ticker |
AAPL US Equity |
Yahoo Finance |
Ticker |
AAPL |
SEC EDGAR |
CIK |
0000320193 |
Ticker symbols are the easiest way to link datasets, but be careful:
Tickers change (Google was
GOOG, then addedGOOGL)Tickers get reused (old companies’ tickers assigned to new ones)
Different sources format tickers differently
Combining WRDS with Other Data Sources#
You’ll often need to combine WRDS data with other sources:
Data Need |
WRDS Source |
External Source |
Link By |
|---|---|---|---|
Stock returns + financials |
CRSP + Compustat |
— |
|
Fundamentals + real-time prices |
Compustat |
Yahoo Finance, Bloomberg |
Ticker symbol |
Historical returns + ESG scores |
CRSP |
Bloomberg, Refinitiv |
Ticker symbol |
Analyst forecasts + stock performance |
IBES + CRSP |
— |
|
Macro data + stock returns |
CRSP |
FRED API |
Date alignment |
Practical Workflow#
Since you have Claude Code, here’s a realistic workflow:
Pull historical data from WRDS (CRSP for returns, Compustat for financials)
Export to CSV or keep in DataFrame
Pull complementary data from Bloomberg, Yahoo Finance, or FRED
Merge in pandas using ticker symbol and date
# Example: Merge WRDS data with Yahoo Finance data
import yfinance as yf
# Get fundamentals from Compustat (via WRDS)
compustat_data = db.raw_sql("""
SELECT tic, datadate, at, ni, sale
FROM comp.funda
WHERE tic = 'AAPL' AND datafmt = 'STD'
""")
# Get recent prices from Yahoo Finance
aapl = yf.download('AAPL', start='2023-01-01')
# Now you have both historical fundamentals AND current prices
What to Ask Claude Code#
When working with WRDS, tell Claude Code:
What data you need (e.g., “daily stock returns for Apple from 2020-2023”)
Which database to use (e.g., “use CRSP for returns, Compustat for financials”)
What identifiers you have (e.g., “I have ticker symbols” or “I have permno values”)
How you want to combine data (e.g., “merge with Yahoo Finance data by ticker”)
Claude Code will handle the SQL syntax—you focus on the research question.
Set-up#
Install the WRDS Python library:
pip install wrds
Connecting to WRDS#
import wrds
# First time: will prompt for username/password
# Creates a .pgpass file for future connections
db = wrds.Connection()
After your first connection, credentials are stored locally and you won’t need to enter them again.
How WRDS Data is Organized#
Before querying data, it helps to understand WRDS’s structure:
Libraries - Top-level data providers (e.g.,
crsp,comp,djones)Tables - Datasets within each library (e.g.,
dsf,funda,djdaily)Variables - Column headers within each table (e.g.,
date,ret,prc)
Think of it like: library.table → crsp.dsf (CRSP Daily Stock File)
Discovering What’s Available#
import wrds
db = wrds.Connection()
# List all available libraries
sorted(db.list_libraries())
# List tables in a specific library
db.list_tables(library='crsp')
# Describe a table's columns (variables)
db.describe_table(library='crsp', table='dsf')
# Get the row count for a table
db.get_row_count(library='djones', table='djdaily')
Tip
You can also browse the WRDS Dataset List online to explore available data without writing code.
Warning
Library and table names must be all lowercase. Using CRSP instead of crsp will cause errors.
Querying WRDS Data#
The wrds module provides two main methods for retrieving data:
Method |
Best For |
Flexibility |
|---|---|---|
|
Simple queries, quick exploration |
Lower |
|
Complex queries, joins, filters |
Higher |
All query results are returned as pandas DataFrames, ready for analysis.
Using get_table()#
The simplest way to retrieve data—good for exploring datasets:
# Get first 10 rows of Dow Jones data
data = db.get_table(library='djones', table='djdaily', obs=10)
# Select specific columns only
data = db.get_table('djones', 'djdaily',
columns=['date', 'dji'],
obs=10)
Key parameters:
library- The library to query (required)table- The dataset to query (required)columns- List of columns to include (optional)obs- Number of observations to return (optional)
Using raw_sql()#
For more control, write SQL queries directly. This is what you’ll use for most research:
# Basic SQL query
data = db.raw_sql("""
SELECT date, dji
FROM djones.djdaily
LIMIT 10
""", date_cols=['date'])
# Query with filters
data = db.raw_sql("""
SELECT permno, date, prc, ret
FROM crsp.dsf
WHERE permno = 14593
AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", date_cols=['date'])
Key parameters:
sql- The SQL query string (required)date_cols- List of columns to parse as dates (recommended)
Important
SQL Basics:
SELECTspecifies which columns you wantFROM library.tablespecifies the data sourceWHEREfilters rows based on conditionsLIMITrestricts the number of rows returned
Always Limit Results While Developing#
WRDS datasets can be huge. Always limit your results while testing:
# With get_table(): use obs parameter
data = db.get_table('crsp', 'dsf', obs=100)
# With raw_sql(): use LIMIT clause
data = db.raw_sql("SELECT * FROM crsp.dsf LIMIT 100")
Once your query is working correctly, remove the limit to get the full dataset.
Warning
Running queries without limits on large tables (like crsp.dsf) can take a very long time and may timeout. Start small!
Example Workflow: Exploring CRSP Data#
Let’s walk through a typical research workflow using the CRSP Daily Stock File (crsp.dsf). This mirrors how you’d approach a real analysis.
Step 1: Explore the Data Structure#
import wrds
db = wrds.Connection()
# What tables are in CRSP?
db.list_tables('crsp')
# What variables are in the Daily Stock File?
db.describe_table('crsp', 'dsf')
Step 2: Preview the Data#
Always look at a small sample first:
# Get first 100 rows to see what we're working with
sample = db.get_table('crsp', 'dsf', obs=100)
sample.head()
Step 3: Select Specific Columns#
Based on our preview, let’s focus on the variables we need:
# Get specific columns only
data = db.raw_sql("""
SELECT cusip, permno, date, bidlo, askhi, prc, ret
FROM crsp.dsf
LIMIT 100
""", date_cols=['date'])
Step 4: Filter for Specific Securities#
Query data for specific companies using permno (CRSP’s permanent identifier):
# Query multiple stocks for a single day
data = db.raw_sql("""
SELECT cusip, permno, date, bidlo, askhi
FROM crsp.dsf
WHERE permno IN (14593, 90319, 12490, 17778)
AND date = '2013-01-04'
""", date_cols=['date'])
Step 5: Filter by Date Range and Conditions#
Add more sophisticated filters:
# Find high ask prices between 2010-2013
data = db.raw_sql("""
SELECT cusip, permno, date, bidlo, askhi
FROM crsp.dsf
WHERE permno IN (14593, 90319, 12490, 17778)
AND date BETWEEN '2010-01-01' AND '2013-12-31'
AND askhi > 2000
""", date_cols=['date'])
Step 6: Aggregate and Analyze#
Use SQL aggregation functions:
# Find all stocks that ever had ask price > $2000
data = db.raw_sql("""
SELECT DISTINCT permno
FROM crsp.dsf
WHERE askhi > 2000
""")
# Find the highest ask price ever recorded
data = db.raw_sql("""
SELECT permno, askhi, date
FROM crsp.dsf
WHERE askhi > 2000
ORDER BY askhi DESC
LIMIT 1
""", date_cols=['date'])
Step 7: Work with Results in Pandas#
Once you have your data, use pandas for further analysis:
# Query returns data
returns = db.raw_sql("""
SELECT permno, date, ret
FROM crsp.dsf
WHERE permno = 14593
AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", date_cols=['date'])
# Set date as index
returns = returns.set_index('date')
# Calculate cumulative returns
returns['cum_ret'] = (1 + returns['ret']).cumprod() - 1
# Plot
returns['cum_ret'].plot(title='Apple Cumulative Returns 2023')
Key CRSP Variables#
Variable |
Description |
|---|---|
|
Permanent security identifier |
|
CUSIP identifier |
|
Trading date |
|
Closing price (negative = bid/ask average) |
|
Holding period return |
|
Lowest bid price |
|
Highest ask price |
|
Trading volume (shares) |
|
Shares outstanding (thousands) |
Example: Compustat Financial Data#
Compustat (comp) contains accounting data from financial statements.
# Query annual financial data for Apple
query = """
SELECT gvkey, datadate, conm, tic, at, sale, ni, ceq
FROM comp.funda
WHERE tic = 'AAPL'
AND datafmt = 'STD'
AND indfmt = 'INDL'
AND consol = 'C'
AND popsrc = 'D'
ORDER BY datadate
"""
aapl_financials = db.raw_sql(query, date_cols=['datadate'])
Note
The filters datafmt = 'STD', indfmt = 'INDL', consol = 'C', and popsrc = 'D' are standard Compustat filters that select standardized, consolidated, domestic data. Always include them.
Key Compustat Variables#
Variable |
Description |
|---|---|
|
Global company key (Compustat identifier) |
|
Ticker symbol |
|
Company name |
|
Fiscal period end date |
|
Total assets |
|
Net sales/revenue |
|
Net income |
|
Common equity |
|
Total liabilities |
Joining Tables#
You can join multiple tables within the same library in a single query:
# Join Compustat fundamentals with monthly pricing
query = """
SELECT a.gvkey, a.datadate, a.tic, a.conm, a.at, a.lt,
b.prccm, b.cshoq
FROM comp.funda a
JOIN comp.secm b
ON a.gvkey = b.gvkey
AND a.datadate = b.datadate
WHERE a.tic = 'IBM'
AND a.datafmt = 'STD'
AND a.consol = 'C'
AND a.indfmt = 'INDL'
"""
ibm_data = db.raw_sql(query, date_cols=['datadate'])
Warning
Joining large datasets can be slow. Always filter by date range or specific securities.
Note
Some institutions subscribe to the CRSP/Compustat Merged database, which provides a linking table to connect stock returns (CRSP) with financial statements (Compustat). Check with your instructor about which databases are available through your class subscription.
Passing Python Variables to SQL#
You can pass Python variables to your SQL queries. This is cleaner than building strings manually:
# Define parameters as a dictionary
params = {
"tickers": ("AAPL", "MSFT", "GOOGL", "AMZN"),
"start_date": "2020-01-01"
}
# Use %(param_name)s syntax in SQL
data = db.raw_sql("""
SELECT gvkey, datadate, tic, at, ni
FROM comp.funda
WHERE tic IN %(tickers)s
AND datafmt = 'STD'
AND datadate >= %(start_date)s
""", params=params, date_cols=['datadate'])
Another Example#
# Pass a list of CRSP identifiers
params = {"permnos": (14593, 10107, 93436)} # AAPL, MSFT, AMZN
data = db.raw_sql("""
SELECT permno, date, ret
FROM crsp.dsf
WHERE permno IN %(permnos)s
AND date BETWEEN '2023-01-01' AND '2023-12-31'
""", params=params, date_cols=['date'])
This is especially useful when your list of securities comes from another data source or earlier analysis.
Managing Your Connection#
WRDS limits you to 5 simultaneous connections. Always close your connection when done:
import wrds
# Connect
db = wrds.Connection()
# Do your work
data = db.raw_sql("SELECT date, dji FROM djones.djdaily LIMIT 10")
# Close when finished
db.close()
You can still use your data DataFrame after closing—only the database link is severed.
Using a Context Manager (Recommended)#
Python’s with statement automatically closes the connection:
import wrds
with wrds.Connection() as db:
data = db.raw_sql("SELECT date, dji FROM djones.djdaily")
# Connection closes automatically here
Tip
If you get “too many connections” errors:
Close any open connections with
db.close()Restart your Jupyter kernel
Wait a few minutes for old connections to timeout
Resources#
WRDS Documentation#
Dataset Manuals#
Getting Help#
Python inline help:
help(db.raw_sql)orhelp(db.get_table)