Backtesting with the BT Package#
These notes are based, in part, on Chapter 3.3 of Efficiently Inefficient by Lasse Pedersen. It is a nice introduction to more sophisticated investing strategies, like quantitative equity and fixed income arbitrage strategies. I provide many more resources below.
Quantitative strategies, regardless of security type, usually have a small amount of alpha per trade. These are not “big swings”, like betting against housing in 2007. The model identifies entry and exit points across a large number of securities, leading to significant diversification, or at least that’s the hope.
If you can identify something in the data, then you can trade on it. This might be traditional measures, like value and momentum. Other accounting data gets used to create measures like earnings quality. You can try to quantify management quality. You can look at flow in and out of a security, or trade volume.
Using the BT Package#
I am going to walk through the basic example using the BT package. You should check out their support page for more. From their page:
bt is a flexible backtesting framework for Python used to test quantitative trading strategies. Backtesting is the process of testing a strategy over a given data set. This framework allows you to easily create strategies that mix and match different Algos. It aims to foster the creation of easily testable, re-usable and flexible blocks of strategy logic to facilitate the rapid development of complex trading strategies. The goal: to save quants from re-inventing the wheel and let them focus on the important part of the job - strategy development
You will need to install the BT package using pip.
pip install bt
The example comes from the author’s github page.
Note that the bt package uses the ffn package, which contains a lot of nice finance-helper functions written by the same author. For example, we’re going to use it to get data below. You can also read more about it.
When we bring in bt, we are also bringing in ffn.
There are many, many options that you can include as you develop your strategy. If you want details, check out this page.
# The usual type of set-up.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import bt as bt
import ffn as ffn
# Include this to have plots show up in your Jupyter notebook.
%matplotlib inline
# As of Dec 2022, looks like yfinance broke the ffn/bt data import. Add this to get it to work. See https://github.com/pmorissette/ffn/issues/185
import yfinance as yf
Step 1 - Get your data#
The first step is to get the appropriate data. Now, we are going to bring in some basic stock and bond data. This is silly, but instructive. You’ll likely spend a lot of time here dealing with APIs to bring in more unusual data. We’re going to look at a strategy that just uses prices. What about fundamental data, like earnings quality? How about text? This is why the data engineering steps are so important.
Note that ffn.get works as well to get data from Yahoo! Finance.
tickers = ['SPY', 'AGG']
data = yf.download(tickers, start='2010-01-01')
data = data['Close'].copy()
data = data.dropna()
data.head()
[ 0% ]
[*********************100%***********************] 2 of 2 completed
| Ticker | AGG | SPY |
|---|---|---|
| Date | ||
| 2010-01-04 | 66.067169 | 85.027962 |
| 2010-01-05 | 66.367699 | 85.253021 |
| 2010-01-06 | 66.329369 | 85.313049 |
| 2010-01-07 | 66.252693 | 85.673164 |
| 2010-01-08 | 66.290993 | 85.958275 |
Step 2 - Define your strategy#
The bt package has you create a strategy object. This object has a name and contains various characteristics for your strategy, such as how often to run your strategy, the securities to select from your data, your weighting scheme, and your rebalancing frequency.
The code example below runs monthly, selects all of the securities from the data given to it, weights the securities equally, and then rebalances to get back to those target weights.
Our strategy is going to be called ew, since this strategy equally weights all of the securities each month. Not much of an algo there! It then gets saved as a strategy object also called ew.
Note that this is not a “buy-and-hold” strategy, since we are rebalancing back to equally weighted each month. There’s a bt.algos.RunOnce for that.
ew = bt.Strategy('ew', [bt.algos.RunMonthly(),
bt.algos.SelectAll(),
bt.algos.WeighEqually(),
bt.algos.Rebalance()])
print(type(ew))
<class 'bt.core.Strategy'>
Now, note that there’s no real trading strategy yet. We haven’t included any trading logic. What is an algorithm, or algo, in quant trading? Basically, it is a series of rules to get you a boolean (1/0) output. Do you include this security in your portfolio? Do you exclude it? Or, you can add a -1 outcome and ask if you short it.
What do I mean by rules? This is the trading strategy! Does the security have positive trend? Momentum? Meet some quality measure? Is it being impacted by retail trading flow and will mean revert in the next hour? Are you predicting that the firm will be a target of a shareholder activist? These are all rules that might lead you to go long or short the security.
Let’s create some actual, simple trading logic. I’ll follow the moving average example from the bt author. We can then compare strategies.
I’ll use a 50-day moving average, like the author. We’ll then compare our two securities to their moving average. If the price is above the moving average, let’s call that positive trend and buy. If not, we’ll exclude the security from the portfolio. Simple boolean logic.
Again, this is a very simple strategy on a stock and a bond index! We’ll make things more interesting later in this chapter.
sma = data.rolling(50).mean()
What did I just create? Let’s look.
sma
| Ticker | AGG | SPY |
|---|---|---|
| Date | ||
| 2010-01-04 | NaN | NaN |
| 2010-01-05 | NaN | NaN |
| 2010-01-06 | NaN | NaN |
| 2010-01-07 | NaN | NaN |
| 2010-01-08 | NaN | NaN |
| ... | ... | ... |
| 2026-03-05 | 100.007900 | 688.298997 |
| 2026-03-06 | 100.020520 | 688.049996 |
| 2026-03-09 | 100.039540 | 687.856196 |
| 2026-03-10 | 100.047574 | 687.592196 |
| 2026-03-11 | 100.046943 | 687.283396 |
4071 rows × 2 columns
See how the first rows are missing? This is a 50-day moving average, so you don’t get an average until 50 days have passed.
The bt package has merge and plot methods. The package is pulling them from pandas and matplotlib. This just makes things easier for us.
plot = bt.merge(data, sma).plot(figsize=(15, 5))
Now, let’s use that Boolean logic to create our signal. When do we buy? When the price is above the 50-day moving average. Just compare the two DataFrames and get a new one full of True and False values! We’ll call that DataFrame signal.
signal = data > sma
print(type(signal))
signal
<class 'pandas.core.frame.DataFrame'>
| Ticker | AGG | SPY |
|---|---|---|
| Date | ||
| 2010-01-04 | False | False |
| 2010-01-05 | False | False |
| 2010-01-06 | False | False |
| 2010-01-07 | False | False |
| 2010-01-08 | False | False |
| ... | ... | ... |
| 2026-03-05 | True | False |
| 2026-03-06 | True | False |
| 2026-03-09 | True | False |
| 2026-03-10 | True | False |
| 2026-03-11 | False | False |
4071 rows × 2 columns
We can now create a new strategy with that logic. The new strategy is called above50sma and is saved in the sma50 object. Notice the new first line that has the .SelectWhere method. This is going to select securities where signal == True.
sma50 = bt.Strategy('above50sma', [bt.algos.SelectWhere(signal),
bt.algos.WeighEqually(),
bt.algos.Rebalance()])
Take a look at this discussion on strategy trees for more on how to use bt to combine different algorithms.
A real strategy might incorporate hundreds of signals from all types of data sources.
Step 3 - Backtest your strategy#
Once you have your strategy object, you can backtest it on your data. You feed the function bt.Backtest your strategy and your data. This creates a test object. Then, you use bt.run on that object to get your results.
One nice feature of bt: you can run multiple backtests at once and compare them side by side. Let’s do that with our equal-weight and moving average strategies.
test_ew = bt.Backtest(ew, data)
test_sma50 = bt.Backtest(sma50, data)
# Run both backtests together
res = bt.run(test_ew, test_sma50)
print(type(res))
0%| | 0/2 [00:00<?, ?it/s]
50%|█████ | 1/2 [00:05<00:05, 5.19s/it]
100%|██████████| 2/2 [00:10<00:00, 5.50s/it]
100%|██████████| 2/2 [00:10<00:00, 5.45s/it]
<class 'bt.backtest.Result'>
Note the arguments for the backtest function. You give it the strategy and your data. Remember, your data are prices for the securities. The backtest uses these prices to generate returns. We pass both backtests into bt.run together so that we can compare them easily.
With the results object, we can plot the value of both strategies (starting at 100) and look at some descriptive statistics side by side.
res.plot(figsize=(12, 5), title='Strategy Comparison: Equal Weight vs. 50-Day SMA');
res.display()
Stat ew above50sma
------------------- ---------- ------------
Start 2010-01-03 2010-01-03
End 2026-03-11 2026-03-11
Risk-free rate 0.00% 0.00%
Total Return 263.79% 113.83%
Daily Sharpe 0.95 0.64
Daily Sortino 1.49 0.95
CAGR 8.31% 4.81%
Max Drawdown -19.87% -21.94%
Calmar Ratio 0.42 0.22
MTD -1.50% -1.37%
3m -0.77% -5.11%
6m 1.97% -3.93%
YTD -0.44% -4.21%
1Y 13.65% 7.03%
3Y (ann.) 12.82% 10.33%
5Y (ann.) 6.60% 3.63%
10Y (ann.) 8.37% 5.10%
Since Incep. (ann.) 8.31% 4.81%
Daily Sharpe 0.95 0.64
Daily Sortino 1.49 0.95
Daily Mean (ann.) 8.38% 5.01%
Daily Vol (ann.) 8.80% 7.78%
Daily Skew -0.54 -0.91
Daily Kurt 14.30 10.07
Best Day 5.08% 3.48%
Worst Day -6.63% -4.35%
Monthly Sharpe 1.06 0.63
Monthly Sortino 1.95 1.05
Monthly Mean (ann.) 8.45% 5.03%
Monthly Vol (ann.) 7.97% 7.99%
Monthly Skew -0.35 -0.52
Monthly Kurt 1.06 1.03
Best Month 7.57% 6.68%
Worst Month -6.68% -7.97%
Yearly Sharpe 0.94 0.46
Yearly Sortino 2.12 0.84
Yearly Mean 8.10% 4.60%
Yearly Vol 8.65% 9.99%
Yearly Skew -1.41 -1.13
Yearly Kurt 2.39 0.47
Best Year 19.64% 16.05%
Worst Year -15.30% -17.78%
Avg. Drawdown -0.86% -1.22%
Avg. Drawdown Days 14.86 34.68
Avg. Up Month 1.88% 1.89%
Avg. Down Month -1.93% -1.72%
Win Year % 81.25% 75.00%
Win 12m % 91.30% 76.63%
Let’s also look at drawdowns. A drawdown measures the decline from a portfolio’s peak value. This is one of the most important risk measures – investors feel losses more than gains. The maximum drawdown tells you the worst peak-to-trough decline over the backtest period. We can calculate drawdowns from the strategy’s price series.
# Calculate and plot drawdowns manually
# Drawdown = (current price - running max) / running max
drawdowns = res.prices / res.prices.cummax() - 1
fig, ax = plt.subplots(figsize=(12, 5))
drawdowns.plot(ax=ax)
ax.set_title('Strategy Drawdowns')
ax.set_ylabel('Drawdown (%)')
ax.legend()
plt.tight_layout()
plt.show()
We can also look at the security weights over time. This shows you how the portfolio allocation changes at each rebalance. For the equal-weight strategy, you’ll see it stays roughly 50/50. For the SMA strategy, securities drop out when they fall below their moving average.
weights = res.get_security_weights()
weights = weights.iloc[1:]
fig, ax = plt.subplots(figsize=(12, 8))
weights.plot.area(ax=ax)
ax.set_title('Security Weights')
ax.set_ylabel('Weight')
ax.legend(loc='best');
You’d have been much better off equally weighting the two indices and rebalancing each month! The SMA strategy spent a lot of time out of the market, missing rallies, and that hurt performance.
But this was a very simple example with just two securities. Let’s make things more realistic and explore more of what the bt package can do.
Backtesting Pitfalls#
Before we build more strategies, let’s talk about what can go wrong. Backtesting is seductive – you can always find a strategy that looks great on historical data. The hard part is finding one that works going forward. Here are the most common pitfalls.
Look-Ahead Bias#
Look-ahead bias occurs when your backtest uses information that would not have been available at the time of the trade. For example, if you use tomorrow’s closing price to make today’s trading decision, your backtest will look fantastic, but you’d never be able to replicate it in real time. This also happens with fundamental data – quarterly earnings aren’t available on the last day of the quarter. They’re reported weeks later.
The bt package helps with this by design. When you create a signal using historical prices (like our moving average), you’re only using past data. But be careful when constructing your own signals outside of bt – it’s easy to accidentally peek into the future.
Survivorship Bias#
Survivorship bias happens when your data set only includes securities that still exist today. If a stock went bankrupt in 2015, it won’t show up in today’s Yahoo Finance data. Your backtest never sees the losers, making performance look better than it actually would have been. This is a huge problem with stock data.
For our examples using ETFs like SPY, this isn’t as big a deal – these are diversified indices. But if you were backtesting a strategy that picks individual stocks, you’d need a database that includes delisted securities. That kind of data costs money.
Overfitting#
Overfitting is the cardinal sin of backtesting. If you try enough parameter combinations, you’ll find something that works beautifully on your historical data but fails on new data. Did you try a 50-day moving average and it didn’t work, so you tried 47 days and that looked better? Congratulations, you may have just overfit your model.
A good rule of thumb: the simpler the strategy, the less likely it is to be overfit. Strategies with fewer parameters are more robust. If your strategy requires 15 precisely tuned parameters, you’re probably fitting noise, not signal.
Transaction Costs#
Every trade costs money. There’s the bid-ask spread (you buy at the ask and sell at the bid), commissions (smaller these days, but not zero for all brokers), and market impact (your trade moves the price, especially in less liquid markets). A strategy that rebalances daily with 50 securities will generate a lot of trades. Even small transaction costs compound quickly and can turn a profitable backtest into a losing real-world strategy.
We’ll see how to add transaction costs to bt backtests below.
Data Snooping#
Related to overfitting, data snooping or p-hacking happens when you test many strategies on the same data set. If you test 100 strategies, about 5 will “work” at the 5% significance level purely by chance. The academic literature has grappled with this problem extensively – see the Cam Harvey papers referenced above.
Regime Changes#
Markets change over time. Interest rates, volatility, correlations, regulations, and market microstructure all evolve. A strategy calibrated to the low-volatility, low-rate environment of 2012-2019 may not work in a rising rate environment. This is why out-of-sample testing is so important – and why you should be skeptical of any backtest that only covers a single market regime.
A More Realistic Example: Trend Following with Sector ETFs#
Let’s build something more interesting. We’ll use a set of sector ETFs and apply a trend-following strategy. This is closer to what a real quantitative strategy might look like, at least in spirit. The universe is larger, we’ll compare multiple signal definitions, and we’ll consider transaction costs.
Our investment universe will be the major U.S. sector ETFs. These are liquid, have long histories, and each represents a different part of the economy. A trend-following strategy will try to identify which sectors have positive momentum and overweight or buy those, while avoiding sectors in a downtrend.
# Sector ETFs - these cover the major S&P 500 sectors
sector_tickers = ['XLK', # Technology
'XLF', # Financials
'XLV', # Health Care
'XLE', # Energy
'XLY', # Consumer Discretionary
'XLP', # Consumer Staples
'XLI', # Industrials
'XLU', # Utilities
'XLB', # Materials
'XLRE'] # Real Estate (started in 2015)
sector_data = yf.download(sector_tickers, start='2015-01-01')
sector_data = sector_data['Close'].copy()
# Drop any rows with NaN values (incomplete trading days)
sector_data = sector_data.dropna()
print(f"Data shape: {sector_data.shape}")
print(f"Date range: {sector_data.index[0].strftime('%Y-%m-%d')} to {sector_data.index[-1].strftime('%Y-%m-%d')}")
sector_data.head()
[ 0% ]
[********** 20% ] 2 of 10 completed
[************** 30% ] 3 of 10 completed
[******************* 40% ] 4 of 10 completed
[******************* 40% ] 4 of 10 completed
[**********************60%**** ] 6 of 10 completed
[**********************70%********* ] 7 of 10 completed
[**********************80%************* ] 8 of 10 completed
[**********************90%****************** ] 9 of 10 completed
[*********************100%***********************] 10 of 10 completed
Data shape: (2620, 10)
Date range: 2015-10-08 to 2026-03-11
| Ticker | XLB | XLE | XLF | XLI | XLK | XLP | XLRE | XLU | XLV | XLY |
|---|---|---|---|---|---|---|---|---|---|---|
| Date | ||||||||||
| 2015-10-08 | 17.947430 | 22.906584 | 15.673985 | 44.282917 | 18.358349 | 37.508656 | 21.185572 | 15.831942 | 57.437981 | 34.710800 |
| 2015-10-09 | 17.939297 | 22.758039 | 15.573471 | 44.432743 | 18.438360 | 37.592281 | 21.150509 | 15.756383 | 57.699337 | 34.751045 |
| 2015-10-12 | 17.784895 | 22.464231 | 15.586875 | 44.441063 | 18.460587 | 37.691082 | 21.283751 | 15.896710 | 57.851082 | 34.916481 |
| 2015-10-13 | 17.715822 | 22.229847 | 15.466256 | 43.966599 | 18.420582 | 37.463058 | 21.150509 | 15.860726 | 57.126053 | 34.728706 |
| 2015-10-14 | 17.858030 | 22.421316 | 15.338930 | 43.492149 | 18.385021 | 37.029831 | 21.150509 | 15.857132 | 57.016460 | 34.375435 |
Strategy 1: Buy-and-Hold Benchmark#
Every strategy comparison needs a benchmark. The simplest benchmark is buy-and-hold: buy equal weights of all sectors on day one and never rebalance. This is your baseline. If your fancy trading strategy can’t beat buy-and-hold, why bother?
In bt, we use bt.algos.RunOnce() to set up a buy-and-hold strategy. The algo runs only once at the start, selects all securities, weights them equally, and then sits there.
# Buy-and-hold: invest once, never rebalance
benchmark = bt.Strategy('buy_and_hold', [bt.algos.RunOnce(),
bt.algos.SelectAll(),
bt.algos.WeighEqually(),
bt.algos.Rebalance()])
Strategy 2: Monthly Rebalanced Equal Weight#
Like our earlier example, but across all sectors. Rebalance to equal weight each month. This is an active strategy in the sense that it sells winners and buys losers each month to maintain equal weights – a mild contrarian tilt.
# Monthly rebalanced equal weight
ew_monthly = bt.Strategy('ew_monthly', [bt.algos.RunMonthly(),
bt.algos.SelectAll(),
bt.algos.WeighEqually(),
bt.algos.Rebalance()])
Strategy 3: SMA Crossover Trend Following#
Now for our actual trading strategy. This is a classic moving average crossover: buy a sector when its short-term moving average crosses above its long-term moving average. This is a stronger trend signal than just comparing the price to a single moving average, because the short-term average smooths out daily noise.
We’ll use a 50-day and 200-day moving average. When the 50-day crosses above the 200-day, the sector is in an uptrend (sometimes called a “golden cross” in technical analysis). When it crosses below, the sector is in a downtrend (“death cross”). This is one of the oldest technical indicators out there.
Sectors that pass the filter get equally weighted. Sectors that don’t pass get excluded – the strategy holds cash for that portion instead.
# Calculate short and long moving averages
sma_short = sector_data.rolling(50).mean()
sma_long = sector_data.rolling(200).mean()
# Signal: short MA above long MA = uptrend
crossover_signal = sma_short > sma_long
# Let's look at the signal for one ETF to see what's happening
print("XLK trend signal (last 10 rows):")
print(crossover_signal['XLK'].tail(10))
XLK trend signal (last 10 rows):
Date
2026-02-26 True
2026-02-27 True
2026-03-02 True
2026-03-03 True
2026-03-04 True
2026-03-05 True
2026-03-06 True
2026-03-09 True
2026-03-10 True
2026-03-11 True
Name: XLK, dtype: bool
Let’s visualize the moving averages and signal for one sector to see what’s going on. When the blue 50-day line is above the orange 200-day line, that’s our buy signal.
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(sector_data['XLK'], label='XLK Price', alpha=0.5, linewidth=0.8)
ax.plot(sma_short['XLK'], label='50-Day SMA', linewidth=1.2)
ax.plot(sma_long['XLK'], label='200-Day SMA', linewidth=1.2)
# Shade the regions where we'd be invested
ax.fill_between(crossover_signal.index,
ax.get_ylim()[0], ax.get_ylim()[1],
where=crossover_signal['XLK'],
alpha=0.1, color='green', label='Long Signal')
ax.set_title('XLK: 50/200 Day Moving Average Crossover')
ax.set_ylabel('Price ($)')
ax.legend(loc='upper left')
plt.tight_layout()
plt.show()
Now let’s build the bt strategy. We’ll rebalance monthly – we don’t need to trade every day just because we have a daily signal. Monthly rebalancing is more realistic and keeps transaction costs lower. The RunMonthly algo checks the signal at the end of each month.
# SMA crossover: only buy sectors in an uptrend (50-day > 200-day)
sma_crossover = bt.Strategy('sma_crossover', [bt.algos.RunMonthly(),
bt.algos.SelectWhere(crossover_signal),
bt.algos.WeighEqually(),
bt.algos.Rebalance()])
Strategy 4: Momentum (Return-Based Signal)#
Let’s try a different kind of trend signal. Instead of moving averages, we’ll use past returns directly. This is a momentum strategy: buy the sectors that have done well over the past few months. We’ll rank sectors by their trailing 6-month return and only hold the top half.
Why the top half? In a pure long-only context, we want to concentrate on winners. In a long-short strategy, you’d go long the top and short the bottom. We’ll keep it long-only here.
The bt package has SelectMomentum for this – it ranks securities by their trailing return over a lookback period and selects the top n. We use SelectAll first to give it the full universe to rank from.
# How many sectors do we want to hold? Top half.
n_sectors = len(sector_data.columns)
n_hold = n_sectors // 2
print(f"Universe: {n_sectors} sectors, holding top {n_hold}")
# Momentum strategy: buy the top N sectors by trailing 6-month return
# SelectAll first gives SelectMomentum the full universe to rank
momentum = bt.Strategy('momentum', [bt.algos.RunMonthly(),
bt.algos.SelectAll(),
bt.algos.SelectMomentum(n=n_hold,
lookback=pd.DateOffset(months=6)),
bt.algos.WeighEqually(),
bt.algos.Rebalance()])
Universe: 10 sectors, holding top 5
Running All Strategies#
Now let’s run all four strategies and compare them. This is where bt really shines – you can run multiple backtests in a single call and get side-by-side comparisons.
# Create backtests for all strategies
test_bh = bt.Backtest(benchmark, sector_data)
test_ew_m = bt.Backtest(ew_monthly, sector_data)
test_cross = bt.Backtest(sma_crossover, sector_data)
test_mom = bt.Backtest(momentum, sector_data)
# Run them all at once
sector_res = bt.run(test_bh, test_ew_m, test_cross, test_mom)
0%| | 0/4 [00:00<?, ?it/s]
25%|██▌ | 1/4 [00:05<00:17, 5.81s/it]
50%|█████ | 2/4 [00:13<00:13, 6.91s/it]
75%|███████▌ | 3/4 [00:19<00:06, 6.55s/it]
100%|██████████| 4/4 [00:24<00:00, 5.95s/it]
100%|██████████| 4/4 [00:24<00:00, 6.16s/it]
sector_res.plot(figsize=(12, 6), title='Sector ETF Strategy Comparison');
sector_res.display()
Stat buy_and_hold ew_monthly sma_crossover momentum
------------------- -------------- ------------ --------------- ----------
Start 2015-10-07 2015-10-07 2015-10-07 2015-10-07
End 2026-03-11 2026-03-11 2026-03-11 2026-03-11
Risk-free rate 0.00% 0.00% 0.00% 0.00%
Total Return 229.63% 231.51% 178.43% 183.42%
Daily Sharpe 0.76 0.77 0.69 0.68
Daily Sortino 1.17 1.18 1.04 1.05
CAGR 12.12% 12.18% 10.32% 10.51%
Max Drawdown -35.71% -36.88% -34.67% -35.83%
Calmar Ratio 0.34 0.33 0.30 0.29
MTD -2.75% -3.27% -3.30% -3.65%
3m 1.80% 4.31% 3.62% 2.48%
6m 5.55% 7.08% 5.74% 4.43%
YTD 2.61% 4.79% 4.08% 3.28%
1Y 20.29% 18.70% 14.79% 15.16%
3Y (ann.) 17.12% 16.12% 13.26% 13.14%
5Y (ann.) 11.31% 11.62% 10.12% 9.15%
10Y (ann.) 12.48% 12.53% 10.78% 10.98%
Since Incep. (ann.) 12.12% 12.18% 10.32% 10.51%
Daily Sharpe 0.76 0.77 0.69 0.68
Daily Sortino 1.17 1.18 1.04 1.05
Daily Mean (ann.) 12.93% 12.95% 11.18% 11.42%
Daily Vol (ann.) 17.03% 16.83% 16.29% 16.67%
Daily Skew -0.49 -0.58 -0.59 -0.45
Daily Kurt 19.01 19.74 20.40 20.61
Best Day 10.04% 10.18% 9.71% 10.62%
Worst Day -12.05% -11.95% -11.81% -11.98%
Monthly Sharpe 0.86 0.85 0.79 0.77
Monthly Sortino 1.49 1.49 1.39 1.37
Monthly Mean (ann.) 12.31% 12.40% 10.81% 11.06%
Monthly Vol (ann.) 14.34% 14.55% 13.72% 14.30%
Monthly Skew -0.49 -0.46 -0.28 -0.36
Monthly Kurt 1.09 1.58 1.49 0.79
Best Month 11.82% 12.99% 13.36% 11.82%
Worst Month -13.41% -14.63% -12.42% -12.44%
Yearly Sharpe 0.94 1.00 0.92 1.01
Yearly Sortino 3.07 4.43 9.26 4.29
Yearly Mean 12.09% 12.03% 10.25% 10.38%
Yearly Vol 12.93% 12.00% 11.16% 10.32%
Yearly Skew -0.61 -0.14 0.79 -0.06
Yearly Kurt -0.01 -0.15 -0.26 0.01
Best Year 29.44% 31.37% 31.14% 27.26%
Worst Year -12.17% -7.49% -3.51% -8.03%
Avg. Drawdown -1.80% -1.77% -1.76% -2.36%
Avg. Drawdown Days 18.24 18.08 18.98 24.35
Avg. Up Month 3.20% 3.29% 3.15% 3.42%
Avg. Down Month -3.59% -3.43% -2.84% -2.95%
Win Year % 81.82% 81.82% 81.82% 81.82%
Win 12m % 88.70% 89.57% 88.70% 92.17%
Let’s look at the drawdowns. This tells us about the worst periods for each strategy. A key question: does the trend-following strategy protect you during drawdowns? That’s one of the main selling points of trend following – it gets you out before the big losses.
# Calculate drawdowns for sector strategies
sector_dd = sector_res.prices / sector_res.prices.cummax() - 1
fig, ax = plt.subplots(figsize=(12, 6))
sector_dd.plot(ax=ax)
ax.set_title('Strategy Drawdowns')
ax.set_ylabel('Drawdown (%)')
ax.legend()
plt.tight_layout()
plt.show()
Now let’s look at how the portfolio weights change over time for the SMA crossover strategy. This shows which sectors the strategy is invested in at each point. When a sector drops below the moving average crossover, it gets excluded and the remaining sectors share the portfolio equally.
weights = sector_res.get_security_weights()
weights = weights.iloc[1:]
fig, ax = plt.subplots(figsize=(12, 10))
weights.plot.area(ax=ax)
ax.set_title('Security Weights')
ax.set_ylabel('Weight')
ax.legend(loc='best');
Adding Transaction Costs#
So far, our backtests assume zero transaction costs. That’s not realistic. Every time you buy or sell, you incur costs. For liquid ETFs, these costs are small but not zero. A reasonable estimate might be 10 basis points (0.10%) per trade for ETFs, accounting for bid-ask spreads and any commissions. For less liquid securities, costs could be much higher.
The bt.Backtest function accepts a commissions argument. This is specified as a function that takes the quantity and price and returns the commission cost. We can use a simple percentage-based cost.
# Define a simple commission function: 10 bps (0.10%) per trade
def commission_10bps(q, p):
return abs(q) * p * 0.001
# Need new strategy objects -- bt doesn't let you reuse them
benchmark_tc = bt.Strategy('buy_and_hold', [bt.algos.RunOnce(), bt.algos.SelectAll(),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
ew_monthly_tc = bt.Strategy('ew_monthly', [bt.algos.RunMonthly(), bt.algos.SelectAll(),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
sma_crossover_tc = bt.Strategy('sma_crossover', [bt.algos.RunMonthly(),
bt.algos.SelectWhere(crossover_signal),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
momentum_tc = bt.Strategy('momentum', [bt.algos.RunMonthly(), bt.algos.SelectAll(),
bt.algos.SelectMomentum(n=n_hold,
lookback=pd.DateOffset(months=6)),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
# Re-run our strategies with transaction costs
test_bh_tc = bt.Backtest(benchmark_tc, sector_data, commissions=commission_10bps)
test_ew_tc = bt.Backtest(ew_monthly_tc, sector_data, commissions=commission_10bps)
test_cross_tc = bt.Backtest(sma_crossover_tc, sector_data, commissions=commission_10bps)
test_mom_tc = bt.Backtest(momentum_tc, sector_data, commissions=commission_10bps)
sector_res_tc = bt.run(test_bh_tc, test_ew_tc, test_cross_tc, test_mom_tc)
0%| | 0/4 [00:00<?, ?it/s]
25%|██▌ | 1/4 [00:06<00:18, 6.05s/it]
50%|█████ | 2/4 [00:13<00:13, 7.00s/it]
75%|███████▌ | 3/4 [00:19<00:06, 6.56s/it]
100%|██████████| 4/4 [00:24<00:00, 5.86s/it]
100%|██████████| 4/4 [00:24<00:00, 6.13s/it]
sector_res_tc.plot(figsize=(12, 6), title='Strategy Comparison with 10 bps Transaction Costs');
sector_res_tc.display()
Stat buy_and_hold ew_monthly sma_crossover momentum
------------------- -------------- ------------ --------------- ----------
Start 2015-10-07 2015-10-07 2015-10-07 2015-10-07
End 2026-03-11 2026-03-11 2026-03-11 2026-03-11
Risk-free rate 0.00% 0.00% 0.00% 0.00%
Total Return 229.29% 230.12% 169.01% 169.05%
Daily Sharpe 0.76 0.77 0.67 0.65
Daily Sortino 1.17 1.18 1.01 1.00
CAGR 12.11% 12.14% 9.96% 9.96%
Max Drawdown -35.71% -36.88% -34.67% -35.83%
Calmar Ratio 0.34 0.33 0.29 0.28
MTD -2.75% -3.27% -3.32% -3.69%
3m 1.80% 4.30% 3.55% 2.31%
6m 5.55% 7.06% 5.64% 4.13%
YTD 2.61% 4.78% 4.01% 3.11%
1Y 20.28% 18.66% 14.32% 14.41%
3Y (ann.) 17.12% 16.08% 12.83% 12.51%
5Y (ann.) 11.30% 11.59% 9.73% 8.58%
10Y (ann.) 12.48% 12.49% 10.40% 10.41%
Since Incep. (ann.) 12.11% 12.14% 9.96% 9.96%
Daily Sharpe 0.76 0.77 0.67 0.65
Daily Sortino 1.17 1.18 1.01 1.00
Daily Mean (ann.) 12.92% 12.91% 10.85% 10.92%
Daily Vol (ann.) 17.03% 16.83% 16.30% 16.67%
Daily Skew -0.49 -0.58 -0.59 -0.45
Daily Kurt 19.01 19.74 20.38 20.59
Best Day 10.04% 10.18% 9.71% 10.62%
Worst Day -12.05% -11.95% -11.81% -11.98%
Monthly Sharpe 0.86 0.85 0.76 0.74
Monthly Sortino 1.49 1.48 1.34 1.30
Monthly Mean (ann.) 12.31% 12.36% 10.47% 10.56%
Monthly Vol (ann.) 14.34% 14.55% 13.70% 14.30%
Monthly Skew -0.49 -0.46 -0.28 -0.35
Monthly Kurt 1.09 1.58 1.48 0.78
Best Month 11.82% 12.99% 13.36% 11.82%
Worst Month -13.41% -14.63% -12.42% -12.44%
Yearly Sharpe 0.94 1.00 0.88 0.96
Yearly Sortino 3.07 4.40 7.93 3.77
Yearly Mean 12.09% 12.00% 9.91% 9.86%
Yearly Vol 12.93% 12.00% 11.22% 10.27%
Yearly Skew -0.61 -0.14 0.80 -0.08
Yearly Kurt -0.01 -0.15 -0.23 0.06
Best Year 29.44% 31.33% 31.07% 26.58%
Worst Year -12.17% -7.52% -3.89% -8.66%
Avg. Drawdown -1.81% -1.79% -1.78% -2.46%
Avg. Drawdown Days 18.36 18.32 19.85 25.66
Avg. Up Month 3.20% 3.29% 3.12% 3.42%
Avg. Down Month -3.59% -3.43% -2.86% -2.93%
Win Year % 81.82% 81.82% 81.82% 81.82%
Win 12m % 88.70% 89.57% 87.83% 90.43%
Compare these results to the zero-cost versions above. Transaction costs hurt the more active strategies (momentum, SMA crossover) more than buy-and-hold. This is a fundamental insight: the more you trade, the more you pay. Your strategy needs to generate enough alpha to overcome those costs. The buy-and-hold strategy barely notices, since it only trades once.
In-Sample vs. Out-of-Sample Testing#
One of the most important concepts in backtesting is the distinction between in-sample and out-of-sample data.
In-sample (training): the data you use to develop and calibrate your strategy. You look at the data, spot patterns, and design your rules.
Out-of-sample (testing): data that you set aside and don’t look at until your strategy is finalized. This tells you how the strategy would have performed on data it’s never seen.
Why does this matter? Because it’s easy to design a strategy that works on data you’ve already seen. That’s just curve fitting. The real test is whether your strategy generalizes to new data.
Let’s split our sector data in half and see how our strategies perform in each period. We’ll use the first half to “develop” our strategy (in-sample) and the second half to test it (out-of-sample). Note that our simple moving average strategy doesn’t have parameters that we optimized on the in-sample data – we picked 50 and 200 days based on convention, not because we searched for the best values. But even so, it’s instructive to see how performance differs across time periods.
# Split data roughly in half
midpoint = sector_data.index[len(sector_data) // 2]
print(f"Split date: {midpoint.strftime('%Y-%m-%d')}")
print(f"In-sample: {sector_data.index[0].strftime('%Y-%m-%d')} to {midpoint.strftime('%Y-%m-%d')}")
print(f"Out-of-sample: {midpoint.strftime('%Y-%m-%d')} to {sector_data.index[-1].strftime('%Y-%m-%d')}")
data_in = sector_data.loc[:midpoint].copy()
data_out = sector_data.loc[midpoint:].copy()
Split date: 2020-12-21
In-sample: 2015-10-08 to 2020-12-21
Out-of-sample: 2020-12-21 to 2026-03-11
We need to redefine our strategies for each sub-period. The bt package requires new strategy objects for each backtest (you can’t reuse the same strategy object). Note that for the SMA crossover strategy, the signal is recalculated from the data in each sub-period.
# Recalculate signals for each sub-period
signal_in = data_in.rolling(50).mean() > data_in.rolling(200).mean()
signal_out = data_out.rolling(50).mean() > data_out.rolling(200).mean()
# In-sample strategies
bh_in = bt.Strategy('buy_and_hold', [bt.algos.RunOnce(), bt.algos.SelectAll(),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
cross_in = bt.Strategy('sma_crossover', [bt.algos.RunMonthly(), bt.algos.SelectWhere(signal_in),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
# Out-of-sample strategies
bh_out = bt.Strategy('buy_and_hold', [bt.algos.RunOnce(), bt.algos.SelectAll(),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
cross_out = bt.Strategy('sma_crossover', [bt.algos.RunMonthly(), bt.algos.SelectWhere(signal_out),
bt.algos.WeighEqually(), bt.algos.Rebalance()])
# Run backtests
res_in = bt.run(bt.Backtest(bh_in, data_in), bt.Backtest(cross_in, data_in))
res_out = bt.run(bt.Backtest(bh_out, data_out), bt.Backtest(cross_out, data_out))
0%| | 0/2 [00:00<?, ?it/s]
50%|█████ | 1/2 [00:02<00:02, 2.87s/it]
100%|██████████| 2/2 [00:05<00:00, 2.81s/it]
100%|██████████| 2/2 [00:05<00:00, 2.82s/it]
0%| | 0/2 [00:00<?, ?it/s]
50%|█████ | 1/2 [00:02<00:02, 2.88s/it]
100%|██████████| 2/2 [00:05<00:00, 2.72s/it]
100%|██████████| 2/2 [00:05<00:00, 2.74s/it]
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
res_in.plot(ax=axes[0])
axes[0].set_title('In-Sample Performance')
res_out.plot(ax=axes[1])
axes[1].set_title('Out-of-Sample Performance')
plt.tight_layout()
plt.show()
print("=== In-Sample Results ===")
res_in.display()
print("\n=== Out-of-Sample Results ===")
res_out.display()
=== In-Sample Results ===
Stat buy_and_hold sma_crossover
------------------- -------------- ---------------
Start 2015-10-07 2015-10-07
End 2020-12-21 2020-12-21
Risk-free rate 0.00% 0.00%
Total Return 79.76% 56.92%
Daily Sharpe 0.70 0.59
Daily Sortino 1.04 0.86
CAGR 11.92% 9.04%
Max Drawdown -35.71% -34.67%
Calmar Ratio 0.33 0.26
MTD 1.41% 0.98%
3m 12.09% 10.18%
6m 18.60% 21.65%
YTD 10.49% 6.72%
1Y 10.96% 7.14%
3Y (ann.) 10.18% 9.35%
5Y (ann.) 12.60% 9.43%
10Y (ann.) - -
Since Incep. (ann.) 11.92% 9.04%
Daily Sharpe 0.70 0.59
Daily Sortino 1.04 0.86
Daily Mean (ann.) 13.01% 10.16%
Daily Vol (ann.) 18.58% 17.27%
Daily Skew -0.77 -0.79
Daily Kurt 23.61 29.38
Best Day 10.04% 9.71%
Worst Day -12.05% -11.81%
Monthly Sharpe 0.83 0.74
Monthly Sortino 1.35 1.18
Monthly Mean (ann.) 11.99% 9.60%
Monthly Vol (ann.) 14.39% 13.02%
Monthly Skew -0.65 -0.61
Monthly Kurt 2.69 2.85
Best Month 11.82% 10.15%
Worst Month -13.41% -12.42%
Yearly Sharpe 1.01 0.81
Yearly Sortino 4.74 6.36
Yearly Mean 12.86% 9.98%
Yearly Vol 12.74% 12.35%
Yearly Skew -0.53 0.49
Yearly Kurt 1.18 -1.47
Best Year 28.74% 26.74%
Worst Year -6.07% -3.51%
Avg. Drawdown -1.91% -1.61%
Avg. Drawdown Days 18.81 17.88
Avg. Up Month 2.80% 2.77%
Avg. Down Month -3.77% -2.32%
Win Year % 80.00% 80.00%
Win 12m % 96.15% 86.54%
=== Out-of-Sample Results ===
Stat buy_and_hold sma_crossover
------------------- -------------- ---------------
Start 2020-12-20 2020-12-20
End 2026-03-11 2026-03-11
Risk-free rate 0.00% 0.00%
Total Return 89.63% 38.29%
Daily Sharpe 0.91 0.50
Daily Sortino 1.47 0.78
CAGR 13.04% 6.41%
Max Drawdown -17.56% -16.89%
Calmar Ratio 0.74 0.38
MTD -2.69% -3.30%
3m 5.04% 3.62%
6m 8.13% 5.74%
YTD 5.75% 4.08%
1Y 19.88% 14.79%
3Y (ann.) 15.74% 13.26%
5Y (ann.) 11.57% 6.70%
10Y (ann.) - -
Since Incep. (ann.) 13.04% 6.41%
Daily Sharpe 0.91 0.50
Daily Sortino 1.47 0.78
Daily Mean (ann.) 13.40% 7.28%
Daily Vol (ann.) 14.77% 14.47%
Daily Skew -0.14 -0.25
Daily Kurt 6.45 6.34
Best Day 7.56% 6.89%
Worst Day -6.32% -5.90%
Monthly Sharpe 0.90 0.52
Monthly Sortino 1.73 0.95
Monthly Mean (ann.) 12.96% 7.10%
Monthly Vol (ann.) 14.32% 13.67%
Monthly Skew -0.31 0.15
Monthly Kurt -0.04 1.43
Best Month 10.52% 13.36%
Worst Month -9.37% -10.41%
Yearly Sharpe 0.99 1.15
Yearly Sortino 5.67 9.32
Yearly Mean 11.46% 5.65%
Yearly Vol 11.61% 4.89%
Yearly Skew 0.46 0.22
Yearly Kurt 1.74 1.11
Best Year 30.59% 13.27%
Worst Year -4.95% -1.48%
Avg. Drawdown -1.73% -2.21%
Avg. Drawdown Days 17.35 26.48
Avg. Up Month 3.63% 3.47%
Avg. Down Month -3.36% -2.38%
Win Year % 83.33% 83.33%
Win 12m % 84.91% 83.02%
Look at the Sharpe ratios and total returns across the two periods. Does the SMA crossover strategy perform consistently? Or does it do well in one period and poorly in another? This kind of analysis is essential before putting real money behind a strategy. If performance is wildly different across periods, your strategy may be capturing noise rather than a real signal.
In practice, more sophisticated approaches use walk-forward analysis: you train on a rolling window of data, test on the next period, then roll forward. This gives you a series of out-of-sample results that better reflect how the strategy would have performed in real time.
Working with BT Results#
The bt results object contains a lot of useful data beyond the summary statistics. You can access the underlying return series, look at individual strategy stats as DataFrames, and create custom analyses.
Accessing Returns and Prices#
You can pull out the strategy’s daily prices and returns as pandas objects for your own analysis.
# The .prices attribute gives you the daily portfolio value (indexed to 100)
prices = sector_res.prices
print(type(prices))
prices.tail()
<class 'pandas.core.frame.DataFrame'>
| buy_and_hold | ew_monthly | sma_crossover | momentum | |
|---|---|---|---|---|
| 2026-03-05 | 334.078618 | 336.501165 | 282.621794 | 286.749326 |
| 2026-03-06 | 330.070466 | 333.166376 | 279.821003 | 284.422649 |
| 2026-03-09 | 331.929283 | 334.341190 | 280.807624 | 285.353957 |
| 2026-03-10 | 330.659170 | 332.845379 | 279.551334 | 283.412666 |
| 2026-03-11 | 329.627855 | 331.513076 | 278.432352 | 283.424268 |
# Get the stats as a DataFrame for custom analysis
stats_df = sector_res.stats
stats_df.loc[['total_return', 'cagr', 'daily_sharpe', 'max_drawdown', 'daily_vol']]
| buy_and_hold | ew_monthly | sma_crossover | momentum | |
|---|---|---|---|---|
| total_return | 2.296279 | 2.315131 | 1.784324 | 1.834243 |
| cagr | 0.12121 | 0.121824 | 0.103204 | 0.105086 |
| daily_sharpe | 0.759433 | 0.769503 | 0.686576 | 0.684898 |
| max_drawdown | -0.357105 | -0.368802 | -0.34666 | -0.358264 |
| daily_vol | 0.170263 | 0.168333 | 0.16289 | 0.166687 |
Rolling Sharpe Ratio#
A strategy’s Sharpe ratio can change dramatically over time. Looking at a rolling Sharpe ratio helps you understand whether the strategy’s risk-adjusted performance is stable or if it goes through long stretches of underperformance. Let’s calculate a 1-year rolling Sharpe for each strategy using the daily returns.
# Calculate daily returns from the strategy prices
daily_returns = sector_res.prices.pct_change().dropna()
# Rolling 1-year (252 trading days) Sharpe ratio
rolling_sharpe = (daily_returns.rolling(252).mean() / daily_returns.rolling(252).std()) * np.sqrt(252)
fig, ax = plt.subplots(figsize=(12, 5))
rolling_sharpe.plot(ax=ax, linewidth=0.8)
ax.axhline(y=0, color='black', linestyle='--', linewidth=0.5)
ax.set_title('Rolling 1-Year Sharpe Ratio')
ax.set_ylabel('Sharpe Ratio')
ax.legend(loc='lower left')
plt.tight_layout()
plt.show()
Notice how much the Sharpe ratio bounces around. A strategy with a great overall Sharpe might have long periods where it’s negative. This is the reality of trading – even good strategies go through painful drawdowns. The question is whether you have the discipline (and the capital) to stick with it.
Return Distributions#
Let’s look at how the monthly returns are distributed for each strategy. This gives you a sense of the shape of returns – are they symmetric? Do they have fat tails? Strategies with similar average returns can have very different distributions.
# Monthly returns for each strategy
monthly_returns = sector_res.prices.resample('ME').last().pct_change().dropna()
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
for ax, col in zip(axes.flat, monthly_returns.columns):
ax.hist(monthly_returns[col], bins=30, edgecolor='black', alpha=0.7)
ax.set_title(col)
ax.set_xlabel('Monthly Return')
ax.axvline(x=0, color='red', linestyle='--', linewidth=0.8)
plt.suptitle('Distribution of Monthly Returns', y=1.02)
plt.tight_layout()
plt.show()
Key Takeaways#
Let’s summarize what we’ve learned about backtesting.
Always compare to a benchmark. It doesn’t matter if your strategy makes money in absolute terms. What matters is whether it makes money relative to doing nothing (or something simple, like buy-and-hold). If your complex strategy can’t beat equal weight, it’s not worth the complexity.
Transaction costs matter. A strategy that looks great with zero costs may look mediocre or bad once you account for real-world trading frictions. Always include realistic cost assumptions.
Test out-of-sample. In-sample performance tells you how well your strategy fits the past. Out-of-sample performance tells you whether it might work in the future. The gap between the two is a measure of how much you’ve overfit.
Simpler is usually better. Strategies with fewer parameters are harder to overfit and more likely to work going forward. A 50/200-day moving average crossover has essentially zero free parameters (those numbers are industry convention, not optimized).
Look beyond total return. Sharpe ratio, maximum drawdown, turnover, and how performance varies over time all matter. A strategy that returns 15% per year with a 50% max drawdown is very different from one that returns 10% with a 15% max drawdown.
Be skeptical. Most backtested strategies don’t work in real life. Survivorship bias, look-ahead bias, transaction costs, and regime changes all conspire against you. If a strategy seems too good to be true, it probably is.
Other Tools#
The bt package is just one of many backtesting frameworks out there. I chose it because the declarative, algo-tree approach lets you focus on what a strategy does rather than wiring up an event loop. That makes it a good fit for learning. But it’s worth knowing what else is available.
Zipline#
Zipline was originally built by Quantopian, a now-defunct platform that let retail investors build and test trading strategies. It is now maintained as zipline-reloaded. Zipline uses an event-driven architecture — your code processes data bar-by-bar, giving you much more granular control over order types, slippage models, and execution logic. The trade-off is a steeper learning curve and heavier dependencies. Related tools from the same ecosystem include alphalens for analyzing sources of risk and return, and pyfolio for creating performance tear sheets.
Backtrader#
Backtrader is another event-driven framework, similar in philosophy to Zipline. It is more flexible than bt for complex order logic and supports live trading connections to brokers like Interactive Brokers. It has good documentation and an active community, but the API is more complex.
Backtesting.py#
There is also a package simply called Backtesting. It is lightweight and easy to get started with, though less feature-rich than the others.
How Do They Compare?#
Feature |
bt |
Zipline |
Backtrader |
|---|---|---|---|
Approach |
Declarative (algo trees) |
Event-driven |
Event-driven |
Best for |
Asset allocation, strategy comparison |
Granular trade simulation |
Complex strategies, live trading |
Learning curve |
Low |
High |
Medium |
Live trading |
No |
No |
Yes (Interactive Brokers) |
Trade-level control |
Limited |
Extensive |
Extensive |
For this course, bt is the right choice — the simple API lets us focus on the ideas behind backtesting rather than implementation details. If you move into more serious quantitative work, Zipline or Backtrader give you the control you would need.
Trading firms and hedge funds will have developed most of this in-house. For example, Goldman Sachs has their own tools.