2.10. Using ChatGPT#

There are now many AI-based tools to help us code. These tools can create code outlines for us, check our own code, write functions, and explain the steps. I’m going to show you a few examples using the free OpenAI ChapGPT tools.

If you use Github and VS Code, you can sign-up for CoPilot.

This paper by Liu, Miller, and Niu (2023) has some great examples of how to use ChatGPT in a course like this one.

Note

There’s a trade-off when using AI code assists. They will make you faster and check your syntax. They work well for simple tasks in a introductory course like this. But, at some point, you want to understand the code. You’ll need to tweak it to get it to work. And, if you’re doing more interesting coding, they may be of limited use. So - use them, but with these caveats in mind!

2.10.1. Conversations#

I think that the best way to think about using AI code assistance is through conversations. Using a tool like Github Copilot or Open AI ChatGPT, you can convert natural language to code. For example, you can code by “talking” to your notebook:

I would like to use pandas to import an Excel file from the following URL…. Skip the first 3 rows of the Excel file, since they are empty. Make sure you include any external libraries that I need.

Take my data frame called loans and drop the following columns…. Remove any text from the column called Terms and convert it into a integer.

Create a table of summary statistics for all of the numerical columns in the data frame called loans.

Each of those conversations will result in some Python code. The keys to using these tools:

  • Understand the tasks you need to complete. Pay attention to the order of tasks. Have a plan.

  • Be able to read the output. It may not work exactly they way you want it to and “tweaking” the code is sometimes faster than asking the AI tools again.

  • You still need to understand basic Python and computer science terms and logic, such as arrays and data frames. If you don’t you won’t be able to read the code or understand its output.

  • Be specific. These tools are great for common tasks. Maybe even some more difficult tasks. They will struggle with larger chains of logic, where the LLM has to make assumptions about exactly what you want.

2.10.2. An Example: Using GPT to Write a Function For You#

Let’s go to Chapter 7, where I show you how to write a function to generate asset paths using geometric brownian motion. We use the Nasdaq API to bring in some bitcoin data in that example. Now, I’m going to use ChatGPT to write the same function. I gave GPT4 the following prompt:

Write a function in Python that simulates geometric brownian motion for any number of assets. Create the appropriate inputs as arrays. The inputs should be the number of assets, the mean for each asset, the standard deviation for each asset, the number of periods, the initial value of each asset, and the number of simulations. The output should be an array that contains all of the simulations for each asset. These are uncorrelated assets, so don’t worry about covariance.

Here’s the code that it gave me:

import numpy as np

def geometric_brownian_motion(num_assets, means, std_devs, num_periods, initial_values, num_simulations):
    if len(means) != num_assets or len(std_devs) != num_assets or len(initial_values) != num_assets:
        raise ValueError("The length of means, std_devs, and initial_values should be equal to the number of assets")

    # Initialize an array to store the simulations
    simulations = np.zeros((num_simulations, num_periods + 1, num_assets))

    # Set the initial values for each simulation
    simulations[:, 0, :] = initial_values

    # Time step
    dt = 1.0

    # Loop through each simulation
    for sim in range(num_simulations):
        # Loop through each time period
        for t in range(1, num_periods + 1):
            # Loop through each asset
            for asset in range(num_assets):
                # Generate a random normal value (using mean=0 and std_dev=1) for the brownian motion
                random_normal = np.random.normal(0, 1)

                # Calculate the change in asset value using geometric brownian motion
                change = means[asset] * dt + std_devs[asset] * np.sqrt(dt) * random_normal

                # Update the asset value
                simulations[sim, t, asset] = simulations[sim, t - 1, asset] * np.exp(change)

    return simulations

# Example usage
num_assets = 3
means = np.array([0.05, 0.06, 0.07])
std_devs = np.array([0.1, 0.12, 0.14])
num_periods = 100
initial_values = np.array([100, 120, 140])
num_simulations = 1000

simulations = geometric_brownian_motion(num_assets, means, std_devs, num_periods, initial_values, num_simulations)
print(simulations)
[[[1.00000000e+02 1.20000000e+02 1.40000000e+02]
  [1.02929664e+02 1.05463766e+02 1.88347739e+02]
  [1.26298445e+02 1.15429411e+02 1.90038318e+02]
  ...
  [3.13810593e+04 9.62854679e+03 1.13608126e+04]
  [3.35416446e+04 1.07412723e+04 1.25039299e+04]
  [3.63093825e+04 1.02685077e+04 1.26789660e+04]]

 [[1.00000000e+02 1.20000000e+02 1.40000000e+02]
  [1.34586418e+02 1.49023965e+02 1.83480526e+02]
  [1.22884532e+02 1.33513847e+02 1.71822420e+02]
  ...
  [7.79701603e+03 9.76412191e+03 4.15240009e+04]
  [8.03426865e+03 9.77732865e+03 4.18765411e+04]
  [8.70305331e+03 9.79035293e+03 4.77476589e+04]]

 [[1.00000000e+02 1.20000000e+02 1.40000000e+02]
  [1.23740412e+02 1.10775749e+02 1.47609104e+02]
  [1.27686679e+02 1.24284840e+02 1.63613068e+02]
  ...
  [6.05788912e+04 1.08258874e+04 1.07767911e+06]
  [7.82820954e+04 9.89829508e+03 1.01638615e+06]
  [7.77441925e+04 1.17681785e+04 1.01079320e+06]]

 ...

 [[1.00000000e+02 1.20000000e+02 1.40000000e+02]
  [9.65061130e+01 1.73906626e+02 1.77412340e+02]
  [1.09171705e+02 2.29343378e+02 1.60437272e+02]
  ...
  [2.94910767e+04 1.07819190e+05 5.86923425e+05]
  [3.42201494e+04 1.60067287e+05 5.81312160e+05]
  [3.93512326e+04 1.70581662e+05 6.48619485e+05]]

 [[1.00000000e+02 1.20000000e+02 1.40000000e+02]
  [1.05347370e+02 1.25786943e+02 1.45148661e+02]
  [1.05093070e+02 1.60792124e+02 1.56386570e+02]
  ...
  [9.85797276e+04 6.05975929e+04 6.05000631e+05]
  [1.02473546e+05 7.55685129e+04 4.47534496e+05]
  [1.13262780e+05 8.85127220e+04 4.85552367e+05]]

 [[1.00000000e+02 1.20000000e+02 1.40000000e+02]
  [8.13095906e+01 1.34684534e+02 1.89935393e+02]
  [9.86859785e+01 1.36430551e+02 2.23001443e+02]
  ...
  [9.87015199e+03 4.05276592e+04 2.05093735e+05]
  [9.76601708e+03 4.83110767e+04 1.95330188e+05]
  [8.98441218e+03 4.96800238e+04 2.00710016e+05]]]

Here’s the GPT explanation of the code.

This function accepts the following inputs:

num_assets: the number of assets
means: a NumPy array containing the mean return for each asset
std_devs: a NumPy array containing the standard deviation of return for each asset
num_periods: the number of time periods
initial_values: a NumPy array containing the initial value for each asset
num_simulations: the number of simulations to run

The function returns a 3D NumPy array containing the simulations for each asset, with dimensions (num_simulations, num_periods + 1, num_assets).

Let’s look a bit more at the output. I’ll get its .shape attribute.

type(simulations)
numpy.ndarray
simulations.shape
(1000, 101, 3)

The output is a three-dimensional array. We can pull out all of the simulations for one asset, as well as a single simulation for one asset. Or anything else.

# All simulations for the first asset.
simulations[:,:,0]
array([[1.00000000e+02, 1.02929664e+02, 1.26298445e+02, ...,
        3.13810593e+04, 3.35416446e+04, 3.63093825e+04],
       [1.00000000e+02, 1.34586418e+02, 1.22884532e+02, ...,
        7.79701603e+03, 8.03426865e+03, 8.70305331e+03],
       [1.00000000e+02, 1.23740412e+02, 1.27686679e+02, ...,
        6.05788912e+04, 7.82820954e+04, 7.77441925e+04],
       ...,
       [1.00000000e+02, 9.65061130e+01, 1.09171705e+02, ...,
        2.94910767e+04, 3.42201494e+04, 3.93512326e+04],
       [1.00000000e+02, 1.05347370e+02, 1.05093070e+02, ...,
        9.85797276e+04, 1.02473546e+05, 1.13262780e+05],
       [1.00000000e+02, 8.13095906e+01, 9.86859785e+01, ...,
        9.87015199e+03, 9.76601708e+03, 8.98441218e+03]])
# All simulations for the second asset.
simulations[:,:,1]
array([[1.20000000e+02, 1.05463766e+02, 1.15429411e+02, ...,
        9.62854679e+03, 1.07412723e+04, 1.02685077e+04],
       [1.20000000e+02, 1.49023965e+02, 1.33513847e+02, ...,
        9.76412191e+03, 9.77732865e+03, 9.79035293e+03],
       [1.20000000e+02, 1.10775749e+02, 1.24284840e+02, ...,
        1.08258874e+04, 9.89829508e+03, 1.17681785e+04],
       ...,
       [1.20000000e+02, 1.73906626e+02, 2.29343378e+02, ...,
        1.07819190e+05, 1.60067287e+05, 1.70581662e+05],
       [1.20000000e+02, 1.25786943e+02, 1.60792124e+02, ...,
        6.05975929e+04, 7.55685129e+04, 8.85127220e+04],
       [1.20000000e+02, 1.34684534e+02, 1.36430551e+02, ...,
        4.05276592e+04, 4.83110767e+04, 4.96800238e+04]])
# First simulation for the first asset

simulations[0,:,0]
array([  100.        ,   102.92966436,   126.29844544,   125.06446567,
         125.8846719 ,   122.01999728,   127.53250484,   138.86461519,
         140.14709667,   121.51665954,   117.40928719,   125.59585797,
         134.65948361,   131.26578467,   160.72625291,   173.34587625,
         200.6082308 ,   205.80934545,   221.6795055 ,   267.77390481,
         315.33186488,   342.43948443,   397.75136654,   420.03316103,
         413.75256123,   336.72431995,   379.54642894,   360.22696672,
         416.98583138,   445.96922938,   437.78019888,   430.32205609,
         433.59812355,   563.7043857 ,   634.19149713,   647.65177431,
         744.03592062,   768.93838162,   829.49885747,   742.75018452,
         742.42475219,   692.36726696,   767.67080147,   737.28814109,
         860.63084797,   929.27159292,  1139.40639117,  1056.49158184,
        1053.40395538,  1210.08048538,  1270.54434994,  1325.35351794,
        1202.55139323,  1344.60209607,  1337.12847311,  1595.03626945,
        1838.07214469,  1933.55941212,  2014.60873846,  2069.04462498,
        2269.82899669,  2177.62086932,  2661.07384962,  2464.60986131,
        3095.2927805 ,  3224.60493706,  3194.78077101,  3242.91330503,
        3329.49918969,  3489.31369805,  3694.99750235,  4170.94396775,
        3856.39233328,  4208.02438681,  4801.77617056,  5073.14993445,
        5652.2993424 ,  6872.19153543,  6853.21345241,  8298.70885542,
        7053.93167436,  7844.45811302, 10109.38090569, 10070.03143486,
       12353.09781736, 12615.46527693, 13000.87901153, 14347.14052342,
       14892.62313947, 16486.15337724, 18060.33570179, 21537.09666169,
       20072.09544931, 22184.94800886, 26521.09016383, 28310.414426  ,
       27658.11973861, 29665.25475573, 31381.05927638, 33541.64456044,
       36309.38252996])

Compare this code to the one we used above. What’s different? What stands out to you? What assumptions did it make? What’s up with these numbers?

GPT is quite good at creating code that has been done in a lot of places. It has trained on millions of online tutorials and examples. However, you still want to follow the logic and check for errors. There will be errors.

As mentioned above, you can have a “dialog” with these tools. Look at the code. Is it what you want? You can ask it to change it for you.

What does that leave for us to do? Use GPT to start sketching your code. Figure out the questions that you’re trying answer. What’s the problem that you’re trying to solve? Who are you solving it for? What will the answer look like? What will it tell you?

Using tools like GPT, Github Co-Pilot, etc. is like having a team of junior programmers working for you. They can make the job easier, but it is still up to you to know what’s an interesting question in the first place. You still need to use your domain expertise and creativity to come up with solutions to problems that aren’t in an online Python tutorials.

2.10.3. Using ChatGPT to Check Our Code#

Below, I have borrowed some code from our portfolio optimization code. However, there is one small problem, based on one I encountered in class. One of those small, tiny errors that can be so frustrating! Do you see it?

# Read in some eod prices
import numpy as np
import pandas as pd

from pylab import mpl, plt

import scipy.optimize as sco

raw = pd.read_csv('https://raw.githubusercontent.com/aaiken1/fin-data-analysis-python/main/data/tr_eikon_eod_data.csv',
                  index_col=0, parse_dates=True).dropna() 

symbols = ['AAPL.O', 'MSFT.O', 'SPY', 'GLD'] #two stocks and two ETFs

noa = len(symbols) #noa = number of assets

data = raw[symbols]

rets = data.pct_change().dropna()

noa = 4

weights = np.random.random(noa)
weights /= np.sum(weights)

ann_rets = rets.mean() * 252

def port_ret(weights):
    return np.sum(ann_rets * weights)

def port_vol(weights):
    return np.sort(np.dot(weights.T, np.dot(rets.cov() * 252, weights)))

def sharpe(weights):  
    return port_ret(weights) / port_vol(weights)  

sharpe(weights)
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
Input In [7], in <cell line: 36>()
     33 def sharpe(weights):  
     34     return port_ret(weights) / port_vol(weights)  
---> 36 sharpe(weights)

Input In [7], in sharpe(weights)
     33 def sharpe(weights):  
---> 34     return port_ret(weights) / port_vol(weights)

Input In [7], in port_vol(weights)
     30 def port_vol(weights):
---> 31     return np.sort(np.dot(weights.T, np.dot(rets.cov() * 252, weights)))

File <__array_function__ internals>:180, in sort(*args, **kwargs)

File /opt/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:1004, in sort(a, axis, kind, order)
   1002 else:
   1003     a = asanyarray(a).copy(order="K")
-> 1004 a.sort(axis=axis, kind=kind, order=order)
   1005 return a

AxisError: axis -1 is out of bounds for array of dimension 0

Huh… Let’s see if ChatGPT can help us out. I’m going to copy and paste the code and ask it to find the mistake. It does!

../_images/17-gpt-find-error.png

Fig. 2.40 You can use ChatGPT to find errors in your code.#