Packages#

Packages, like numpy and pandas, expand what Python can do. These packages also come with the Anaconda install of Python. Packages are also sometimes called libraries.

Other packages, like nasdaq-data-link, do not come with Anaconda. We are going to need to go out and get them.

This link from the developers of Python explains how to get packages installed in complete detail on both Mac and Windows machines.

pip and installing packages in VS Code#

If you are in VS Code in Github Codespaces, things are pretty straight forward. Inside of a Python code cell, you can type:

pip install package-name

where you fill in package-name. Run that cell and your Python interpreter will go out and find that package. You can also use the terminal window in you Codespace.

pip stands for “pip installs packages”. That’s a computer joke about recursion.

Note

By importing a package, you get access to all of the classes and methods that come with that package. This are important concepts from software engineering and make your code much easier to read and use. Essentially, you get built-in modularity – you don’t have to reinvent the wheel and define, say, an array every time you use them. We’ll talk a bit more about this when we get to object oriented programming and related concepts.

You will also see many packages that say to use pip to install them. This is where it can get tricky – packages have dependencies, or other packages that the package that we want is using. pip may or may not be able to install every package dependency, though it will try.

But, where is it installing the packages? Which version of Python will it be associated with? This is the tricky part and one of the frustrating aspects of Python.

In the terminal in VS Code, you can use the following to see which pip and which Python your Codespace is currently looking at.

which -a pip
which -a python

Checking what’s installed#

Before installing a package, you might want to check if it’s already there. Here are some useful commands you can run in the terminal:

See all installed packages:

pip list

This shows every package installed in your environment, along with version numbers. The list can be long!

Check if a specific package is installed:

pip show pandas

If pandas is installed, you’ll see information about it — version, location, dependencies, etc. If it’s not installed, you’ll see an error message.

Check the version of an installed package:

import pandas as pd
print(pd.__version__)

This is useful when documentation or tutorials mention a specific version.

Packages and Codespaces#

Warning

Each Codespace is a fresh environment. If you create a new Codespace for a different assignment, you may need to reinstall packages that don’t come pre-installed. This is normal! Just run pip install package-name again.

The good news: GitHub Codespaces comes with many common data science packages already installed, including numpy, pandas, matplotlib, and seaborn. You can start using these immediately with import — no installation needed.

If you get an error like ModuleNotFoundError: No module named 'package-name', that just means you need to install it first with pip install.

An example from NASDAQ#

Let’s use pip to install the data package for the NASDAQ. You can set-up an academic account with them before doing this.

With my account set-up, I can install the package by running the following in the terminal of a Codespace or in a Python cell.

pip install nasdaq-data-link

This pip method should work for all of the packages that we’re using. You can use the pip method in any Jupyter notebook, including in VS Code. But - every time you run that code cell, pip is going to try to install the package, unless you comment it out. This is why I like using the terminal or command line to install packages when needed.

GitHub Codespaces is a browser-based, remote environment that also lets you install other packages. For example, it comes with numpy and pandas already installed. In other words, it already knows about these, so you can go ahead and use import.

../_images/02-xkcd.png

Fig. 24 Managing Python installs and where everything is located is notoriously difficult. One reason why we are using Codespaces.#

Here’s a nice group of new libraries. We’ll be sticking to the basics, like pandas, in this course.