polars: A fast, fancy pandas alternative#
Most data folks use pandas. However, there is an alternative that I just wanted to bring to your attention. polars is a faster and, perhaps, more modern way to handle data in Python. Still, pandas is ubiquitous, so I wanted to start with that.
Here’s the user’s guide. I’m not going to go through every command here. As always, think about what you want to do. Sketch it out. Then, look for the syntax to do the job.
Why not use pandas? Here’s the author of pandas on why pandas isn’t always the best tool for data manipulation. We’re getting more advanced here, worrying about speed, being closer to the “metal”, etc.
Some people, especially those coming to Python from other languages, are suggesting that you just start with polars instead.
Coding for Economists discusses alternatives to pandas, like polars.
We can insall polars using pip the usual way. Don’t forget to use ! pip in Google Colab.
pip install polars
You’ll see my basic import statement below.
# Set-up
import polars as pl
import numpy as np
import pandas as pd
df = pl.read_csv('https://raw.githubusercontent.com/aaiken1/fin-data-analysis-python/main/data/ncbreweries.csv')
type(df)
polars.dataframe.frame.DataFrame
See what I did there? That was pl.read_csv from polars. I’ve created a polars DataFrame.
Now, you can read the manual to find out all of things that you can do!
df.describe()
| statistic | Name | City | Type | Beer Count | Est | Status | URL |
|---|---|---|---|---|---|---|---|
| str | str | str | str | f64 | f64 | str | str |
| "count" | "251" | "251" | "251" | 251.0 | 251.0 | "251" | "251" |
| "null_count" | "0" | "0" | "0" | 0.0 | 0.0 | "0" | "0" |
| "mean" | null | null | null | 32.960159 | 2012.155378 | null | null |
| "std" | null | null | null | 43.723385 | 8.749158 | null | null |
| "min" | "217 Brew Works" | "Aberdeen" | "Brewpub" | 1.0 | 1900.0 | "Active" | "https://www.ratebeer.com//brew… |
| "25%" | null | null | null | 10.0 | 2011.0 | null | null |
| "50%" | null | null | null | 18.0 | 2014.0 | null | null |
| "75%" | null | null | null | 39.0 | 2016.0 | null | null |
| "max" | "Zebulon Artisan Ales" | "Winston-Salem" | "Microbrewery" | 424.0 | 2018.0 | "Closed" | "https://www.ratebeer.com//brew… |
Looks a little different. I like it.
You can select certain columns, as well. You can filter, do “group bys”. All the usual things.
df.select(
pl.col(['Name', 'City'])
)
| Name | City |
|---|---|
| str | str |
| "217 Brew Works" | "Wilson" |
| "3rd Rock Brewing Company" | "Trenton" |
| "7 Clans Brewing" | "Cherokee" |
| "Andrews Brewing Company" | "Andrews" |
| "Angry Troll Brewing" | "Elkin" |
| … | … |
| "Sweet Taters" | "Rocky Mount" |
| "Triangle Brewing Company" | "Durham" |
| "White Rabbit Brewing (NC)" | "Angier" |
| "Williamsville Brewery (formerl… | "Farmville" |
| "Wolf Beer Company" | "Wilmington" |
df.filter(
pl.col("Beer Count").is_between(10, 100))
| Name | City | Type | Beer Count | Est | Status | URL |
|---|---|---|---|---|---|---|
| str | str | str | i64 | i64 | str | str |
| "217 Brew Works" | "Wilson" | "Microbrewery" | 10 | 2017 | "Active" | "https://www.ratebeer.com//brew… |
| "3rd Rock Brewing Company" | "Trenton" | "Microbrewery" | 12 | 2016 | "Active" | "https://www.ratebeer.com//brew… |
| "Andrews Brewing Company" | "Andrews" | "Microbrewery" | 18 | 2014 | "Active" | "https://www.ratebeer.com//brew… |
| "Appalachian Mountain Brewery" | "Boone" | "Microbrewery" | 78 | 2013 | "Active" | "https://www.ratebeer.com//brew… |
| "Archetype Brewing" | "Asheville" | "Microbrewery" | 15 | 2017 | "Active" | "https://www.ratebeer.com//brew… |
| … | … | … | … | … | … | … |
| "Heinzelmannchen Brewery" | "Sylva" | "Microbrewery" | 18 | 2005 | "Closed" | "https://www.ratebeer.com//brew… |
| "Hosanna Brewing Company" | "Fuqauy Varina" | "Brewpub" | 12 | 2013 | "Closed" | "https://www.ratebeer.com//brew… |
| "Jack of the Wood Brewpub" | "Asheville" | "Brewpub" | 13 | 2004 | "Closed" | "https://www.ratebeer.com//brew… |
| "Triangle Brewing Company" | "Durham" | "Microbrewery" | 21 | 2007 | "Closed" | "https://www.ratebeer.com//brew… |
| "White Rabbit Brewing (NC)" | "Angier" | "Microbrewery" | 19 | 2013 | "Closed" | "https://www.ratebeer.com//brew… |
df.filter(
(pl.col('Beer Count') <= 10) & (pl.col('Status') != "Closed")
)
| Name | City | Type | Beer Count | Est | Status | URL |
|---|---|---|---|---|---|---|
| str | str | str | i64 | i64 | str | str |
| "217 Brew Works" | "Wilson" | "Microbrewery" | 10 | 2017 | "Active" | "https://www.ratebeer.com//brew… |
| "7 Clans Brewing" | "Cherokee" | "Client Brewer" | 1 | 2018 | "Active" | "https://www.ratebeer.com//brew… |
| "Angry Troll Brewing" | "Elkin" | "Microbrewery" | 8 | 2017 | "Active" | "https://www.ratebeer.com//brew… |
| "Bear Creek Brews" | "Bear Creek" | "Microbrewery" | 6 | 2012 | "Active" | "https://www.ratebeer.com//brew… |
| "Beech Mountain Brewing Company" | "Beech Mountain" | "Microbrewery" | 7 | 2014 | "Active" | "https://www.ratebeer.com//brew… |
| … | … | … | … | … | … | … |
| "Valley River Brewery & Eatery" | "Murphy" | "Brewpub" | 8 | 2017 | "Active" | "https://www.ratebeer.com//brew… |
| "Vicious Fishes Brewery" | "Angier" | "Microbrewery" | 1 | 2017 | "Active" | "https://www.ratebeer.com//brew… |
| "Waterline Brewing Company" | "Wilmington" | "Microbrewery" | 6 | 2015 | "Active" | "https://www.ratebeer.com//brew… |
| "Winding Creek Brewing Company" | "Columbus" | "Microbrewery" | 9 | 2017 | "Active" | "https://www.ratebeer.com//brew… |
| "York Chester Brewing Company" | "Belmont" | "Microbrewery" | 8 | 2016 | "Active" | "https://www.ratebeer.com//brew… |
# Note: In newer versions of polars, use group_by() instead of groupby()
df.group_by("Type").count().sort(by="count", descending=True)
/var/folders/kx/y8vj3n6n5kq_d74vj24jsnh40000gn/T/ipykernel_33477/3753008177.py:2: DeprecationWarning: `GroupBy.count` was renamed; use `GroupBy.len` instead
df.group_by("Type").count().sort(by="count", descending=True)
| Type | count |
|---|---|
| str | u32 |
| "Microbrewery" | 165 |
| "Brewpub/Brewery" | 41 |
| "Brewpub" | 33 |
| "Client Brewer" | 9 |
| "Commercial Brewery" | 3 |
Not bad!