polars: A fast, fancy pandas alternative
4.8. polars: A fast, fancy pandas alternative#
Most data folks use pandas
. However, there is an alternative that I just wanted to bring to your attention. polars is a faster and, perhaps, more modern way to handle data in Python. Still, pandas
is ubiquitous, so I wanted to start with that.
Here’s the user’s guide. I’m not going to go through every command here. As always, think about what you want to do. Sketch it out. Then, look for the syntax to do the job.
Why not use pandas
? Here’s the author of pandas on why pandas
isn’t always the best tool for data manipulation. We’re getting more advanced here, worrying about speed, being closer to the “metal”, etc.
Some people, especially those coming to Python from other languages, are suggesting that you just start with polars
instead.
[Coding for Economists] discusses alternatives to pandas
, like polars
.
We can insall polars
using pip
the usual way. Don’t forget to use ! pip
in Google Colab.
pip install polars
You’ll see my basic import statement below.
# Set-up
import polars as pl
import numpy as np
import pandas as pd
df = pl.read_csv('https://raw.githubusercontent.com/aaiken1/fin-data-analysis-python/main/data/ncbreweries.csv')
type(df)
polars.dataframe.frame.DataFrame
See what I did there? That was pl.read_csv
from polars
. I’ve created a polars
DataFrame.
Now, you can read the manual to find out all of things that you can do!
df.describe()
describe | Name | City | Type | Beer Count | Est | Status | URL |
---|---|---|---|---|---|---|---|
str | str | str | str | f64 | f64 | str | str |
"count" | "251" | "251" | "251" | 251.0 | 251.0 | "251" | "251" |
"null_count" | "0" | "0" | "0" | 0.0 | 0.0 | "0" | "0" |
"mean" | null | null | null | 32.960159 | 2012.155378 | null | null |
"std" | null | null | null | 43.723385 | 8.749158 | null | null |
"min" | "217 Brew Works... | "Aberdeen" | "Brewpub" | 1.0 | 1900.0 | "Active" | "https://www.ra... |
"max" | "Zebulon Artisa... | "Winston-Salem" | "Microbrewery" | 424.0 | 2018.0 | "Closed" | "https://www.ra... |
"median" | null | null | null | 18.0 | 2014.0 | null | null |
Looks a little different. I like it.
You can select certain columns, as well. You can filter, do “group bys”. All the usual things.
df.select(
pl.col(['Name', 'City'])
)
Name | City |
---|---|
str | str |
"217 Brew Works... | "Wilson" |
"3rd Rock Brewi... | "Trenton" |
"7 Clans Brewin... | "Cherokee" |
"Andrews Brewin... | "Andrews" |
"Angry Troll Br... | "Elkin" |
"Appalachian Mo... | "Boone" |
"Archetype Brew... | "Asheville" |
"Asheville Brew... | "Asheville" |
"Ass Clown Brew... | "Cornelius" |
"Aviator Brewin... | "Fuquay Varina" |
"Barking Duck B... | "Mint Hill" |
"Barrel Culture... | "Durham" |
… | … |
"Greenshields B... | "Raleigh" |
"Hams Restauran... | "Greenville" |
"Heinzelmannche... | "Sylva" |
"High Tide Brew... | "Jacksonville" |
"Hosanna Brewin... | "Fuqauy Varina" |
"Jack of the Wo... | "Asheville" |
"Loe's Brewing ... | "Hickory" |
"Sweet Taters" | "Rocky Mount" |
"Triangle Brewi... | "Durham" |
"White Rabbit B... | "Angier" |
"Williamsville ... | "Farmville" |
"Wolf Beer Comp... | "Wilmington" |
df.filter(
pl.col("Beer Count").is_between(10, 100))
Name | City | Type | Beer Count | Est | Status | URL |
---|---|---|---|---|---|---|
str | str | str | i64 | i64 | str | str |
"217 Brew Works... | "Wilson" | "Microbrewery" | 10 | 2017 | "Active" | "https://www.ra... |
"3rd Rock Brewi... | "Trenton" | "Microbrewery" | 12 | 2016 | "Active" | "https://www.ra... |
"Andrews Brewin... | "Andrews" | "Microbrewery" | 18 | 2014 | "Active" | "https://www.ra... |
"Appalachian Mo... | "Boone" | "Microbrewery" | 78 | 2013 | "Active" | "https://www.ra... |
"Archetype Brew... | "Asheville" | "Microbrewery" | 15 | 2017 | "Active" | "https://www.ra... |
"Asheville Brew... | "Asheville" | "Brewpub" | 87 | 2003 | "Active" | "https://www.ra... |
"Aviator Brewin... | "Fuquay Varina" | "Microbrewery" | 59 | 2008 | "Active" | "https://www.ra... |
"Barking Duck B... | "Mint Hill" | "Microbrewery" | 16 | 2014 | "Active" | "https://www.ra... |
"Barrel Culture... | "Durham" | "Microbrewery" | 29 | 2017 | "Active" | "https://www.ra... |
"Bayne Brewing ... | "Cornelius" | "Microbrewery" | 16 | 2014 | "Active" | "https://www.ra... |
"BearWaters Bre... | "Canton" | "Microbrewery" | 39 | 2012 | "Active" | "https://www.ra... |
"Beer Army Comb... | "Trenton" | "Microbrewery" | 11 | 2012 | "Active" | "https://www.ra... |
… | … | … | … | … | … | … |
"Chesapeake Bay... | "Raleigh" | "Microbrewery" | 14 | 1999 | "Closed" | "https://www.ra... |
"Craggie Brewin... | "Asheville" | "Microbrewery" | 30 | 2009 | "Closed" | "https://www.ra... |
"Draft Line Bre... | "Fuquay-Varina" | "Microbrewery" | 19 | 2014 | "Closed" | "https://www.ra... |
"Four Friends B... | "Charlotte" | "Microbrewery" | 11 | 2009 | "Closed" | "https://www.ra... |
"G2B Gastropub ... | "Durham" | "Brewpub/Brewer... | 18 | 2015 | "Closed" | "https://www.ra... |
"Greenshields B... | "Raleigh" | "Microbrewery" | 15 | 1999 | "Closed" | "https://www.ra... |
"Hams Restauran... | "Greenville" | "Brewpub" | 26 | 2003 | "Closed" | "https://www.ra... |
"Heinzelmannche... | "Sylva" | "Microbrewery" | 18 | 2005 | "Closed" | "https://www.ra... |
"Hosanna Brewin... | "Fuqauy Varina" | "Brewpub" | 12 | 2013 | "Closed" | "https://www.ra... |
"Jack of the Wo... | "Asheville" | "Brewpub" | 13 | 2004 | "Closed" | "https://www.ra... |
"Triangle Brewi... | "Durham" | "Microbrewery" | 21 | 2007 | "Closed" | "https://www.ra... |
"White Rabbit B... | "Angier" | "Microbrewery" | 19 | 2013 | "Closed" | "https://www.ra... |
df.filter(
(pl.col('Beer Count') <= 10) & (pl.col('Status') != "Closed")
)
Name | City | Type | Beer Count | Est | Status | URL |
---|---|---|---|---|---|---|
str | str | str | i64 | i64 | str | str |
"217 Brew Works... | "Wilson" | "Microbrewery" | 10 | 2017 | "Active" | "https://www.ra... |
"7 Clans Brewin... | "Cherokee" | "Client Brewer" | 1 | 2018 | "Active" | "https://www.ra... |
"Angry Troll Br... | "Elkin" | "Microbrewery" | 8 | 2017 | "Active" | "https://www.ra... |
"Bear Creek Bre... | "Bear Creek" | "Microbrewery" | 6 | 2012 | "Active" | "https://www.ra... |
"Beech Mountain... | "Beech Mountain... | "Microbrewery" | 7 | 2014 | "Active" | "https://www.ra... |
"Bill's Front P... | "Wilmington" | "Brewpub/Brewer... | 10 | 2016 | "Active" | "https://www.ra... |
"Biltmore Brewi... | "Asheville" | "Client Brewer" | 4 | 2010 | "Active" | "https://www.ra... |
"BottleTree Bee... | "Tryon" | "Client Brewer" | 2 | 2010 | "Active" | "https://www.ra... |
"Bright Light B... | "Fayetteville" | "Microbrewery" | 5 | 2018 | "Active" | "https://www.ra... |
"Broomtail Craf... | "Wilmington" | "Microbrewery" | 10 | 2014 | "Active" | "https://www.ra... |
"Bull City Cide... | "Durham" | "Commercial Bre... | 9 | 2014 | "Active" | "https://www.ra... |
"Bull Durham Be... | "Durham" | "Microbrewery" | 7 | 2015 | "Active" | "https://www.ra... |
… | … | … | … | … | … | … |
"Slammin' Sam B... | "Pinehurst" | "Client Brewer" | 1 | 2012 | "Active" | "https://www.ra... |
"Southern Range... | "Monroe" | "Microbrewery" | 6 | 2016 | "Active" | "https://www.ra... |
"Tarboro Brewin... | "Tarboro" | "Microbrewery" | 8 | 2016 | "Active" | "https://www.ra... |
"Tek Mountain B... | "Wilmington" | "Microbrewery" | 7 | 2016 | "Active" | "https://www.ra... |
"The Mason Jar ... | "Fuquay Varina" | "Microbrewery" | 5 | 2017 | "Active" | "https://www.ra... |
"Thristy Souls ... | "Mount Airy" | "Brewpub/Brewer... | 10 | 2018 | "Active" | "https://www.ra... |
"Tobacco Road S... | "Raleigh" | "Brewpub" | 7 | 2017 | "Active" | "https://www.ra... |
"Valley River B... | "Murphy" | "Brewpub" | 8 | 2017 | "Active" | "https://www.ra... |
"Vicious Fishes... | "Angier" | "Microbrewery" | 1 | 2017 | "Active" | "https://www.ra... |
"Waterline Brew... | "Wilmington" | "Microbrewery" | 6 | 2015 | "Active" | "https://www.ra... |
"Winding Creek ... | "Columbus" | "Microbrewery" | 9 | 2017 | "Active" | "https://www.ra... |
"York Chester B... | "Belmont" | "Microbrewery" | 8 | 2016 | "Active" | "https://www.ra... |
df.groupby("Type").count().sort(by="count", descending=True)
Type | count |
---|---|
str | u32 |
"Microbrewery" | 165 |
"Brewpub/Brewer... | 41 |
"Brewpub" | 33 |
"Client Brewer" | 9 |
"Commercial Bre... | 3 |
Not bad!