{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using SQL in Python\n",
    "\n",
    "SQL stands for [Structured Query Language](https://en.wikipedia.org/wiki/SQL) and is perhaps the number one tool that any data analyst needs to know. Well, after Excel I guess. Everyone assumes that you know Excel.\n",
    "\n",
    "You can use SQL everywhere. This is the OG database language. There are many different iterations. But, they all let you create databases and do the standard ETL (extract, transform, and load) [processes](https://blog.panoply.io/sql-and-etl-the-dynamic-data-duo). They key - there are standard SQL ways to do things. If you know the terms and ideas, you can use them across a variety of platforms and languages.\n",
    "\n",
    "SQL is used for [relational, structured databases](https://en.wikipedia.org/wiki/Relational_database), where there are keys that uniquely identify observations and where the data is the traditional column/variable format. Basically, the data that we are using.\n",
    "\n",
    "```{note}\n",
    "If you're doing data science, business analytics, anything that involves something bigger than a financial model in Excel, learn SQL.\n",
    "```\n",
    "\n",
    "However, databases have changed a lot the last 25 years or so, as companies deal with more and more data with limited structure. These tools, broadly speaking, are sometimes called [NoSQL](https://en.wikipedia.org/wiki/NoSQL). This is beyond the scope of what we're doing, but would be taught in any data engineering or databases course.\n",
    "\n",
    "If you'd like to learn more about SQL, there are also some [DataCamp](https://www.datacamp.com/learn/sql) tutorials. \n",
    "\n",
    "And here's the [cheat sheet](https://res.cloudinary.com/dyd911kmh/image/upload/v1675360372/Marketing/Blog/SQL_Basics_For_Data_Science.pdf).\n",
    "\n",
    "I found this [page helpful for some syntax](https://towardsdatascience.com/pandasql-interesting-way-to-run-sql-queries-in-python-18a4fc36406a). \n",
    "\n",
    "[Coding for Economists](https://aeturrell.github.io/coding-for-economists/data-advanced.html) discusses alternatives to `pandas`.\n",
    "\n",
    "## Pandas SQL\n",
    "\n",
    "We are going to use [pandassql](https://pypi.org/project/pandasql/) to use SQL in Python. Using this method will let use use **standard SQL queries** with our `pandas` DataFrames. If you already know SQL, you might like this better!\n",
    "\n",
    "I'll use `pip` to install `pandassql`. Don't forget to use `! pip` in Google Colab.\n",
    "\n",
    "```\n",
    "pip install pandasql \n",
    "```\n",
    "\n",
    "This is a much \"lighter weight\" way to use SQL. There are others, like [pyspark](https://spark.apache.org/docs/latest/api/python/index.html) from [Apache Spark](https://spark.apache.org) that are whole set of data engineering tools.\n",
    "\n",
    "You'll see my import statement below. \n",
    "\n",
    "I'll also bring in that NC Brewery data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set-up\n",
    "\n",
    "from pandasql import sqldf\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "nc_df = pd.read_csv('https://raw.githubusercontent.com/aaiken1/fin-data-analysis-python/main/data/ncbreweries.csv')\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From here, you can use the standard set of SQL queries. I won't go over all of them. Let me give you a few examples, though, so that you can see what things look like. \n",
    "\n",
    "Let's start by turning this DataFrame into a SQL database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>City</th>\n",
       "      <th>Type</th>\n",
       "      <th>Beer Count</th>\n",
       "      <th>Est</th>\n",
       "      <th>Status</th>\n",
       "      <th>URL</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>217 Brew Works</td>\n",
       "      <td>Wilson</td>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>10</td>\n",
       "      <td>2017</td>\n",
       "      <td>Active</td>\n",
       "      <td>https://www.ratebeer.com//brewers/217-brew-wor...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3rd Rock Brewing Company</td>\n",
       "      <td>Trenton</td>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>12</td>\n",
       "      <td>2016</td>\n",
       "      <td>Active</td>\n",
       "      <td>https://www.ratebeer.com//brewers/3rd-rock-bre...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7 Clans Brewing</td>\n",
       "      <td>Cherokee</td>\n",
       "      <td>Client Brewer</td>\n",
       "      <td>1</td>\n",
       "      <td>2018</td>\n",
       "      <td>Active</td>\n",
       "      <td>https://www.ratebeer.com//brewers/7-clans-brew...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andrews Brewing Company</td>\n",
       "      <td>Andrews</td>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>18</td>\n",
       "      <td>2014</td>\n",
       "      <td>Active</td>\n",
       "      <td>https://www.ratebeer.com//brewers/andrews-brew...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angry Troll Brewing</td>\n",
       "      <td>Elkin</td>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>8</td>\n",
       "      <td>2017</td>\n",
       "      <td>Active</td>\n",
       "      <td>https://www.ratebeer.com//brewers/angry-troll-...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>246</th>\n",
       "      <td>Sweet Taters</td>\n",
       "      <td>Rocky Mount</td>\n",
       "      <td>Brewpub/Brewery</td>\n",
       "      <td>9</td>\n",
       "      <td>2016</td>\n",
       "      <td>Closed</td>\n",
       "      <td>https://www.ratebeer.com//brewers/sweet-taters...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247</th>\n",
       "      <td>Triangle Brewing Company</td>\n",
       "      <td>Durham</td>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>21</td>\n",
       "      <td>2007</td>\n",
       "      <td>Closed</td>\n",
       "      <td>https://www.ratebeer.com//brewers/triangle-bre...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>248</th>\n",
       "      <td>White Rabbit Brewing (NC)</td>\n",
       "      <td>Angier</td>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>19</td>\n",
       "      <td>2013</td>\n",
       "      <td>Closed</td>\n",
       "      <td>https://www.ratebeer.com//brewers/white-rabbit...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>249</th>\n",
       "      <td>Williamsville Brewery (formerly Wilmington Bre...</td>\n",
       "      <td>Farmville</td>\n",
       "      <td>Brewpub</td>\n",
       "      <td>5</td>\n",
       "      <td>2000</td>\n",
       "      <td>Closed</td>\n",
       "      <td>https://www.ratebeer.com//brewers/williamsvill...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>250</th>\n",
       "      <td>Wolf Beer Company</td>\n",
       "      <td>Wilmington</td>\n",
       "      <td>Client Brewer</td>\n",
       "      <td>2</td>\n",
       "      <td>2009</td>\n",
       "      <td>Closed</td>\n",
       "      <td>https://www.ratebeer.com//brewers/wolf-beer-co...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>251 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  Name         City  \\\n",
       "0                                       217 Brew Works       Wilson   \n",
       "1                             3rd Rock Brewing Company      Trenton   \n",
       "2                                      7 Clans Brewing     Cherokee   \n",
       "3                              Andrews Brewing Company      Andrews   \n",
       "4                                  Angry Troll Brewing        Elkin   \n",
       "..                                                 ...          ...   \n",
       "246                                       Sweet Taters  Rocky Mount   \n",
       "247                           Triangle Brewing Company       Durham   \n",
       "248                          White Rabbit Brewing (NC)       Angier   \n",
       "249  Williamsville Brewery (formerly Wilmington Bre...    Farmville   \n",
       "250                                  Wolf Beer Company   Wilmington   \n",
       "\n",
       "                Type  Beer Count   Est  Status  \\\n",
       "0       Microbrewery          10  2017  Active   \n",
       "1       Microbrewery          12  2016  Active   \n",
       "2      Client Brewer           1  2018  Active   \n",
       "3       Microbrewery          18  2014  Active   \n",
       "4       Microbrewery           8  2017  Active   \n",
       "..               ...         ...   ...     ...   \n",
       "246  Brewpub/Brewery           9  2016  Closed   \n",
       "247     Microbrewery          21  2007  Closed   \n",
       "248     Microbrewery          19  2013  Closed   \n",
       "249          Brewpub           5  2000  Closed   \n",
       "250    Client Brewer           2  2009  Closed   \n",
       "\n",
       "                                                   URL  \n",
       "0    https://www.ratebeer.com//brewers/217-brew-wor...  \n",
       "1    https://www.ratebeer.com//brewers/3rd-rock-bre...  \n",
       "2    https://www.ratebeer.com//brewers/7-clans-brew...  \n",
       "3    https://www.ratebeer.com//brewers/andrews-brew...  \n",
       "4    https://www.ratebeer.com//brewers/angry-troll-...  \n",
       "..                                                 ...  \n",
       "246  https://www.ratebeer.com//brewers/sweet-taters...  \n",
       "247  https://www.ratebeer.com//brewers/triangle-bre...  \n",
       "248  https://www.ratebeer.com//brewers/white-rabbit...  \n",
       "249  https://www.ratebeer.com//brewers/williamsvill...  \n",
       "250  https://www.ratebeer.com//brewers/wolf-beer-co...  \n",
       "\n",
       "[251 rows x 7 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nc = sqldf(\"SELECT * FROM nc_df\")\n",
    "\n",
    "nc"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can't tell a difference from the output, right? If you've used SQL before, though, you should start to feel at home. \n",
    "\n",
    "Here's a standard query. Again, I'm not going over all of SQL here, but you can probably read this and figure out what it is doing. I print the query at the end. It is traditional in SQL to put the **actions** in all caps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SELECT \n",
      "      Type as brewery_type, City as city, AVG(\"Beer Count\") as avg_beer_count \n",
      "      FROM \n",
      "         nc \n",
      "      GROUP BY \n",
      "         City\n",
      "      ORDER BY\n",
      "         City;\n"
     ]
    }
   ],
   "source": [
    "q = \"\"\"SELECT \n",
    "      Type as brewery_type, City as city, AVG(\"Beer Count\") as avg_beer_count \n",
    "      FROM \n",
    "         nc \n",
    "      GROUP BY \n",
    "         City\n",
    "      ORDER BY\n",
    "         City;\"\"\"\n",
    "\n",
    "print(q)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then run this query and save it to a new DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>brewery_type</th>\n",
       "      <th>city</th>\n",
       "      <th>avg_beer_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>Aberdeen</td>\n",
       "      <td>9.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>Andrews</td>\n",
       "      <td>9.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>Angier</td>\n",
       "      <td>10.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>Apex</td>\n",
       "      <td>12.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Brewpub</td>\n",
       "      <td>Arden</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>Brewpub/Brewery</td>\n",
       "      <td>West Jefferson</td>\n",
       "      <td>15.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>Whitsett</td>\n",
       "      <td>12.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>Brewpub/Brewery</td>\n",
       "      <td>Wilmington</td>\n",
       "      <td>15.357143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>Wilson</td>\n",
       "      <td>10.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>Microbrewery</td>\n",
       "      <td>Winston-Salem</td>\n",
       "      <td>50.400000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>102 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        brewery_type            city  avg_beer_count\n",
       "0       Microbrewery        Aberdeen        9.000000\n",
       "1       Microbrewery         Andrews        9.666667\n",
       "2       Microbrewery          Angier       10.000000\n",
       "3       Microbrewery            Apex       12.333333\n",
       "4            Brewpub           Arden        5.000000\n",
       "..               ...             ...             ...\n",
       "97   Brewpub/Brewery  West Jefferson       15.000000\n",
       "98      Microbrewery        Whitsett       12.000000\n",
       "99   Brewpub/Brewery      Wilmington       15.357143\n",
       "100     Microbrewery          Wilson       10.000000\n",
       "101     Microbrewery   Winston-Salem       50.400000\n",
       "\n",
       "[102 rows x 3 columns]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nc2 = sqldf(q)\n",
    "nc2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(nc2)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In short, if you have SQL that you want to run, it is easy enough to copy and paste it into Python!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "40d3a090f54c6569ab1632332b64b2c03c39dcf918b08424e98f38b5ae0af88f"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}