The Basics
Contents
The Basics#
This section covers some ideas that I call CompSci 101. These are the sorts of topics that come up in any Intro to Computer Science class. Let’s cover the basics, so that we can have some idea of what’s going on with our data.
I can’t let you get out of this course without seeing some of this stuff. I’d feel bad.
Note
I am borrowing heavily from Chapter 1 of Python for Data Science and Chapter 3 of Python for Finance, 2e. You’ll also find this in Coding Basics from Coding for Economists.
There are additional resources on Datacamp, as well. Here is a DataCamp tutorial on Python data structures and here is a DataCamp tutorial on Python strings, or dealing with text.
Chapter 3 of Python for Data Analysis, 3E also covers data structures and functions.
You can find more on the basics of Python in the Python tutorial.
Lists, dictionaries, and tuples are also covered in detail in the Python tutorial.
Do not feel like you need to look at this stuff and understand everything all at once. The key is to know that these ideas and tools exist, try them, get an error message, and iterate.
Data types#
Computers think of data, or a value, as a type. For example, in Python, there are three types of numbers: integers, floats, and complex. A variable is a name that refers to a value. Python let’s you create any variable name as long as it begins with a letter or an underscore, so no numbers to start. It should also not be what is called a reserved word in Python such as for
, while
, or class
. All programming languages have special, reserved words that they don’t want us using as variable names. It would get confused.
A common metaphor is to think of a variable as a box that holds some information (like a number, a vector, or a string). We use the assignment operator =
to assign a value to a variable.
Common built-in Python data types and structures#
English name |
Type name |
Type Category |
Description |
Example |
---|---|---|---|---|
integer |
|
Numeric Type |
positive/negative whole numbers |
|
floating point number |
|
Numeric Type |
real number in decimal form |
|
boolean |
|
Boolean Values |
true or false |
|
string |
|
Sequence Type |
text |
|
list |
|
Sequence Type |
a collection of objects - mutable & ordered |
|
tuple |
|
Sequence Type |
a collection of objects - immutable & ordered |
|
dictionary |
|
Mapping Type |
mapping of key-value pairs |
|
none |
|
Null Object |
represents no value |
|
Arithmetic operators#
You can do all of the arithmetic that you’d expect.
Operator |
Description |
---|---|
|
addition |
|
subtraction |
|
multiplication |
|
division |
|
exponentiation |
|
integer division / floor division |
|
modulo |
Integers#
We can assign the value 10 to the variable a
using =
. We can then use the type
function to see what data type a
is.
a = 10
type(a)
int
Python can be used as a calculator.
1 / 4
0.25
type(1/4)
float
By the way, we just created our first variable, a
. Variable names even have rules associated with them.
Floats#
Floats are the other way Python stores numbers. The book goes into more detail about the way computers represent numbers internally, but just know that you may sometimes need to be aware of precision. See below.
b = 0.35
type(b)
float
b + 0.1
0.44999999999999996
I guess that’s close, right? By the way, run cell [7] before cell [6] and get an error. Why? The variable b
hasn’t been defined if you haven’t run cell 6. Also, click on Jupyter: Variables below. You’ll see the types and values for a
and b.
You can click restart above to clear all of the variables out of memory.
Also, operations may change one type to another. For example, an int
into a float
. Floor division will round down and retain the int
type.
type(2 / 2)
float
type(2 // 2)
int
The Modulo operator gives the remainder.
5 % 2
1
Booleans#
Booleans are True
or False
. We’ll see relational operators, like >
, <
, ==
, <=
, >=
, and !=
. We can also use and
, or
, and not
. These are all keywords, which means that we can’t use them as variable names.
We can compare objects using comparison operators, and we’ll get back a Boolean result:
Operator |
Description |
---|---|
|
is |
|
is |
|
is |
|
is |
|
is |
|
is |
|
is |
Boolean operators also evaluate to either True
or False
:
Operator |
Description |
---|---|
|
are |
|
is at least one of |
|
is |
Source: Chapter 1 of Python for Data Science
42 > 23
True
42 >= 42
True
42 == 42
True
# Nope! Need to use ==.
# 42 = 42
# Common way to say "not equal"
42 != 42
False
# Can make compound statements too. See why this is true?
(4 == 3) or (2 != 3)
True
This is also a good time to point out that Python is case sensitive.
x = 23
X = 42
print(x)
23
print(X)
42
# Nope!
# Print(x)
Strings#
Strings are text. We could spend half this semester or more just dealing with text, regular expressions, natural language programming (NLP). Our Hull textbook has a chapter on dealing with text. To start, though, we need to know that strings are a basic and essential data type across all programming languages.
You can use either '
or "
around text. This is helpful when the string has a '
in it.
# Define our string. Check the Jupyter:Variables in your VS Code! Note the size.
txt = 'elon university'
print(txt)
elon university
txt2 = "Prof Aiken's Class"
print(txt2)
Prof Aiken's Class
There are many different string methods. Being able to deal with text is a crucial part of data wrangling, or cleaning. And, text is usually part of what people refer to as unstructered data. For example, could you write code to read 10K filings? Yes! How about using the news to predict stock returns? Maybe! Lots of people are trying.
txt.capitalize()
'Elon university'
txt.split()
['elon', 'university']
txt.replace(' ', '******')
'elon******university'
Casting#
Sometimes we need to explicitly cast a value from one type to another. We can do this using functions like str()
, int()
, and float()
. Python tries to do the conversion, or throws an error if it can’t.
Data structures#
We just saw data types. There are also data structures. Of the basic data structures, I think we’ll deal with lists the most. We’ll see arrays and data frames in the next few chapters. We’ll use those two and their associated methods all of the time.
Lists#
Lists allow us to store multiple things (“elements”) in a single object. The elements are ordered. We’ll start with lists. Lists are defined with square brackets []
.
They can both hold different data types. They can even hold other lists.
my_list = [1, 2, "THREE", 4, "Elon"]
my_list
[1, 2, 'THREE', 4, 'Elon']
type(my_list)
list
We can append
something to a list, like another list. We can also extend
, insert
, and remove
items.
my_list.append([4, 3])
my_list
[1, 2, 'THREE', 4, 'Elon', [4, 3]]
my_list.extend([1.0, 1.5, 2.0])
my_list
[1, 2, 'THREE', 4, 'Elon', [4, 3], 1.0, 1.5, 2.0]
my_list.insert(1, 'insert')
my_list
[1, 'insert', 2, 'THREE', 4, 'Elon', [4, 3], 1.0, 1.5, 2.0]
my_list.remove('THREE')
my_list
[1, 'insert', 2, 4, 'Elon', [4, 3], 1.0, 1.5, 2.0]
len(my_list)
9
We can access values inside a list, tuple, or string using square bracket syntax. Python uses zero-based indexing, which means the first element of the list is in position 0, not position 1.
my_list[0]
1
We can use a :
to slice a list. Note that the start of the slice is inclusive and the end is exclusive. So, you start counting at 0… 0, 1, 2 and you get 2. Then, keep going… 3, 4, 5. The 5th element of the list is another list [4,3]. So, the 4th Element, the string “Elon”, is the last element sliced.
my_list[2:5]
[2, 4, 'Elon']
We can use negative indices to count backwards from the end of the list.
my_list[-1]
2.0
Dictionaries, sets, and tuples#
These are three other data types that we won’t use as much, but they do appear in the DataCamp assignments.
Python dictionaries are key:value pairs. They associate a key with a value, in other words. You can change them and they do not allow you to have duplicate entries. You can create one using brackets, like this:
d = {
'Name' : 'Adam Aiken',
'University' : 'Elon University',
'Department' : 'Finance',
'PhD Program' : 'Arizona State'
}
type(d)
dict
You can then print items from the dictionary, as well as show the keys and the values.
print(d['Name'], d['University'])
Adam Aiken Elon University
d.keys()
dict_keys(['Name', 'University', 'Department', 'PhD Program'])
d.values()
dict_values(['Adam Aiken', 'Elon University', 'Finance', 'Arizona State'])
d.items()
dict_items([('Name', 'Adam Aiken'), ('University', 'Elon University'), ('Department', 'Finance'), ('PhD Program', 'Arizona State')])
Python sets let you store unordered values in a single variable. There’s no relationship between the items and they are unordered. You also can’t change them, though you can add and delete items.
adam_set = {"Adam", "Aiken", "Elon", "Arizona State"}
print(adam_set)
{'Adam', 'Arizona State', 'Aiken', 'Elon'}
Finally, a Python tuple is like a set, except that the order does matter. You define these with ()
, instead of {}
.
adam_tuple = ("Adam", "Aiken", "Elon", "Arizona State")
To summarize, we’ve seen four ways to store data in Python: lists, dictionaries, sets, and tuples. We’ll use lists the most. But, we’re going to need other ways. This is where we get to numpy
arrays and pandas
DataFrames.
Syntax in Python#
Syntax, or the way you write your code, is really important. As mentioned =
and ==
are not the same thing. Python is case sensitive, as we saw.
If you’re coming from another programming language, you might have also noticed that you don’t need a semi-colon ;
to end a line. However, you can use a ;
to separate different statements on the same line.
You’ll see below that we end conditional statements with a :
.
Most importantly, we don’t use brackets in Python to tell our code what statements go with which control structure. Instead, we use indentation. Let me show you what I mean.
Control structures#
Control structures allow you to determine the flow of your code. We’ll start with conditional statements. Conditional statements make it so that only certain blocks of code will run (i.e. get executed), depending, or conditional, on the state of the code at that time (i.e. what is true). This is where if
, elif
, and else
come in. You’ve probably used something like this in Excel.
We will also see two types of loops. You can create a loop using for
that will run the code included in the loop only for values contained in a list. There are also while
loops, where the loop will run until a certain criteria, specified by the code, is met. There are subtle differences between the two. While
loops need to check boolean conditions to see if a condition is True
or False
in order to keep going. For
loops go until the end range is reached. This makes for
loops faster than while
loops – the Python compiler doesn’t have to work as hard.
In general, loops can slow down your code. Functional programming can speed things up. The book mentions this. We will get to it later.
You can find more on control structures in the Python tutorial.
Let’s start with conditional statements and the humble if
.
Conditional statements introduce if/then/else-style logic. The main points to notice:
Use keywords
if
,elif
andelse
As with
for
andwhile
, a colon:
ends each conditional expressionIndentation (by 4 empty space) defines code blocks. Very important!
In an
if
statement, the first block whose conditional statement returnsTrue
is executed and the program exits theif
blockif
statements don’t necessarily needelif
orelse
elif
lets us check several conditionselse
lets us evaluate a default block if all other conditions areFalse
the end of the entire
if
statement is where the indentation returns to the same level as the firstif
keyword
Let’s check if some numbers are even or odd. The modulo operator %
gives us the remainder from division. We’ll check and see if 7 is even or odd.
i = 7
if i % 2 == 0:
print("%d is even" % i)
else:
print("%d is odd" % i)
7 is odd
This the basic set-up for if/else. Note the format - you need those :
and the indentation. Check out the text for string replacement to see what the print("%d is even" % i)
code is doing. In short, the code is substituting i
into the string for %d
. Our text does this all of the time. Also note how there is no condition after the else
. You do this when the logic above is false
.
Note
When creating more complicated control structures, I suggest going step-by-step on a piece of paper. What does the computer “know” at any point in the sequence? What are the values of your variables? What will it do? Does it do what you expect it to do?
Let’s turn to loops. We’ll also put some if
, elif
, and else
logic inside of a loop.
Each line of code has some logic. For example, we are using for element in num_list[0:3]:
below in our first example. Let’s parse that:
for
means that Python is going to work across a certain number of elements from something, like alist
.element
is going to represent an item from thelist
, like a single integer. But, it doesn’t have to be an integer.num_list
is our list. In this case, we areslicing
to only use three elements: 0, 1, and 2. Remember, slicing is inclusive of the first element and exclusive of the last.We then end the line of logic with a
:
. This is really important.for
,while
,if
,elif
, andelse
all need to end with a:
.Indentation matters in Python. To do the indentation, you want to hit tab. The indentation tells Python which lines of code go with which lines of logic. See our examples below and in the text.
Note
You can find more on loops and functions in Chapter 2 of Python Programming for Data Science.
num_list = [1,2,3,4,5]
num_list
[1, 2, 3, 4, 5]
Here’s a basic for
loop. Note the indexing on the list. It starts at the 0th element (the 1st item) and goes up to the 3rd element (the 4th item), but doesn’t include it, and stops.
for element in num_list[0:3]:
print(element ** 2)
1
4
9
Each item in the list gets put into the variable element. That number is then squared and printed. The for
loop then moves on to the next item in the list. The loop will exit when the last item in the list is reached.
As noted above, you can use elif
to test multiple conditions. This example, from Chapter 3 of Python for Finance, 2e, uses a range
function to create the numbers 1 through 9 (not 10 though!) and then test two conditions and ending with the else
. The range
acts like a loop that we have now combined with if/else statements.
for i in range(1, 10):
if i % 2 == 0:
print("%d is even" % i)
elif i % 3 == 0:
print("%d is multiple of 3" % i)
else:
print("%d is odd" % i)
1 is odd
2 is even
3 is multiple of 3
4 is even
5 is odd
6 is even
7 is odd
8 is even
9 is multiple of 3
Again, note the :
to end each line control structure, as well as the four-space indentation. Try deleting the indentation before the first if
and running this code. What happens?
for
loops are one type of look. We can also use while
loops. These are slightly different in their logic – check out the graphic above.
Here’s a simple example of a while
loop.
i = 0
while i < 4:
print(i)
i += 1
0
1
2
3
The variable i starts at 0, gets printed, and then has 1 added to it. The loop then returns to the top and is evaluated again. The loop will exit when i = 4.
Let’s look at a while loop and some if-else logic together. We again see the use of the print
function. I am taking the integer number and casting it as a string to be included in the print
output. The +
operator with strings means concatenation in Python.
# Take user input
number = 2
# Condition of the while loop
while number < 5:
# Find the mod of 2
if number%2 == 0:
print("The number "+str(number)+" is even")
else:
print("The number "+str(number)+" is odd")
# Increment `number` by 1
number = number + 1
The number 2 is even
The number 3 is odd
The number 4 is even
I will again point out the :
and the indentation. If you’re control structures are getting your error messages, those are the first two things to check.
Functions and functional programming#
I just want to introduce the idea of functions and functional programming. You can, of course, write your own functions in Python. Functions take input and give you an output.
See the example below for the basic syntax. You call the function with function()
. In the example, I define a function square
that takes an argument, or input, x
and raises it to the power of 2.
Always pick a good name for you functions!
Note
This is just a first look at writing functions. We’ll do more later.
Like with control structures we have a :
after the first line defining the function. We then have ** four-space indentation** to indicate what code is part of the function.
You can again find more at this DataCamp tutorial.
def square(x):
return x ** 2
square(2)
4
We can also print the Fibonacci Sequence up to some term n
, which can be an input into your function.
def fib(n):
a = 0
b = 1
if n == 1:
print(a)
else:
print(a)
print(b)
for i in range(2,n):
c = a + b
a = b
b = c
print(c)
fib(10)
0
1
1
2
3
5
8
13
21
34
Functional programming is a way of telling the computer what to do in an efficient manner. This is the world of lambda functions, and map()
, filter()
, reduce()
.
Printing and formatting#
We’ve used the print
function a few times now. Something like this:
i = 3
print("%d is odd" % i)
3 is odd
This code is printing text inside of the quotes. The %
is a placeholder for a variable. What variable? The one that comes after the %
inside of the print
function. In this case, that variable is called i.
What is the d
doing next to the %
? This is not a variable. This is telling Python how to format the variable inside of the string. d
stands for integer. We can also format numbers as floats and specify decimal places. For example,
i = 42.23456
print("%.2f is a float" % i)
42.23 is a float
The Python tutorial has much more on how to format output.