2. Introduction to Python: Lists, Tuples, Dictionaries, Arrays#
Variables#
Containers for store data values. This variable has a label or name. This variable is an object in memory.
# define a variable
x = 50
# prints the specified message to the screen
print(x)
50
# In a Jupyter Notebook environment, will automatically display the result of the last expression in the cell without the need for using print
x
50
#return the type of data stored in the object
type(x)
int
Python Data Types#
There are four key python data types: int, float, string and boolean.
Integers: They consist of positive or negative whole numbers (without fractions or decimals).
Float: This type represents real numbers with a floating-point notation, specified by a decimal point.
String: It is a collection of one or more characters enclosed in single, double, or triple quotes.
Boolean: Objects equal to True are truthy and those equal to False are falsy.
a = 30
type(a)
int
b = 30.3
type(b)
float
c = "30"
type(c)
str
d = False
type(d)
bool
String#
A string is a sequence of characters.
String Exploration#
name = 'Jose Felipe'
print(name)
Jose Felipe
# Function len() returns the length of a string, including all spaces
len(name)
11
# Indexing operation: returns the character at the specified position in the string
name[3]
'e'
# First character (in python, indexing starts at 0)
name[0]
'J'
# Last character (in python, negative indexing is used to access elements from the end of a sequence)
name[-1]
'e'
# Slicing 2nd to 5th character (extract a portion of the string)
# In Python, slicing includes the starting index (0 in this case) but excludes the ending index (3 in this case).
print(name[0:3])
Jos
Other String Operations#
# Using "" is useful when you have '' into the string, and vice versa
my_string1 = " la casa rosada 'SBA' "
print(my_string1)
my_string2 = ' la casa rosada "SBA" '
print(my_string2)
la casa rosada 'SBA'
la casa rosada "SBA"
my_string = ''' Hello$%&15 '''
print(my_string , "\n") # "\n" is used to create a new line
Hello$%&15
course_name = str_1 + str_2 + str_3
course_name # concatenation of the strings
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[17], line 1
----> 1 course_name = str_1 + str_2 + str_3
2 course_name # concatenation of the strings
NameError: name 'str_1' is not defined
str_1 = "The name"
str_2 = "of the course"
str_3 = "is Introduction to Python"
print(str_1, str_2, str_3) # it does not concatenate the strings; instead, it separates them with spaces when printing
The name of the course is Introduction to Python
dir = "C:/Users/Alexander/Dropbox/Pakistan_ID_DIME/Data/Data_requests/"
print(dir + "data_1")
print(dir + "data_2")
C:/Users/Alexander/Dropbox/Pakistan_ID_DIME/Data/Data_requests/data_1
C:/Users/Alexander/Dropbox/Pakistan_ID_DIME/Data/Data_requests/data_2
Bool#
The Python Boolean type is one of Python’s built-in data types. It’s used to represent the truth value of an expression. For example, the expression 1 <= 2 is True, while the expression 0 == 1 is False.
print(10 == 9)
False
response = 10 > 9
response
True
# True is considered equivalent to the integer value 1 in a boolean context
print(True == 1)
print(True == 0)
True
False
# False is considered equivalent to the integer value 1 in a boolean context
print(False == 0)
print(False == 1)
True
False
# True / True evaluates to 1.0, False is equivalent to 0. So, the final result of the expression is 1.0.
(True / True) + False
1.0
Lists#
It is an ordered and mutable Python container. Its itmes are orderd, changeable, and allow duplicate values and different type of objects. Finally, every item has index because lists have a defined order.
# We use brackets to create a list
# A list can contain elements of various data types
my_list = [ 18, 20 , 30, "alex" ]
my_list
[18, 20, 30, 'alex']
Method |
Definition |
---|---|
type() |
It returns the class type of an object. |
copy() |
Returns a copy of the list. |
sort() |
Sorts the list in ascending order. |
append() |
Adds a single element to a list. |
extend() |
Adds multiple elements to a list. |
index() |
Returns the first appearance of the specified value. |
grades = [15, 18, 16, 5, 8 ]
grades
[15, 18, 16, 5, 8]
Copy#
# This is useful if you want to perform operations on the new list without affecting the original list
new_grades = grades.copy()
new_grades
[15, 18, 16, 5, 8]
Sort#
new_grades.sort()
new_grades
[5, 8, 15, 16, 18]
Append#
new_grades.append( 20 )
new_grades
[5, 8, 15, 16, 18, 20]
Extend#
other_grades = [ 4, 14, 15 ]
other_grades
[4, 14, 15]
new_grades.extend( other_grades )
new_grades
[5, 8, 15, 16, 18, 20, 4, 14, 15]
Index#
# if the value appears more than once in the list, return the first occurrence
new_grades.index(15)
2
Function |
Definition |
---|---|
max(list) |
It returns an item from the list with max value. |
min(list) |
It returns an item from the list with min value. |
len(list) |
It gives the total length of the list. |
list(seq) |
Converts a tuple into a list. |
Max#
max( new_grades )
20
Min#
min( new_grades )
4
Len#
len (new_grades )
9
List#
my_tuple = ( 1, 3, 5, 7 )
type(my_tuple)
tuple
my_list = list( my_tuple )
my_list
[1, 3, 5, 7]
Tuple#
It is an ordered and unchangeable Python container. We cannot change, add or remove items after the tuple has been created. Tuple items are ordered, unchangeable, and allow duplicate values.
# A tuple can contain elements of various data types
new_tuple = ('alex', 5, True)
new_tuple
('alex', 5, True)
tuple1 = (1, 3, 3, 5, 10)
tuple1[1]
3
# I want to change the value in the index 1
tuple1[1] = 4
# It is not possible to change values
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [119], in <cell line: 2>()
1 # I want to change the value in the index 1
----> 2 tuple1[1] = 4
NameError: name 'tuple1' is not defined
Method |
Definition |
---|---|
count() |
Returns the number of times |
index() |
Searches the tuple for a specified value |
tuple1 = (1, 3, 3, 5, 10, 5, 5)
Count#
tuple1.count( 5 )
3
Index#
tuple1.index( 3 )
1
Function |
Definition |
---|---|
max(tuple) |
It returns an item from the tuple with max value. |
min(tuple) |
It returns an item from the tuple with min value. |
len(tuple) |
It gives the total length of the tuple. |
tuple( list ) |
Converts a list into a tuple. |
my_tuple = ( 1, 2, 3, 4, 5, 10 )
Len#
# Length
len( my_tuple )
6
Tuple#
# Tuple
my_list = [ 1, 3, 5, 7]
my_tuple = tuple( my_list )
my_tuple
(1, 3, 5, 7)
Nested Tuple#
# Creates a tuple tuple1 with elements 1, 3, 3, 5, 10, and a nested tuple (5, 6, 7)
# Nested tuple: tuple written inside another tuple
new_tuple = ( 1, 3, 3, 5, 10, (5, 6, 7) )
print(new_tuple)
print(new_tuple[-1])
(1, 3, 3, 5, 10, (5, 6, 7))
(5, 6, 7)
Dictionaries#
It is a ordered (Python >= 3.7) and mutable Python container. It does not allow duplicate key. They must be unique.
alexander = { 'lastname': "Quispe,", 'age': 28, 'birth_place': "SJL", 'male' : True }
maria = { 'lastname': "rojas", 'age': 27, 'birth_place': "SMP", 'male' : False }
print(alexander)
print(maria)
{'lastname': 'Quispe,', 'age': 28, 'birth_place': 'SJL', 'male': True}
{'lastname': 'rojas', 'age': 27, 'birth_place': 'SMP', 'male': False}
From list to dictionary#
# Defines two lists, lastname and ages, containing strings and integers, respectively
lastname = ["Quispe", "Rojas", "Rodriguez"]
ages = [28, 29, 30]
# A dictionary dict_1 is created with keys "lastname" and "ages",
# where each key is associated with its respective list
dict_1 = { "lastname" : lastname, "ages": ages}
dict_1
{'lastname': ['Quispe', 'Rojas', 'Rodriguez'], 'ages': [28, 29, 30]}
type( dict_1 )
dict
# Using the key "lastname" to access the corresponding value in the dictionary
dict_1["lastname"]
['Quispe', 'Rojas', 'Rodriguez']
Method |
Definition |
---|---|
clear() |
Removes all the elements from the dictionary |
copy() |
Returns a copy of the dictionary |
fromkeys() |
Returns a dictionary with the specified keys and value |
get() |
Returns the value of the specified key |
items() |
Returns a list containing a tuple for each key value pair |
keys() |
Returns a list containing the dictionary’s keys |
pop() |
Removes the element with the specified key |
popitem() |
Removes the last inserted key-value pair |
setdefault() |
Returns the value of the specified key. |
update() |
Updates the dictionary with the specified key-value pairs |
values() |
Returns a list of all the values in the dictionary |
# Dictionary containing the population of the 5 largest german cities
population = {'Berlin': 3748148, 'Hamburg': 1822445, 'Munich': 1471508, 'Cologne': 1085664, 'Frankfurt': 753056 }
population
{'Berlin': 3748148,
'Hamburg': 1822445,
'Munich': 1471508,
'Cologne': 1085664,
'Frankfurt': 753056}
Copy#
pop_2 = population.copy()
pop_2
{'Berlin': 3748148,
'Hamburg': 1822445,
'Munich': 1471508,
'Cologne': 1085664,
'Frankfurt': 753056}
Clear#
pop_2.clear()
pop_2
{}
Get, items, keys#
# Get information from key
population.get('Munich')
1471508
# Get information from key
population.items()
dict_items([('Berlin', 3748148), ('Hamburg', 1822445), ('Munich', 1471508), ('Cologne', 1085664), ('Frankfurt', 753056)])
population.keys()
dict_keys(['Berlin', 'Hamburg', 'Munich', 'Cologne', 'Frankfurt'])
Pop#
# Drop a key
population.pop("Frankfurt")
population
{'Berlin': 3748148, 'Hamburg': 1822445, 'Munich': 1471508, 'Cologne': 1085664}
Update#
stadiums = { "munich":"Alianz Arena","dormundt": "SIP", "ulm": "VFLULM", "shalke": "GAZPROM"}
population.update( {"stadiums": stadiums} )
population
{'Berlin': 3748148,
'Hamburg': 1822445,
'Munich': 1471508,
'Cologne': 1085664,
'stadiums': {'munich': 'Alianz Arena',
'dormundt': 'SIP',
'ulm': 'VFLULM',
'shalke': 'GAZPROM'}}
Pop item#
# Drop an item (the last inserted key-value pair: stadiums)
population.popitem( )
population
{'Berlin': 3748148, 'Hamburg': 1822445, 'Munich': 1471508, 'Cologne': 1085664}
Add new items#
population.update( { "Bonn" : 327258 } )
population.update( { "Ulm" : 100000 } )
population
{'Berlin': 3748148,
'Hamburg': 1822445,
'Munich': 1471508,
'Cologne': 1085664,
'Bonn': 327258,
'Ulm': 100000}
population.update( { "Bonn" : {"population":100 , "km2" : 500, "president" : "Anzony"} } )
print( population )
{'Berlin': 3748148, 'Hamburg': 1822445, 'Munich': 1471508, 'Cologne': 1085664, 'Bonn': {'population': 100, 'km2': 500, 'president': 'Anzony'}, 'Ulm': 100000}
# Get all keys
population.keys()
dict_keys(['Berlin', 'Hamburg', 'Munich', 'Cologne', 'Bonn', 'Ulm'])
# Get all values from all keys
population.values()
dict_values([3748148, 1822445, 1471508, 1085664, {'population': 100, 'km2': 500, 'president': 'Anzony'}, 100000])
From lists to dictionaries#
# keys
cities = ['Fray Martin','Santa Rosa de Puquio','Cuchicorral','Santiago de Punchauca',
'La Cruz (11 Amigos)','Cerro Cañon','Cabaña Suche','San Lorenzo',
'Jose Carlos Mariategui','Pascal','La Esperanza','Fundo Pancha Paula','Olfa',
'Rio Seco','Paraiso','El Rosario','Cerro Puquio','La Campana','Las Animas',
'Vetancio','Roma Alta','San Jose','San Pedro de Carabayllo','Huacoy',
'Fundo Pampa Libre','Ex Fundo Santa Ines','Reposo','Carmelito','Santa Elena','Don Luis','Santa Ines Parcela','Asociacion Santa Ines','Roma Baja','Residencial Santa Lucia','San Francisco','Santa Margarita - Molinos','Sipan Peru','Fundo Cuadros','Bello Horizonte','El Hueco','Ex Fundo Mariategui','Naranjito','Vista Hermosa','El Sabroso de Jose Carlos Mariategui','Granja Carabayllo','Agropecuario Valle el Chillon','Camino Real','Copacabana','El Trebol','Tablada la Virgen','San Fernando de Carabayllo','San Fernando de Copacabana','La Manzana','Chacra Grande','Torres de Copacabana','San Pedro de Carabayllo','San Lorenzo','Chaclacayo','Chorrillos','Cieneguilla','Lindero','Pichicato','San Isidro','San Vicente','Piedra Liza','Santa Rosa de Chontay (Chontay)','La Libertad','El Agustino','Independencia','Jesus Maria','La Molina','La Victoria','Lince','Las Palmeras','Chosica','Lurin','Los Almacigos','Rinconada del Puruhuay','Fundo Santa Genoveva','Los Maderos','Casco Viejo','Vista Alegre','Buena Vista Alta','Lomas Pucara','Fundo la Querencia','Magdalena del Mar','Pueblo Libre','Miraflores','Pachacamac','Puente Manchay','Tambo Inga','Pampa Flores','Manchay Alto Lote B','Invasion Cementerio','Manchay Bajo','Santa Rosa de Mal Paso','Cardal','Jatosisa','Tomina','Pucusana','Honda','Quipa','Los Pelicanos','Playa Puerto Bello','Ñaves','Granja Santa Elena','Alvatroz II','Poseidon - Lobo Varado','Playa Minka Mar','Playa Acantilado','Puente Piedra','Punta Hermosa','Capilla Lucumo','Cucuya','Pampapacta','Avicola San Cirilo de Loma Negra - 03','Avicola San Cirilo de Loma Negra - 02','Avicola San Cirilo de Loma Negra - 01','Pampa Mamay','Cerro Botija','Agricultores y Ganaderos','Pampa Malanche Avicola Puma','Punta Negra','Chancheria','Rimac','San Bartolo','Plantel 41','Granja 4','Granja 5','Granja 07','Granja 44','Granja 47','Santa Maria I','Las Torres Santa Fe','San Francisco de Borja','San Isidro','San Juan de Lurigancho','Ciudad de Dios','San Luis','Barrio Obrero Industrial','San Miguel','Santa Anita - los Ficus','Santa Maria del Mar','Don Bruno','Santa Rosa','Santiago de Surco','Surquillo','Villa el Salvador','Villa Maria del Triunfo', 'Pueblo libre']
# values
postal_code = [15001,15003,15004,15006,15018,15019,15046,15072,15079,15081,15082,15083,15088,15123,15004,15011,15012,15019,15022,15023,15026,15476,15479,15483,15487,15491,15494,15498,15047,15049,15063,15082,15083,15121,15122,15313,15316,15318,15319,15320,15321,15324,15320,15320,15320,15320,15320,15320,15121,15320,15320,15121,15320,15320,15121,15121,15122,15122,15121,15121,15121,15320,15320,15320,15320,15320,15320,15121,15121,15121,15320,15121,15319,15121,15121,15121,15320,15320,15121,15121,15121,15121,15320,15320,15320,15122,15122,15122,15122,15122,15122,15122,15122,15121,15121,15122,15122,15121,15121,15122,15122,15121,15122,15122,15122,15472,15476,15054,15056,15057,15058,15063,15064,15066,15067,15593,15594,15593,15593,15593,15593,15593,15593,15593,15311,15312,15313,15314,15316,15324,15326,15327,15328,15332,15003,15004,15006,15007,15008,15009,15011,15018,15022,15311,15328,15331,15332,15333,15046, 15001]
len(cities)
150
len(postal_code)
150
list(zip( cities , postal_code ))
[('Fray Martin', 15001),
('Santa Rosa de Puquio', 15003),
('Cuchicorral', 15004),
('Santiago de Punchauca', 15006),
('La Cruz (11 Amigos)', 15018),
('Cerro Cañon', 15019),
('Cabaña Suche', 15046),
('San Lorenzo', 15072),
('Jose Carlos Mariategui', 15079),
('Pascal', 15081),
('La Esperanza', 15082),
('Fundo Pancha Paula', 15083),
('Olfa', 15088),
('Rio Seco', 15123),
('Paraiso', 15004),
('El Rosario', 15011),
('Cerro Puquio', 15012),
('La Campana', 15019),
('Las Animas', 15022),
('Vetancio', 15023),
('Roma Alta', 15026),
('San Jose', 15476),
('San Pedro de Carabayllo', 15479),
('Huacoy', 15483),
('Fundo Pampa Libre', 15487),
('Ex Fundo Santa Ines', 15491),
('Reposo', 15494),
('Carmelito', 15498),
('Santa Elena', 15047),
('Don Luis', 15049),
('Santa Ines Parcela', 15063),
('Asociacion Santa Ines', 15082),
('Roma Baja', 15083),
('Residencial Santa Lucia', 15121),
('San Francisco', 15122),
('Santa Margarita - Molinos', 15313),
('Sipan Peru', 15316),
('Fundo Cuadros', 15318),
('Bello Horizonte', 15319),
('El Hueco', 15320),
('Ex Fundo Mariategui', 15321),
('Naranjito', 15324),
('Vista Hermosa', 15320),
('El Sabroso de Jose Carlos Mariategui', 15320),
('Granja Carabayllo', 15320),
('Agropecuario Valle el Chillon', 15320),
('Camino Real', 15320),
('Copacabana', 15320),
('El Trebol', 15121),
('Tablada la Virgen', 15320),
('San Fernando de Carabayllo', 15320),
('San Fernando de Copacabana', 15121),
('La Manzana', 15320),
('Chacra Grande', 15320),
('Torres de Copacabana', 15121),
('San Pedro de Carabayllo', 15121),
('San Lorenzo', 15122),
('Chaclacayo', 15122),
('Chorrillos', 15121),
('Cieneguilla', 15121),
('Lindero', 15121),
('Pichicato', 15320),
('San Isidro', 15320),
('San Vicente', 15320),
('Piedra Liza', 15320),
('Santa Rosa de Chontay (Chontay)', 15320),
('La Libertad', 15320),
('El Agustino', 15121),
('Independencia', 15121),
('Jesus Maria', 15121),
('La Molina', 15320),
('La Victoria', 15121),
('Lince', 15319),
('Las Palmeras', 15121),
('Chosica', 15121),
('Lurin', 15121),
('Los Almacigos', 15320),
('Rinconada del Puruhuay', 15320),
('Fundo Santa Genoveva', 15121),
('Los Maderos', 15121),
('Casco Viejo', 15121),
('Vista Alegre', 15121),
('Buena Vista Alta', 15320),
('Lomas Pucara', 15320),
('Fundo la Querencia', 15320),
('Magdalena del Mar', 15122),
('Pueblo Libre', 15122),
('Miraflores', 15122),
('Pachacamac', 15122),
('Puente Manchay', 15122),
('Tambo Inga', 15122),
('Pampa Flores', 15122),
('Manchay Alto Lote B', 15122),
('Invasion Cementerio', 15121),
('Manchay Bajo', 15121),
('Santa Rosa de Mal Paso', 15122),
('Cardal', 15122),
('Jatosisa', 15121),
('Tomina', 15121),
('Pucusana', 15122),
('Honda', 15122),
('Quipa', 15121),
('Los Pelicanos', 15122),
('Playa Puerto Bello', 15122),
('Ñaves', 15122),
('Granja Santa Elena', 15472),
('Alvatroz II', 15476),
('Poseidon - Lobo Varado', 15054),
('Playa Minka Mar', 15056),
('Playa Acantilado', 15057),
('Puente Piedra', 15058),
('Punta Hermosa', 15063),
('Capilla Lucumo', 15064),
('Cucuya', 15066),
('Pampapacta', 15067),
('Avicola San Cirilo de Loma Negra - 03', 15593),
('Avicola San Cirilo de Loma Negra - 02', 15594),
('Avicola San Cirilo de Loma Negra - 01', 15593),
('Pampa Mamay', 15593),
('Cerro Botija', 15593),
('Agricultores y Ganaderos', 15593),
('Pampa Malanche Avicola Puma', 15593),
('Punta Negra', 15593),
('Chancheria', 15593),
('Rimac', 15311),
('San Bartolo', 15312),
('Plantel 41', 15313),
('Granja 4', 15314),
('Granja 5', 15316),
('Granja 07', 15324),
('Granja 44', 15326),
('Granja 47', 15327),
('Santa Maria I', 15328),
('Las Torres Santa Fe', 15332),
('San Francisco de Borja', 15003),
('San Isidro', 15004),
('San Juan de Lurigancho', 15006),
('Ciudad de Dios', 15007),
('San Luis', 15008),
('Barrio Obrero Industrial', 15009),
('San Miguel', 15011),
('Santa Anita - los Ficus', 15018),
('Santa Maria del Mar', 15022),
('Don Bruno', 15311),
('Santa Rosa', 15328),
('Santiago de Surco', 15331),
('Surquillo', 15332),
('Villa el Salvador', 15333),
('Villa Maria del Triunfo', 15046),
('Pueblo libre', 15001)]
# Return a dictionarie
ct_pc = dict( zip( cities , postal_code ) )
ct_pc
{'Fray Martin': 15001,
'Santa Rosa de Puquio': 15003,
'Cuchicorral': 15004,
'Santiago de Punchauca': 15006,
'La Cruz (11 Amigos)': 15018,
'Cerro Cañon': 15019,
'Cabaña Suche': 15046,
'San Lorenzo': 15122,
'Jose Carlos Mariategui': 15079,
'Pascal': 15081,
'La Esperanza': 15082,
'Fundo Pancha Paula': 15083,
'Olfa': 15088,
'Rio Seco': 15123,
'Paraiso': 15004,
'El Rosario': 15011,
'Cerro Puquio': 15012,
'La Campana': 15019,
'Las Animas': 15022,
'Vetancio': 15023,
'Roma Alta': 15026,
'San Jose': 15476,
'San Pedro de Carabayllo': 15121,
'Huacoy': 15483,
'Fundo Pampa Libre': 15487,
'Ex Fundo Santa Ines': 15491,
'Reposo': 15494,
'Carmelito': 15498,
'Santa Elena': 15047,
'Don Luis': 15049,
'Santa Ines Parcela': 15063,
'Asociacion Santa Ines': 15082,
'Roma Baja': 15083,
'Residencial Santa Lucia': 15121,
'San Francisco': 15122,
'Santa Margarita - Molinos': 15313,
'Sipan Peru': 15316,
'Fundo Cuadros': 15318,
'Bello Horizonte': 15319,
'El Hueco': 15320,
'Ex Fundo Mariategui': 15321,
'Naranjito': 15324,
'Vista Hermosa': 15320,
'El Sabroso de Jose Carlos Mariategui': 15320,
'Granja Carabayllo': 15320,
'Agropecuario Valle el Chillon': 15320,
'Camino Real': 15320,
'Copacabana': 15320,
'El Trebol': 15121,
'Tablada la Virgen': 15320,
'San Fernando de Carabayllo': 15320,
'San Fernando de Copacabana': 15121,
'La Manzana': 15320,
'Chacra Grande': 15320,
'Torres de Copacabana': 15121,
'Chaclacayo': 15122,
'Chorrillos': 15121,
'Cieneguilla': 15121,
'Lindero': 15121,
'Pichicato': 15320,
'San Isidro': 15004,
'San Vicente': 15320,
'Piedra Liza': 15320,
'Santa Rosa de Chontay (Chontay)': 15320,
'La Libertad': 15320,
'El Agustino': 15121,
'Independencia': 15121,
'Jesus Maria': 15121,
'La Molina': 15320,
'La Victoria': 15121,
'Lince': 15319,
'Las Palmeras': 15121,
'Chosica': 15121,
'Lurin': 15121,
'Los Almacigos': 15320,
'Rinconada del Puruhuay': 15320,
'Fundo Santa Genoveva': 15121,
'Los Maderos': 15121,
'Casco Viejo': 15121,
'Vista Alegre': 15121,
'Buena Vista Alta': 15320,
'Lomas Pucara': 15320,
'Fundo la Querencia': 15320,
'Magdalena del Mar': 15122,
'Pueblo Libre': 15122,
'Miraflores': 15122,
'Pachacamac': 15122,
'Puente Manchay': 15122,
'Tambo Inga': 15122,
'Pampa Flores': 15122,
'Manchay Alto Lote B': 15122,
'Invasion Cementerio': 15121,
'Manchay Bajo': 15121,
'Santa Rosa de Mal Paso': 15122,
'Cardal': 15122,
'Jatosisa': 15121,
'Tomina': 15121,
'Pucusana': 15122,
'Honda': 15122,
'Quipa': 15121,
'Los Pelicanos': 15122,
'Playa Puerto Bello': 15122,
'Ñaves': 15122,
'Granja Santa Elena': 15472,
'Alvatroz II': 15476,
'Poseidon - Lobo Varado': 15054,
'Playa Minka Mar': 15056,
'Playa Acantilado': 15057,
'Puente Piedra': 15058,
'Punta Hermosa': 15063,
'Capilla Lucumo': 15064,
'Cucuya': 15066,
'Pampapacta': 15067,
'Avicola San Cirilo de Loma Negra - 03': 15593,
'Avicola San Cirilo de Loma Negra - 02': 15594,
'Avicola San Cirilo de Loma Negra - 01': 15593,
'Pampa Mamay': 15593,
'Cerro Botija': 15593,
'Agricultores y Ganaderos': 15593,
'Pampa Malanche Avicola Puma': 15593,
'Punta Negra': 15593,
'Chancheria': 15593,
'Rimac': 15311,
'San Bartolo': 15312,
'Plantel 41': 15313,
'Granja 4': 15314,
'Granja 5': 15316,
'Granja 07': 15324,
'Granja 44': 15326,
'Granja 47': 15327,
'Santa Maria I': 15328,
'Las Torres Santa Fe': 15332,
'San Francisco de Borja': 15003,
'San Juan de Lurigancho': 15006,
'Ciudad de Dios': 15007,
'San Luis': 15008,
'Barrio Obrero Industrial': 15009,
'San Miguel': 15011,
'Santa Anita - los Ficus': 15018,
'Santa Maria del Mar': 15022,
'Don Bruno': 15311,
'Santa Rosa': 15328,
'Santiago de Surco': 15331,
'Surquillo': 15332,
'Villa el Salvador': 15333,
'Villa Maria del Triunfo': 15046,
'Pueblo libre': 15001}
Excersises#
Write a Python script to check whether Lima is a key of ct_pc.
Write a Python script to join two Python dictionaries.
Write a Python script to add a key to a dictionary.
Numpy#
Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this tutorial useful to get started with Numpy.
Arrays#
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.
import numpy as np
a = np.array( [1, 2, 3, 4, 5] )
a
# 1D array
a = np.array( [1, 2, 3, 4, 5] )
print(a)
# 2D array
M = np.array( [ [1, 2, 3], [4, 5, 6] ] )
print(M)
X = np.array( [ [1, 2, 3, 4], [4, 5, 6, 7] ] )
X
Function |
Description |
---|---|
np.array(a) |
Create -dimensional np array from sequence a |
np.linspace(a,b,N) |
Create 1D np array with N equally spaced values |
np.arange(a,b,step) |
Create 1D np array with values from a to b (exclusively) |
np.zeros(N) |
Create 1D np array of zeros of length |
np.zeros((n,m)) |
Create 2D np array of zeros with rows and columns |
np.ones(N) |
Create 1D np array of ones of length |
np.ones((n,m)) |
Create 2D np array of ones with rows and columns |
np.eye(N) |
Create 2D np array with rows and columns |
np.concatenate( ) |
Join a sequence of arrays along an existing axis |
np.hstack( ) |
Stack arrays in sequence horizontally(column wise) |
np.vstack( ) |
Stack arrays in sequence vertically(row wise) |
np.column_stack( ) |
Stack 1-D arrays as columns into a 2-D array |
np.random.normal() |
Draw random samples from a normal (Gaussian) distribution. |
np.linalg.inv() |
Compute the (multiplicative) inverse of a matrix. |
np.dot() / @ |
Matrix Multiplication. |
# Create a 1D NumPy array with 11 equally spaced values from 0 to 1:
x = np.linspace( 0, 1, 11 )
print(x)
# Create a 1D NumPy array with values from 0 to 20 (exclusively) incremented by 5:
y = np.arange( 0, 20, 1 )
print(y)
# Create a 1D NumPy array of zeros of length 5:
z = np.zeros(5)
print(z)
# Create a 2D NumPy array of zeros of shape ( 5, 10 ) :
M = np.zeros( (5, 10) )
print(M)
# Create a 1D NumPy array of ones of length 7:
w = np.ones(7)
print(w)
# Create a 2D NumPy array of ones with 35ows and 25 columns:
N = np.ones( (5, 5) )
print(N)
np.eye(5)
# Create the identity matrix of size 10:
I = np.eye(10)
print(I)
# Shape
print( I.shape )
# Size
print(I.size)
# Concateante
g = np.array([[5,6],[7,8]])
g
h = np.array([[1,2]])
h
print(g, "\n")
print(h , "\n")
g.shape
h.shape
h.shape
h_2 = h.reshape(2, 1)
h_2
g
g_h = np.concatenate((g, h_2), axis = 1)
g_h
h_2 = h.reshape(2, 1)
h_2
jesus = np.hstack((g,h_2))
jesus
# vstack
x = np.array([1,1,1])
y = np.array([2,2,2])
z = np.array([3,3,3])
vstacked = np.vstack( (x, y, z) )
vstacked
vstacked = np.vstack((x,y,z))
print(vstacked)
# hstack
hstacked = np.hstack((x,y,z))
print(hstacked)
OLS with Numpy#
x0.reshape(-1, 1).shape
# X data generation
n_data = 200
x1 = np.linspace(200, 500, n_data)
x0 = np.ones(n_data)
X = np.hstack(( x0.reshape(-1, 1 ) , x1.reshape(-1, 1 ) ))
X.shape
# select parameters
beta = np.array([5, -2]).reshape(-1, 1 )
beta.shape
# y ture
y_true = X @ beta
y_true.shape
y_true
y_true + (np.random.normal(0, 1, n_data) * 20).reshape(-1, 1)
# add random normal noise
sigma = 20
y_actual = y_true + (np.random.normal(0, 1, n_data) * sigma).reshape(-1, 1)
print(y_actual[0:4, :])
The matrix equation for the estimated linear parameters is as below: $\({\hat {\beta }}=(X^{T}X)^{-1}X^{T}y.\)$
# estimations
beta_estimated = np.linalg.inv(X.T @ X) @ X.T @ y_actual
import matplotlib.pyplot as plt
plt.plot(x1, y_actual, 'o')
plt.plot(x1, y_true, 'g-', c = 'black')
Calculate the sum of squared residual errors $\( RSS=y^{T}y-y^{T}X(X^{T}X)^{{-1}}X^{T}y \)$
y_actual
RSS = ( y_actual.T @ y_actual - y_actual.T @ X @ np.linalg.inv(X.T @ X) @ X.T @ y_actual )
Calculated the Total Sum of Squares of the spread of the actual (noisy) values around their mean $\( TSS=(y-{\bar y})^{T}(y-{\bar y})=y^{T}y-2y^{T}{\bar y}+{\bar y}^{T}{\bar y} \)$
y_mean = ( np.ones(n_data) * np.mean(y_actual) ).reshape( -1 , 1 )
TSS = (y_actual - y_mean).T @ (y_actual - y_mean)
TSS
# get predictions
y_pred = X @ beta_estimated
Calculate the Sum of Squares of the spread of the predictions around their mean. $\( ESS=({\hat y}-{\bar y})^{T}({\hat y}-{\bar y})={\hat y}^{T}{\hat y}-2{\hat y}^{T}{\bar y}+{\bar y}^{T}{\bar y} \)$
ESS = (y_pred - y_mean).T @ (y_pred - y_mean)
ESS
TSS, ESS + RSS
Get \(R^2\) $\( 1 - RSS / TSS \)$
1 - RSS / TSS
Standard error of regression#
Calculate the standard error of the regression. We divide by (n-2)
, because the Expectation of the sum of squares is (n-2)*sigma^2
.
sr2 = ( (1 / (n_data - 2)) * (y_pred - y_actual).T @ (y_pred - y_actual))
sr = np.sqrt(sr2)
sr
Get variance and covariance Matrix#
In order to get the standard errors for our linear parameters, we use the matrix formula below: $\( Var(β^)=σ^2(X′X)^{-1} \)$
var_beta = sr2 * np.linalg.inv(X.T @ X)
var_beta
print(
f'Std Error for b0 {np.sqrt(var_beta[0, 0])}, \nStd Error for b1 {np.sqrt(var_beta[1, 1])}'
)
pwd