More NumPy¶

Let's talk about some more we can do with NumPy.

import numpy as np
import matplotlib.pyplot as plt

Exercise 1: Write code that will take a function f and use the trapezoidal rule to approximate the integral of f with N points from 0 to 1.

The trapezoial rule is something you may have learned in a calculus course, if you've taken one. The way it works is that we approximate our function by finding the value at N points and drawing line segments between these points. The area underneath a line segment is a trapezoid, which we can compute using the formula A = 1/2(b1 + b2)h. This gives us an estimate for the area underneath the graph of f, which is the integral of f.

This time, we will be writing a function to compute this for us. We'll have each subpart of the problem be its own function.

Part 1: Write a function called sample_points which takes a function f and returns a NumPy vector of f evaluated at N points evenly spaced between 0 and 1.

Hint: Normally we want to avoid looping over a NumPy vector, but here we will have to.

(solution below)

.

def sample_points(f, N):
    pts = np.linspace(0,1,N)
    for i in range(len(pts)):
        pts[i] = f(pts[i])
    return pts

Part 2: Write a function called compute_area which computes the area under the trapezoids defined by the sampled points.

For evenly spaced points (which we have), this has a simple formula. It is
1/2(value of first point) + (value of second point) + (value of third point) + ... + (value of second to last point) + 1/2(value of last point) (this is the sum of all the point values, except only half for the first and last point) and all of that multipled by the distance between the sampled points.

(solution below)

.

def compute_area(pts):
    area = np.sum(pts) - 0.5*pts[0] - 0.5*pts[-1]
    return area/(len(pts)-1)

After this, we have our solution:

def trapezoidal_rule(f, N):
    pts = sample_points(f, N)
    return compute_area(pts)

Part 3: Use the above functions and Pyplot to plot y = sin(pi x) and the trapezoial rule approximation. Also, approximate the integral.

(solution below)

.

f = lambda x : np.sin(np.pi*x)
N = 5

x = np.linspace(0,1,1000)
y = f(x)
x_samp = np.linspace(0,1,N)
y_samp = sample_points(f, N)

plt.plot(x,y)
plt.plot(x_samp,y_samp)

plt.show()
print(f'Area: {trapezoidal_rule(f,N)}')

Area: 0.6035533905932737

Exercise 2: Download the files cal_cities_lat_long.csv and cal_cities_names.txt from CCLE or my website. Imagine we're trying to pick a city to place something (such as a utility, or a concert hall, or a business, etc.). We want it to be as close as possible to all of California. By this I mean that we want the furthest California city from our city to be as close as possible.

We'll plot the cities as latitude and longitude on the X-Y plane and use the distance there as the distance between cities. This isn't completely accurate, but it's good enough.

First, we'll get the data from the files. Later on in class, we'll learn a better package to load data from files, but NumPy works just fine.

cal_cities = np.genfromtxt('cal_cities_lat_long.csv', delimiter=',')
print(cal_cities.shape)

(459, 2)

This is the latitude and longitude of 459 California cities.

Part 1: Write a function max_dist that takes in a NumPy array pts with two columns and a NumPy vector v and returns the maximum distance between any point in pts and v.

(solution below)

.

def max_dist(pts, v):
    return max(np.sqrt((pts[:,0] - v[0])**2 + (pts[:,1] - v[1])**2))

Part 2: Write a function called choose_city that uses the above to choose the city we want. Return the index of the city.

(Hint: again, we are going to need to loop over our NumPy array, after this I will explain how we can vectorize our code.)

(solution below)

.

def choose_city(pts):
    min_dist = max_dist(pts, pts[0,:])
    ind = 0
    for i in range(1,len(pts)):
        d = max_dist(pts, pts[i,:])
        if d < min_dist:
            ind = i
            min_dist = d
    return ind

To vectorize the code properly, the best way I know how to do is to create a 2D array of distances in a vectorized way, and then use NumPy methods to find the maximum of each column and then then the minimum of the maximums. This requires a little bit more knowledge of broadcasting, however, which you don't quite know yet.

Part 3: Use these two functions to find our choosen city. Then load cal_cities_names.txt to get the name of the city. The list of cities are in the same order as the latitude and longitude and separated by lines. So the 0-th index row in cal_cities is the first line of cal_cities_names.txt.

(solution below)

.

city_ind = choose_city(cal_cities)
with open('cal_cities_names.txt') as f:
    name = str(f.read()).splitlines()[city_ind]

print(name)

Madera

Part 4: Plot all the cities of California, with our chosen city in a different color.

(solution below)

.

plt.scatter(cal_cities[:,1], cal_cities[:,0])
plt.scatter(cal_cities[city_ind,1], cal_cities[city_ind, 0])

plt.show()