1) Plot the airports

There are three airports in New York:

  • Newark, 40.69°, -74.174°
  • JFK, at 40.641°, -73.778°
  • LaGuardia, at 40.777°, -73.874°

Modify your taxi_plot.py (example here) script to add the three New York City Airports to the plot your script produces.

2) Parsing command-line arguments

Right now, we run python3 taxi_plot.py, and it already knows where the data file is, and where the output should go. What if, however, you got the same data file every day, and needed to make a new plot every day?

Modify your script so that it takes two filenames, the one for the input data file, and the one for the plot output file, so that you can run it like this:

python3 taxi_plot.py data/taxirides.csv.zip data/taxi_rides.png

I suggest reading the documentation for the argparse module, and using that. You will need to import argparse in your taxi_plot.py file, and then use argparse.ArgumentParser to parse arguments, as shown in the second example in the argparse tutorial.

3) Trip Data Analysis

In this problem we want to take a closer look at the trip distance data. Load the taxidata set as data. The taxidata table comes with information about pickup/dropoff locations as long/lat pairs as well as the trip distance logged by the taxi.

We want to compare the distance logged by the taxi in data.trip_distance to two different ways of computing distances from the pickup/dropoff location.

  • The first way of computing a distance is simply by drawing a straight line between the dropoff and pickup locations. This distance is approximately the euclidean distance between the two points, neglecting the curvature of the earth. you can compute the euclidean distance between two long/lat pairs by using the following function:
def euclidean_dist(long0,long1,lat0,lat1 ,unit = 'miles'):
    '''approximate the euclidean (direct) distance between two lat/long pairs in miles or km'''
    #scale longitudes by cosine of mean latitudes
    s_long0 = long0*np.cos((lat0+lat1)/2./180.*np.pi)
    s_long1 = long1*np.cos((lat0+lat1)/2./180.*np.pi)
    #angular distance between points on surface of earth
    d = np.sqrt((s_long1-s_long0)**2+(lat1-lat0)**2)
    if unit == 'km':
        #there are approximately 111km between points on the surface of the earth if they are separated by 1 degree
        d *= 111.
    elif unit == 'miles':
        #there are 0.621371 miles per km
        d *= 111.*0.621371
    else:
        print("function can only use 'km' or 'miles' as units")
        return

    return d
  • Another way of computing a distance is by using the Manhattan distance between the dropoff and pickup locations. This distance is just the sum of the absolute values of horizontal and vertical differences between the two points. You can compute the Manhattan distance between two long/lat pairs by using the following function:
def manhattan_dist(long0,long1,lat0,lat1 ,unit = 'miles'):
    '''approximate the Manhattan distance between two lat/long pairs in miles or km'''
    #scale longitudes by cosine of mean latitudes
    s_long0 = long0*np.cos((lat0+lat1)/2./180.*np.pi)
    s_long1 = long1*np.cos((lat0+lat1)/2./180.*np.pi)
    #angular distance between points on surface of earth
    d = np.abs(s_long1-s_long0) + np.abs(lat1-lat0)
    if unit == 'km':
        #there are approximately 111km between points on the surface of the earth if they are separated by 1 degree
        d *= 111.
    elif unit == 'miles':
        #there are 0.621371 miles per km
        d *= 111.*0.621371
    else:
        print("function can only use 'km' or 'miles' as units")
        return

    return d

1) Create the columns data.euclid_dist and data.manhattan_dist by applying the above defined functions to your dropoff/pickup data.

  • Note: data.trip_distance is a "shortcut" for accessing the column data['trip_distance']. To create it, you can't use that shortcut to create a column; you will need to use the notation data['euclid_dist'] = ….

2) Histogram all three distance columns data.trip_distance, data.euclid_dist, data.manhattan_dist. Choose appropriate bins like in class (exclude trips over 100 miles or a distance of your liking). Can you tell a difference between the three distributions? What are their mean and median distances?

3) Let's compare the two distance functions to the taxi-logged distance.

  • Focus on only very short trips (less than 3 miles). Which way of computing the distance between the two lat/long pairs is closest to the taxi-logged distance?

  • Focus on only large trips (more than than 20 miles). Which way of computing the distance between the two lat/long pairs is closest to the taxi-logged distance?

Can you explain your findings from above?

Bonus

The avenues of Manhattan are at an angle of 29° from true North (source). Update your Manhattan distance function to account from this.

Double Bonus

In Python, the standard way to handle errors is to raise an exception. Modify your functions to raise a ValueError if the value for units is not understood.

Triple Bonus

WARNING: This is really above and beyond anything we've covered!

Use the quantities module to convert distances into any distance unit the user specifies.

The quantities module does not come with Anaconda, and will need to be installed separately. To install it, go to the terminal and run this command:

pip install quantities

After that, import quantities