There are three airports in New York:
Modify your taxi_plot.py (example here) script to add the three New York City Airports to the plot your script produces.
Right now, we run python3 taxi_plot.py, and it already knows where the data file is, and where the output should go. What if, however, you got the same data file every day, and needed to make a new plot every day?
Modify your script so that it takes two filenames, the one for the input data file, and the one for the plot output file, so that you can run it like this:
python3 taxi_plot.py data/taxirides.csv.zip data/taxi_rides.png
I suggest reading the documentation for the argparse module, and using that. You will need to import argparse in your taxi_plot.py file, and then use argparse.ArgumentParser to parse arguments, as shown in the second example in the argparse tutorial.
In this problem we want to take a closer look at the trip distance data. Load the taxidata set as data. The taxidata table comes with information about pickup/dropoff locations as long/lat pairs as well as the trip distance logged by the taxi.
We want to compare the distance logged by the taxi in data.trip_distance to two different ways of computing distances from the pickup/dropoff location.
def euclidean_dist(long0,long1,lat0,lat1 ,unit = 'miles'):
'''approximate the euclidean (direct) distance between two lat/long pairs in miles or km'''
#scale longitudes by cosine of mean latitudes
s_long0 = long0*np.cos((lat0+lat1)/2./180.*np.pi)
s_long1 = long1*np.cos((lat0+lat1)/2./180.*np.pi)
#angular distance between points on surface of earth
d = np.sqrt((s_long1-s_long0)**2+(lat1-lat0)**2)
if unit == 'km':
#there are approximately 111km between points on the surface of the earth if they are separated by 1 degree
d *= 111.
elif unit == 'miles':
#there are 0.621371 miles per km
d *= 111.*0.621371
else:
print("function can only use 'km' or 'miles' as units")
return
return d
def manhattan_dist(long0,long1,lat0,lat1 ,unit = 'miles'):
'''approximate the Manhattan distance between two lat/long pairs in miles or km'''
#scale longitudes by cosine of mean latitudes
s_long0 = long0*np.cos((lat0+lat1)/2./180.*np.pi)
s_long1 = long1*np.cos((lat0+lat1)/2./180.*np.pi)
#angular distance between points on surface of earth
d = np.abs(s_long1-s_long0) + np.abs(lat1-lat0)
if unit == 'km':
#there are approximately 111km between points on the surface of the earth if they are separated by 1 degree
d *= 111.
elif unit == 'miles':
#there are 0.621371 miles per km
d *= 111.*0.621371
else:
print("function can only use 'km' or 'miles' as units")
return
return d
1) Create the columns data.euclid_dist and data.manhattan_dist by applying the above defined functions to your dropoff/pickup data.
data.trip_distance is a "shortcut" for accessing the column data['trip_distance']. To create it, you can't use that shortcut to create a column; you will need to use the notation data['euclid_dist'] = ….2) Histogram all three distance columns data.trip_distance, data.euclid_dist, data.manhattan_dist. Choose appropriate bins like in class (exclude trips over 100 miles or a distance of your liking). Can you tell a difference between the three distributions? What are their mean and median distances?
3) Let's compare the two distance functions to the taxi-logged distance.
Focus on only very short trips (less than 3 miles). Which way of computing the distance between the two lat/long pairs is closest to the taxi-logged distance?
Focus on only large trips (more than than 20 miles). Which way of computing the distance between the two lat/long pairs is closest to the taxi-logged distance?
Can you explain your findings from above?
The avenues of Manhattan are at an angle of 29° from true North (source). Update your Manhattan distance function to account from this.
In Python, the standard way to handle errors is to raise an exception. Modify your functions to raise a ValueError if the value for units is not understood.
WARNING: This is really above and beyond anything we've covered!
Use the quantities module to convert distances into any distance unit the user specifies.
The quantities module does not come with Anaconda, and will need to be installed separately. To install it, go to the terminal and run this command:
pip install quantities
After that, import quantities