Ride Aggregation Efficiency

Alan Parker Lue

26 February 2018

Abstract. I propose a method to quantify the notion of efficiency when aggregating rides in ridesharing settings, where passengers on two or more distinct trips may occupy the same vehicle at the same time. Given a set of trips that overlap in time but with potentially different origins and destinations, a measure of ride aggregation efficiency can help ridesharing system operators decide how best to allocate riders to vehicles.

In the course of fulfilling ride requests in a ridesharing system, riders can often be efficiently allocated to vehicles by identifying the sets of rides with the greatest space–time overlap. Rides with high overlap have high ride aggregation efficiency, meaning that such rides tend to travel in the same direction and occur at similar times. In general there are two approaches:

Ex ante heuristics
Ex post measurement

Ex post measurement (i.e., of actual or simulated aggregated ride outcomes) simultaneously measures both pure ride aggregation efficiency and the effectiveness of the ride routing algorithm itself—e.g., the rides in a zone may be intrinsically highly aggregable, but an ineffective routing algorithm could lead to very low aggregation. If we assume that the algorithm is efficient, then we could consider the ratio of passenger-minutes to driver-minutes:

\begin{equation} f(t_0, T) = \frac{\sum_{i=1}^n \int_{t_0}^T p_i(t)\,dt}{\sum_{i=1}^n \int_{t_0}^T 𝟙_i(t)\,dt}, \end{equation}

where \( p_i(t) \) is the number of passengers in vehicle \( i \) at time \( t \), \( 𝟙_i(t) \) is an indicator variable for whether vehicle \( i \) is driving a passenger at time \( t \), and \( n = n(t_0, T) \) is the number of vehicles operating during the interval \( (t_0, T) \). This ratio will be \( 1 \) for completely independent trips and a large positive number when many rides aggregate into few vehicles, so the higher the better. We take care to avoid assigning high efficiency to perverse outcomes (e.g., where a rider suffers from a long detour) by assuming that ride‐routing algorithms are efficient.

In the absence of a particular ride‐routing algorithm and by utilizing publicly available data on unaggregated trips, I propose a solution to measure ride aggregation efficiency using an ex ante heuristic.

1 Ex ante heuristic for ride aggregation efficiency

I propose a ride aggregation efficiency metric that yields a nonnegative number whose magnitude summarizes how many times a trip can be grouped with other trips, accounting for trip duration. For example,

a trip whose aggregation efficiency is 0 is unaggregable, whereas
a trip whose aggregation efficiency is 4 can group its trajectory with nearby trips four times over.

For zones, aggregation efficiency in a given time window is the average of the aggregation efficiencies of the trips within the zone and time window.

I measure ride aggregation efficiency by considering fields of aggregation compatibility around the trajectories of past taxi trips and by quantifying the amount of overlap in those fields. The core idea is that if there is a large amount of overlap between the aggregation compatibility fields of two taxi trips, then the trips can be easily aggregated.

I make the following assumptions:

Vehicles travel at constant velocity in straight-line trajectories from their origins to their destinations.
Aggregation compatibility is a decreasing function of straight-line distance between two vehicles.

Based on these assumptions, I posit the following model and efficiency metrics:

A vehicle's aggregation compatibility field at each point in time can be represented by a bivariate Gaussian distribution.
Over the course of their trajectories, the amount of overlap between the fields of two vehicles measures their mutual aggregation compatibility.
I calculate the aggregation compatibility of two vehicle trajectories (i.e., trips) by integrating the overlap of their aggregation compatibility fields across space and time.
I calculate the trip aggregation efficiency by (1) summing the pairwise aggregation compatibilities of that trip with all other trips in a given time window and zone and (2) dividing by the duration of the trip.
I calculate the zone aggregation efficiency by taking the mean of the zone's trip aggregation efficiencies in a given time window.

The core calculation is the integration of aggregation compatibility overlap across space and time:

\begin{equation} \int_{t_0}^T \int_{x_0}^X \int_{y_0}^Y 𝟙(x, y; \mu_1(t), n\Sigma) 𝟙(x, y; \mu_2(t), n\Sigma) \frac{c}{\sqrt{(2\pi)^2 |\Sigma|}} f(x) \,dy\,dx\,dt, \textrm{and} \end{equation} \begin{equation} f(x) = \min{\left(\exp{\left\{-\frac{1}{2}[(x, y) - \mu_1(t)]^T \Sigma^{-1}[(x, y) - \mu_1(t)]\right\}}, \exp{\left\{-\frac{1}{2}[(x, y) - \mu_2(t)]^T \Sigma^{-1}[(x, y) - \mu_2(t)]\right\}}\right)}, \end{equation}

where \( t \) is time, \( x \) is longitude, \( y \) is latitude, \( μ_i(t) \) is the coordinates of the vehicle \( i \) at time \( t \), \( \Sigma \) is the covariance matrix for aggregation compatibility, \( n \) is the number of standard deviations the limit of truncation is away from the mean, and \( c \) is the normalizing constant for the truncated bivariate normal distribution.

In order to operationalize this model, I implement the following features:

Truncated bivariate normal distribution to represent aggregation compatibility (see class TruncBiNorm and TruncBiNormOverlap in taxi.py)
Use a trip's pick-up time, origin, and destination to instantiate an trip object with corresponding trajectory and aggregation compatibility field (see class Trip in taxi.py)

This model has three parameters that impact the efficiency calculation for a given set of taxi trip data:

Vehicle velocity. Assume 15 km/h (about 9.3 mph), or 250 m/min.
- 0.00225 lat/min
- 0.00296 lon/min
Standard deviation of aggregation compatibility. Assume 300 m (just over 1 Manhattan avenue).
- 0.00270 lat
- 0.00355 lon
Limits of bivariate Gaussian truncation. Assume 3 standard deviations.

1.1 Example: (Roughly) overlapping rides

COLNAMES = [
    'pick_time', 'drop_time',
    'pick_lat', 'pick_lon', 'drop_lat', 'drop_lon',
    'n_passengers',
]

Consider two rides in Astoria, NY.

trips = pd.DataFrame(
    [
	(pd.Timestamp('2016-06-15 12:00:00'), np.nan,
	 40.7759, -73.9276, 40.7665, -73.9217, 1),
	(pd.Timestamp('2016-06-15 12:00:03'), np.nan,
	 40.7757, -73.9269, 40.7673, -73.9209, 1),
    ],
    columns=COLNAMES,
)
output_notebook()
taxi.plot_zones_and_trips(['Astoria'], trips)

Loading BokehJS ...

These two rides have high mutual aggregation efficiencies, as they are close in both space and time.

zae = taxi.zone_aggeffs(trips)

print('Trip aggregation efficiencies:', zae)
print('Mean efficiency for zone: {}'.format(np.mean(zae)))

Trip aggregation efficiencies: [0.81878319 0.8921752 ]
Mean efficiency for zone: 0.8554791973249489

However if I start the second one three minutes later, their aggregation efficiencies fall.

trips = pd.DataFrame(
    [
	(pd.Timestamp('2016-06-15 12:00:00'), np.nan,
	 40.7759, -73.9276, 40.7665, -73.9217, 1),
	(pd.Timestamp('2016-06-15 12:03:03'), np.nan,
	 40.7757, -73.9269, 40.7673, -73.9209, 1),
    ],
    columns=COLNAMES,
)
zae = taxi.zone_aggeffs(trips)

print('Trip aggregation efficiencies:', zae)
print('Mean efficiency for zone: {}'.format(np.mean(zae)))

Trip aggregation efficiencies: [0.07696682 0.08386577]
Mean efficiency for zone: 0.08041629551941824

1.2 Example: Multiple partially overlapping rides

For another example, the aggregation efficiency of the second of the following three trips reflects its space–time overlap with the first and third rides.

trips = pd.DataFrame(
    [
	(pd.Timestamp('2016-06-15 12:00:00'), np.nan,
	 40.7759, -73.9276, 40.7665, -73.9217, 1),
	(pd.Timestamp('2016-06-15 12:01:00'), np.nan,
	 40.7710, -73.9240, 40.7630, -73.9190, 1),
	(pd.Timestamp('2016-06-15 12:02:00'), np.nan,
	 40.7651, -73.9200, 40.7608, -73.9170, 1),
    ],
    columns=COLNAMES,
)
output_notebook()
taxi.plot_zones_and_trips(['Astoria'], trips)

Loading BokehJS ...

zae = taxi.zone_aggeffs(trips)

print('Trip aggregation efficiencies:', zae)
print('Mean efficiency for zone: {}'.format(np.mean(zae)))

Trip aggregation efficiencies: [0.48252141 0.71270788 0.55640971]
Mean efficiency for zone: 0.583879668259483

1.3 Example: Unaggregable rides

Finally, unaggregable rides do not impact the aggregation efficiencies of other trips, but they do reduce the zone-level aggregation efficiency.

trips = pd.DataFrame(
    [
	(pd.Timestamp('2016-06-15 12:00:00'), np.nan,
	 40.7759, -73.9276, 40.7665, -73.9217, 1),
	(pd.Timestamp('2016-06-15 12:01:00'), np.nan,
	 40.7710, -73.9240, 40.7630, -73.9190, 1),
	(pd.Timestamp('2016-06-15 12:02:00'), np.nan,
	 40.7651, -73.9200, 40.7608, -73.9170, 1),
	(pd.Timestamp('2016-06-15 12:02:00'), np.nan,
	 40.7670, -73.9100, 40.7600, -73.9100, 1),
    ],
    columns=COLNAMES,
)
output_notebook()
taxi.plot_zones_and_trips(['Astoria'], trips)

Loading BokehJS ...

zae = taxi.zone_aggeffs(trips)

print('Trip aggregation efficiencies:', zae)
print('Mean efficiency for zone: {}'.format(np.mean(zae)))

Trip aggregation efficiencies: [0.48252141 0.71270788 0.55640971 0.        ]
Mean efficiency for zone: 0.4379097511946123

2 Simulation and analysis

I calculate aggregation efficiencies for individual trips with various zones and time windows. I take Midtown to represent Manhattan and examine the following zone combinations:

Astoria
Astoria–Midtown
Astoria–Midtown–LGA
UES
UES–Midtown

I perform the calculation over six different times of the week:

Weekday morning (Wednesday starting at 08:00)
Weekday afternoon (Wednesday starting at 12:00)
Weekday evening (Wednesday starting at 18:00)
Weekend morning (Saturday starting at 08:00)
Weekend afternoon (Saturday starting at 12:00)
Weekend evening (Saturday starting at 18:00)

Because the aggregation efficiency calculation is computationally intensive, I use various windows of time ranging from 1–120 minutes to limit the number of trips to a reasonable level. For each time window with more than 30 trips, I randomly sample just 30 trips to perform the aggregation efficiency calculation. The table below shows the zone‐level aggregation efficiencies.

# Load aggregation efficiency calculation layout
ae_layout = pd.read_csv('aggeff_layout.csv')

# Load trip data and aggregation efficiency results
trips, zae = [], []
for i in ae_layout.index:
    with open('results/trips_{}.pkd'.format(i), 'rb') as f:
	trips.append(pickle.load(f))
    with open('results/zae_{}.pkd'.format(i), 'rb') as f:
	zae.append(pickle.load(f))

pd.concat(
    [
	ae_layout[['zone_label', 'time_label', 'start_time', 'duration']],
	pd.DataFrame(
	    [(len(td), np.mean(ae)) for td, ae in zip(trips, zae)],
	    columns=['n_trips', 'efficiency'],
	),
    ],
    axis=1,
).sort_values(['zone_label', 'start_time'])

№	`zone_label`	`time_label`	`start_time`	`duration`	`n_trips`	`efficiency`
0	Astoria	Weekday morning	2016-06-15 08:00	60	34	0.081083
5	Astoria	Weekday afternoon	2016-06-15 12:00	60	34	0.127128
10	Astoria	Weekday evening	2016-06-15 18:00	60	58	0.180803
15	Astoria	Weekend morning	2016-06-18 08:00	60	19	0.063788
20	Astoria	Weekend afternoon	2016-06-18 12:00	60	40	0.116931
25	Astoria	Weekend evening	2016-06-18 18:00	60	59	0.252794
1	Astoria-Midtown	Weekday morning	2016-06-15 08:00	90	11	0.368971
6	Astoria-Midtown	Weekday afternoon	2016-06-15 12:00	120	8	0.106940
11	Astoria-Midtown	Weekday evening	2016-06-15 18:00	60	11	0.473867
16	Astoria-Midtown	Weekend morning	2016-06-18 08:00	60	15	0.275607
21	Astoria-Midtown	Weekend afternoon	2016-06-18 12:00	60	13	0.124903
26	Astoria-Midtown	Weekend evening	2016-06-18 18:00	60	21	0.212501
2	Astoria-Midtown-LGA	Weekday morning	2016-06-15 08:00	3	17	3.110343
7	Astoria-Midtown-LGA	Weekday afternoon	2016-06-15 12:00	5	21	2.762013
12	Astoria-Midtown-LGA	Weekday evening	2016-06-15 18:00	5	14	1.312172
17	Astoria-Midtown-LGA	Weekend morning	2016-06-18 08:00	10	23	0.999410
22	Astoria-Midtown-LGA	Weekend afternoon	2016-06-18 12:00	10	27	1.846878
27	Astoria-Midtown-LGA	Weekend evening	2016-06-18 18:00	15	14	0.339333
3	UES	Weekday morning	2016-06-15 08:00	5	115	2.966929
8	UES	Weekday afternoon	2016-06-15 12:00	5	122	2.983721
13	UES	Weekday evening	2016-06-15 18:00	5	125	3.674477
18	UES	Weekend morning	2016-06-18 08:00	5	34	1.031728
23	UES	Weekend afternoon	2016-06-18 12:00	5	68	1.804801
28	UES	Weekend evening	2016-06-18 18:00	5	74	2.187959
4	UES-Midtown	Weekday morning	2016-06-15 08:00	1	35	3.023929
9	UES-Midtown	Weekday afternoon	2016-06-15 12:00	2	42	2.836140
14	UES-Midtown	Weekday evening	2016-06-15 18:00	1	25	1.526476
19	UES-Midtown	Weekend morning	2016-06-18 08:00	2	12	0.928378
24	UES-Midtown	Weekend afternoon	2016-06-18 12:00	2	43	2.458989
29	UES-Midtown	Weekend evening	2016-06-18 18:00	2	30	1.305169

2.1 Assess efficiency of aggregating rides within Astoria

After calculating the aggregation efficiencies for the various zone–time combinations, I pool the trip‐by‐trip aggregation efficiencies by zone to compare the overall efficiency quantiles and sample statistics. The numbers below reflect individual trip aggregation efficiencies across weekday‐weekend and morning‐afternoon‐evening combinations.

def aggeff_stats_by_zone(zones):
    d = {
	z: np.concatenate([zae[i] for i in ae_layout.index[ae_layout['zone_label'] == z]])
	for z in zones
    }
    return pd.DataFrame([
	pd.concat([pd.Series([z], index=['zone']), pd.Series(d[z]).describe()])
	for z in zones
    ])


aggeff_stats_by_zone(
    ['Astoria', 'Astoria-Midtown', 'Astoria-Midtown-LGA', 'UES', 'UES-Midtown']
)

№	zone	count	mean	std	25%	50%	75%	max
0	Astoria	169.0	0.141859	0.248959	0.000000	0.003992	0.211858	1.200064
1	Astoria-Midtown	79.0	0.257558	0.276845	0.032165	0.166450	0.420903	1.255381
2	Astoria-Midtown-LGA	116.0	1.783201	1.346923	0.696486	1.429303	2.829936	5.210652
3	UES	180.0	2.441603	1.805567	1.003623	2.101666	3.500081	10.291479
4	UES-Midtown	157.0	2.153052	1.526896	0.998387	2.040324	3.093651	6.243472

taxi.plot_zones_and_trips(
    ['Astoria'],
    pd.concat([trips[i] for i in ae_layout.index[ae_layout['zone_label'] == 'Astoria']]),
)

2.2 Comparing zones

The preceding table shows that ride aggregation is a much more efficient process in the Upper East Side (UES) than in Astoria, about

\begin{equation*} \frac{1 + 2.44}{1 + 0.14} = 3.02 \end{equation*}

times more efficient when comparing means.

A histogram shows that ride aggregation efficiency has a much greater range in UES.

ae_astoria = np.concatenate(
    [zae[i] for i in ae_layout.index[ae_layout['zone_label'] == 'Astoria']]
)
ae_ues = np.concatenate(
    [zae[i] for i in ae_layout.index[ae_layout['zone_label'] == 'UES']]
)

plt.hist(ae_astoria, alpha=0.5, label='Astoria')
plt.hist(ae_ues, alpha=0.5, label='UES')
plt.legend(loc='upper left')
plt.show()

taxi.plot_zones_and_trips(
    ['Upper East Side', taxi.get_manhattan_neighborhoods()],
    pd.concat([trips[i] for i in ae_layout.index[ae_layout['zone_label'] == 'UES']]),
)

2.3 Scope of service

Ride aggregation efficiency can function as a decision variable for determining scope of service. For example, ride aggregation efficiency is generally higher between Astoria and Manhattan than within Astoria, suggesting that interzone service could be feasible in this case.

taxi.plot_zones_and_trips(
    ['Astoria', 'Midtown', taxi.get_manhattan_neighborhoods()],
    pd.concat([trips[i] for i in ae_layout.index[ae_layout['zone_label'] == 'Astoria-Midtown']]),
)

2.4 Service scheduling

The following table of statistics suggests that ride aggregation efficiencies in the area of study are comparable between weekdays and weekends and moreover that efficiencies tend to increase over the course of the day. Since optimal service scheduling depends more heavily on ride demand than on ride aggregability, a scheduling solution would most likely involve modulating driver supply according diurnal demand variation.

def filter_stats_by_time(times, zone):
    d = {
	t: np.concatenate([
	    zae[i]
	    for i in ae_layout.index[
		(ae_layout['time_label'] == t)
		& (ae_layout['zone_label'] == zone)
	    ]
	])
	for t in times
    }
    return pd.DataFrame([
	pd.concat([pd.Series([t], index=['time']), pd.Series(d[t]).describe()])
	for t in times
    ])


filter_stats_by_time(
    [
	'Weekday morning',
	'Weekday afternoon',
	'Weekday evening',
	'Weekend morning',
	'Weekend afternoon',
	'Weekend evening',
    ],
    'Astoria',
)

№	time	count	mean	std	25%	50%	75%	max
0	Weekday morning	30.0	0.081083	0.175967	0.000000	0.000000	0.022029	0.692394
1	Weekday afternoon	30.0	0.127128	0.261474	0.000000	0.000019	0.153857	0.989229
2	Weekday evening	30.0	0.180803	0.205144	0.000000	0.139620	0.292863	0.781462
3	Weekend morning	19.0	0.063788	0.159064	0.000000	0.000000	0.023260	0.551952
4	Weekend afternoon	30.0	0.116931	0.231197	0.000000	0.005462	0.064922	0.882757
5	Weekend evening	30.0	0.252794	0.354266	0.005164	0.053167	0.440431	1.200064

Ride Aggregation Efficiency

1 Ex ante heuristic for ride aggregation efficiency

1.1 Example: (Roughly) overlapping rides

1.2 Example: Multiple partially overlapping rides

1.3 Example: Unaggregable rides

2 Simulation and analysis

2.1 Assess efficiency of aggregating rides within Astoria

2.2 Comparing zones

2.3 Scope of service

2.4 Service scheduling

Source code