1 Preparing Datasets

Datasets:

Dataset Aggregated travel times between every zone pair in the city
Time Period Q1 2018
City Delhi, Mumbai, Bangalore
Source Uber Movement
suppressPackageStartupMessages(library(tidyverse))
new_delhi_wards <- read.csv("Data/uber_delhi_ward_ids.csv", stringsAsFactors = FALSE)
inter_ward_times <- data.table::fread("Data/delhi_hod_times.csv", stringsAsFactors = F)

1.1 Preparing the distance components

Inter ward distance was calculated through QGIS using the the wards geojson file downloaded from the UBER Movement website for DELHI. Process was:

  • To first find the center of each polygon
  • To calculate a distance matrix from each polygon center to other for all ploygons
inter_ward_distance <- data.table::fread("Data/spatial/delhi_distance_matrix.csv", stringsAsFactors = F, data.table = FALSE)

1.2 Merging Distance and Time components

inter_ward_matrix <-
  dplyr::left_join(
    inter_ward_times,
    inter_ward_distance,
    by = c('sourceid' = 'InputID', 'dstid' = 'TargetID')
  )

1.3 Converting distance to kilometer

inter_ward_matrix$distance_km <- inter_ward_matrix$distance_metre/1000

2 Hourly Analysis of time taken and distance covered

2.1 Time per km ranges at every Hour of the day

The objective of this analysis is to focus on the congestion of vehicular activity on city routes. Since the current dataset from UBER gives us just the travel times, it is importatnt to normlaise it by the distance metric, thus focusing on time taken on covering every km of a route (assumption being that every km takes the equal amount of time, certainly not the case, but this should be good for an EDA) and treating it as a proxy of business of that route.

inter_ward_matrix$time_per_km <- inter_ward_matrix$mean_travel_time/inter_ward_matrix$distance_km

Let’s now look at the density plots for time per km at every hour of the day - This will help us see the range of times and check if the flow rate of city traffic is constant at every hour or varies across routes

Excluding the rides done in the same Movement zone, as the distance is not calculated for such cases.

Results:

  • There are high peaks from 0-6 hours as the congestion seems to be less at these points, so most rides are able to maintain a good flow rate, 75% of these rides cover a distance of a km in less than 150 seconds
  • The peaks turns a bit flat after 7 AM and the width increases, as the congestion increases, so is the variability with time.

2.2 Time taken by the majority

Let’s look at the 90th percentile of time taken per km at every hour

Observations:

Being rush office hours,

  • First peak is around 11-13, where TPK (Time per km) is close to 300s
  • Second peak is around 17-20 where the TPK goes almost till 350s

2.3 How far can you go in an hour

A prominent indicator of congestion in any city is the variation in average speed of vehicles over time. Distance travelled in an hour of time at various times of the day should be a good proxy to determine the flow rates of different routes or cities themselves. Having analysed Delhi’s traffic throughout the day and at diferent routes, lets compare it with other cities i.e Mumbai and Bangalore. For this analysis, we download the data from UBER Movement for these cities for a similar time period i.e. Q1 2018. Our objective here is to see how far we can travel in an hour at all hours (0 to 23), the results will help us compare the congestion rates in Delhi with these cities, let’s prepare the data and look at the results.

ggplot(hourly_spider, aes(x = city,y = distance_per_hour)) +
  geom_bar(alpha=.6, fill="#FF6666",stat = 'identity') +
  ylab('Distance per hour (In Km)') + xlab('City') +
  facet_wrap(~ hod, nrow=8) + theme_minimal()

Observations:

  • Bangalore looks the most congested of cities at all hours when compared with Delhi and Mumbai
  • Delhi is the least congested at night and Peak AM hours i.e from 10 PM till 8 AM
  • Mumbai is the least congested at peak hours i.e from 9 AM till 8 PM

Out of the three cities:

  • Maximum you can travel in an hour is 32.8 kms at 3 AM in Delhi
  • Minimum you can travel in an hour is 12.5 kms at 6 PM in Bangalore

Yes, Bangalore traffic looks way poor than Delhi or Mumbai.

Till now we were only working with hod(Hour of the day), now lets take the route into consideration as well

3 Busiest routes throughout the day (Top 10):

x <- inter_ward_matrix %>% filter(!is.na(time_per_km))%>% group_by(sourceid, dstid) %>%
  summarise(mean_tpk = mean(time_per_km))
x <- dplyr::left_join(x, inter_ward_distance, by = c('sourceid' = 'InputID', 'dstid' = 'TargetID'))
x <- x[order(-x$mean_tpk), ]
x[1:10,c(1,2,3,5)]

Shorter routes will tend to have a greater TPK, as for larger routes this metric gets time to decrease because of more occurrences of light congestion segments as compared to shorter routes.

Let’s look at the same table again but now between routes which are atleast greater than a km.

x <- x[x$distance_metre >= 1000, ]
x <- x[order(-x$mean_tpk), ]
x[1:10,c(1,2,3,5)]

Observations:

These are actually pretty congested areas -

  • Though shorted distances, the average time is ~15 mins
  • The time taken between these wards was verified on Google Maps and the results were pretty close
  • These areas lies in region with high population and high commercial activities, not so good roads and a heavy traffic at pretty much any time of the day

The satellite view - one of the busiest route (181 - 178) looks like this busy_routes

4 Congested Airport rides

The UBER Movement web UI is amazing, its good from a single user perspective, but if a city planner wants to have compare mutiple routes at once, then it can be a bit difficult. You are always tied to a source and a destination for certain analyses, and though the avergae time for every ward from a source can be identified by a map view, the same thing cannot be done for a destination.

Let’s fix our destination at the Delhi Airport Ward (There is no dedicated ward for this, so taking the closest one - Ward No 5 as a proxy) and look at average times (TPK) from every other ward

dst5 <- inter_ward_matrix[inter_ward_matrix$dstid==5,]
dst5 %>% group_by(sourceid) %>% summarise(mean_tpt = mean(time_per_km)) %>% top_n(5, mean_tpt)
# A tibble: 5 x 2
  sourceid mean_tpt
     <int>    <dbl>
1      102     320.
2      201     410.
3      202     351.
4      285     273.
5      288     272.

The results are same as above, short routes having a higher TPK, but the last two sources are interesting.

  • Both of them are over 10 Km’s.
  • Source 288 lies in between Source 255 and Airport
  • The route starts from the eastern parts of Delhi and ends at the Southern part (People coming to airport from these routes shoule definitely be wary of this fact)

5 Longest Journeys (in terms of time taken to complete) in a city

library(knitr)
inter_ward_matrix[is.na(inter_ward_matrix)] <- ''
inter_ward_matrix %>% top_n(10, mean_travel_time) %>% select(sourceid, dstid, hod, mean_travel_time,distance_km,time_per_km)  %>% arrange(desc(mean_travel_time)) %>% kable()


| sourceid| dstid| hod| mean_travel_time|distance_km      |time_per_km      |
|--------:|-----:|---:|----------------:|:----------------|:----------------|
|        5|     5|  21|         10107.75|                 |                 |
|      193|   193|  21|          9314.33|                 |                 |
|        3|     3|  21|          9141.33|                 |                 |
|      223|   223|  21|          9111.50|                 |                 |
|       34|    32|  18|          8968.17|32.7232962606117 |274.060715906386 |
|      119|    32|  18|          8690.43|32.9882471123653 |263.440187361228 |
|       89|   173|  18|          8503.17|30.4132457825518 |279.58771848279  |
|       40|    32|  18|          8179.67|30.722681697032  |266.242057925243 |
|      108|   163|  18|          8137.57|26.9138651818746 |302.356051240099 |
|      191|   284|  18|          8094.67|29.0002459934917 |279.124184043702 |

These are the routes with the higest travel times in Delhi. Some observations:

  • As observed in the earlier analysis as well, 21 and 18 are the hours which contribute to the highest times, congestion is at its peak during these times.
  • We can eliminate distance as a factor specifically in routes (Top 4 in this case) which are in the same ward. Ward 5, 3 and 223 are all closest to the airport and therefore have the highest journey times at these hours, definitely something for the traffic authorities to look at
  • Route No 2 (193) is all contained within Shahdara, which is indeed considered to be one of the most congested areas in Delhi, ride times within the region can go upto 1 and a half hours
  • Ward 32 (METRO MALL, Pocket 1, Sector 14 Dwarka, Dwarka, New Delhi) - There are 3 routes in the top 10 with this as the destination. But one thing in common among these routes is the ride distance which is more than 30 KM’s but still there TPK is close to 4.5 minutes (15 Km/hour), which is on the slower side

6 Time deviations on the same route

We know that there are peak times in a day when the travel time increases and usually it increases by around 15-20 mins and thats what the data says as well.

route_deviations_df <- readRDS("Data/route_devialtions.rds")
route_deviations_df <- route_deviations_df[!is.na(route_deviations_df$route_deviation), ]
summary(route_deviations_df$route_deviation)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   1.633  385.871  579.841  592.093  773.665 5305.606 

The mean time deviation for more than 80% rides is close to 13 mins (820 s)

But the max deviation for a route is more than 5000 s (almost an hour and 20 mins). This is where we want to focus our analysis on, so let;s find out those 0.5 percentile of routes (if any), where the deviation is more than 30 mins (1800 s)



| route_start| route_end| route_deviation|
|-----------:|---------:|---------------:|
|         133|       133|        5305.606|
|         228|       228|        5080.668|
|         166|       166|        4975.585|
|         264|       264|        4972.658|
|         219|       219|        4953.778|
|         226|       226|        4899.359|
|         121|       121|        4803.214|
|           5|         5|        4732.191|
|         161|       161|        4726.952|
|          12|        12|        4603.527|

So, there are a total of 111 such routes. Let’s see if we could find some patterns here:

  • Looking at the dataset for the route with most deviation(133), we found that there are only 2 rows in the base dataset, which means that the data is available for only a coupe of hours. This won’t be a fair comparison with other routes where the data is avialable for all 24 hours. So ;et’s remove them from this dataset and look at these observartions again
number_instances <-
  inter_ward_matrix %>% group_by(sourceid, dstid) %>% summarise(total_hod = length(hod))
  
  route_deviations_df <-
  dplyr::left_join(
  route_deviations_df,
  number_instances,
  by = c('route_start' = 'sourceid', 'route_end' = 'dstid')
  )
  
  route_deviations_df %>% filter(route_deviation > 1800 &
  total_hod > 6) %>% arrange(desc(route_deviation)) %>% kable()


| route_start| route_end| route_deviation| total_hod|
|-----------:|---------:|---------------:|---------:|
|         119|       140|        1892.420|         9|
|          40|        32|        1862.699|         8|
|         192|       139|        1841.018|         7|
|          41|        32|        1803.767|        11|

We now have 4 routes with atleast 6 observations. Some observations:

  • These routes are affected the most due to congestion as the same ride will take almost the double amount of time to complete in peak hours
  • All of them have a deviation of close to 30 mins
ggplot(data = inter_ward_matrix[inter_ward_matrix$sourceid== 119 & inter_ward_matrix$dstid == 140,], aes(hod,mean_travel_time)) +  geom_point(shape = 16, size = 5) + xlab('Hour of the day') + ylab('Mean travel time (in seconds)') + theme_minimal()

  • These are longer routes (~25-30 Km’s)
  • If you’ll observe the scatter plot, these routes suffer the most between 3 PM till 9 PM
  • These routes can be termed as end to end routes, as they start and end mostly at the points which lie on the perimeter of Delhi. Most of these routes start from the Eastern part and end at the Western parts, moving through Central Delhi.
route_deviations

route_deviations

