-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAnalysis of Bicyle Thefts in Toronto.Rmd
681 lines (570 loc) · 29.3 KB
/
Analysis of Bicyle Thefts in Toronto.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
---
title: "How to Avoid Losing Your Bike to Thieves in Toronto"
author: "Zeeshan Gazi and Khushil Nagda"
date: "`r format(Sys.Date(), '%B %d, %Y')`" # This will output the current date
output: pdf_document
fontsize: 12pt
geometry: margin=1in
header-includes:
- \usepackage{booktabs}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE)
## IMPORTING ALL THE RELEVANT LIBRARIES
library(ggplot2)
library(tidyverse)
library(dplyr)
library(ggtext)
library(ggimage)
library(knitr)
library(gt)
#library(sf)
#library(RColorBrewer)
#library(readr)
#library(lubridate)
```
```{r readfile, echo=FALSE}
## READING THE DATA
bicycle_thefts <- read_csv("Bicycle_Thefts_Open_Data.csv", show_col_types = FALSE)
```
# What happened to bikes in Toronto from 2014-2023?
Bike theft is a major issue in Toronto. It is not uncommon to see a bicycle
wheel or lock just hanging on a sidewalk railing as you walk through the city.
After all, all one needs to steal a bike is a bike cutter, hacksaw, or even an
angle grinder which can be easily and discreetly stored in a backpack. Numerous
victims report stolen bikes every year but there is not much being done to
combat these crimes. Though the punishment of the thefts can mean time in
prison, the amount of money that can be made from selling these stolen bikes
and their parts are deemed to be worth it by these thieves.
The goal of this article is to delve deeper into the factors which impact the
risk of bicycle theft in Toronto. These factors include: month, day of the
week, time of day, location of bike, neighborhood, bike make, type, and
lastly average price. The data that will be utilized is from the Toronto
Police Service Public Safety Data Portal. This data contains over 34000
recorded bike thefts in the city of Toronto from 2014 to 2023, along with
corresponding information. This article will break down this data into an
informative analysis which will hopefully provide some meaningful insights to
Toronto cyclists and those who are interested in purchasing a bike later on.
```{r toppremises, echo=FALSE}
bicycle_thefts$REPORT_MONTH <- as.character(bicycle_thefts$REPORT_MONTH)
thefts_by_month_premise <- bicycle_thefts %>%
group_by(OCC_MONTH, PREMISES_TYPE) %>%
summarise(count = n(), .groups = 'drop') %>%
mutate(MONTH = factor(OCC_MONTH, levels = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")))
## Adding a season column to the data frame
thefts_by_month_premise <- thefts_by_month_premise %>%
mutate(SEASON = case_when(
OCC_MONTH %in% c("December", "January", "February") ~ "Winter",
OCC_MONTH %in% c("March", "April", "May") ~ "Spring",
OCC_MONTH %in% c("June", "July", "August") ~ "Summer",
OCC_MONTH %in% c("September", "October", "November") ~ "Fall",
TRUE ~ NA_character_
))
```
```{r toppremise, echo=FALSE}
# EXTRACTING THE TOP PREMISES IN THE DATASET
top_premises <- bicycle_thefts %>%
count(PREMISES_TYPE) %>%
top_n(3, wt = n) %>%
pull(PREMISES_TYPE)
# thefts_by_month_premise <- thefts_by_month_premise %>%
# mutate(Highlight = ifelse(PREMISES_TYPE %in% top_premises, PREMISES_TYPE, "Other"))
thefts_by_month_premise_top_3 <- thefts_by_month_premise %>%
filter(PREMISES_TYPE %in% top_premises)
# Reorder PREMISES_TYPE within thefts_by_month_premise_top_3 based on count
thefts_by_month_premise_top_3 <- thefts_by_month_premise_top_3 %>%
mutate(PREMISES_TYPE = fct_reorder(PREMISES_TYPE, count))
```
```{r topmonth2, echo=FALSE, fig.width=10, fig.height=6}
# Stacked bar plot
ggplot(thefts_by_month_premise, aes(x = MONTH, y = count, fill = PREMISES_TYPE)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Set3") +
labs(title = "Where Thefts Occur Each Month",
x = "Month",
y = "Count",
fill = "Premise Type") +
theme_minimal() +
theme(axis.title.x = element_text(face = "bold", family = "Helvetica", size = 18),
axis.title.y = element_text(face = "bold", family = "Helvetica", size = 18),
plot.title = element_text(face = "bold", family = "Helvetica", size = 18, hjust=0.5),
axis.text.x = element_text(face = "bold", angle = 45, hjust = 1, family = "Helvetica", size = 14),
axis.text.y = element_text(family = "Helvetica", size = 14))
```
```{r topmonth, echo=FALSE, fig.width=10, fig.height=6}
## Displaying the same stacked barchart but with only the top 3 locations now
ggplot(thefts_by_month_premise_top_3, aes(x = MONTH, y = count, fill = PREMISES_TYPE)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c(setNames(c("#7ee0d2", "#f27168", "#fab941"), top_premises))) +
geom_text(aes(label = count), position = position_stack(vjust = 0.5), color = "white", size = 3.5) +
labs(title = "Closer look at Top 3 Theft Locations Each Month",
x = "Month",
y = "Count",
fill = "") +
theme_minimal() +
theme(axis.title.x = element_text(face = "bold", family = "Helvetica", size = 18),
axis.title.y = element_text(face = "bold", family = "Helvetica", size = 18),
plot.title = element_text(face = "bold", family = "Helvetica", size = 18, hjust=0.5),
axis.text.x = element_text(face = "bold", angle = 45, hjust = 1, family = "Helvetica", size = 14),
axis.text.y = element_text(family = "Helvetica", size = 14),
legend.text = element_text(size = 14),
legend.position = "top")
```
```{r topseason, echo=FALSE}
# Calculate yearly count per premises type
yearly_thefts_by_premises <- thefts_by_month_premise_top_3 %>%
group_by(PREMISES_TYPE) %>%
summarise(yearly_count = sum(count), .groups = 'drop')
# Calculate seasonal count and then join with yearly count
thefts_percentage_by_season <- thefts_by_month_premise_top_3 %>%
group_by(PREMISES_TYPE, SEASON) %>%
summarise(seasonal_count = sum(count), .groups = 'drop') %>%
left_join(yearly_thefts_by_premises, by = "PREMISES_TYPE") %>%
mutate(`% of Yearly Count` = round(seasonal_count / yearly_count, 2))
thefts_percentage_by_season_table <- gt(thefts_percentage_by_season) %>%
tab_header(
title = "Seasonal Theft Counts of Top Three Locations"
) %>%
cols_label(
PREMISES_TYPE = "Premises Type",
SEASON = "Season",
seasonal_count = "Seasonal Count",
yearly_count = "Yearly Count",
`% of Yearly Count` = "% of Yearly Count"
) %>%
fmt_percent(
columns = vars(`% of Yearly Count`),
decimals = 2
) %>%
tab_style(
style = cell_fill(color = "gray95"),
locations = cells_body()
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
) %>%
tab_style(
style = cell_text(weight = "bold", color = "black"), # Make text bold and change color to blue for Summer rows
locations = cells_body(
rows = SEASON == "Summer"
)
)
# Display the table
thefts_percentage_by_season_table
```
# Insight 1: Which Months and Where do Thefts Occur the Most?
Most thefts occur at residences, specifically apartments, and outside, and on
the other hand the least thefts occur on the transit. Transit being the least
common theft location makes sense since cyclists are most likely keeping an eye
on their property such that it would not be stolen. However in apartments as
well as outside, bikes are probably stolen because no one is keeping an eye.
In apartments, there are bike storage which can be found in the parking lot
inside or outside of the building. Once thieves are able to access these
locations they can choose whatever bike they like since there are going to be
more options than at a home. As long as the lock of the bike is breakable by a
lock cutter, any bike is prone to be stolen. This also applies to bikes stored
outside. If there is no one nearby during the weekdays, it is easy to break the
lock and ride away quickly without being noticed.
Another insight that can be taken from this graph is that thefts most commonly
occur during July and the hotter, summer season. During the early spring, late
fall, and winter, people do not tend to ride their bikes due to the weather,
thus they keep it indoors, locked and properly stored. However, as the weather
gets better, people start to take their bikes out more often and will tend to
leave them outside when they are not at home. This causes thefts to be more easy
and common.
Nonetheless, in order for the crime to go unnoticed this theft should occur
during the afternoon and during the weekdays when everyone is busy and are not
paying attention to their bikes whereabouts. The next section will delve deeper
into this topic.
```{r, fig.width=8, fig.height=6, echo=FALSE}
# Calculate the number of thefts per day of the week
thefts_by_day_of_week <- bicycle_thefts %>%
group_by(OCC_DOW) %>%
summarise(count = n(), .groups = 'drop') %>%
# Shorten the days of the week
mutate(OCC_DOW = factor(OCC_DOW,
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
labels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))) %>%
arrange(count) %>%
mutate(OCC_DOW = factor(OCC_DOW, levels = unique(OCC_DOW)))
# Generate the horizontal bar plot with arranged order
ggplot(thefts_by_day_of_week, aes(x = OCC_DOW, y = count, fill = OCC_DOW)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("Sun" = "#4B0082", # Golden
"Mon" = "#fc0a67", # Light Salmon
"Tue" = "#f5c15b", # Salmon
"Wed" = "#f5935b", # OrangeRed
"Thu" = "#f04c1f", # Crimson
"Fri" = "#e8ca07", # Maroon
"Sat" = "#fc0ad4")) + # Indigo
geom_text(data = subset(thefts_by_day_of_week, OCC_DOW %in% c("Fri", "Sun")),
aes(label = count), hjust=2.0,vjust = 0.5, color = "white", size = 5) +
labs(title = "Bikes Stolen each Day | Highest to Lowest",
x = "",
y = "") +
theme_minimal() +
theme(axis.title.x = element_text(face = "bold", size = 20, color="white"),
axis.title.y = element_text(face = "bold", size = 14),
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, color="white"),
plot.background = element_rect(fill = "black", colour = "black"), # Set plot background to black
axis.text.x = element_text(face = "bold", size = 16, color="#e0e0e0"),
axis.text.y = element_text(face = "bold", size = 16, color="#e0e0e0"),
legend.position = "none", # Remove legend
panel.grid.major = element_blank(), # Removes major grid lines
panel.grid.minor = element_blank()) + # Removes minor grid lines
coord_flip() + # Makes the plot horizontal
geom_image(data = thefts_by_day_of_week, aes(x = OCC_DOW, y = count, image = "bike-128.ico"),size = 0.09, hjust = 0) # Adjust size and position as needed
#aes(label = count), position = position_stack(vjust = 0.5), color = "white", size = 3.5
```
# Insight 2: Which Day do Thefts Occur the Most?
Given this graph, it is evident that the weekdays experience the most thefts,
specifically Friday while the weekend experiences the least thefts,
specifically Sunday. Though there is a difference present between the weekdays
and weekends, the difference between Saturday and Sunday is quite small, as
well as the difference between Wednesday and Thursday for example. Therefore,
it is important to understand what differentiates the weekend and the weekdays.
One reason could be because people are at home or are not busy during the
weekends, thus thieves do not want to be caught by people walking around
whether at home or outside. During the weekdays, people are most likely to be
at work and busy. As a result, there are less people paying attention to their
bikes and less people catching a thief stealing a bike as well. In addition, as
previously mentioned the thieves are most likely to attack during the afternoon,
the next section will determine whether or not this is true.
```{r toptimeofday, fig.width=12, fig.height=6, echo=FALSE}
bicycle_thefts_data <- read_csv("Bicycle_Thefts_Open_Data.csv", show_col_types = FALSE)
bicycle_thefts_data <- bicycle_thefts_data %>%
filter(OCC_YEAR >= 2014 & OCC_YEAR <= 2023) %>%
mutate(part_of_day = case_when(
OCC_HOUR >= 0 & OCC_HOUR < 6 ~ "Night (0:00-6:00)",
OCC_HOUR >= 6 & OCC_HOUR < 12 ~ "Morning (6:00-12:00)",
OCC_HOUR >= 12 & OCC_HOUR < 18 ~ "Afternoon (12:00-18:00)",
OCC_HOUR >= 18 & OCC_HOUR < 24 ~ "Evening (18:00-24:00)"
)) %>%
group_by(OCC_YEAR, part_of_day) %>%
summarise(count = n(), .groups = 'drop')
ggplot(bicycle_thefts_data, aes(x = OCC_YEAR, y = count, fill = part_of_day)) +
geom_area(position = 'stack') +
scale_fill_manual(
values = c("Morning (6:00-12:00)" = "#c7eaf2",
"Afternoon (12:00-18:00)" = "#0a9cfc",
"Evening (18:00-24:00)" = "#a00afc",
"Night (0:00-6:00)" = "#eec9f5"),
labels = c("Morning (6:00-12:00)" = "<span style='color:#c7eaf2;'>Morning (6am to 12pm)</span>",
"Afternoon (12:00-18:00)" = "<span style='color:#0a9cfc;'>Afternoon (12pm to 6pm)</span>",
"Evening (18:00-24:00)" = "<span style='color:#a00afc;'>Evening (6pm to 12am)</span>",
"Night (0:00-6:00)" = "<span style='color:#eec9f5;'>Night (12am to 6am)</span>")
) +
labs(title = "What Time of Day do Thefts Occur?", x = "Year", y = "Count", fill = "Part of Day") +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "black", colour = NA),
legend.background = element_rect(fill = "black", colour = NA),
axis.title.x = element_text(face = "bold", family = "Helvetica", size =24, color="white",),
axis.title.y = element_text(face = "bold", family = "Helvetica", size = 18, color="white"),
plot.title = element_text(face = "bold", family = "Helvetica", size = 18, hjust = 0.5,color="white"),
axis.text.x = element_text(hjust = 1, family = "Helvetica", size = 12,color="white"),
axis.text.y = element_text(face = "bold", family = "Helvetica", size = 12,color="white"),
legend.text = element_markdown(size=12, face="bold"),
legend.position = "top",
panel.grid.major = element_line(colour = "black"),
panel.grid.minor = element_line(colour = "black"),
panel.ontop = TRUE)
```
```{r toptime, echo=FALSE}
# Read the data from the CSV file
bicycle_thefts <- read_csv("Bicycle_Thefts_Open_Data.csv", show_col_types = FALSE)
# Preprocess the data
bicycle_thefts_data <- bicycle_thefts %>%
filter(OCC_YEAR == 2023) %>%
mutate(part_of_day = case_when(
OCC_HOUR >= 0 & OCC_HOUR < 6 ~ "Night (0:00-6:00)",
OCC_HOUR >= 6 & OCC_HOUR < 12 ~ "Morning (6:00-12:00)",
OCC_HOUR >= 12 & OCC_HOUR < 18 ~ "Afternoon (12:00-18:00)",
OCC_HOUR >= 18 & OCC_HOUR < 24 ~ "Evening (18:00-24:00)"
)) %>%
group_by(OCC_YEAR, part_of_day) %>%
summarise(count = n(), .groups = 'drop')
# Now, create the gt table
library(gt)
nice_table <- gt(bicycle_thefts_data) %>%
tab_header(
title = "Bicycle Thefts by Part of Day and Year",
subtitle = "Data from 2014 to 2023"
) %>%
cols_label(
OCC_YEAR = "Year",
part_of_day = "Part of Day",
count = "Theft Count"
) %>%
fmt_number(
columns = vars(count),
decimals = 0
) %>%
tab_style(
style = cell_fill(color = "lightgray"),
locations = cells_body()
) %>%
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels()
)
# Display the table
nice_table
```
# Insight 2b: Do Thieves Prefer a Certain Time of Day?
In 2023, thefts were considered to be similarly probable during the evening
(18:00-24:00) and during the afternoon (12:00-18:00), as well as similarly
probable during the night (0:00-6:00) and morning (6:00-12:00). These parts of
the day can also be grouped by times when there is daylight (morning and
afternoon) and where there is no daylight (night and evening). Given these two
groupings, the count of thefts seems to be quite similar, indicating that there
might not be specific preference by these thieves. Nonetheless, in 2023 the
afternoon period experienced the most thefts as was predicted due to most people
being occupied during those times.
Another insight that is provided by this visualization is the number of thefts
that occur per year. As of 2023, thefts have considerably decreased since the
past 9 years. This is a positive sign for Toronto as it allows cyclists to be
comfortable in purchasing and/or storing their bikes. The late 2010s
experienced the most thefts which also represents the time during the pandemic.
The reason for this could be because a lot of people decided to stay at home
during the lockdowns and thus thieves took this as an opportunity to steal more
since there would be less people outside to catch them.
Given these details, it is important to consider the neighborhoods in which
these crimes occurred since it is evident that the presence of daylight does not
really affect the probability of thefts. The next section will provide some
more information on this matter.
```{r, fig.width=11, fig.height=8, echo=FALSE}
# Count thefts per neighbourhood per year
thefts_per_neighbourhood_year <- bicycle_thefts %>%
group_by(NEIGHBOURHOOD_158, REPORT_YEAR) %>%
summarise(count = n(), .groups = 'drop')
# Calculate average thefts per neighborhood across all years
avg_thefts_per_neighbourhood <- thefts_per_neighbourhood_year %>%
group_by(NEIGHBOURHOOD_158) %>%
summarise(avg_count = mean(count), .groups = 'drop') %>%
arrange(desc(avg_count)) %>%
slice_max(order_by = avg_count, n = 30) # Get the top 30 neighbourhoods by average theft count
# Calculate the min and max thefts per neighbourhood across all years
min_max_thefts_per_neighbourhood <- thefts_per_neighbourhood_year %>%
group_by(NEIGHBOURHOOD_158) %>%
summarise(
min_theft = min(count),
max_theft = max(count),
.groups = 'drop'
)
# Join the average thefts with their min and max to get the top neighbourhoods' range
top_thefts_stats <- avg_thefts_per_neighbourhood %>%
inner_join(min_max_thefts_per_neighbourhood, by = "NEIGHBOURHOOD_158")
# Calculate overall mean for the vertical line across all neighborhoods
overall_mean <- mean(avg_thefts_per_neighbourhood$avg_count)
# Create the lollipop chart with min and max values on the sticks
ggplot(top_thefts_stats, aes(x = reorder(NEIGHBOURHOOD_158, avg_count), y = avg_count)) +
geom_segment(aes(xend = NEIGHBOURHOOD_158, yend = min_theft), color = "#f56433", size=2) +
geom_segment(aes(xend = NEIGHBOURHOOD_158, yend = max_theft), color = "#57cfff", size=2) +
geom_point(size = 5) + # Head of the lollipop (average count)
geom_hline(yintercept = 75, color = "black", size=1.0) +
annotate("text", x = Inf, y = 75, label = "Mean Thefts", hjust = 0.47, vjust = 0.8, color = "black", size = 4, fontface='bold') +
coord_flip() + # Flip coordinates to get a horizontal lollipop chart
theme_minimal() +
labs(title = "Average Thefts per Neighbourhood each Year | 2014 to 2023",
x = "Neighbourhood",
y = "Count") +
theme(axis.title.x = element_text(face = "bold", size = 18, color="black"),
axis.title.y = element_text(face = "bold", size = 18, color="black"),
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, color="black"),
plot.background = element_rect(fill = "#fcf7f2", colour = "white"), # Set plot background to black
axis.text.x = element_text(face = "bold", size = 12, color="black"),
axis.text.y = element_text(face = "bold", size = 10, color="black"),
panel.grid.major = element_line(size = 1, color = "white"), # Adjust major grid line size
panel.grid.minor = element_line(size = 1, color = "white"))
```
# Insight 3: Where do Thieves Attack the Most?
Given this visualization it is evident that the Yonge-Bay Corridor experiences
the most amount of thefts in Toronto, followed by Downtown Yonge East,
Wellington Place, St.Lawrence-East Bayfront-The Islands, and Annex. What is
common among these neighborhoods is the fact that they are all located in the
south of the city in the heart of Downtown Toronto. All of these places are
quite close to popular attractions including the CN Tower and Harbourfront.
It is quite surprising that these locations are vulnerable to these crimes given
the fact that these regions contain very high-end restaurants, hotels, and large
corporation offices. However, since these areas are considered to be “rich” as a
result of the high-end places in the neighborhoods, that could also suggest
that the people also possess quite some money to spend on things such as bikes.
Thus, thieves might consider these neighborhoods as ideal spots to steal
high-end bikes as well. The next section will verify whether or not the bikes
stolen are actually high value or not.
```{r topmake, echo=FALSE}
# Read the data from the CSV file
bicycle_thefts_data <- read_csv("Bicycle_Thefts_Open_Data.csv", show_col_types = FALSE)
# Consolidate "OT" into "OTHER" for BIKE_MAKE
bicycle_thefts_data$BIKE_MAKE <- ifelse(bicycle_thefts_data$BIKE_MAKE == "OT", "OTHER", bicycle_thefts_data$BIKE_MAKE)
bicycle_thefts_data$BIKE_MAKE <- dplyr::recode(bicycle_thefts_data$BIKE_MAKE,
"GI" = "GIANT",
"NO" = "NORCO",
"TR" = "TREK",
"UK" = "UNKNOWN",
.default = bicycle_thefts_data$BIKE_MAKE)
# Filter out records without bike cost and then compute the necessary aggregations
bike_make_stats <- bicycle_thefts_data %>%
filter(!is.na(BIKE_COST) & BIKE_COST > 0) %>%
group_by(BIKE_MAKE) %>%
summarise(
Count = n(),
Avg_Cost = mean(BIKE_COST, na.rm = TRUE)
) %>%
arrange(desc(Count)) %>%
top_n(5, Count)
# Plot the data with modified aesthetics
ggplot(bike_make_stats, aes(y = Avg_Cost, x = BIKE_MAKE)) +
geom_point(aes(size = Count, color = BIKE_MAKE), alpha = 0.7) +
geom_text(aes(label = Count), vjust = 0.5, color = "white", size = 3) + # Adjust text size and position
scale_size_area(max_size = 30, name = "Count", breaks = bike_make_stats$Count) + # Make bubbles bigger
scale_color_manual(values = c("black", "black", "black", "black", "black"), guide = FALSE) + # Remove legend for make
scale_y_continuous(breaks = seq(500, 1500, by = 100), limits = c(500, 1500)) + # Adjust y-axis limits
theme_minimal() +
labs(title = "Top 5 Stolen Bike Makes and Their Average Costs",
y = "Average Cost ($)",
x = "Bike Make",
size = "Number of Thefts") +
theme(legend.position = "bottom")
```
# Insight 4: What is the Average Cost and Make of the Bikes Stolen?
The most stolen bike brand is GIANT, of which 2660 of them were stolen. Among
GIANT were other brands including NORCO and TREK which also experienced a
considerable amount of thefts. One thing to note is the average cost of these
bikes were all in a similar range of around $1000. To put this into
context, one of the cheapest bikes stolen was $50, and one of the most expensive
bikes stolen was $10000. Therefore, on average the thieves are stealing bikes
that are more expensive, which makes sense given the neighborhoods in which they
are mostly being stolen from. However, it might also be important to note the
type of bikes that are being stolen which will be explained in the next section.
```{r topbrand, echo=FALSE, message=FALSE}
# Read the data from the CSV file
bicycle_thefts_data <- read_csv("Bicycle_Thefts_Open_Data.csv", show_col_types = FALSE)
# Recode BIKE_TYPE abbreviations to full names
bicycle_thefts_data$BIKE_TYPE <- recode(bicycle_thefts_data$BIKE_TYPE,
"MT" = "MOUNTAIN",
"EL" = "ELECTRIC",
"OT" = "OTHER",
"RC" = "RACER",
"RG" = "REGULAR",
.default = bicycle_thefts_data$BIKE_TYPE)
# Group and summarize the data by BIKE_TYPE, then calculate the count
bike_type_stats <- bicycle_thefts_data %>%
group_by(BIKE_TYPE) %>%
summarise(Count = n()) %>%
arrange(desc(Count)) %>%
top_n(5, Count) # Select the top 5 bike types based on theft count
# Create the bar chart
ggplot(bike_type_stats, aes(x = BIKE_TYPE, y = Count)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Top 5 Bike Types by Theft Count",
x = "Bike Type",
y = "Count of Thefts") +
theme_minimal() +
theme(legend.position = "none") # Hide the legend since the fill color is uniform
```
# Insight 5: What Type of Bikes do Thieves Prefer?
Thieves have most commonly targeted MOUNTAIN and REGULAR bikes compared to
other types such as ELECTRIC and RACER. This makes sense given the previous
graph which showed that the average cost of the bikes stolen was around $1000.
The prices of electric and racer bikes are much more than $1000, thus it would
not make sense for a lot of them to be stolen. Even though thieves might want
to take a more expensive item, this is not always the case. One of the
reasons is because the thieves do not always keep the bikes they steal. Instead
the bikes are sold, or taken apart to sell in pieces. The odds of finding
customers for pricey bikes are not common, thus it is ideal to steal bikes that
are decently priced such that there are more customers willing to buy, and the
thieves are able to make a good amount of money.
```{r pichart, echo=FALSE, message=FALSE}
# Read the dataset
bicycle_thefts <- read_csv("Bicycle_Thefts_Open_Data.csv")
# Calculate the number of thefts by status
thefts_by_status <- bicycle_thefts %>%
group_by(STATUS) %>%
summarise(count = n())
# Create the pie chart
ggplot(thefts_by_status, aes(x = "", y = count, fill = STATUS)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
theme_void() +
labs(title = "The Condition of Recovery") +
theme(legend.title = element_blank()) +
scale_fill_manual(values = c("RECOVERED" = "#c7f28a", "STOLEN" = "#f27868", "UNKNOWN" = "black"))
```
```{r, echo=FALSE}
# Read the data from the CSV file
bicycle_thefts_data <- read_csv("Bicycle_Thefts_Open_Data.csv", show_col_types = FALSE)
# Calculate the total count and average bike cost by status, ensuring bike cost is present and positive
average_cost_by_status <- bicycle_thefts_data %>%
filter(!is.na(BIKE_COST) & BIKE_COST > 0) %>%
group_by(STATUS) %>%
summarise(
Total_Count = n(),
Average_Cost = mean(BIKE_COST, na.rm = TRUE)
) %>%
ungroup() %>%
mutate(
Percentage = Total_Count / sum(Total_Count) * 100
)
# Create the gt table and format it
average_cost_by_status %>%
gt() %>%
tab_header(title = "Average Bike Cost by Status") %>%
fmt_currency(
columns = vars(Average_Cost),
currency = "CAD"
) %>%
fmt_number(
columns = vars(Percentage),
decimals = 2,
suffixing = TRUE
) %>%
cols_label(
STATUS = "Status",
Total_Count = "Total Count",
Average_Cost = "Average Cost (CAD)",
Percentage = "Percentage (%)"
)
```
# Summary of Findings
To end off, this chart highlights the fact that most bikes that are stolen are
never recovered. Assumptions such as very expensive bikes are more likely to be
recovered due to the high value or the presence of a tracker can be made, yet
further analysis shows the average price of stolen, recovered, and unknown
status bikes are all relatively the same. Therefore, it is important to gather
other information about these thefts such that proactive actions can be taken
to avoid these crimes. This article has gathered the following information
about the most common type of bike thefts from 2014-2023:
Day: Friday (Weekdays) \
Month: July (Summer Season) \
Time of Day: Anytime \
Premises type: Apartment and Outside \
Neighborhood: Yonge-Bay Corridor (Downtown Toronto) \
\newpage
Make: GIANT \
Type: Mountain \
Average Cost: $1086 \
Given these characteristics of the most common bike thefts, cyclists should
take the necessary precautions to avoid these crimes. This could mean buying a
better lock, or just being weary of the location of where the bike is stored and
its security.
It is important for everyone to look into this data as cycling is a great way
of commuting. It avoids using cars and public transit which are contributors to
pollution and as the planet is being cared for, cyclists are able to enjoy some
good exercise and fresh air while riding. The goal of this article is to not
prevent people from buying bicycles but to promote an idea of increased
awareness when storing and securing them.
To end off, even though this article provides a lot of useful information from
the data provided by the Toronto Police Service, there is much more extra data
that can be used to make this analysis more holistic. For example, analyzing
the types of locks that were used to secure the bike can give more insights as
to how easy or difficult the theft is, and how long it takes. The presence of
security cameras could also play a role in the thieves inclination to commit a
crime. If there are multiple thieves involved this could prevent bystanders
from intervening. Thus, cyclists can do more research into finding out the best
ways to stay safe from these unfortunate crimes.
Data: https://data.torontopolice.on.ca/pages/bicycle-thefts