forked from hcorrada/IntroDataSciBaltimore
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbaltimore.Rmd
executable file
·519 lines (381 loc) · 20.7 KB
/
baltimore.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
A quick analysis of Baltimore crime
========================================================
```{r setup, echo=FALSE}
knitr::opts_chunk$set(cache=TRUE)
```
I'm going to do a very simple analysis of Baltimore crime to show off R. We'll use data downloaded from Baltimore City's awesome open data site (this was downloaded a couple of years ago so if you download now, you will get different results).
### Getting data
* Arrest data: https://data.baltimorecity.gov/Crime/BPD-Arrests/3i3v-ibrt
* CCTV data: https://data.baltimorecity.gov/Crime/CCTV-Locations/hdyb-27ak
Let's load the data:
```{r}
arrest_tab=read.csv("BPD_Arrests.csv", stringsAsFactors=FALSE)
cctv_tab=read.csv("CCTV_Locations.csv", stringsAsFactors=FALSE)
# these columns are mislabeled, so fix them
tmp=arrest_tab$sex
arrest_tab$sex=arrest_tab$race
arrest_tab$race=tmp
```
### Exploring data
```{r}
# dimension of table (data.frame)
dim(arrest_tab)
# what are the columns
names(arrest_tab)
# what is the average arrest age?
mean(arrest_tab$age)
# the range of arrest ages
range(arrest_tab$age)
# how many arrests per sex
table(arrest_tab$sex)
# what are the most common offenses
head(sort(table(arrest_tab$incidentOffense),decreasing=TRUE))
# what are the offenses that only happen once
tab <- table(arrest_tab$incidentOffense)
tab[tab == 1]
# range of arrests after removing those w/ age==0
range(arrest_tab$age[arrest_tab$age>0])
```
Offenses by sex
```{r}
tab <- table(arrest_tab$incidentOffense, arrest_tab$sex)
```
Let's see a table of arrests by sex and race
```{r}
table(sex=arrest_tab$sex,race=arrest_tab$race)
```
A histogram of age
```{r}
hist(arrest_tab$age,nc=100)
with(arrest_tab,hist(age[sex=="M"],nc=100)) # males only
with(arrest_tab,hist(age[sex=="F"],nc=100)) # females only
```
### Are males and females arrested at different ages on average?
Let's take a look at how age depends on sex. Let's plot age as a function of sex first (notice how we indicate that sex is a `factor`).
```{r}
plot(arrest_tab$age~factor(arrest_tab$sex))
```
One of the neat things about R is that statistical model building and testing is built-in. The model we use is $y_i=\beta_0+\beta_1 x_i$ where $y_i$ is age of sample (example) $i$ and $x_i$ is an indicator variable $x_i \in \{0,1\}$ with $x_i=1$ if the $i$-th record (example) is male. You can check that $\beta_1$ is the difference in mean age between females and males.
We use the formula syntax to build a linear regression model.
```{r}
# let's ignore those records with missing sex
fit=lm(age~factor(sex),data=arrest_tab,subset=arrest_tab$sex %in% c("M","F"))
summary(fit)
```
We see that $\beta_1 \approx -0.2$ meaning that the arrest age for males is about 2.5 months younger. So there is very little difference in the average age (which is what the linear model is testing) but we see that the probability of observing this difference from a sample of this size **when there is no difference in average age** is small $p \approx 0.01$. Since we have a very large number of examples, or records, this testing framework will declare very small differences as *statistically significant*. We'll return to this theme later in class.
### Geographic distribution of arrests.
First we need to extract latitude and longitude from location, we'll use some string functions to do this
```{r}
tmp=gsub("\\)","",gsub("\\(","",arrest_tab$Location))
tmp=strsplit(tmp,split=",")
arrest_tab$lon=as.numeric(sapply(tmp,function(x) x[2]))
arrest_tab$lat=as.numeric(sapply(tmp,function(x) x[1]))
```
Now let's plot
```{r}
plot(arrest_tab$lon, arrest_tab$lat, xlab="Longitude", ylab="Latitude", main="Arrests in Baltimore")
```
We can also use density estimates to make this nicer:
```{r}
smoothScatter(arrest_tab$lat, arrest_tab$lon, xlab="Latitude", ylab="Longitude", main="Arrests in Baltimore")
```
Let's make this fancier using the `ggplot2` graphics systems and the `maps` package containing map data.
```{r}
library(maps)
library(ggplot2)
balto_map = subset(map_data("county", region="maryland"),subregion=="baltimore city")
plt=ggplot()
plt=plt+geom_polygon(data=balto_map,aes(x=long,y=lat),color="white",fill="gray40")
plt=plt+geom_point(data=arrest_tab,aes(x=lon,y=lat),color="blue",alpha=.1)
print(plt)
```
Now let's add CCTV cameras.
```{r}
tmp=gsub("\\)","",gsub("\\(","",cctv_tab$Location))
tmp=strsplit(tmp,split=",")
cctv_tab$lon=as.numeric(sapply(tmp,function(x) x[2]))
cctv_tab$lat=as.numeric(sapply(tmp,function(x) x[1]))
plt=ggplot()
plt=plt+geom_polygon(data=balto_map,aes(x=long,y=lat),color="white",fill="gray40")
plt=plt+geom_point(data=arrest_tab,aes(x=lon,y=lat),color="blue",alpha=.1)
plt=plt+geom_point(data=cctv_tab,aes(x=lon,y=lat),color="red")
print(plt)
```
### A challenge
Is there any relationship between the number of CCTV cameras and the number of arrests? Divide the city into a grid and plot the number of CCTV cameras vs. the number of arrests.
```{r}
# step 1: divide city intro grid for arrest data
# step 1a: find the range of latitude and longitude
latRange=range(arrest_tab$lat,na.rm=TRUE)
lonRange=range(arrest_tab$lon,na.rm=TRUE)
# step 1b: discretize latitude into 50 bins
latGrid=seq(min(latRange),max(latRange),len=50)
latFactor=cut(arrest_tab$lat,breaks=latGrid)
# now longitude
lonGrid=seq(min(lonRange),max(lonRange),len=50)
lonFactor=cut(arrest_tab$lon,breaks=lonGrid)
# step 1c: make a factor indicating geographic grid location
gridFactor=factor(paste(lonFactor,latFactor,sep=":"))
# step 2: do the same for the cctv data
latFactor=cut(cctv_tab$lat,breaks=latGrid)
lonFactor=cut(cctv_tab$lon,breaks=lonGrid)
cctvGridFactor=factor(paste(lonFactor,latFactor,sep=":"))
arrestTab=table(gridFactor)
cctvTab=table(cctvGridFactor)
m=match(names(cctvTab),names(arrestTab))
plot(arrestTab[m]~factor(cctvTab),xlab="Number of CCTV cameras", ylab="Number of Arrests")
```
### Extra analyses
As part of Project 1 you will add to this analysis. Please use the following template:
#### Mihai Sirbu
What question are you asking?:
I am trying to answer: at what time are most people arrested?
For this prelimary analysis, I plan on making a plot where
hour is the x-axis and the number of arrest is the y-axis. This
will produced an "Arrest Timeseries.
What is the code you use to answer it?:
```{r surbu}
time <- strptime(arrest_tab$arrestTime, "%H:%M")
arrest_tab$hours <- as.numeric(format(time, "%H"))
hours_df <- as.data.frame(table(arrest_tab$hours))
names(hours_df) <- c("hour","count")
g <- ggplot(hours_df, aes(hour, count, group=1))+geom_line(color="blue")+geom_point(color="blue")
g <- g+labs(title = "Arrest Timeseries", x="Time of Day",y="Num of Arrests")
g <- g+scale_x_discrete(breaks=seq(0,23,2))
g <- g + theme(plot.title=element_text(size=16,face="bold"),axis.title.x=element_text(size=16,face="bold"),axis.title.y=element_text(size=16,face="bold"))
g
```
What did you observe?
I had originally thought that there would be very little arrests until 8 pm at which point there would be a giant spike from 8 pm to 5 am. But that was not the case. Instead, the two biggest hours of arrest were 6 pm followed by 10 am (!!). At this point, I'm not entirely sure why that might be. I would be surprised, however, if all offenses followed this exact same pattern.
#### Aaron Dugatkin
What question are you asking?: I am trying to find out how cameras affect the sorts of crimes in their area, both in reducing certain types of crime, or leading to finding more of other types of crime.
What is the code you use to answer it?:
```{r aarondugatkin}
# modified code from above, to create factors, but remove NA
# added by HCB to restore original arrest table
arrest_tab_original = arrest_tab
#
arrest_tab = arrest_tab[!is.na(arrest_tab$lat) & !is.na(arrest_tab$lon),]
latRange=range(arrest_tab$lat,na.rm=TRUE)
lonRange=range(arrest_tab$lon,na.rm=TRUE)
latGrid=seq(min(latRange),max(latRange),len=50)
latFactor=cut(arrest_tab$lat,breaks=latGrid)
lonGrid=seq(min(lonRange),max(lonRange),len=50)
lonFactor=cut(arrest_tab$lon,breaks=lonGrid)
gridFactor=factor(paste(lonFactor,latFactor,sep=":"))
latFactor=cut(cctv_tab$lat,breaks=latGrid)
lonFactor=cut(cctv_tab$lon,breaks=lonGrid)
cctvGridFactor=factor(paste(lonFactor,latFactor,sep=":"))
arrestTab=table(gridFactor)
cctvTab=table(cctvGridFactor)
#count crimes in areas with and without camera
arrestOnCamera = gridFactor %in% names(cctvTab)
count_crime_tab <- table(arrest_tab$incidentOffense, arrestOnCamera)
#merge the two tables, and calculate the difference in crime frequency in the two situations
crime_tab <- data.frame(count_crime_tab[,1], count_crime_tab[,2])
colnames(crime_tab)[1] <- "noCamCrimes"
colnames(crime_tab)[2] <- "camCrimes"
crime_tab$names <- rownames(crime_tab)
crime_tab$campct <- crime_tab$camCrimes/sum(crime_tab$camCrimes)*100
crime_tab$nocampct <- crime_tab$noCamCrimes/sum(crime_tab$noCamCrimes)*100
crime_tab$pctchange <- crime_tab$campct - crime_tab$nocampct
#display the change in crime frequency with crime name in descending order, with the most increased (caught) crimes first
crime_tab <- crime_tab[with(crime_tab, order(-pctchange)), ]
options(scipen=999)
subset(crime_tab, select=c("pctchange"))
# added by HCB to restore original arrest table
arrest_tab = arrest_tab_original
```
What did you observe? The results were interesting. We see a large increase in charges of narcotics, which may be due to camera surveillance. We also see a decrease in assault, which may be due to the perpetrators of such crimes realizing the dangers of committing such crimes in front of a camera. However, the vast majority of crimes do not even see a 1% change between the two situations, so it would appear as though, overall, cameras do not have a major affect on criminal activity.
#### Anna Petrone
What question are you asking?:
Which neighborhoods in Baltimore have the higest number of arrests?
What is the code you use to answer it?:
Load libraries
```{r AnnaPetrone:libs}
library(rgdal) # needed for reading shape files
library(plyr) # needed for rename function
library(sp) # needed for point.in.polygon function
library(ggmap) # could use for geocoding addresses
library(ggplot2) # needed for plotting
```
Find number of arrests for which the geo coordinates weren't given
```{r AnnaPetrone:nogeo}
no.geo.idx = nchar(arrest_tab$Location.1) == 0
n.geo.missing = sum( no.geo.idx )
narrests = dim(arrest_tab)[1]
n.geo.missing/narrests*100 # 39%
```
Find the number of incidents who dont have geo code info, but the incidentLocation is provided
```{r AnnaPetrone:nogeo_butloc}
has.location = nchar(arrest_tab$incidentLocation) > 0
sum(no.geo.idx & has.location)
```
```{r AnnaPetrone:geocode}
#tmp = paste(arrest_tab$incidentLocation[no.geo.idx & has.location], "Baltimore, MD")
#gc = geocode(tmp[1:2490]) # restricted to 2500 api requests per day
```
Get the 2010 statistical community areas [here](http://bniajfi.org/mapping-resources/)
Download the shape files and extract from the .zip file
```{r AnnaPetrone:getshape}
setwd("csa_2010_boundaries/")
csa = readOGR(dsn=".",layer="CSA_NSA_Tracts")
csa.df = fortify(csa) # fortify turns the shape data into a data.frame
csa.df = rename(csa.df, c("long"="X.feet","lat"="Y.feet")) # MD uses State Plane coords instead of lat/lon (see comments section)
convert = FALSE
if (convert){ # write a file to send to matlab code (described in comments section)
write.csv(csa.df[,c("lat","lon")], "csa-df.txt", quote=FALSE, na="",row.names=FALSE)
}
csa.converted.df = read.csv("csa-df_converted.txt",header=FALSE) # output of the matlab code, converted to lat/lon
setwd("..")
csa.converted.df = rename(csa.converted.df, c("V1"="lat","V2"="lon"))
csa.df = cbind(csa.df, csa.converted.df)
```
Now assign each of the arrest records to a neighborhood
but this is only possible for the records that have geo info.
This step takes about 15-20 seconds
```{r AnnaPetrone:assign}
ncsa = dim(csa)[1]
arrest_tab_geo = arrest_tab[!no.geo.idx,]
narrests.geo = dim(arrest_tab_geo)[1]
arrest_nbhd_id = vector(length = narrests.geo)
for (j in 1:ncsa) { # takes about 30 sec
idx = csa.df$id == j-1
polyx = csa.df$lon[idx]
polyy = csa.df$lat[idx]
in.poly = point.in.polygon(arrest_tab_geo$lon, arrest_tab_geo$lat, polyx,polyy)
in.poly= as.logical(in.poly)
arrest_nbhd_id[in.poly] = j - 1
}
arrest_tab_geo = cbind(arrest_tab_geo, arrest_nbhd_id)
```
For each neighborhood, count the number of arrests, using the table function
```{r AnnaPetrone:narrests}
nbhd.narrests = as.data.frame(table(arrest_nbhd_id))
nbhd.narrests = rename(nbhd.narrests, c("arrest_nbhd_id"="id", "Freq"="narrests"))
nbhd.names= as.vector(csa$Neigh)
nbhd.narrests = cbind(nbhd.names, nbhd.narrests)
head(nbhd.narrests)
```
Merge the arrest counts with the geometry data
```{r AnnaPetrone:narrests.merge}
csa.df = merge(csa.df, nbhd.narrests, by="id",all.x=TRUE)
```
Make a plot colored by number of arrests
```{r AnnaPetrone_plot}
g = ggplot(csa.df,aes(x=lon,y=lat,group=group))
g = g + geom_polygon(aes(fill=narrests)) + scale_fill_gradient(low="slategray1",high="slateblue4") # color the nbhds by narrests
g = g + geom_path(colour="gray75",size=.1) # draw lines separating the neighborhoods
g = g + ggtitle("Baltimore City Arrests 2011 - 2012") # add a title
g = g + theme(axis.ticks = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank()) # remove axis labels
print(g)
```
What did you observe?
First, it should be noted that out of the 104,528 arrest records, 40,636 of them (about 39%) did not have geocoded locations (latitude and longitude). Some of them (7,650) did have and adress in the incidentLocation field, so it would be possible to geocode these, though it would not contribute enormously. (I did however take a look at the ggmap library which can convert an address string into lat and lon by calling the google maps geocoding API, however it only allows 2,500 requests per day.) Therefore my analysis only includes the 61% of records which provided geocoded information.
Second, I need to note that it was a pain converting from the [MD State Plane coordinates](http://en.wikipedia.org/wiki/State_Plane_Coordinate_System) into longitude and latitude. I ended up using [an external matlab function](http://www.mathworks.com/matlabcentral/fileexchange/26413-sp-proj) to do the conversion, since it seemed really confusing to do in R.
The results: As could probably be expected, the highest number of arrests occured in the downtown area, with the northwest area being notably high as well. The inner harbor neighborhood is among the lowest which makes sense as this area is more touristy. The neighborhoods in the central northern neighborhoods are also on the low end (I don't know Baltiore but I'm guessing these are higher income neighborhoods).
For future analysis, it would be good to create a similar plot where the colors represent neighborhood income level. I would also like to add a layer showing the locations of transit stations, since these are commonly believed to attract crime.
#### Raul Alfaro
What question are you asking?: Which is the most common crime per race?
What is the code you use to answer it?:
```{r RaulAlfaro}
head(sort(table(arrest_tab$incidentOffense[arrest_tab$race=="A"]),decreasing=TRUE))
head(sort(table(arrest_tab$incidentOffense[arrest_tab$race=="B"]),decreasing=TRUE))
head(sort(table(arrest_tab$incidentOffense[arrest_tab$race=="H"]),decreasing=TRUE))
head(sort(table(arrest_tab$incidentOffense[arrest_tab$race=="I"]),decreasing=TRUE))
head(sort(table(arrest_tab$incidentOffense[arrest_tab$race=="U"]),decreasing=TRUE))
head(sort(table(arrest_tab$incidentOffense[arrest_tab$race=="W"]),decreasing=TRUE))
```
What did you observe?
I observed asside from the Unknown Offenses the most common crime for all races but 1 was Narcotics, the "race" had common Assault as their most common crime.
#### Kim St. Andrie, Rain Surasorn
What question are you asking?:
Which year had the largest number of arrests?
What is the code you use to answer it?:
```{r KimRain}
a <- data.frame(id = arrest_tab$arrest, year = substr(arrest_tab$arrestDate,7,11))
head(sort(table(a$year), decreasing=TRUE),10)
```
What did you observe?
2011 had the largest number of arrests but there was no dramatic difference between the number of arrests for each year. There were 52,868 arrests in 2011 which was 1208 more than the number of arrests for 2012.
#### Rentao Wu
What are you asking:
I wanted to know if the ratio of female to male crime rates are similar accross the difference races.
What is the code you used to answer it?
```{r}
mytab = table(race=arrest_tab$race, sex=arrest_tab$sex)
#mydf$V1 = NULL
mydf = as.data.frame.matrix(mytab)
mydf$ratio = mydf$F/mydf$M
mydf$ratio <- round(mydf$ratio, 3)
mydf = mydf[-1,]
mydf
mydf$race <- c("A","B","H","I","U","W")
ggplot(data=mydf, aes(x=race, y=ratio, fill=race)) + geom_bar(stat="identity")
```
What are your observations?
I found out that for most race, the ratio of female to male crime rates is about 0.2. This tells us that there are about 1 female for ever 5 male criminal offenses. I also saw that the female to male crime ratio for the white population is about 0.43 which is much higher than the others.
#### Krishna Pai
What question are you asking?: Do police officers go out of their way to arrest more black people than white people?
What is the code you use to answer it?:
```{r kpai}
library(ggplot2)
# added by HCB to not dirty global environment
kpai=function()
{
arrest_tab=read.csv("BPD_Arrests.csv", stringsAsFactors=FALSE)
tmp=arrest_tab$sex
arrest_tab$sex=arrest_tab$race
arrest_tab$race=tmp
police_tab=read.csv("Police_Stations.csv", stringsAsFactors=FALSE)
tmp=gsub("\\)","",gsub("\\(","",arrest_tab$Location))
tmp=strsplit(tmp,split=",")
arrest_tab$lon=as.numeric(sapply(tmp,function(x) x[2]))
arrest_tab$lat=as.numeric(sapply(tmp,function(x) x[1]))
tmp=gsub("\\)","",gsub("\\(","",police_tab$Location))
tmp=strsplit(tmp,split=",")
police_tab$lon=as.numeric(sapply(tmp,function(x) x[2]))
police_tab$lat=as.numeric(sapply(tmp,function(x) x[1]))
plt=ggplot()
plt=plt+geom_point(data=arrest_tab[arrest_tab$race=='B',],aes(x=lon,y=lat),color="black",alpha=.1)
plt=plt+geom_point(data=arrest_tab[arrest_tab$race=='W',],aes(x=lon,y=lat),color="white",alpha=.1)
plt=plt+geom_point(data=police_tab,aes(x=lon,y=lat),color="red")
print(plt)
}
kpai()
```
What did you observe? I was surprised to find that most of the arrests made some distance away from the cluster of police stations were for white people. It would be interesting to investigate what kinds of crimes might have been committed so far away from the stations, and why white people stand out as being arrested at that distance, especially in the east.
#### Joe Downs, Michael Kleyman, Anumeet Nepaul
What question are you asking?:
We were curious about how temperature relates to crime. Does crime count
correlate with temperature? We used average temperature values for Baltimore
from wunderground and normalized our counts for each temperature by the number
of days that experienced that temperature. It seems that there should be more
crimes in warmer weather because both criminals and victims should be more
likely to spend time outside when it is warmer.
What is the code you use to answer it?:
```{r jdowns_mkleyman_anepaul}
# load weather data from csv file
baltimore_weather <- read.csv("baltimore_weather.csv", stringsAsFactors=FALSE)
# store abbrevation of weather data and add formatting for date
tempAbr <- baltimore_weather[,c("EST","Mean.TemperatureF")]
tempAbr$EST <- as.Date(baltimore_weather$EST,"%Y-%m-%d")
# copy arrest data and add formatting for date
Date_Arrests <- arrest_tab
Date_Arrests$arrestDate <- as.Date(Date_Arrests$arrestDate,"%m/%d/%Y")
# join weather and crime tables to match days together
tempMerge<-merge(x=Date_Arrests,y=tempAbr, by.x ="arrestDate", by.y="EST" )
tableCrime <- table(tempMerge$Mean.TemperatureF)
tableTemp <- table(baltimore_weather$Mean.TemperatureF)
# plot crime count versus temperature, normalized by day count for each temperature
plot(tableCrime/tableTemp, xlab="Temperature (F)", ylab="Normalized Crime Count")
divTable = tableCrime/tableTemp
divtmatrix = as.data.frame(divTable)
# cast factors as numeric values for correlation
divtmatrix[,1] <- as.numeric(as.character(divtmatrix[,1]))
divtmatrix[,2] <- as.numeric(divtmatrix[,2])
cor(divtmatrix[,1],divtmatrix[,2])
```
What did you observe?:
While we suspected that crime count would correlate with temperature, we found that the correlation coefficient was only 0.3673, suggesting that crime is not influenced linearly by the temperature. There also seems to be no clear non-linear dependence of crime on temperature for our data. Compared to this (http://crime.static-eric.com/) analysis of Chicago, temperature appears to relate to crime in Baltimore to a much smaller degree.