Data
Traffic Violations Dataset
The data was originally found on Kaggle.com and then source data was later found on the following dataMontomery website. This data is updated frequently, our dataset is from November 3rd, 2023. The data was originally collected by the Montgomery County Police Station in Maryland. It was collected for the purpose of observing drug and traffic violations and creating a government and police record of violations. It was put together using traffic violation information from all electronic traffic violations issued in Montgomery County.
The data file that was used was originally in the form of a CSV file. This original dataset had 43 variables, but because it was large, we chose to use only the more relevant 14 variables in our cleaned data set, which are Date of stop, Time of stop, Description, Location, Accident, Fatal, Alcohol, Driver’s License State, Year, Make, Model, Violation type, Race, and Arrest type. Description shows what the specific charge was, such as exceeding the posted speed limit. Location indicates the specific street in Montgomery County where the violation occurred, while Driver’s License State shows the state where the driver is from, which is not necessarily Maryland. Make and Model refers to the car that was being driven. Fatal and Alcohol only say Yes or No, while Violation Type and Arrest Type provide more specific descriptions of the violation that occurred and its outcome, respectively.
For preliminary cleaning, we removed columns that we believe would not be helpful or relevant in our data analysis in our cleaning script. Some examples of this are latitude and longitude coordinates. No additional R packages were required for this and no other data sets were merged with the current data to begin with for the exploratory data analysis. We later merged the crash reporting dataset summarized below with this dataset for our further analysis and statistical modeling.
Crash Reporting Dataset
The second dataset we chose to use for our analysis was found on the same website as our original dataset, dataMontomery. This dataset also contains information for Montgomery County, Maryland, making it a relevant addition to our preliminary dataset to provide a more comprehensive perspective on accidents in this county and the patterns underlying them. The data was collected by the Automated Crash Reporting System, referring to as ACRS, by the Maryland State Police. It was then reported by the Montgomery County Police, Gaithersburg Police, Rockville Police, or the Maryland-National Capital Park Police. The reports were created for the purpose of addressing public safety. For our data analysis, we are specifically using the data reported by the Montgomery County Police to fit with our traffic violations data.
The crash reporting incidents data provides information regarding reported collisions in Montgomery County and their relevant details. This includes crashes that occurred on both county and local roadways in the county. It has approximately 96,801 rows of data and 43 columns, originally in the form of a CSV. These columns include relevant information regarding the collision occurrences that were reported included report number, local case number, date and time, collision type, and weather. The other major columns it includes are surface condition of the road, latitude and longitude, and whether the crash occurred at a junction or intersection. For our preliminary data cleaning, we began by condensing the dataset by removing columns we we did believe to be necessary for our analysis, which included ACRS report type since we already knew we were focusing on collisions, and Agency Name, since it was Montgomery County Police for every data point. This dataset was then joined with the traffic violations dataset using longitude and latitude.