Titelbild von road-accidents-fr

Motivation

Every year, approximately 1.3 million people die as a result of road accidents (source: WHO).

The aim of this data science project is to investigate the influences on the occurrence and severity of accidents. The knowledge thus gained can help decision-makers in their decisions and thus prevent fatal accidents.

The input data

The raw data is provided by www.data.gouv.fr. It has information on every traffic accident with personal injury since 2005.

One limitation of the dataset is that not all data collected is released (e.g., alcohol levels of the people involved). In addition, an important variable that is not available by its nature would be the speed of the vehicles prior to the accident.

Status of the project

The content status of the project so far is captured in 4 Jupyter notebooks:

  • In notebook 1 I am importing and cleaning the data.
  • In notebook 2 I visualize some correlations between the variables and the geographic distribution of the accidents
  • In notebook 3 I train different models of ensemble learning to predict the severity of traffic accidents (XGBoost, Random Forest). The models are interpreted regarding which variables are particularly significant for the severity of accidents.
  • In Notebook 4, I train artificial neural networks with Tensorflow/Keras and Coral-Ordinal.

In addition, I created a Streamlit app for the geographic plots, which can be found here: app

https://langhammer-road-accidents-fr-streamlit-app-xnqwbs.streamlit.app/

The notebooks are tested for functionality with a test run every push on GitHub.

Agenda

  • Time series analysis starting in 2005
  • Identify the variables that influence the number of accidents in a given city / department

Kay Langhammer

Data Scientist

Kay Langhammer 2023