Modeling Spatiotemporal Pedestrian-Environment Interactions for Predicting Pedestrian Crossing Intention from the Ego-View
For pedestrians and autonomous vehicles (AVs) to co-exist harmoniously and safely in the real-world, AVs will need to not only react to pedestrian actions, but also anticipate their intentions. In this thesis, we propose to use rich visual and pedestrian-environment interaction features to improve pedestrian crossing intention prediction from the ego-view. We do so by combining visual feature extraction, graph modeling of scene objects and their relationships, and feature encoding as comprehensive inputs for an LSTM encoder-decoder network.
Pedestrians react and make decisions based on their surrounding environment, and the behaviors of other road users around them. The human-human social relationship has already been explored for pedestrian trajectory prediction from the bird’s eye view in stationary cameras. However, context and pedestrian-environment relationships are often missing in current research into pedestrian trajectory, and intention prediction from the ego-view. To map the pedestrian’s relationship to its surrounding objects we use a star graph with the pedestrian in the center connected to all other road objects/agents in the scene. The pedestrian and road objects/agents are represented in the graph through visual features extracted using state of the art deep learning algorithms. We use graph convolutional networks, and graph autoencoders to encode the star graphs in a lower dimension. Using the graph en- codings, pedestrian bounding boxes, and human pose estimation, we propose a novel model that predicts pedestrian crossing intention using not only the pedestrian’s action behaviors (bounding box and pose estimation), but also their relationship to their environment.
Through tuning hyperparameters, and experimenting with different graph convolutions for our graph autoencoder, we are able to improve on the state of the art results. Our context- driven method is able to outperform current state of the art results on benchmark dataset Pedestrian Intention Estimation (PIE). The state of the art is able to predict pedestrian crossing intention with a balanced accuracy (to account for dataset imbalance) score of 0.61, while our best performing model has a balanced accuracy score of 0.79. Our model especially outperforms in no crossing intention scenarios with an F1 score of 0.56 compared to the state of the art’s score of 0.36. Additionally, we also experiment with training the state of the art model and our model to predict pedestrian crossing action, and intention jointly. While jointly predicting crossing action does not help improve crossing intention prediction, it is an important distinction to make between predicting crossing action versus intention.