Purdue University Graduate School
dissertation_JC_final.pdf (4.51 MB)


Download (4.51 MB)
posted on 2020-06-16, 17:53 authored by Jiannan CaiJiannan Cai

The motivation for this research stems from the promise of coupling multi-sensory systems and advanced data analytics to enhance holistic situational awareness and thus prevent fatal accidents in the construction industry. The construction industry is one of the most dangerous industries in the U.S. and worldwide. Occupational Safety and Health Administration (OSHA) reports that the construction sector employs only 5% of the U.S. workforce, but accounts for 21.1% (1,008 deaths) of the total worker fatalities in 2018. The struck-by accident is one of the leading causes and it alone led to 804 fatalities between 2011 and 2015. A critical contributing factor to struck-by accidents is the lack of holistic situational awareness, attributed to the complex and dynamic nature of the construction environment. In the context of construction site safety, situational awareness consists of three progressive levels: perception – to perceive the status of construction entities on the jobsites, comprehension – to understand the ongoing construction activities and interactions among entities, and projection – to predict the future status of entities on the dynamic jobsites. In this dissertation, holistic situational awareness refers to the achievement at all three levels. It is critical because with the absence of holistic situational awareness, construction workers may not be able to correctly recognize the potential hazards and predict the severe consequences, either of which will pose workers in great danger and may result in construction accidents. While existing studies have been successful, at least partially, in improving the perception of real-time states on construction sites such as locations and movements of jobsite entities, they overlook the capability of understanding the jobsite context and predicting entity behavior (i.e., movement) to develop the holistic situational awareness. This presents a missed opportunity to eliminate construction accidents and save hundreds of lives every year. Therefore, there is a critical need for developing holistic situational awareness of the complex and dynamic construction sites by accurately perceiving states of individual entities, understanding the jobsite contexts, and predicting entity movements.

The overarching goal of this research is to minimize the risk of struck-by accidents on construction jobsite by enhancing the holistic situational awareness of the unstructured and dynamic construction environment through a novel data-driven approach. Towards that end, three fundamental knowledge gaps/challenges have been identified and each of them is addressed in a specific objective in this research.

The first knowledge gap is the lack of methods in fusing heterogeneous data from multimodal sensors to accurately perceive the dynamic states of construction entities. The congested and dynamic nature of construction sites has posed great challenges such as signal interference and line of sight occlusion to a single mode of sensor that is bounded by its own limitation in perceiving the site dynamics. The research hypothesis is that combining data of multimodal sensors that serve as mutual complementation achieves improved accuracy in perceiving dynamic states of construction entities. This research proposes a hybrid framework that leverages vision-based localization and radio-based identification for robust 3D tracking of multiple construction workers. It treats vision-based tracking as the main source to obtain object trajectory and radio-based tracking as a supplementary source for reliable identity information. It was found that fusing visual and radio data increases the overall accuracy from 88% and 87% to 95% and 90% in two experiments respectively for 3D tracking of multiple construction workers, and is more robust with the capability to recover the same entity ID after fragmentation compared to using vision-based approach alone.

The second knowledge gap is the missing link between entity interaction patterns and diverse activities on the jobsite. With multiple construction workers and equipment co-exist and interact on the jobsite to conduct various activities, it is extremely difficult to automatically recognize ongoing activities only considering the spatial relationship between entities using pre-defined rules, as what has been done in most existing studies. The research hypothesis is that incorporating additional features such as attentional cues better represents entity interactions and advanced deep learning techniques automates the learning of the complex interaction patterns underlying diverse activities. This research proposes a two-step long short-term memory (LSTM) approach to integrate the positional and attentional cues to identify working groups and recognize corresponding group activities. A series of positional and attentional cues are modeled to represent the interactions among entities, and the LSTM network is designed to (1) classify whether two entities belong to the same group, and (2) recognize the activities they are involved in. It was found that by leveraging both positional and attentional cues, the accuracy increases from 85% to 95% compared with cases using positional cues alone. Moreover, dividing the group activity recognition task into a two-step cascading process improves the precision and recall rates of specific activities by about 3%-12% compared to simply conducting a one-step activity recognition.

The third knowledge gap is the non-determining role of jobsite context on entity movements. Worker behavior on a construction site is goal-based and purposeful, motivated and influenced by the jobsite context including their involved activities and the status of other entities. Construction workers constantly adjust their movements in the unstructured and dynamic workspace, making it challenging to reliably predict worker trajectory only considering their previous movement patterns. The research hypothesis is that combining the movement patterns of the target entity with the jobsite context more accurately predicts the trajectory of the entity. This research proposes a context-augmented LSTM method, which incorporates both individual movement and workplace contextual information, for better trajectory prediction. Contextual information regarding movements of neighboring entities, working group information, and potential destination information is concatenated with movements of the target entity and fed into an LSTM network with an encoder-decoder architecture to predict trajectory over multiple time steps. It was found that integrating contextual information with target movement information can result in a smaller final displacement error compared to that obtained only considering the previous movement, especially when the length of prediction is longer than the length of observation. Insights are also provided on the selection of appropriate methods.

The results and findings of this dissertation will augment the holistic situational awareness of site entities in an automatic way and enable them to have a better understanding of the ongoing jobsite context and a more accurate prediction of future states, which in turn allows the proactive detection of any potential collisions.


Degree Type

  • Doctor of Philosophy


  • Civil Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Hubo Cai

Additional Committee Member 2

Dulcy Abraham

Additional Committee Member 3

Mary Comer

Additional Committee Member 4

Phillip Dunston

Additional Committee Member 5

Ayman Habib

Usage metrics



    Ref. manager