Software Systems for Large-Scale Retrospective Video Analytics
Pervasive cameras are generating videos at an unprecedented pace, making videos the new frontier of big data. As the processors, e.g., CPU/GPU, become increasingly powerful, the cloud and edge nodes can generate useful insights from colossal video data. However, as the research in computer vision (CV) develops vigorously, the system area has been a blind spot in CV research. With colossal video data generated from cameras every day and limited compute resource budgets, how to design software systems to generate insights from video data efficiently?
Designing cost-efficient video analytics software systems is challenged by the expensive computation of vision operators, the colossal data volume, and the precious wireless bandwidth of surveillance cameras. To address above challenges, three software systems are proposed in this thesis. For the first system, we present VStore, a data store that supports fast, resource-efficient analytics over large archival videos. VStore manages video ingestion, storage, retrieval, and consumption and controls video formats through backward derivation of configuration: in the opposite direction along the video data path, VStore passes the video quantity and quality expected by analytics backward to retrieval, to storage, and to ingestion. VStore derives an optimal set of video formats, optimizes for different resources in a progressive manner, and runs queries as fast as 362x of video realtime. For the second system, we present a camera/cloud runtime called DIVA that supports querying cold videos distributed on low-cost wireless cameras. DIVA is built upon a novel zero-streaming paradigm: to save wireless bandwidth, when capturing video frames, a camera builds sparse yet accurate landmark frames without uploading any video data; when executing a query, a camera processes frames in multiple passes with increasingly more expensive operators. On diverse queries over 15 videos, DIVA runs at more than 100x realtime and outperforms competitive alternatives remarkably. For the third system, we present Clique, a practical object re-identification (ReID) engine that builds upon two unconventional techniques. First, Clique assesses target occurrences by clustering unreliable object features extracted by ReID algorithms, with each cluster representing the general impression of a distinct object to be matched against the input. Second, to search across camera videos, Clique samples cameras to maximize the spatiotemporal coverage and incrementally adds cameras for processing on demand. Through evaluation on 25 hours of traffic videos from 25 cameras, Clique reaches a high recall at 5 of 0.87 across 70 queries and runs at 830x of video realtime in achieving high accuracy.