Developing unsupervised vision systems in Dynamic Environments is one of the next
challenges in Computer Vision. In Dynamic Environments, we usually lack the complete
domain knowledge of the applied environments before deployment, and computation is
also limited due to the need for prompt reaction and on-board computational capacity. This
thesis studies a series of key Computer Vision problems in Dynamic Environments.
First, we propose a stream clustering algorithm and a number of variants for unsupervised feature learning and object discovery, which possess several crucial characteristics
required by applications in Dynamic Environments, e.g. fully progressive, arbitrary similarity measure, matching object while the feature space is increasing, etc. We give strong
provable guarantees of the clustering accuracy in statistic view. Based on the above the approaches, we tackle the problem of discovering aerial objects on-the-fly, where we assume
all of the objects are unknown at the beginning of the deployment. The vision system is
required to discover from the low-level features to salient objects on-the-fly without any
supervision. We propose a number of approaches with respect to object proposal, tracking, recognition, and localization to achieve real-time performance. Extensive experiments
on prevalent aerial video datasets showed that the approaches efficiently and accurately
discover salient ground objects.
To explore complex and deep architectures in Dynamic Environments, we propose Unsupervised Deep Encoding which unifies traditional Visual Encoding and Convolutional
Neural Networks. We found strong relationships between single-layer Neural Networks
and Clustering and thus performed unsupervised feature learning at each layer from the feature maps of the previous layer. We replaced the dot product inside each neuron with
a similarity measure, which is also used in unsupervised feature learning. The weight
vectors of our network are initialized by cluster centers. Therefore, one feature map is
a visual encoding of its previous feature map. We applied this mechanism to pre-training
Convolutional Neural Networks for image classification. It has been found by extensive experiments that pre-training benefits the network more reliable learning dynamics (e.g.fast
convergence without Batch Normalization) and better classification accuracy.