Lightweight and Sufficient Two Viewpoint Connections for Augmented Reality
Augmented Reality (AR) is a powerful computer to human visual interface that displays data overlaid onto the user's view of the real world. Compared to conventional visualization on a computer display, AR has the advantage of saving the user the cognitive effort of mapping the visualization to the real world. For example, a user wearing AR glasses can find a destination in an urban setting by following a virtual green line drawn by the AR system on the sidewalk, which is easier to do than having to rely on navigational directions displayed on a phone. Similarly, a surgeon looking at an operating field through an AR display can see graphical annotations authored by a remote mentor as if the mentor actually drew on the patient's body.
However, several challenges remain to be addressed before AR can reach its full potential. This research contributes solutions to four such challenges. A first challenge is achieving visualization continuity for AR displays. Since truly transparent displays are not feasible, AR relies on simulating transparency by showing a live video on a conventional display. For correct transparency, the display should show exactly what the user would see if the display were not there. Since the video is not captured from the user viewpoint, simply displaying each frame as acquired results in visualization discontinuity and redundancy. A second challenge is providing the remote mentor with an effective visualization of the mentee's workspace in AR telementoring. Acquiring the workspace with a camera built into the mentee's AR headset is appealing since it captures the workspace from the mentee's viewpoint, and since it does not require external hardware. However, the workspace visualization is unstable as it changes frequently, abruptly, and substantially with each mentee head motion. A third challenge is occluder removal in diminished reality. Whereas in conventional AR the user's visualization of a real world scene is augmented with graphical annotations, diminished reality aims to aid the user's understanding of complex real world scenes by removing objects from the visualization. The challenge is to paint over occluder pixels using auxiliary videos acquired from different viewpoints, in real time, and with good visual quality. A fourth challenge is to acquire scene geometry from the user viewpoint, as needed in AR, for example, to integrate virtual annotations seamlessly into the real world scene through accurate depth compositing, and shadow and reflection casting and receiving.
Our solutions are based on the thesis that images acquired from different viewpoints should not always be connected by computing a dense, per-pixel set of correspondences, but rather by devising custom, lightweight, yet sufficient connections between them, for each unique context. We have developed a self-contained phone-based AR display that aligns the phone camera and the user by views, reducing visualization discontinuity to less than 5% for scene distances beyond 5m. We have developed and validated in user studies an effective workspace visualization method by stabilizing the mentee first-person video feed through reprojection on a planar proxy of the workspace. We have developed a real-time occluder in-painting method for diminished reality based on a two-stage coarse-then-fine mapping between the user and the auxiliary view. The mapping is established in time linear with occluder contour length, and it achieves good continuity across the occluder boundary. We have developed a method for 3D scene acquisition from the user viewpoint based on single-image triangulation of correspondences between left and right eye corneal reflections. The method relies on a subpixel accurate calibration of the catadioptric imaging system defined by two corneas and a camera, which enables the extension of conventional epipolar geometry for a fast connection between corneal reflections.