Binocular 3D vision in a nutshell

Consider two cameras with overlapping fields of view.

First camera is modelized by its sensor plane S1 and optical center O1, second by S2 and O2 respectively.

Given the observation of a pixel P1 over S1, image of the point X in the scene, we try to determine the distance d1 between O1 and X. That would be the so called ‘3D information’ or ‘z coordinate’ of the point X in the first camera coordinates system.

To that end, we need to identify pixel P2 over S2, image of the point X for the second camer. Thanks to epipolar geometry, that can be achieved by matching P1 over a single line (called epipolar line) instead of the whole image plane.

The correspondence between epipolar lines of image planes S1 and S2 is established during the calibration process. It consists in capturing a calibration plate of known dimensions on the whole system’s field of view.

Once a P2’ candidate is found with enough confidence to assume P2’ = P2, one can simply compute distance d1 using the sine formula: