This invention provides a system and method for training and performing runtime 3D pose determination of an object using a plurality of camera assemblies in a 3D vision system. The cameras are arranged at different orientations with respect to a scene, so as to acquire contemporaneous images of an object, both at training and runtime. Each of the camera assemblies includes a non-perspective lens that acquires a respective non-perspective image for use in the process. The searched object features in one of the acquired non-perspective image can be used to define the expected location of object features in the second (or subsequent) non-perspective images based upon an affine transform, which is computed based upon at least a subset of the intrinsics and extrinsics of each camera. The locations of features in the second, and subsequent, non-perspective images can be refined by searching within the expected location of those images. This approach can be used in training, to generate the training model, and in runtime operating on acquired images of runtime objects. The non-perspective cameras can employ telecentric lenses.