In this paper, a novel deep convolutional neural network (CNN) based high-level multi-task
control architecture is proposed to address the visual guide-and-pick control problem of an omnidirectional
mobile manipulator platform based on deep learning technology. The proposed mobile manipulator control
system only uses a stereo camera as a sensing device to accomplish the visual guide-and-pick control
task. After the stereo camera captures the stereo image of the scene, the proposed CNN-based high-level
multi-task controller can directly predict the best motion guidance and picking action of the omnidirectional mobile manipulator by using the captured stereo image. In order to collect the training dataset, we manually controlled the mobile manipulator to navigate in an indoor environment for approaching and picking up an object-of-interest (OOI). In the meantime, we recorded all of the captured stereo images and the corresponding control commands of the robot during the manual teaching stage. In the training stage, we employed the end-to-end multi-task imitation learning technique to train the proposed CNN model by
learning the desired motion and picking control strategies from prior expert demonstrations for visually
guiding the mobile platform and then visually picking up the OOI. Experimental results show that the
proposed visually guided picking control system achieves a picking success rate of about 78.2% on average.