In this paper, a novel deep convolutional neural network (CNN) based high-level multi-task control architecture is proposed to address the visual guide-and-pick control problem of an omnidirectional mobile manipulator platform based on deep learning technology. The proposed mobile manipulator control system only uses a stereo camera as a sensing device to accomplish the visual guide-and-pick control task. After the stereo camera captures the stereo image of the scene, the proposed CNN-based high-level multi-task controller can directly predict the best motion guidance and picking action of the omnidirectional mobile manipulator by using the captured stereo image. In order to collect the training dataset, we manually controlled the mobile manipulator to navigate in an indoor environment for approaching and picking up an object-of-interest (OOI). In the meantime, we recorded all of the captured stereo images and the corresponding control commands of the robot during the manual teaching stage. In the training stage, we employed the end-to-end multi-task imitation learning technique to train the proposed CNN model by learning the desired motion and picking control strategies from prior expert demonstrations for visually guiding the mobile platform and then visually picking up the OOI. Experimental results show that the proposed visually guided picking control system achieves a picking success rate of about 78.2% on average.