Manipulating robots are supposed to be functioning like our hands and like our hands it should have an intelligent grasping ability to perform complex manipulation tasks. However, for robots executing an intelligent and optimal grasp efficiently, the way we grasp objects, is quite challenging. The reason being that we acquire this skill by spending a lot of time in our childhood trying and failing to pick things up, and learning from our mistakes. For robots we can't wait through the equivalent of an entire robotic childhood. To streamline the process, in the present investigation we propose to develop deep learning and machine learning based techniques to help robots learn quickly how to generate and execute appropriate grasps. In this context, for vision based object detection, we have designed an effective loss function, Absolute Intersection over Union (AIoU), for faster and better bounding box regression which has been verified using You Only Look Once version 3 (YOLOv3) and Single Shot Detection (SSD) algorithms. Subsequently, on detected objects, for grasp generation, we develop genetic algorithm based grasp position estimator with deep reinforcement learning based grasp orientation estimator using Grasp Deep Q-Network (GDQN). Since all deep learning and reinforcement learning techniques are data hungry, and there is scarcity of sufficient labelled data, we try to overcome the challenges by proposing a hybrid (discriminative-generative) model, based on Vector Quantized Variational Autoencoder (VQ-VAE). More specifically, we develop two stateof-the-art models. One a Generative Inception Neural Network (GI-NNet) model, capable of generating antipodal robotic grasps on seen as well as unseen objects which is trained on Cornell Grasping Dataset (CGD) and performed excellently by attaining 98.87% grasp pose accuracy by detecting the same from the RGB-Depth (RGB-D) images for regular as well as irregular shaped objects while it requires only one third of the network trainable parameters as compared to the State-Of-The-Art (SOTA) approaches. For other model we integrate VQ-VAE with GI-NNet, which we name as Representation based GI-NNet (RGINNet). This model has been trained utilizing the various splits of available CGD dataset to test the learning ability of our architecture starting from only 10% label data with the latent embedding of VQ-VAE to 90% label data with latent embedding. The performance level, in terms of grasp pose accuracy of RGI-NNet, varies between 92.13% to 97.75% which is far better than many other existing SOTA models trained with only labelled dataset. For the performance verification of all the proposed models for grasp pose estimation, we use Anukul (Baxter) Cobot and it is observed that our models perform significantly better in real-time tabletop grasp executions. Since the ultimate Cobotics (collaborative robotics) framework development requires smooth/seamless human-robot interactions, we also develop a fusion model utilizing multiple modes of communications such as speech and gesture, using Long Short Term Memory (LSTM), Convolutional Neural Network (CNN) and 3-D CNN on a humanoid robot framework, NAO. Finally, we want cobots should be able to execute grasps based on learning, and therefore, we also address the robot grasping manipulation at the execution level such as, solving an inverse kinematics problem using reinforcement learning techniques