The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.
In this project, I used the dataset of image-caption pairs to train a CNN-RNN model to automatically generate image captions from images
For this training, gpu is used. Some result of my model are shown below.
A man riding a wave on top of a surfboard .
A man riding on the back of an elephant .