Skip to content

Image Captioning

The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.

coco examples

In this project, I used the dataset of image-caption pairs to train a CNN-RNN model to automatically generate image captions from images

encoder-decoder

For this training, gpu is used. Some result of my model are shown below.

A man riding a wave on top of a surfboard .
A man riding on the back of an elephant .