Skip to content

Image Captioning

The Microsoft Common Objects in COntext (MS COCO) dataset is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms.

coco examples

In this project, I used the dataset of image-caption pairs to train a CNN-RNN model to automatically generate image captions from images

encoder-decoder

For this training, gpu is used. Some result of my model are shown below.

A man riding a wave on top of a surfboard .
A man riding on the back of an elephant .

Purpose : Using MS COCO dataset, creating image captioning deep learning model.

Contribution : Using CNN-RNN model to obtain great accuracy on the testset.

Completion Date : 22.03.2020