"Generating poetry from image is a special task of text generation from image. There have been many studies in this area. However, most of them focus on image captioning rather than literature creation. Only few of previous systems addressed the problem of generating poems from images. There have also many studies and systems for generating poetry. In most cases, a system is provided with a few keywords and is required to compose a poem containing or relating to the keywords."
"Given an image, we first extract a few keywords representing objects and sentiments perceived from the image. These keywords are then expanded to related ones based on their associations in human written poems. Finally, verses are generated gradually from the keywords using recurrent neural networks trained on existing poems.
"For the image query, object and sentiment detection are used to extract appropriate nouns, such as city and street, and adjectives, such as busy, as initial keyword set.... We propose detecting objects and sentiments from each image with two parallel convolutional neural networks (CNN), which share the same network architecture but with different parameters. Specifically, one network learns to describe objects by the output of noun words, and the other learns to understand the sentiments by the output of adjective words. The two CNNs are pre-trained on ImageNet and fine-tuned on noun and adjective categories, respectively. After filtering out words with low confidence and rare words, keyword expansion will be applied to construct a keyword set. Next, each keyword is regarded as an initial seed for each sentence in the poem generation process. For example, the first sentence is generated from the seed city. A hierarchical recurrent neural network is proposed for modifying the structure between words and between sentences. We follow the recurrent neural network language model (RNNLM) to predict text sequence. Each word is predicted sequentially by the previous word sequence... Finally we apply a fluency checker to automatically detect low quality sentences early on and re-generate them. We use Long-Short Term Memory (LSTM) for RNN... In the poetry generation model, the recurrent hidden layers for the sentence level and poem level both contain 3 layers and 1024 hidden units for each layer. The sentence encoder dimensionality is 64."
"As a training corpus, we collect 2,027 modern Chinese poems that are composed of 45,729 sentences from shigeku.org. The character vocabulary size is 4,547. For the training of word based model, word segmentation are applied on the corpus. The size of word vocabulary is 54,318. For the model optimization experiment, 100 public domain images are crawled from Bing image search by searching 60 randomly sampled nouns and adjectives in our predefined categories. We focus on 45 images recognized as views for optimizing our model."
The system was deployed on XiaoIce in 2017 and within a year 12M poems had been generated. A book of 139 generate poems was published in 2017.