We, as human beings, are still feeling our way through the new world of Artificial Intelligence (AI). Even though the field has been around for some time, its philosophical underpinnings are still in a nascent stage. Perhaps it is worthwhile to remember something the late Stephen Hawking once said, “We should shift the goal of AI from creating pure undirected artificial intelligence to creating beneficial intelligence. It might take decades to figure out how to do this, so let’s start researching this today.”
We believe, effective fusion of AI, culture, and storytelling will help diminish bias in algorithmic identification and train AI software to be much more inclusive and we are asking you to help us explore the possibilities. We are sharing our early approach here and looking for your feedback.
AI is being used more and more to drive interactive and timely content. The New York Times, Washington Post and Associated Press all rely on machines to help meet our ever growing appetite for on-demand content. Yet, the data that drives our narratives represents only a fraction of our story.
Today, world cultures, history, and traditions are being left out of the narrative, research, and datasets. We need systems that empower communities to participate in the emerging digital narrative. AI story narration is the craft of making machines learn human storytelling techniques.
Several groundbreaking works on AI, culture, and storytelling prove that such an endeavor is possible and critical, including the works of Rafael Pérez y Pérez’s MEXICA project; Boyang “Albert” Li and Mark Riedl on The Scheherazade System; Mark Finlayson annotating a corpus of Russian folktales; Wolfgang Victor Yarlott’s Old Man Coyote Stories, and D. Fox Harrell’s Imagination, Computation and Expression Laboratory.
At IVOW, we have been seeking advice from several AI experts as we develop our prototype focusing on images and extended captions. In addition, our academic partnership with Morgan State University has allowed us to collaborate with Professor Mahmudur Rahman on image recognition and captioning work in AI. It’s important to note that we are an early stage startup but confident that together with our partners and advisors we can incorporate cultural data into future automated stories.
To do this work, we are relying on these main components:
- Story narration templates
- Culturally sensitive image captioning Deep Learning Model
- Natural Language Processing (NLP)
Story Narration Templates
Cultural story narrations are not straightforward like stories related to weather and sports. Every different culture or ethnicity not only has its own style of storytelling, but each has different histories, experiences, and traditions that must be incorporated. This would require multiple story narration templates for a particular culture that would include different ways to start and end a story. Our team of journalists and international partners will collaborate on the story narration templates.
Deep Learning Model to caption cultural image
The first step in creating a culturally sensitive story based on a photo is coming up with tags for the image. Those can be explicit to the specific culture and event being depicted: Indo-Hispanic, dance, feathered costume, headpiece, Festival of Our Lady of Guadalupe, etc. Those tags can be combined to start the caption-making process. We first trained our deep learning model with thousands of generic images to help it to come up with sensible captions. These are some real caption examples that our Deep Learning model produced for the given images –
We started training our Culturally Sensitive Deep Learning model with a relatively small set of images related to Hispanic culture, provided by world renowned ethnographer and photographer Miguel Gandert. Gandert has spent his career documenting the various cultures and traditions of the Indo-Hispanic community in the American southwest. Although this model with only hundreds of images is small, by accurately tagging each photo, we can train our model to produce culturally sensitive captions.
At IVOW, story generation with tags is a work in progress. We believe an extended caption can be the first form of story we tell. The culture examples seen are story generation with images only. So, the model is capable of producing the sentences or the entire story with image as the only input. We are working on the extra input of tags so it can potentially make the story generation more robust.
Here is an early example produced by our model:
Note: The punctuations are filtered by current setting of the tokenizer. But can be added back easily. The “word-by-word” approach is to learn the distribution over all possible captions. This allows variations of language when more data comes in.
Natural Language Processing (NLP)
We will use natural language processing algorithms to generate short sentences, using tags provided either by the user or generated via automated AWS services. Here is an example that shows how tags can be converted into brief sentences.
We are considering using the EventRank or PlotShot to identify normal events, optional events and conditional events that can be marked with Typicality. This can help in selecting, omitting, and arranging the sentences. It creates a short summary of a situation by choosing the most typical events. This method can be applied to cultural narratives. There will be multiple short sentences generated using multiple methods:
- The story will start by drawing on available story templates for a specific culture, provided by our journalists.
- Captions are generated from Images, from our Deep Learning model.
- Sentences are generated with Natural Language Processing based on user-generated tags.
- The story will end by drawing on available story templates for a specific culture, provided by our journalists.
Sentences generated from the above method can be arranged using algorithms (for example — EventRank or PlotShot), that will be explored in detail, to give a sensible flow to the narration.
We’d love to hear from you. Critique us, share your feedback — email [email protected]. Our IVOW colleagues Vishal Raj, Ziheng Lin, Robert Malesky and Kee Malesky contributed to this report.
Originally published at medium.com