A new machine learning model hallucinates an image of what a sentence looks like in a language. According to a recent research paper, the technique then uses visualization and other clues to aid with translation. It’s part of a growing movement to use AI to understand language.  “How people talk and write is unique because we all have slightly different tones and styles,” Beth Cudney, a professor of data analytics at Maryville University, who was not involved in the research, told Lifewire in an email interview. “Understanding context is difficult because it is like dealing with unstructured data. This is where natural language processing (NLP) is useful. NLP is a branch of AI that addresses the differences in how we communicate using machine reading comprehension. The key difference in NLP, as a branch of AI, does not focus simply on the literal meanings of the words we speak or write. It looks at the meaning.”

Go Ask Alice

The new AI system, called VALHALLA, created by researchers from MIT, IBM, and the University of California at San Diego, was designed to mimic the way humans perceive language. According to scientists, using sensory information, like multimedia, paired with new and unfamiliar words, like flashcards with images, improves language acquisition and retention.  The team claims their method improves the accuracy of machine translation over text-only translation. The scientists used an encoder-decoder architecture with two transformers, a type of neural network model suited for sequence-dependent data, like language, that can pay attention to keywords and semantics of a sentence. One transformer generates a visual hallucination, and the other performs multimodal translation using outputs from the first transformer. “In real-world scenarios, you might not have an image with respect to the source sentence,” Rameswar Panda, one of the research team members, said in a news release. “So, our motivation was basically: Instead of using an external image during inference as input, can we use visual hallucination—the ability to imagine visual scenes—to improve machine translation systems?”

AI Understanding

Considerable research is focused on advancing NLP, Cudney pointed out. For example, Elon Musk co-founded Open AI, which is working on GPT-3, a model that can converse with a human and is savvy enough to generate software code in Python and Java.  Google and Meta are also working to develop conversational AI with their system called LAMDA. “These systems are increasing the power of chatbots that are currently only trained and capable of specific conversations, which will likely change the face of customer support and help desks,” Cudney said.  Aaron Sloman, the co-founder CLIPr, an AI tech company, said in an email that large language models like GPT-3 can learn from very few training examples to improve on summaries of text based on human feedback. For instance, he said, you can give a large language model a math problem and ask the AI to think step-by-step. “We can expect greater insights and reasoning to be extracted from large language models as we learn more about their abilities and limitations,” Sloman added. “I also expect these language models to create more human-like processes as modelers develop better ways to fine-tune the models for specific tasks of interest.” Georgia Tech computing professor Diyi Yang predicted in an email interview that we will see more use of natural language processing (NLP) systems in our daily lives, ranging from NLP-based personalized assistants to help with emails and phone calls, to knowledgeable dialogue systems for information-seeking in travel or healthcare. “As well as fair AI systems that can perform tasks and assist humans in a responsible and bias-free manner,” Yang added.  Enormous AI models using trillions of parameters such as GPT-3 and DeepText will continue to work towards a single model for all language applications, predicted Stephen Hage, a machine learning engineer at Dialexa, in an email interview. He said that there will also be new types of models created for specific uses, such as voice-commanded online shopping. “An example might be a shopper saying ‘Show me this eyeshadow in midnight blue with more halo,’ to show that shade on the person’s eyes with some control over how it’s applied,” Hage added.