Academic Lecture——Repulsive Attention and its Applications
Author:Administrator Source:website Time:2021-06-04 04:15:53
Time: 11:00a.m-12:30a.m Monday, June 7, 2021
Place: Tencent Conference (Conference ID: 882927322 Conference Password: 583679)Topic: Repulsive Attention and its ApplicationsSpeaker: Chen Chang is an Assistant Professor in the Department of Computer Science and Engineering at the University at Buffalo, State University of New York
Abstract:
Multi-head attention mechanism is an important technique that has been applied in many state of the art deep neural network architectures, including Transformer, BERT, GPT, CLIP and DALL-E. A potential deficiency of multi-head attention is what we called attention collapse, a phenomenon that tends to make some or all attention heads identical after learning. We tackle this problem from a Bayesian perspective, and proposed an interpretation that treats attention heads as samples from some distribution. Based on our new particle-optimization framework for Bayesian inference, we develop efficient algorithms to jointly optimize all attention heads. Experiments on a number of problems including sentence embedding, cyberbully detection, machine translation, language representation learning and graph writer demonstrate the effectiveness of our method.
Brief introduction:
Chen Changyou is an assistant professor in the Department of Computer Science and Engineering at the University at Buffalo, State University of New York. His research focuses on Bayesian machine learning, deep learning, and deep reinforcement learning, as well as various applications in computer vision and natural language processing. He was a research Assistant Professor and postdoctoral fellow in the Department of Electrical and Computer Engineering at Duke University, and received his PhD from the School of Engineering and Computer Science at the Australian National University, and his Master's and Bachelor's degrees from Fudan University.