Neural Machine Translation with Latent Context
Gonçalo Correia (Ph.D. Student)
Transformers have become ubiquitous in generative models and, most notably, in NLP, where large and pre-trained all-purpose models are released often and achieve new state-of-the-art results. These architectures are mainly comprised of multiple layers of attention mechanisms, which not only results in powerful representations of the input, but also paves the way to interpretability thanks to the possibility of analyzing the attention plots. In this project, we focus on using pre-trained models smartly, on using sparsity to improve Transformers interpretability, and on using latent variables to summarize the context in which the input was created. First, we investigate how pre-trained Transformer models can be used effectively to train an Automatic Post-Editing (APE) system in very little data, while obtaining results that are comparable to models trained in large amounts of expensive to create APE data. Second, we use sparse normalizing functions to parameterize the attention mechanisms of Transformers, resulting in improved interpretability of this architecture, when compared to using softmax. Third, we propose a method to train discrete and structured latent variable models by using these sparse functions, resulting in a method that does not require any sampling and as such improves the training stability of generative models that use such type of latent variables. Lastly, to be able to use a large context in document-level machine translation, we propose to use a variational method in which only the latent variable has access to the context in which a sentence was translated, while the translation model generates each target sentence independently given this latent variable. To achieve this, we efficiently use pre-trained Transformers in the generative and inference models, taking inspiration from our work with APE. We also make use of our proposed technique to train discrete latent variables in order to obtain an interpretable rationale of which sentences of the context are useful for each translation.
|Primary Host:||André Martins (University of Lisbon)|
|Exchange Host:||Vlad Niculae (University of Amsterdam)|
|PhD Duration:||01 April 2018 - 30 April 2022|
|Exchange Duration:||01 April 2020 - 30 June 2020 01 October 2021 - 31 March 2022|