S2ST Survey & History
End-to-End Speech Translation Progress
Data
Corpus | Direction | Target | Duration | License |
---|---|---|---|---|
CoVoST 2 | {Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En and En -> | Text | 2880h | CC0 |
CVSS | {Fr, De, Es, Ca, It, Ru, Zh, Pt, Fa, Et, Mn, Nl, Tr, Ar, Sv, Lv, Sl, Ta, Ja, Id, Cy} -> En | Text & Speech | 1900h | CC BY 4.0 |
mTEDx | {Es, Fr, Pt, It, Ru, El} -> En, {Fr, Pt, It} -> Es, Es -> {Fr, It}, {Es,Fr} -> Pt | Text | 765h | CC BY-NC-ND 4.0 |
CoVoST | {Fr, De, Nl, Ru, Es, It, Tr, Fa, Sv, Mn, Zh} -> En | Text | 700h | CC0 |
MUST-C & MUST-Cinema | En -> | Text | 504h | CC BY-NC-ND 4.0 |
How2 | En -> Pt | Text | 300h | Youtube & CC BY-SA 4.0 |
Augmented LibriSpeech | En -> Fr | Text | 236h | CC BY 4.0 |
Europarl-ST | {En, Fr, De, Es, It, Pt, Pl, Ro, Nl} -> | Text | 280h | CC BY-NC 4.0 |
Kosp2e | Ko -> En | Text | 198h | Mixed CC |
Fisher + Callhome | Es -> En | Text | 160h+20h | LDC |
MaSS | parallel among En, Es, Eu, Fi, Fr, Hu, Ro and Ru | Text & Speech | 172h | Bible.is |
LibriVoxDeEn | De -> En | Text | 110h | CC BY-NC-SA 4.0 |
Prabhupadavani | parallel among En, Fr, De, Gu, Hi, Hu, Id, It, Lv, Lt, Ne, Fa, Pl, Pt, Ru, Sl, Sk, Es, Se, Ta, Te, Tr, Bg, Hr, Da and Nl | Text | 94h | |
BSTC | Zh -> En | Text | 68h | |
LibriS2S | De <-> En | Text & Speech | 52h/57h | CC BY-NC-SA 4.0 |
Toolkit
This repository collects the tookits, common datasets and paper list related to the research on Simultaneous Translation. This repository is continuously updating…
It is a great honor if this repository brings some help or reference to your research:blush: If you have any suggestions, feel free to contact me with: Shaolei Zhang zhangshaolei20z@ict.ac.cn
.
Tookits
- Fairseq: a sequence modeling toolkit, covering the machine translation, speech translation and simultaneous translation (both text-to-text and speech-to-text).
- SimulEval: a general evaluation framework for simultaneous translation on text and speech.
Datasets
- Conventional text-to-text translation datasets:
- IWSLT15 English-Vietnamese: 133K sentence pairs. [Link]
- WMT15 German-English: 4.5M sentence pairs. [Link]
- WMT14 English-French: 36.3M sentence pairs. [Link]
- Conventional speech-to-text translation datasets:
- MuST-C: multilingual speech-to-text translation corpus with 8 language pairs. [Link]
- Conventional speech-to-Speech translation datasets:
- CVSS: massively multilingual-to-English speech-to-speech translation corpus. [Link]
- Simultaneous interpretation datasets:
- BSTC Chinese-English: 68 hours. [Link]
- NAIST-SIC English-Japanese: 22 hours.[Link]
Tutorials & Talks
PACLIC 2016: The Challenge of Simultaneous Speech Translation. Anoop Sarkar. [Link]
EMNLP 2020: Simultaneous Translation. Liang Huang, Colin Cherry, Mingbo Ma, Naveen Arivazhagan, and Zhongjun He. [Link]
AMTA 2020: Simultaneous Speech Translation in Google Translate. Jeff Pitman. [Link]
Paper List
This is a paper list of Simultaneous Translation, organized by publication year.
We also collect a paper list organized by different categories. Refer to Here.
2002 | 2006 | 2007 | 2009 | 2010 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024
2002
- Translation Unit Concerning Timing of Simultaneous Translation. LREC 2002. [PDF]
2006
- Simultaneous English-Japanese Spoken Language Translation Based on Incremental Dependency Parsing and Transfer. ACL 2006. [PDF]
2007
- Simultaneous translation of lectures and speeches. Mach Translat 2007. [PDF]
2009
- End-to-End Evaluation in Simultaneous Translation. EACL 2009. [PDF]
2010
-
Stream-based Translation Models for Statistical Machine Translation. NAACL 2010. [PDF]
-
Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation. LREC 2010. [PDF]
2012
- Real-time Incremental Speech-to-Speech Translation of Dialogs. NAACL 2012. [PDF]
2013
- Incremental Segmentation and Decoding Strategies for Simultaneous Translation. IJCNLP 2013. [PDF]
2014
-
Optimizing Segmentation Strategies for Simultaneous Speech Translation. ACL 2014. [PDF]
-
Collection of a Simultaneous Translation Corpus for Comparative Analysis. IREC 2014. [PDF]
-
Don’t Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation. EMNLP 2014. [PDF]
-
Towards Simultaneous Interpreting: the Timing of Incremental Machine Translation and Speech Synthesis. IWSLT 2014. [PDF]
-
Segmentation Strategies for Streaming Speech Translation. NAACL 2014. [PDF]
2015
-
Automated Simultaneous Interpretation: Hints of a Cognitive Framework for Machine Translation. HyTra 2015. [PDF]
-
Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents. ACL 2015. [PDF]
-
Syntax-based Rewriting for Simultaneous Machine Translation. EMNLP 2015. [PDF]
-
Improved Speech-to-Text Translation with the Fisher and Callhome Spanish–English Speech Translation Corpus. IWSLT 2015. [PDF]
2016
-
An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation. WAT 2016. [PDF]
-
Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation. NAACL 2016. [PDF] [Code]
-
Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons. LREC 2016. [PDF]
-
A Prototype Automatic Simultaneous Interpretation System. COLING 2016. [PDF]
-
Simultaneous Machine Translation using Deep Reinforcement Learning. ICML 2016. [PDF]
-
Can neural Machine Translation do Simultaneous Translation? Arxiv 2016. [PDF]
-
Listen and translate: A proof of concept for end-to-end speech-to-text translation. NIPS Workshop 2016. [PDF]
-
An Attentional Model for Speech Translation Without Transcription. NAACL 2016. [PDF]
2017
-
Online and Linear-Time Attention by Enforcing Monotonic Alignments. ICML 2017. [PDF] [Code]
-
Learning to Translate in Real-time with Neural Machine Translation. EACL 2017. [PDF] [Code]
-
Sequence-to-Sequence Models Can Directly Translate Foreign Speech. INTERSPEECH 2017. [PDF]
-
Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation. INTERSPEECH 2017. [PDF]
-
Towards speech-to-text translation without speech recognition. EACL 2017. [PDF]
2018
-
Simultaneous Translation using Optimized Segmentation. AMTA 2018. [PDF]
-
Automatic Estimation of Simultaneous Interpreter Performance. ACL 2018. [PDF] [Code]
-
Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation. NAACL 2018. [PDF] [Code]
-
Statistical Analysis of Missing Translation in Simultaneous Interpretation Using A Large-scale Bilingual Speech Corpus. LREC 2018. [PDF]
-
Prediction Improves Simultaneous Neural Machine Translation. EMNLP 2018. [PDF] [Code]
-
KIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning. COLING 2018. [PDF]
-
How2: A Large-scale Dataset for Multimodal Language Understanding. NIPS 2018. [PDF]
-
End-to-End Speech Translation with the Transformer. IberSPEECH 2018. [PDF]
-
Low-Resource Speech-to-Text Translation. INTERSPEECH 2018. [PDF]
-
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation. LREC 2018. [PDF]
-
Tied multitask learning for neural speech translation. NAACL 2018. [PDF]
-
End-to-End Automatic Speech Translation of Audiobooks. ICASSP 2018. [PDF]
2019
-
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation. ACL 2019. [PDF]
-
STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework. ACL 2019. [PDF]
-
Simultaneous Translation with Flexible Policy via Restricted Imitation Learning. ACL 2019. [PDF]
-
Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation. NAACL 2019. [PDF] [Code]
-
Simpler and Faster Learning of Adaptive Policies for Simultaneous Translation. EMNLP 2019. [PDF]
-
Speculative Beam Search for Simultaneous Translation. EMNLP 2019. [PDF]
-
Thinking Slow about Latency Evaluation for Simultaneous Machine Translation. Arxiv 2019. [PDF]
-
DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting. Arxiv 2019. [PDF]
-
Simultaneous Neural Machine Translation using Connectionist Temporal Classification. Arxiv 2019. [PDF]
-
One-To-Many Multilingual End-to-end Speech Translation. ASRU 2019. [PDF]
-
Multilingual End-to-End Speech Translation. ASRU 2019. [PDF]
-
Speech-to-speech Translation between Untranscribed Unknown Languages. ASRU 2019. [PDF]
-
A Comparative Study on End-to-end Speech to Text Translation. ASRU 2019. [PDF]
-
Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade. IWSLT 2019. [PDF]
-
On Using SpecAugment for End-to-End Speech Translation. IWSLT 2019. [PDF]
-
End-to-End Speech Translation with Knowledge Distillation. INTERSPEECH 2019. [PDF]
-
Adapting Transformer to End-to-end Spoken Language Translation. INTERSPEECH 2019. [PDF]
-
Direct speech-to-speech translation with a sequence-to-sequence model. INTERSPEECH 2019. [PDF]
-
Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation. ACL 2019. [PDF]
-
Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation. ACL 2019. [PDF]
-
Pre-training on High-Resource Speech Recognition Improves Low-Resource Speech-to-Text Translation. NAACL 2019. [PDF]
-
MuST-C: a Multilingual Speech Translation Corpus. NAACL 2019. [PDF]
-
Fluent Translations from Disfluent Speech in End-to-End Speech Translation. NAACL 2019. [PDF]
-
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation. ICASSP 2019. [PDF]
-
Towards unsupervised speech-to-text translation. ICASSP 2019. [PDF]
-
Towards End-to-end Speech-to-text Translation with Two-pass Decoding. ICASSP 2019. [PDF]
2020
-
Towards Multimodal Simultaneous Neural Machine Translation. WMT 2020. [PDF] [Code]
-
Opportunistic Decoding with Timely Correction for Simultaneous Translation. ACL 2020. [PDF]
-
Simultaneous Translation Policies: From Fixed to Adaptive. ACL 2020. [PDF]
-
SimulSpeech: End-to-End Simultaneous Speech to Text Translation. ACL 2020. [PDF]
-
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus. ACL 2020. [PDF]
-
Speech Translation and the End-to-End Promise: Taking Stock of Where We Are. ACL 2020 Theme. [PDF]
-
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation. ACL 2020. [PDF]
-
Phone Features Improve Speech Translation. ACL 2020. [PDF]
-
Curriculum Pre-training for End-to-End Speech Translation. ACL 2020. [PDF]
-
ESPnet-ST: All-in-One Speech Translation Toolkit. ACL 2020 Demo. [PDF]
-
Learning Adaptive Segmentation Policy for Simultaneous Translation. EMNLP 2020. [PDF]
-
Simultaneous Machine Translation with Visual Context. EMNLP 2020. [PDF] [Code]
-
Direct Segmentation Models for Streaming Speech Translation. EMNLP 2020. [PDF] [Code]
-
Effectively pretraining a speech translation decoder with Machine Translation data. EMNLP 2020. [PDF]
-
SIMULEVAL: An Evaluation Toolkit for Simultaneous Translation. EMNLP 2020 Demo. [PDF] [Code]
-
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework. EMNLP 2020 Findings. [PDF]
-
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training. EMNLP 2020 Findings. [PDF]
-
Adaptive Feature Selection for End-to-End Speech Translation. EMNLP 2020 Findings. [PDF]
-
A General Framework for Adaptation of Neural Machine Translation to Simultaneous Translation. AACL 2020. [PDF]
-
SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation. AACL 2020. [PDF] [Code]
-
fairseq S2T: Fast Speech-to-Text Modeling with fairseq. AACL 2020 Demo. [PDF]
-
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation. AAAI 2020. [PDF]
-
Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding. AAAI 2020. [PDF]
-
Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation. ICASSP 2020. [PDF]
-
Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates. ICASSP 2020. [PDF]
-
Instance-Based Model Adaptation For Direct Speech Translation. ICASSP 2020. [PDF]
-
Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning. ICASSP 2020. [PDF]
-
Analyzing ASR pretraining for low-resource speech-to-text translation. ICASSP 2020. [PDF]
-
End-to-End Speech Translation with Self-Contained Vocabulary Manipulation. ICASSP 2020. [PDF]
-
Efficient Wait-k Models for Simultaneous Machine Translation. InterSpeech 2020. [PDF] [Code]
-
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection. InterSpeech 2020. [PDF]
-
Relative Positional Encoding for Speech Recognition and Direct Translation. InterSpeech 2020. [PDF]
-
Contextualized Translation of Automatically Segmented Speech. InterSpeech 2020. [PDF]
-
Self-Training for End-to-End Speech Translation. InterSpeech 2020. [PDF]
-
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation. InterSpeech 2020. [PDF]
-
Self-Supervised Representations Improve End-to-End Speech Translation. InterSpeech 2020. [PDF]
-
Investigating Self-Supervised Pre-Training for End-to-End Speech Translation. InterSpeech 2020. [PDF]
-
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus. LREC 2020. [PDF]
-
MuST-Cinema: a Speech-to-Subtitles corpus. LREC 2020. [PDF]
-
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible. LREC 2020. [PDF]
-
LibriVoxDeEn: A Corpus for German-to-English Speech Translation and Speech Recognition. LREC 2020. [PDF]
-
On Target Segmentation for Direct Speech Translation. AMTA 2020. [PDF]
-
Consistent Transcription and Translation of Speech. TACL 2020. [PDF]
-
Presenting Simultaneous Translation in Limited Space. Arxiv 2020. [PDF]
-
Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS. Arxiv 2020. [PDF]
-
Low Latency ASR for Simultaneous Speech Translation. Arxiv 2020. [PDF]
-
Bridging the Modality Gap for Speech-to-Text Translation. Arxiv 2020. [PDF]
-
CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning. Arxiv 2020. [PDF]
-
On Knowledge Distillation for Direct Speech Translation. CLiC-IT 2020. [PDF]
-
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation. COLING 2020. [PDF]
-
Breeding Gender-aware Direct Speech Translation Systems. COLING 2020. [PDF]
2021
-
Monotonic Simultaneous Translation with Chunk-wise Reordering and Refinement. WMT2021. [PDF]
-
Simultaneous Neural Machine Translation with Constituent Label Prediction. WMT 2021. [PDF]
-
Future-Guided Incremental Transformer for Simultaneous Translation. AAAI 2021. [PDF]
-
Studying The Impact Of Document-level Context On Simultaneous Neural Machine Translation. Machine Translation 2021. [PDF]
-
Beyond Sentence-Level End-to-End Speech Translation: Context Helps. ACL 2021. [PDF] [Code]
-
RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer. ACL 2021 findings. [PDF]
-
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR. ACL 2021 findings. [PDF]
-
Multilingual Simultaneous Neural Machine Translation. ACL 2021 findings. [PDF]
-
Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy. EMNLP 2021. [PDF] [Code]
-
Cross Attention Augmented Transducer Networks for Simultaneous Translation. EMNLP 2021. [PDF] [Code]
-
Translation-based Supervision for Policy Generation in Simultaneous Neural Machine Translation. EMNLP 2021. [PDF] [Code]
-
Improving Simultaneous Translation by Incorporating Pseudo-References with Fewer Reorderings. EMNLP 2021. [PDF]
-
A Generative Framework for Simultaneous Machine Translation. EMNLP 2021. [PDF]
-
It Is Not As Good As You Think! Evaluating Simultaneous Machine Translation on Interpretation Data. EMNLP 2021. [PDF] [Code]
-
Stream-level Latency Evaluation for Simultaneous Machine Translation. EMNLP 2021 findings. [PDF] [Code]
-
MiSS: An Assistant for Multi-Style Simultaneous Translation. EMNLP 2021 Demo. [PDF]
-
Learning Coupled Policies for Simultaneous Machine Translation using Imitation Learning. EACL 2021. [PDF] [Code]
-
Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation. EACL 2021. [PDF] [Code]
-
An Empirical Study Of End-To-End Simultaneous Speech Translation Decoding Strategies. ICASSP 2021 [PDF]
-
Streaming Simultaneous Speech Translation With Augmented Memory Transformer. ICASSP 2021 [PDF]
-
Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation. Interspeech 2021. [PDF]
-
Visualization: the missing factor in Simultaneous Speech Translation. CLIC-it 2021. [PDF]
-
UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation. Arxiv 2021. [PDF]
-
Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation. Arxiv 2021. [PDF]
-
Learning to Use Future Information in Simultaneous Translation. Arxiv 2021 [PDF]
-
Simultaneous Multi-Pivot Neural Machine Translation. Arxiv 2021. [PDF]
-
Full-Sentence Models Perform Better in Simultaneous Translation Using the Information Enhanced Decoding Strategy. Arxiv 2021. [PDF]
-
Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems. Arxiv 2021. [PDF]
-
Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention. Arxiv 2021. [PDF]
-
SimulSLT: End-to-End Simultaneous Sign Language Translation. Arxiv 2021. [PDF]
-
Efficient Transformer for Direct Speech Translation. Arxiv 2021. [PDF]
-
Zero-shot Speech Translation. Arxiv 2021. [PDF]
-
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. ASRU 2021. [PDF]
-
Assessing Evaluation Metrics for Speech-to-Speech Translation. ASRU 2021. [PDF]
-
Enabling Zero-shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders. ASRU 2021. [PDF]
-
Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation. ICNLSP 2021. [PDF]
-
Speechformer: Reducing Information Loss in Direct Speech Translation. EMNLP 2021. [PDF]
-
Is “moby dick” a Whale or a Bird? Named Entities and Terminology in Speech Translation. EMNLP 2021. [PDF]
-
Mutual-Learning Improves End-to-End Speech Translation. EMNLP 2021. [PDF]
-
End-to-end Speech Translation via Cross-modal Progressive Training. Interspeech 2021. [PDF]
-
CoVoST 2 and Massively Multilingual Speech-to-Text Translation. Interspeech 2021. [PDF]
-
The Multilingual TEDx Corpus for Speech Recognition and Translation. Interspeech 2021. [PDF]
-
Large-Scale Self-and Semi-Supervised Learning for Speech Translation. Interspeech 2021. [PDF]
-
Kosp2e: Korean Speech to English Translation Corpus. Interspeech 2021. [PDF]
-
AlloST: Low-resource Speech Translation without Source Transcription. Interspeech 2021. [PDF]
-
SpecRec: An Alternative Solution for Improving End-to-End Speech-to-Text Translation via Spectrogram Reconstruction. Interspeech 2021. [PDF]
-
Optimally Encoding Inductive Biases into the Transformer Improves End-to-End Speech Translation. Interspeech 2021. [PDF]
-
ASR Posterior-based Loss for Multi-task End-to-end Speech Translation. Interspeech 2021. [PDF]
-
Simultaneous Speech Translation for Live Subtitling: from Delay to Display. AMTA 2021. [PDF]
-
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders. ACL 2021. [PDF]
-
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models. ACL 2021. [PDF]
-
Lightweight Adapter Tuning for Multilingual Speech Translation. ACL 2021. [PDF]
-
Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference? ACL 2021. [PDF]
-
Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task. ACL 2021. [PDF]
-
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation. ACL 2021 Findings. [PDF]
-
Learning Shared Semantic Space for Speech-to-Text Translation. ACL 2021 Findings. [PDF]
-
Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation. ACL 2021 Findings. [PDF]
-
How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation. ACL 2021 Findings. [PDF]
-
NeurST: Neural Speech Translation Toolkit. ACL 2021 Demo. [PDF]
-
Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation. ICML 2021. [PDF]
-
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation. NAACL 2021. [PDF]
-
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks. NAACL 2021. [PDF]
-
BSTC: A Large-Scale Chinese-English Speech Translation Dataset. NAACL AutoSimTrans 2021. [PDF]
-
Highland Puebla Nahuatl–Spanish Speech Translation Corpus for Endangered Language Documentation. AmericasNLP 2021. [PDF]
-
Task Aware Multi-Task Learning for Speech to Text Tasks. ICASSP 2021. [PDF]
-
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks. ICASSP 2021. [PDF]
-
An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies. ICASSP 2021. [PDF]
-
Streaming Simultaneous Speech Translation with Augmented Memory Transformer. ICASSP 2021. [PDF]
-
Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder. ICASSP 2021. [PDF]
-
Cascaded Models With Cyclic Feedback For Direct Speech Translation. ICASSP 2021. [PDF]
-
Jointly Trained Transformers models for Spoken Language Translation. ICASSP 2021. [PDF]
-
Efficient Use of End-to-end Data in Spoken Language Processing. ICASSP 2021. [PDF]
-
CTC-based Compression for Direct Speech Translation. EACL 2021. [PDF]
-
Streaming Models for Joint Speech Recognition and Translation. EACL 2021. [PDF]
-
mintzai-ST: Corpus and Baselines for Basque-Spanish Speech Translation. IberSPEECH 2021. [PDF]
-
Consecutive Decoding for Speech-to-text Translation. AAAI 2021. [PDF]
-
UWSpeech: Speech to Speech Translation for Unwritten Languages. AAAI 2021. [PDF]
-
“Listen, Understand and Translate”: Triple Supervision Decouples End-to-end Speech-to-text Translation. AAAI 2021. [PDF]
-
Tight Integrated End-to-End Training for Cascaded Speech Translation. SLT 2021. [PDF]
-
Transformer-based Direct Speech-to-speech Translation with Transcoder. SLT 2021. [PDF]
-
Beyond Sentence-Level End-to-End Speech Translation: Context Helps. ACL 2021. [PDF]
-
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR. ACL 2021 Findings. [PDF]
-
RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer. ACL 2021 Findings. [PDF]
-
Efficient Transformer for Direct Speech Translation. arXiv 2021. [PDF]
-
Zero-shot Speech Translation. arXiv 2021. [PDF]
-
Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention. arXiv 2021. [PDF]
-
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. ASRU 2021. [PDF]
-
Assessing Evaluation Metrics for Speech-to-Speech Translation. ASRU 2021. [PDF]
-
Enabling Zero-shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders. ASRU 2021. [PDF]
-
Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation. ICNLSP 2021. [PDF]
-
Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation. INTERSPEECH 2021. [PDF]
-
Speechformer: Reducing Information Loss in Direct Speech Translation. EMNLP 2021. [PDF]
2022
-
Modeling Dual Read/Write Paths for Simultaneous Machine Translation. ACL 2022. [PDF] [Code]
-
Reducing Position Bias in Simultaneous Machine Translation with Length-Aware Framework. ACL 2022. [PDF]
-
From Simultaneous to Streaming Machine Translation by Leveraging Streaming History. ACL 2022. [PDF]
-
Learning When to Translate for Streaming Speech. ACL 2022. [PDF] [Code]
-
Learning Adaptive Segmentation Policy for End-to-End Simultaneous Translation. ACL 2022. [PDF]
-
Gaussian Multi-head Attention for Simultaneous Machine Translation. ACL 2022 findings. [PDF] [Code]
-
Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation. ACL 2022. [PDF]
-
UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation. ACL 2022. [PDF]
-
Direct speech-to-speech translation with discrete units. ACL 2022. [PDF]
-
STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation. ACL 2022. [PDF]
-
End-to-End Speech Translation for Code Switched Speech. ACL 2022 Findings. [PDF]
-
Language Model Augmented Monotonic Attention for Simultaneous Translation. NAACL 2022. [PDF]
-
Textless Speech-to-Speech Translation on Real Data. NAACL 2022. [PDF]
-
Information-Transport-based Policy for Simultaneous Translation. EMNLP 2022. [PDF][code]
-
Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation. EMNLP 2022 findings. [PDF][code]
-
Turning Fixed to Adaptive: Integrating Post-Evaluation into Simultaneous Machine Translation. EMNLP 2022 findings. [PDF][code]
-
Does Simultaneous Speech Translation need Simultaneous Models? EMNLP 2022 findings. [PDF][code]
-
RedApt: An Adaptor for WAV2VEC 2 Encoding Faster and Smaller Speech Translation without Quality Compromise. EMNLP 2022 Findings. [PDF]
-
Revisiting End-to-End Speech-to-Text Translation From Scratch. ICML 2022. [PDF]
-
Translatotron 2: Robust direct speech-to-speech translation. ICML 2022. [PDF]
-
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation. InterSpeech 2022. [PDF][code]
-
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. InterSpeech 2022. [PDF]
-
Multilingual Simultaneous Speech Translation. InterSpeech 2022. [PDF][code]
-
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation. InterSpeech 2022. [PDF][code]
-
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. InterSpeech 2022. [PDF]
-
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers. InterSpeech 2022. [PDF]
-
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation. InterSpeech 2022. [PDF]
-
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation. InterSpeech 2022. [PDF]
-
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation. InterSpeech 2022. [PDF]
-
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation. InterSpeech 2022. [PDF]
-
Supervised Visual Attention for Simultaneous Multimodal Machine Translation. JAIR 2022. [PDF]
-
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation. LREC 2022. [PDF]
-
LibriS2S: A German-English Speech-to-Speech Translation Corpus. LREC 2022. [PDF]
-
Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques. ICASSP 2022. [PDF]
-
Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement. AAAI 2022. [PDF]
-
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing. Neural Networks 2022. [PDF]
-
Comprehension of Subtitles from Re-Translating Simultaneous Speech Translation. Arxiv 2022. [PDF]
-
Data-Driven Adaptive Simultaneous Machine Translation. Arxiv 2022. [PDF]
-
Simultaneous Translation for Unsegmented Input: A Sliding Window Approach. Arxiv 2022. [PDF]
-
MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation (Technical Report). Arxiv 2022. [PDF]
-
Attention as a guide for Simultaneous Speech Translation. Arxiv 2022. [PDF]
-
AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation. Arxiv 2022. [PDF]
-
Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features. Arxiv 2022. [PDF]
-
ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English. Arxiv 2022. [PDF]
-
Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages. Arxiv 2022. [PDF]
2023
-
Tuning Large language model for End-to-end Speech Translation. Arxiv 2023. [PDF]
-
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning. Arxiv 2023. [PDF]
-
Multilingual Speech-to-Speech Translation into Multiple Target Languages. Arxiv 2023. [PDF]
-
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition. ICCV 2023. [PDF]
-
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. InterSpeech 2023. [PDF]
-
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer. InterSpeech 2023. [PDF]
-
Joint Speech Translation and Named Entity Recognition. InterSpeech 2023. [PDF]
-
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation. InterSpeech 2023. [PDF]
-
Knowledge Distillation on Joint Task End-to-End Speech Translation. InterSpeech 2023. [PDF]
-
GigaST: A 10,000-hour Pseudo Speech Translation Corpus. InterSpeech 2023. [PDF]
-
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation. InterSpeech 2023. [PDF]
-
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation. InterSpeech 2023. [PDF]
-
Pre-training for Speech Translation: CTC Meets Optimal Transport. ICML 2023. [PDF]
-
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units. ACL 2023. [PDF]
-
Simple and effective unsupervised speech translation. ACL 2023. [PDF]
-
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric. ACL 2023. [PDF]
-
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations. ACL 2023. [PDF]
-
Understanding and Bridging the Modality Gap for Speech Translation. ACL 2023. [PDF]
-
Back Translation for Speech-to-text Translation Without Transcripts. ACL 2023. [PDF]
-
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation. ACL 2023. [PDF]
-
WACO: Word-Aligned Contrastive Learning for Speech Translation. ACL 2023. [PDF]
-
Speech-to-Speech Translation for a Real-world Unwritten Language. ACL 2023 Findings. [PDF]
-
CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation. ACL 2023 Findings. [PDF]
-
Duplex Diffusion Models Improve Speech-to-Speech Translation. ACL 2023 Findings. [PDF]
-
DUB: Discrete Unit Back-translation for Speech Translation. ACL 2023 Findings. [PDF]
-
Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data. ACL 2023 Findings. [PDF]
-
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation. ICASSP 2023. [PDF]
-
M3ST: Mix at Three Levels for Speech Translation. ICASSP 2023. [PDF]
-
Generating Synthetic Speech from SpokenVocab for Speech Translation. EACL 2023 Findings. [PDF]
-
Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data. AAAI 2023. [PDF]
-
Improving Simultaneous Machine Translation with Monolingual Data. AAAI 2023. [PDF][Code]
-
Hidden Markov Transformer for Simultaneous Machine Translation. ICLR 2023. [PDF][Code]
-
Rethinking the Reasonability of the Test Set for Simultaneous Machine Translation. ICASSP 2023. [PDF]
-
LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous Machine Translation. ICASSP 2023. [PDF]
-
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks. ACL 2023. [PDF][Code]
-
Learning Optimal Policy for Simultaneous Machine Translation via Binary Search. ACL 2023 [PDF][Code]
-
Better Simultaneous Translation with Monotonic Knowledge Distillation. ACL 2023 [PDF][Code]
-
Attention as a Guide for Simultaneous Speech Translation. ACL 2023 [PDF][Code]
-
End-to-End Simultaneous Speech Translation with Differentiable Segmentation. ACL 2023 findings [PDF][Code]
-
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation. ACL 2023 findings [PDF][Code]
-
Japanese-to-English Simultaneous Dubbing Prototype. ACL 2023 demo [PDF]
-
AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation. InterSpeech 2023. [PDF]
-
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models. InterSpeech 2023. [PDF][Code]
-
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers. InterSpeech 2023. [PDF]
-
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. InterSpeech 2023. [PDF]
-
Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation. ICML 2023. [PDF][Code]
-
Non-autoregressive Streaming Transformer for Simultaneous Translation. EMNLP 2023. [PDF][Code]
-
Adaptive Policy with Wait-k Model for Simultaneous Translation. EMNLP 2023. [PDF][Code]
-
Simultaneous Machine Translation with Tailored Reference. EMNLP 2023 findings. [PDF][Code]
-
Enhanced Simultaneous Machine Translation with Word-level Policies. EMNLP 2023 findings. [PDF][Code]
-
Long-form Simultaneous Speech Translation: Thesis Proposal. AACL 2023. [PDF]
-
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach. ASRU 2023. [PDF]
-
Average Token Delay: A Latency Metric for Simultaneous Translation. Arxiv 2023. [PDF]
-
Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference. Arxiv 2023. [PDF]
-
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation. Arxiv 2023. [PDF][Code]
-
Simultaneous Machine Translation with Large Language Models. Arxiv 2023. [PDF]
-
CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation with Weighted Prefix-to-Prefix Training. Arxiv 2023. [PDF]
-
Context Consistency between Training and Testing in Simultaneous Machine Translation. Arxiv 2023. [PDF][Code]
-
Seamless: Multilingual Expressive and Streaming Speech Translation. Arxiv 2023. [PDF][Code]
-
Unified Segment-to-Segment Framework for Simultaneous Sequence Generation. NeurIPS 2023. [PDF][Code]
-
Efficient Monotonic Multihead Attention.Arxiv 2023. [PDF][Code]
2024
-
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning. ACL 2024. [PDF][Code][Project]
-
Decoder-only Streaming Transformer for Simultaneous Translation. ACL 2024. [PDF][Code]
-
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation. ACL 2024. [Code][Project]
-
Self-Modifying State Modeling for Simultaneous Machine Translation. ACL 2024 [PDF][Code]
-
Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models. Arxiv 2023. [PDF][Code]
-
Glancing Future for Simultaneous Machine Translation. ICASSP 2024. [PDF][Code]
-
LANGUAGE MODEL IS A BRANCH PREDICTOR FOR SIMULTANEOUS MACHINE TRANSLATION. ICASSP 2024. [PDF][Code]
-
R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech Translation. Arxiv 2024. [PDF]
-
SimulTron: On-Device Simultaneous Speech to Speech Translation. Arxiv 2024. [PDF]
-
Recent Advances in End-to-End Simultaneous Speech Translation. Arxiv 2024. [PDF]
-
Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous Translation. Arxiv 2024. [PDF]
-
SiLLM: Large Language Models for Simultaneous Machine Translation. Arxiv 2024. [PDF][Code]
-
Conversational SimulMT: Efficient Simultaneous Translation with Large Language Models. Arxiv 2024. [PDF]
-
TransLLaMa: LLM-based Simultaneous Translation System. Arxiv 2024. [PDF]
2025
-
Rethinking Cascaded Speech-to-Text Translation. Arxiv 2025. [PDF]
-
High-Fidelity Simultaneous Speech-To-Speech Translation. Arxiv 2025. [PDF]
-
SpeechT: Findings of the First Mentorship in Speech Translation. Arxiv 2025. [PDF]
-
InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model. Arxiv 2025. [PDF]
-
Direct Speech to Speech Translation: A Review. Arxiv 2025. [PDF]
-
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation. Arxiv 2025. [PDF]
-
AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation. Arxiv 2025. [PDF]
-
Efficient and Adaptive Simultaneous Speech Translation with Fully Unidirectional Architecture. Arxiv 2025. [PDF]
-
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs. Arxiv 2025. [PDF]
-
Large Model Empowered Streaming Semantic Communications for Speech Translation. Arxiv 2025. [PDF]
-
Speech Translation Refinement using Large Language Models. Arxiv 2025. [PDF]
-
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation. Arxiv 2025. [PDF]
-
Bemba Speech Translation: Exploring a Low-Resource African Language. Arxiv 2025. [PDF]
-
Language translation, and change of accent for speech-to-speech translation. Arxiv 2025. [PDF]
-
Improve Speech Translation Through Text Rewrite. COLING Industry 2025. [PDF]
Tutorial
- INTERSPEECH 2019 survey talk: Spoken Language Translation
- ACL 2020 Theme paper: Speech Translation and the End-to-End Promise: Taking Stock of Where We Are
- EACL 2021 tutorial: Speech Translation
- Blog: Getting Started with End-to-End Speech Translation
Workshops
IWSLT 2020
-
ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020. [PDF]
-
Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University. [PDF]
-
KIT’s IWSLT 2020 SLT Translation System. [PDF]
-
End-to-End Simultaneous Translation System for IWSLT2020 Using Modality Agnostic Meta-Learning. [PDF]
-
ELITR Non-Native Speech Translation at IWSLT 2020. [PDF]
-
Re-translation versus Streaming for Simultaneous Translation. [PDF]
-
Towards Stream Translation: Adaptive Computation Time for Simultaneous Machine Translation. [PDF]
-
Neural Simultaneous Speech Translation Using Alignment-Based Chunking. [PDF]
AutoSimTrans 2020
- Dynamic Sentence Boundary Detection for Simultaneous Translation. [PDF]
ASLTRW 2021
-
Operating a Complex SLT System with Speakers and Human Interpreters. [PDF]
-
Simultaneous Speech Translation for Live Subtitling: from Delay to Display. [PDF]
IWSLT 2021
-
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021. [PDF]
-
NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task. [PDF]
-
The University of Edinburgh’s Submission to the IWSLT21 Simultaneous Translation Task. [PDF]
-
Without Further Ado: Direct and Simultaneous Speech Translation by AppTek in 2021. [PDF]
-
The Volctrans Neural Speech Translation System for IWSLT 2021. [PDF]
-
Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data. [PDF]
-
Towards the evaluation of automatic simultaneous speech translation from a communicative perspective. [PDF]
-
Tag Assisted Neural Machine Translation of Film Subtitles. [PDF]
AutoSimTrans 2021
-
ICT’s System for AutoSimTrans 2021: Robust Char-Level Simultaneous Translation. [PDF]
-
BIT’s system for AutoSimulTrans2021. [PDF]
-
XMU’s Simultaneous Translation System at NAACL 2021. [PDF]
-
System Description on Automatic Simultaneous Translation Workshop. [PDF]
-
BSTC: A Large-Scale Chinese-English Speech Translation Dataset. [PDF]
IWSLT 2022
-
Simultaneous Neural Machine Translation with Prefix Alignment. [PDF]
-
Anticipation-Free Training for Simultaneous Machine Translation. [PDF]
-
The AISP-SJTU Simultaneous Translation System for IWSLT 2022. [PDF]
-
The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022. [PDF]
-
The HW-TSC’s Simultaneous Speech Translation System for IWSLT 2022
Evaluation. [PDF] -
MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks. [PDF]
-
CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022. [PDF]
-
NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022. [PDF]
AutoSimTrans 2022
-
Over-Generation Cannot Be Rewarded: Length-Adaptive Average Lagging for Simultaneous Speech Translation. [PDF]
-
System Description on Automatic Simultaneous Translation Workshop. [PDF]
-
System Description on Third Automatic Simultaneous Translation Workshop. [PDF]
-
End-to-End Simultaneous Speech Translation with Pretraining and Distillation: Huawei Noah’s System for AutoSimTranS 2022. [PDF]
-
BIT-Xiaomi’s System for AutoSimTrans 2022. [PDF]
-
USST’s System for AutoSimTrans 2022. [PDF]
IWSLT 2023
-
Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023. [PDF]
-
MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation. [PDF]
-
CMU’s IWSLT 2023 Simultaneous Speech Translation System. [PDF]
-
NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023. [PDF]
-
Language Model Based Target Token Importance Rescaling for Simultaneous Neural Machine Translation. [PDF]
-
Tagged End-to-End Simultaneous Speech Translation Training Using Simultaneous Interpretation Data. [PDF]
-
The HW-TSC’s Simultaneous Speech-to-Text Translation System for IWSLT 2023 Evaluation. [PDF]
-
The HW-TSC’s Simultaneous Speech-to-Speech Translation System for IWSLT 2023 Evaluation. [PDF]
-
Towards Efficient Simultaneous Speech Translation: CUNI-KIT System for Simultaneous Track at IWSLT 2023. [PDF]
-
The Xiaomi AI Lab’s Speech Translation Systems for IWSLT 2023 Offline Task, Simultaneous Task and Speech-to-Speech Task. [PDF]