Escrito por estudiantes que aprobaron Inmediatamente disponible después del pago Leer en línea o como PDF ¿Documento equivocado? Cámbialo gratis 4,6 TrustPilot
logo-home
Resumen

Summary english essay

Puntuación
-
Vendido
-
Páginas
12
Subido en
27-04-2025
Escrito en
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Mostrar más Leer menos
Institución
Freshman / 9th Grade
Grado
English language and composition

Vista previa del contenido

The University of Edinburgh's Submissions to the WMT19
News Translation Task

Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio
Valerio Miceli Barone, Alexandra Birch

arXiv (arXiv: 1907.05854v1)

Generated on April 27, 2025

, The University of Edinburgh's Submissions to the WMT19
News Translation Task


Abstract
The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six
language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English,
German-to-English, and English-to-Czech. For all translation directions, we created or used
back-translations of monolingual data in the target language as additional synthetic training data. For
English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training,
and translation pivoting through Hindi. For translation to and from Chinese, we investigated
character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we
studied the impact of vast amounts of back-translated training data on translation quality, gaining a few
additional insights over Edunov et al. (2018). For English-to-Czech, we compared different
pre-processing and tokenisation regimes.

The University of Edinburgh’s Submissions to the WMT19 News Translation Task Rachel Bawden
Nikolay Bogoychev Ulrich Germann Roman Grundkiewicz Faheem Kirefu Antonio Valerio Miceli
Barone Alexandra Birch School of Informatics, University of Edinburgh, Scotland
Abstract The University of Edinburgh participated in the WMT19 Shared
Task on News Translation in six language directions: English $Gujarati, English$Chinese, German
!English, and English!Czech. For all translation direc- tions, we created or used back-translations of
monolingual data in the target language as additional synthetic training data. For English$Gujarati, we
also explored semi- supervised MT with cross-lingual language model pre-training, and translation
pivoting through Hindi. For translation to and from Chi- nese, we investigated character-based tokeni-
sation vs. sub-word segmentation of Chinese text. For German!English, we studied the im- pact of vast
amounts of back-translated train- ing data on translation quality, gaining a few additional insights over
Edunov et al. (2018). For English!Czech, we compared different pre-processing and tokenisation
regimes. 1 Introduction The University of Edinburgh participated in the WMT19 Shared Task on News
Transla- tion in six language directions: English-Gujarati (EN$GU), English-Chinese (EN $ZH),
German- English (DE!EN) and English-Czech (EN !CS). All our systems are neural machine translation
(NMT) systems trained in constrained data condi- tions with the Marian1toolkit (Junczys-Dowmunt et
al., 2018). The different language pairs pose very different challenges, due to the characteristics of the
languages involved and arguably more impor- tantly, due to the amount of training data available.
Pre-processing For EN$ZH, we investigate character-level pre-processing for Chinese com- pared with
subword segmentation. For EN !CS, we show that it is possible in high resource settings to simplify
pre-processing by removing steps. 1https://marian-nmt.github.ioExploiting non-parallel resources For
all lan- guage directions, we create additional, synthetic parallel training data. For the high resource lan-
guage pairs, we look at ways of effectively us- ing large quantities of backtranslated data. For example,
for DE!EN, we investigated the most effective way of combining genuine parallel data with larger
quantities of synthetic parallel data and for CS!EN, we ■lter backtranslated data by re- scoring
translations using the MT model for the op- posite direction. The challenge for our low resource pair,
EN$GU, is producing suf■ciently good mod- els for back-translation, which we achieve by train- ing
semi-supervised MT models with cross-lingual language model pre-training (Lample and Conneau,
2019). We use the same technique to translate ad- ditional data from a related language, Hindi. NMT
Training settings In all experiments, we test state-of-the-art training techniques, including using
ultra-large mini-batches for DE !EN and EN$ZH, implemented as optimiser delay. Results summary
Of■cial automatic evaluation results for all ■nal systems on the WMT19 test set are summarised in
Table 1. Throughout the paper, BLEU is calculated using SACRE BLEU2 (Post, 2018) unless otherwise

Escuela, estudio y materia

Institución
Freshman / 9th grade
Grado
English language and composition
Año escolar
1

Información del documento

Subido en
27 de abril de 2025
Número de páginas
12
Escrito en
2024/2025
Tipo
RESUMEN

Temas

$8.49
Accede al documento completo:

¿Documento equivocado? Cámbialo gratis Dentro de los 14 días posteriores a la compra y antes de descargarlo, puedes elegir otro documento. Puedes gastar el importe de nuevo.
Escrito por estudiantes que aprobaron
Inmediatamente disponible después del pago
Leer en línea o como PDF

Conoce al vendedor
Seller avatar
cleoellis

Conoce al vendedor

Seller avatar
cleoellis University of the People
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
-
Miembro desde
10 meses
Número de seguidores
0
Documentos
11
Última venta
-
Essay, Notes, Test, Quizzes

0.0

0 reseñas

5
0
4
0
3
0
2
0
1
0

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes