Resumen

Summary english essay

Puntuación

Vendido

Páginas

Subido en

27-04-2025

Escrito en

2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Mostrar más Leer menos

Institución

Freshman / 9th Grade

Grado

English language and composition

Vista previa del contenido

The University of Edinburgh's Submissions to the WMT19
News Translation Task

Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio
Valerio Miceli Barone, Alexandra Birch

arXiv (arXiv: 1907.05854v1)

Generated on April 27, 2025

, The University of Edinburgh's Submissions to the WMT19
News Translation Task

Abstract
The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six
language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English,
German-to-English, and English-to-Czech. For all translation directions, we created or used
back-translations of monolingual data in the target language as additional synthetic training data. For
English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training,
and translation pivoting through Hindi. For translation to and from Chinese, we investigated
character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we
studied the impact of vast amounts of back-translated training data on translation quality, gaining a few
additional insights over Edunov et al. (2018). For English-to-Czech, we compared different
pre-processing and tokenisation regimes.

The University of Edinburgh’s Submissions to the WMT19 News Translation Task Rachel Bawden
Nikolay Bogoychev Ulrich Germann Roman Grundkiewicz Faheem Kirefu Antonio Valerio Miceli
Barone Alexandra Birch School of Informatics, University of Edinburgh, Scotland
Abstract The University of Edinburgh participated in the WMT19 Shared
Task on News Translation in six language directions: English $Gujarati, English$Chinese, German
!English, and English!Czech. For all translation direc- tions, we created or used back-translations of
monolingual data in the target language as additional synthetic training data. For English$Gujarati, we
also explored semi- supervised MT with cross-lingual language model pre-training, and translation
pivoting through Hindi. For translation to and from Chi- nese, we investigated character-based tokeni-
sation vs. sub-word segmentation of Chinese text. For German!English, we studied the im- pact of vast
amounts of back-translated train- ing data on translation quality, gaining a few additional insights over
Edunov et al. (2018). For English!Czech, we compared different pre-processing and tokenisation
regimes. 1 Introduction The University of Edinburgh participated in the WMT19 Shared Task on News
Transla- tion in six language directions: English-Gujarati (EN$GU), English-Chinese (EN $ZH),
German- English (DE!EN) and English-Czech (EN !CS). All our systems are neural machine translation
(NMT) systems trained in constrained data condi- tions with the Marian1toolkit (Junczys-Dowmunt et
al., 2018). The different language pairs pose very different challenges, due to the characteristics of the
languages involved and arguably more impor- tantly, due to the amount of training data available.
Pre-processing For EN$ZH, we investigate character-level pre-processing for Chinese com- pared with
subword segmentation. For EN !CS, we show that it is possible in high resource settings to simplify
pre-processing by removing steps. 1https://marian-nmt.github.ioExploiting non-parallel resources For
all lan- guage directions, we create additional, synthetic parallel training data. For the high resource lan-
guage pairs, we look at ways of effectively us- ing large quantities of backtranslated data. For example,
for DE!EN, we investigated the most effective way of combining genuine parallel data with larger
quantities of synthetic parallel data and for CS!EN, we ■lter backtranslated data by re- scoring
translations using the MT model for the op- posite direction. The challenge for our low resource pair,
EN$GU, is producing suf■ciently good mod- els for back-translation, which we achieve by train- ing
semi-supervised MT models with cross-lingual language model pre-training (Lample and Conneau,
2019). We use the same technique to translate ad- ditional data from a related language, Hindi. NMT
Training settings In all experiments, we test state-of-the-art training techniques, including using
ultra-large mini-batches for DE !EN and EN$ZH, implemented as optimiser delay. Results summary
Of■cial automatic evaluation results for all ■nal systems on the WMT19 test set are summarised in
Table 1. Throughout the paper, BLEU is calculated using SACRE BLEU2 (Post, 2018) unless otherwise

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Freshman / 9th grade
Grado: English language and composition
Año escolar: 1

Información del documento

Subido en: 27 de abril de 2025
Número de páginas: 12
Escrito en: 2024/2025
Tipo: RESUMEN

Temas

univesity
essay
english
edinburgh

$8.49

Accede al documento completo:

Escrito por estudiantes que aprobaron

Inmediatamente disponible después del pago

Leer en línea o como PDF

Conoce al vendedor

cleoellis

Conoce al vendedor

cleoellis University of the People

Ver perfil

Seguir

Vendido

Miembro desde

10 meses

Número de seguidores

Documentos

Última venta

Essay, Notes, Test, Quizzes

0.0

0 reseñas

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller cleoellis. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now