100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary english essay

Rating
-
Sold
-
Pages
12
Uploaded on
27-04-2025
Written in
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Show more Read less
Institution
Freshman / 9th Grade
Course
English language and composition









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Freshman / 9th grade
Course
English language and composition
School year
1

Document information

Uploaded on
April 27, 2025
Number of pages
12
Written in
2024/2025
Type
Summary

Content preview

The University of Edinburgh's Submissions to the WMT19
News Translation Task

Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio
Valerio Miceli Barone, Alexandra Birch

arXiv (arXiv: 1907.05854v1)

Generated on April 27, 2025

, The University of Edinburgh's Submissions to the WMT19
News Translation Task


Abstract
The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six
language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English,
German-to-English, and English-to-Czech. For all translation directions, we created or used
back-translations of monolingual data in the target language as additional synthetic training data. For
English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training,
and translation pivoting through Hindi. For translation to and from Chinese, we investigated
character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we
studied the impact of vast amounts of back-translated training data on translation quality, gaining a few
additional insights over Edunov et al. (2018). For English-to-Czech, we compared different
pre-processing and tokenisation regimes.

The University of Edinburgh’s Submissions to the WMT19 News Translation Task Rachel Bawden
Nikolay Bogoychev Ulrich Germann Roman Grundkiewicz Faheem Kirefu Antonio Valerio Miceli
Barone Alexandra Birch School of Informatics, University of Edinburgh, Scotland
Abstract The University of Edinburgh participated in the WMT19 Shared
Task on News Translation in six language directions: English $Gujarati, English$Chinese, German
!English, and English!Czech. For all translation direc- tions, we created or used back-translations of
monolingual data in the target language as additional synthetic training data. For English$Gujarati, we
also explored semi- supervised MT with cross-lingual language model pre-training, and translation
pivoting through Hindi. For translation to and from Chi- nese, we investigated character-based tokeni-
sation vs. sub-word segmentation of Chinese text. For German!English, we studied the im- pact of vast
amounts of back-translated train- ing data on translation quality, gaining a few additional insights over
Edunov et al. (2018). For English!Czech, we compared different pre-processing and tokenisation
regimes. 1 Introduction The University of Edinburgh participated in the WMT19 Shared Task on News
Transla- tion in six language directions: English-Gujarati (EN$GU), English-Chinese (EN $ZH),
German- English (DE!EN) and English-Czech (EN !CS). All our systems are neural machine translation
(NMT) systems trained in constrained data condi- tions with the Marian1toolkit (Junczys-Dowmunt et
al., 2018). The different language pairs pose very different challenges, due to the characteristics of the
languages involved and arguably more impor- tantly, due to the amount of training data available.
Pre-processing For EN$ZH, we investigate character-level pre-processing for Chinese com- pared with
subword segmentation. For EN !CS, we show that it is possible in high resource settings to simplify
pre-processing by removing steps. 1https://marian-nmt.github.ioExploiting non-parallel resources For
all lan- guage directions, we create additional, synthetic parallel training data. For the high resource lan-
guage pairs, we look at ways of effectively us- ing large quantities of backtranslated data. For example,
for DE!EN, we investigated the most effective way of combining genuine parallel data with larger
quantities of synthetic parallel data and for CS!EN, we ■lter backtranslated data by re- scoring
translations using the MT model for the op- posite direction. The challenge for our low resource pair,
EN$GU, is producing suf■ciently good mod- els for back-translation, which we achieve by train- ing
semi-supervised MT models with cross-lingual language model pre-training (Lample and Conneau,
2019). We use the same technique to translate ad- ditional data from a related language, Hindi. NMT
Training settings In all experiments, we test state-of-the-art training techniques, including using
ultra-large mini-batches for DE !EN and EN$ZH, implemented as optimiser delay. Results summary
Of■cial automatic evaluation results for all ■nal systems on the WMT19 test set are summarised in
Table 1. Throughout the paper, BLEU is calculated using SACRE BLEU2 (Post, 2018) unless otherwise
$8.49
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
cleoellis

Get to know the seller

Seller avatar
cleoellis University of the People
View profile
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
8 months
Number of followers
0
Documents
11
Last sold
-
Essay, Notes, Test, Quizzes

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions