100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary english essay

Beoordeling
-
Verkocht
-
Pagina's
12
Geüpload op
27-04-2025
Geschreven in
2024/2025

The vast majority of the popular English named entity recognition (NER) datasets contain American or British English data, despite the existence of many global varieties of English. As such, it is unclear whether they generalize for analyzing use of English globally.

Meer zien Lees minder
Instelling
Freshman / 9th Grade
Vak
English language and composition









Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Freshman / 9th grade
Vak
English language and composition
School jaar
1

Documentinformatie

Geüpload op
27 april 2025
Aantal pagina's
12
Geschreven in
2024/2025
Type
Samenvatting

Voorbeeld van de inhoud

The University of Edinburgh's Submissions to the WMT19
News Translation Task

Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio
Valerio Miceli Barone, Alexandra Birch

arXiv (arXiv: 1907.05854v1)

Generated on April 27, 2025

, The University of Edinburgh's Submissions to the WMT19
News Translation Task


Abstract
The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six
language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English,
German-to-English, and English-to-Czech. For all translation directions, we created or used
back-translations of monolingual data in the target language as additional synthetic training data. For
English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training,
and translation pivoting through Hindi. For translation to and from Chinese, we investigated
character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we
studied the impact of vast amounts of back-translated training data on translation quality, gaining a few
additional insights over Edunov et al. (2018). For English-to-Czech, we compared different
pre-processing and tokenisation regimes.

The University of Edinburgh’s Submissions to the WMT19 News Translation Task Rachel Bawden
Nikolay Bogoychev Ulrich Germann Roman Grundkiewicz Faheem Kirefu Antonio Valerio Miceli
Barone Alexandra Birch School of Informatics, University of Edinburgh, Scotland
Abstract The University of Edinburgh participated in the WMT19 Shared
Task on News Translation in six language directions: English $Gujarati, English$Chinese, German
!English, and English!Czech. For all translation direc- tions, we created or used back-translations of
monolingual data in the target language as additional synthetic training data. For English$Gujarati, we
also explored semi- supervised MT with cross-lingual language model pre-training, and translation
pivoting through Hindi. For translation to and from Chi- nese, we investigated character-based tokeni-
sation vs. sub-word segmentation of Chinese text. For German!English, we studied the im- pact of vast
amounts of back-translated train- ing data on translation quality, gaining a few additional insights over
Edunov et al. (2018). For English!Czech, we compared different pre-processing and tokenisation
regimes. 1 Introduction The University of Edinburgh participated in the WMT19 Shared Task on News
Transla- tion in six language directions: English-Gujarati (EN$GU), English-Chinese (EN $ZH),
German- English (DE!EN) and English-Czech (EN !CS). All our systems are neural machine translation
(NMT) systems trained in constrained data condi- tions with the Marian1toolkit (Junczys-Dowmunt et
al., 2018). The different language pairs pose very different challenges, due to the characteristics of the
languages involved and arguably more impor- tantly, due to the amount of training data available.
Pre-processing For EN$ZH, we investigate character-level pre-processing for Chinese com- pared with
subword segmentation. For EN !CS, we show that it is possible in high resource settings to simplify
pre-processing by removing steps. 1https://marian-nmt.github.ioExploiting non-parallel resources For
all lan- guage directions, we create additional, synthetic parallel training data. For the high resource lan-
guage pairs, we look at ways of effectively us- ing large quantities of backtranslated data. For example,
for DE!EN, we investigated the most effective way of combining genuine parallel data with larger
quantities of synthetic parallel data and for CS!EN, we ■lter backtranslated data by re- scoring
translations using the MT model for the op- posite direction. The challenge for our low resource pair,
EN$GU, is producing suf■ciently good mod- els for back-translation, which we achieve by train- ing
semi-supervised MT models with cross-lingual language model pre-training (Lample and Conneau,
2019). We use the same technique to translate ad- ditional data from a related language, Hindi. NMT
Training settings In all experiments, we test state-of-the-art training techniques, including using
ultra-large mini-batches for DE !EN and EN$ZH, implemented as optimiser delay. Results summary
Of■cial automatic evaluation results for all ■nal systems on the WMT19 test set are summarised in
Table 1. Throughout the paper, BLEU is calculated using SACRE BLEU2 (Post, 2018) unless otherwise
$8.49
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
cleoellis

Maak kennis met de verkoper

Seller avatar
cleoellis University of the People
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
0
Lid sinds
8 maanden
Aantal volgers
0
Documenten
11
Laatst verkocht
-
Essay, Notes, Test, Quizzes

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen