Evaluating the diversity of scientific discourse on
twenty-one multilingual Wikipedias using citation analysis
Michael Taylor, Roisi Proven, Carlos Areia
arXiv (arXiv: 2501.09666v1)
Generated on April 27, 2025
, Evaluating the diversity of scientific discourse on
twenty-one multilingual Wikipedias using citation analysis
Abstract
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health
content, citing over 4 million scholarly publications. However, the representation of research-based
knowledge across different languages on Wikipedia has been under explored. This study analyses the
largest database of Wikipedia citations collected to date, examining the uniqueness of content and
research representation across languages. METHOD: The study included nearly 3.5 million unique
research articles and their Wikipedia mentions from 21 languages. These were categorized into three
groups: Group A (publications uniquely cited by a single non-English Wikipedia), Group B (co-cited by
English and non-English Wikipedias), and Group C (co-cited by multiple non-English Wikipedias).
Descriptive and comparative statistics were conducted by Wikipedia language, group, and discipline.
RESULTS: Significant differences were found between twenty non-English languages and English
Wikipedia (p<0.001). While English Wikipedia is the largest, non-English Wikipedias cite an additional
1.5 million publications. CONCLUSION: English Wikipedia should not be seen as a comprehensive
body of information. Non-English Wikipedias cover unique subjects and disciplines, offering a more
complete representation of research collectively. The uniqueness of voice in non-English Wikipedias
correlates with their size, though other factors may also influence these differences.
Evaluating the diversity of scientific discourse on twenty - one multilingual Wikipedias using citation
analysis Michael Taylor (University of Wolverhampton, Digital Science) - correspondent, 61 Home
Close, Oxford OX2 8PT, ; m.taylor@digital -science.com; Roisi Proven (ex
-Altmetric); Carlos Areia (Digital Science, University of Coventry). 0000- 0002 -4668 -7069 Abstract
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health
content, citing over 4 million scholarly publications. However, the representation of research- based
knowledge across different languages on Wikipedia has been under explored. This study analyses the
largest database of Wikipedia citations collected to date, examining the uniqueness of content and
research representation across languages. METHOD: The study included nearly 3.5 million unique
research articles and their Wikipedia mentions from 21 languages. These were categorized into three
groups: Group A (publications uniquely cited by a single non- English Wikipedia), Group B (co- cited by
English and non- English Wikipedias), and Group C (co- cited by multiple non- English Wikipedias).
Descriptive and comparative statistics were conducted by Wikipedia language, group, and discipline.
RESULTS: Significant differences were found between twenty non- English languages and English
Wikipedia (p<0.001). While English Wikipedia is the largest, non- English Wikipedias cite an additional
1.5 million publications. CONCLUSION: English Wikipedia should not be seen as a comprehensive
body of information. Non- English Wikipedias cover unique subjects and disciplines, offering a more
complete representation of research collectively. The uniqueness of voice in non- English Wikipedias
correlates with their size, though other factors may also influence these differences. Conflicts of interest
Both Michael Taylor and Carlos Areia are employed by Digital Science, which owns Altmetric and
Dimensions. Rosie Proven was employed by Altmetric at the time of the analysis. Contributions: MT,
RP and CS conceived the project, MT and CS developed the methodology, MT, RP and CS wrote the
article, CS created the images. Data availability statement: summarized data and statistics will be made
available on Figshare
Introduction Wikipedia, an internet encyclopaedia, was launched in 2001 (Wikipedia, 2022a) and by
2009, held 3 million articles in English, being maintained by just under 500,000 editors (Arthur, 2009).
Wikipedia was recorded as being the seventh most visited website in the world in April 2023 (Top
twenty-one multilingual Wikipedias using citation analysis
Michael Taylor, Roisi Proven, Carlos Areia
arXiv (arXiv: 2501.09666v1)
Generated on April 27, 2025
, Evaluating the diversity of scientific discourse on
twenty-one multilingual Wikipedias using citation analysis
Abstract
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health
content, citing over 4 million scholarly publications. However, the representation of research-based
knowledge across different languages on Wikipedia has been under explored. This study analyses the
largest database of Wikipedia citations collected to date, examining the uniqueness of content and
research representation across languages. METHOD: The study included nearly 3.5 million unique
research articles and their Wikipedia mentions from 21 languages. These were categorized into three
groups: Group A (publications uniquely cited by a single non-English Wikipedia), Group B (co-cited by
English and non-English Wikipedias), and Group C (co-cited by multiple non-English Wikipedias).
Descriptive and comparative statistics were conducted by Wikipedia language, group, and discipline.
RESULTS: Significant differences were found between twenty non-English languages and English
Wikipedia (p<0.001). While English Wikipedia is the largest, non-English Wikipedias cite an additional
1.5 million publications. CONCLUSION: English Wikipedia should not be seen as a comprehensive
body of information. Non-English Wikipedias cover unique subjects and disciplines, offering a more
complete representation of research collectively. The uniqueness of voice in non-English Wikipedias
correlates with their size, though other factors may also influence these differences.
Evaluating the diversity of scientific discourse on twenty - one multilingual Wikipedias using citation
analysis Michael Taylor (University of Wolverhampton, Digital Science) - correspondent, 61 Home
Close, Oxford OX2 8PT, ; m.taylor@digital -science.com; Roisi Proven (ex
-Altmetric); Carlos Areia (Digital Science, University of Coventry). 0000- 0002 -4668 -7069 Abstract
INTRODUCTION: Wikipedia is a major source of information, particularly for medical and health
content, citing over 4 million scholarly publications. However, the representation of research- based
knowledge across different languages on Wikipedia has been under explored. This study analyses the
largest database of Wikipedia citations collected to date, examining the uniqueness of content and
research representation across languages. METHOD: The study included nearly 3.5 million unique
research articles and their Wikipedia mentions from 21 languages. These were categorized into three
groups: Group A (publications uniquely cited by a single non- English Wikipedia), Group B (co- cited by
English and non- English Wikipedias), and Group C (co- cited by multiple non- English Wikipedias).
Descriptive and comparative statistics were conducted by Wikipedia language, group, and discipline.
RESULTS: Significant differences were found between twenty non- English languages and English
Wikipedia (p<0.001). While English Wikipedia is the largest, non- English Wikipedias cite an additional
1.5 million publications. CONCLUSION: English Wikipedia should not be seen as a comprehensive
body of information. Non- English Wikipedias cover unique subjects and disciplines, offering a more
complete representation of research collectively. The uniqueness of voice in non- English Wikipedias
correlates with their size, though other factors may also influence these differences. Conflicts of interest
Both Michael Taylor and Carlos Areia are employed by Digital Science, which owns Altmetric and
Dimensions. Rosie Proven was employed by Altmetric at the time of the analysis. Contributions: MT,
RP and CS conceived the project, MT and CS developed the methodology, MT, RP and CS wrote the
article, CS created the images. Data availability statement: summarized data and statistics will be made
available on Figshare
Introduction Wikipedia, an internet encyclopaedia, was launched in 2001 (Wikipedia, 2022a) and by
2009, held 3 million articles in English, being maintained by just under 500,000 editors (Arthur, 2009).
Wikipedia was recorded as being the seventh most visited website in the world in April 2023 (Top