GSA Today
Volume 36, Issue 6
Lost in Non-Translation: AI Translation Could Bring Non-English Literature into Modern Geoscience Research
Science

Lost in Non-Translation: AI Translation Could Bring Non-English Literature into Modern Geoscience Research

Elena Robakiewicz et al.

Science

In this article


Authors

Elena Robakiewicz,*,1 Walter Alvarez,2 Lung Sang Chan,2,3 and Tadesse Alemu4

Abstract

Over the past several decades, English has emerged as the dominant language across the geological sciences. Despite this growth of a “unifying” language, there is still substantial research, particularly from before the 1970s, that is available only in other languages. This results in unfamiliar concepts as well as regions of the world whose geology and geological history are not well incorporated in modern scientific literature, often causing them to be overlooked or understudied. Human translation of texts is not always possible due to high time and resources costs. This article, therefore, highlights the potential use of free, publicly available artificial intelligence (AI) tools to recover earlier findings and reincorporate them into scientific conversations. This paper is a call for lost geoscience papers in languages other than English. We aim to collect papers “lost in non-translation” containing pivotal scientific ideas that can be incorporated into the English scientific literature with the use of AI translation. We seek identification in particular of two kinds of legacy research: (1) papers that should be recognized historically as early proposals of hypotheses or theories that are now widely accepted or are currently debated (such as the metacraton, or diwa, concept from the Chinese literature); and (2) papers containing data that could contribute to current research (such as data from a Russian paper about what lies buried beneath the Aswan High Dam and Lake Nasser in Egypt). While AI cannot replace translations by scientifically trained experts, these services can be powerful tools for scientific discovery, and we urge our scientific colleagues to help us rediscover the potential of non–English language papers.


* erobakie@uni-koeln.de

1 Institute for Geography Education, University of Cologne, Germany
2 Department of Earth and Planetary Science, University of California Berkeley, USA
3 Department of Earth Sciences, The University of Hong Kong, Hong Kong
4 Geology and Environmental Science Department, University of Wisconsin Eau Claire, USA

5 Supplemental Material. Table S1. Comparison of English versus non-English articles science articles from 2006 to 2015. Please visit https://doi.org/10.1130/GSAT.S.32316507 to access the supplemental material; contact editing@geosociety.org with any questions.

CITATION: Robakiewicz, E., et al., 2026, Lost in non-translation: AI translation could bring non-English literature into modern geoscience research:
GSA Today, v. 36, p. 4–9, https://doi.org/10.1130/GSATG642A.1.

© 2026 The Authors. Gold Open Access: This paper is published under the terms of the CC-BY-NC license. Printed in the USA.

Introduction

A striking feature of 20th century science is how dominant English has become in scientific communication. International scientific journals are increasingly published in English. English is the language of international conferences and day-to-day communications among groups of scientists who have no other common language. The benefits of having a common language are obvious, but it also causes problems, such as disenfranchising and putting more pressure on non-native English speakers. In this paper we specifically address another problem—the loss of important scientific “legacy research” papers with key concepts published in other languages and never translated into English. We focus specifically on such lost concepts in earth sciences.

From the Scientific Revolution until the end of the 20th century, scientific research was discussed and published in many different languages (e.g., Kaplan, 2001; Ferguson, 2007). The two senior authors of this paper (WA and LC) recall a time when it was necessary to learn other languages to communicate globally.

Until 1914, German was the predominant international language of science. After 1914, however, its dominance declined due to post-war banishment of Germany from international scientific conferences (Ammon, 2001). After World War II, the United States rapidly expanded its scientific research, particularly compared to its academic rivals, Germany and France, who were rebuilding after the war and had also lost much of their scientific community due to emigration during the National Socialist regime.

Over decades, the Cold War stimulus to scientific research, development of computer technology, growth of large research-oriented universities, and increased political and ideological isolation of countries like China and the Soviet Union, all contributed to an increase in the United States’ share of the world’s research output (Ferguson, 2007), thus increasing the volume of English-language research. The dominance of the English language also benefitted from centuries of Great Britain’s imperial expansion, which had spread the English language across the globe.

Since the 1900s, a steady decline has been observed in the percentage of non-English scientific papers published, with the exception of the periods of World War I and World War II. During wartime, the total volume of academic papers published decreased, particularly the share of French, German, and Russian papers, which did not rebound until reconstruction in the 1960s (in 1969 German represented 6.17%, French 5.38%, and Russian 5.06%; Fig. 1; Liu, 2017). According to the Science Citation Index Expanded (1900–2015), in the 21st century, <5% of papers published per year are in languages other than English (Fig. 1). An estimated 97% of all scientific papers in 2015 and 98.5% of all geoscience-related papers between 2006 and 2015 were published in English (Table 1; Liu, 2017; see Table S1 in the Supplemental Material5).

Figure 1

Figure 1

Percent of papers not in English in Science Citation Index Expanded (1900–2015) based on data reported by Liu (2017).

Table 1

Table 1

Table 1. Comparison of different branches of different earth science disciplines and the number of English versus non-English articles and reviews from the field (according to Web of Science) from 2006 to 2015, based on Liu (2017). Complete data is included in Table S1.


5 Supplemental Material. Table S1. Comparison of English versus non-English articles science articles from 2006 to 2015. Please visit https://doi.org/10.1130/GSAT.S.32316507 to access the supplemental material; contact editing@geosociety.org with any questions.


In this paper we call attention to the potential that the modern dominance of English in geosciences may have obscured a wealth of older papers published in other languages with important observations, ideas, and hypotheses that have not been incorporated in the modern lingua franca framework. While individual researchers may have unearthed many historically important but obscure papers in other languages (e.g., Şengör and Bach, 2025; History of Earth Sciences Society), we hope to provide a pathway for a more systematic approach. In this paper, we discuss two exemplary cases of important lost geological concepts: one published in Russian, the other in Chinese. We then highlight the benefits and harms of monolingual science to the geosciences and modern researchers. We conclude by highlighting the potential of modern translation technologies to recover historic texts from other languages.

This paper acts as a call across the geoscience community to bring forward historic papers that may have been “lost in non-translation” to ensure a preservation of valuable scientific ideas that may have been overlooked due to the dominance of English. We further suggest that it would be valuable to establish a repository for machine translations of important older papers.

DISCUSSION

Impacts of a Lingua Franca on the Geosciences

A dominant scientific language has the potential to unify scientific communities. For those who have learned it, a common scientific language brings them into the fold of scientific discussion (Steigerwald et al., 2022), expanding their research beyond the size of their linguistic community. Large interdisciplinary projects and teams like the International Ocean Discovery Program, International Continental Scientific Drilling Program, Greenland Ice Core Project, or Scientific Committee of Antarctic Research require standardized communication methods, and a single common language is necessary.

However, English as the common language of science harms researchers who are not fluent. Conducting research and publishing in English is associated with academic prestige, leading to greater career and scientific mobility and opportunities (e.g., Hwang, 2005; López-Navarro et al., 2015; Huttner-Koros and Perera, 2016). Non-native speakers must therefore put additional time and resources into conducting reputable work (e.g., Vasconcelos et al., 2008; Ramírez-Castañeda, 2020; Amano et al., 2023), causing them to feel increased pressure. This pressure is especially strong for individuals whose language is highly divergent from English or who are from areas with poorer English proficiency (Hwang, 2005; Amano et al., 2016). Native English-speaking scientists, particularly those from North America, therefore inevitably receive preferential treatment due to the dominance of English in scientific publishing over non-native English-speaking colleagues (O’Neil, 2018). This can result in non-native speakers publishing in lesser-known journals or regional non-English journals, reducing the visibility and prestige of their work (Mur Dueñas, 2012).

A lingua franca in science also negatively impacts the quality of the science itself. The dominance of English in academia diminishes the diversity of perspectives which can be vital in constructing robust and innovative scientific knowledge (UNESCO, 2021). Inhibiting diverse points of view to fit within the structure and vocabulary of a single language hinders scholarly discourse and scientific creativity (Suzina, 2021). Constraining global scientific discussions to a single language limits who builds, has access to, and communicates scientific knowledge to the broader public and invested local communities (e.g., Meneghini and Packer, 2007; Nguyen and Tran, 2019; Márquez and Porras, 2020). This is particularly relevant within rapidly changing fields such as environmental, conservation, and climate sciences whose knowledge greatly impacts the public.

Lastly, as highlighted by the Aswan High Dam and diwa examples, monolinguistic science can cause knowledge generated in other languages to be lost. Exclusive use of English during literature searches, which is amplified by language biases in search engines (Rovira et al., 2021), often creates gaps within global databases and reviews (e.g., Konno et al., 2020; Zenni et al., 2023; Hannah et al., 2024).

Revisiting Our Untranslated Examples

Our examples of untranslated geoscience publications from other languages underscore the cost of overlooking important scientific knowledge. In Chumakov’s example, an astonishing discovery, the buried Aswan Gorge, was virtually unknown to the English-speaking scientific community. Without the serendipitous discovery of the work from ODP Leg 13 by Chumakov, that information might still be unknown, and many details from his 1967 publication would likely remain hidden.

In the case of diwa, the divergence between the Chinese and English languages makes original works in Chinese less accessible to the broader scientific community compared to those in the Romance or Germanic languages (Ren and Rousseau, 2002). China’s historic isolation from World War II to the early 1980s has left many early Chinese publications at risk of being lost, despite the recent increase in Chinese publications over the past few decades (Faghri and Bergman, 2024). The lack of effective translation may contribute to the permanent loss of early significant geological concepts. It is therefore critical to be proactive in finding and translating scientific ideas lost over the past century.

Translation can be a substantial undertaking, even for geologists fluent in multiple languages. As of 2025, despite many researchers, particularly senior ones, being aware of important ideas published in foreign languages, valuable insights and details of the original observations remain hidden from much of the scientific community, hindering the development of the geosciences as a whole.

Potential of AI in Translating Texts

The above discussion highlights the need to find and translate foreign publications that may contain important concepts in science and also the practical challenges in doing so. Translating academic texts is a large task for researchers, whether in devoting the time to translate a text or finding the money to hire another researcher to translate. Hannah et al. (2024) found that, in ecological sciences, the lack of language skills, limited funds, and time constraints are key factors that cause non-English literature to be overlooked in literature reviews. We therefore propose utilizing advanced neural translation technologies, specifically large language models (LLMs) and neural machine translation (NMT) to help preserve and disseminate concepts that are on the brink of being lost. While neural translation technologies are evolving rapidly, these neural translators probably represent the most advanced ones at time of publication.

LLMs such as ChatGPT are not designed to translate text, but they can be prompted to translate with decent accuracy. They scour open-access resources for information and piece together the most likely response based on the data and information compiled. While some studies have claimed LLMs have reached human parity between high-resource languages like English and Chinese, which have an abundance of online content available, others claim that those findings are probably overstated (Läubli et al., 2020), especially since LLMs were created for content analysis, making translations less direct and more susceptible to biases present within the training set. Regardless, LLMs are generally believed to be weaker than NMTs, particularly in translating low-resource languages, although future multilingual language models will potentially begin to address some concerns.

NMTs, such as GoogleTranslate or DeepL, are specifically designed to create accurate and realistic-sounding translations. Created around 2014, NMTs first translates words and phrases in chunks. After initial translation by an “encoder,” the full context of a sentence is used by a “decoder” to create a more natural sounding-translation. Rather than just translating words or phrases, NMTs explore words within their context, making all information within a sentence, weighted based on a source-language attention model, relevant to the translation (Way, 2018; Mohamed et al., 2021). The strength of these resources in replacing human translation has been debated (e.g., Castilho et al., 2017; Comelles and Laso, 2025), and ultimately NMTs have lower-quality translations for low-resource languages.

Beyond potential language errors, such as poorly translated scientific or technical terms and general mistranslations (Wan et al., 2022), AI translation technologies strive for idiomatic and highly fluent translations, which can limit accuracy and change the scientific meaning. It is therefore critical that these tools be used as a starting point rather than an end point. Understanding the broader scientific implications of papers requires human post-editing to achieve textual cohesion and higher standards of quality (e.g., Castilho et al., 2017; Läubli et al., 2020; Orrego-Carmona, 2022). While these technologies are useful for giving an overview of a paper, there is no assurance on the scientific accuracy of the translated text.

CALL FOR “LOST IN NON-TRANSLATION” PAPERS

How can we save papers on the brink of being lost in non-translation? AI translations can aid in identifying and promoting overlooked concepts, such as the Aswan buried gorge or diwa. With the help of AI translation, we believe the solution has two components. The first is sharing papers that have nearly been “lost in non-translation.” This could be accomplished through a community repository for older papers, even without translation. The second is then translating these papers and hosting them in a forum where translations can be shared, commented on, and edited between members of the community.

We recommend building an open database within an already established hub, such as Zenodo, Pangaea, ResearchGate, where it’s possible to host a community or create a forum to share texts “lost in non-translation.” These texts can then be living documents where a community of interested researchers can discuss issues in translations and utilize skillsets to provide live updates to increase the accuracy of translations.

We recognize that the recommended method for translating texts is not peer-reviewed or published in a traditional sense. There will be shared materials with errors or poor translations. Yet we believe that the creation of a community interested in these texts and distributing them to a broader swath of earth scientists benefits the science far more than any potential harm. Through this call, we hope to restore papers on the brink of being lost, restoring them as living documents at the center of new scientific discussions. We invite interested researchers to contact the corresponding author to continue this discussion and explore the use of new AI technologies and open data repositories to recover pivotal science from the past.

ACKNOWLEDGMENTS

We would like to especially acknowledge the guidance of Dr. Mohamed Abdelsalam as we planned this paper. His contributions to our discussions on African geology were invaluable and we greatly miss him. We would also like to thank the two reviewers who provided important feedback towards the completion of this manuscript.

REFERENCES CITED

  1. Abdelsalam, M.G., Liégeois, J.P., and Stern, R.J., 2002, The Saharan Metacraton: Journal of African Earth Sciences, v. 34, no. 3–4, p. 119–136, https://doi.org/10.1016/S0899-5362(02)00013-1.
  2. Amano, T., González-Varo, J.P., and Sutherland, W.J., 2016, Languages are still a major barrier to global science: PLoS Biology, v. 14, no. 12, https://doi.org/10.1371/journal.pbio.2000933.
  3. Amano, T., Ramírez-Castañeda, V., Berdejo-Espinola, V., Borokini, I., Chowdhury, S., Golivets, M., González-Trujillo, J.D., Montaño-Centellas, F., Paudel, K., White, R.L., and Veríssimo, D., 2023, The manifold costs of being a non-native English speaker in science: PLoS Biology, v. 21, no. 7, https://doi.org/10.1371/journal.pbio.3002184.
  4. Ammon, U., ed., 2001, The Dominance of English as a Language of Science: Effects on Other Languages and Language Communities: Berlin, De Gruyter Brill, https://doi.org/10.1515/9783110869484.
  5. Castilho, S., Moorkens, J., Gaspari, F., Calixto, I., Tinsley, J., and Way, A., 2017, Is neural machine translation the new state of the art?: The Prague Bulletin of Mathematical Linguistics, v. 108, p. 109–120, https://doi.org/10.1515/pralin-2017-0013.
  6. Chen, G., 1956, Examples of activizing regions in the Chinese Platform with special reference to the Cathaysian problem: Acta Geologica Sinica, v. 36, no. 3, p. 239–271.
  7. Chen, G., 2009, 陈国达全集 [The complete works of Chen Guoda]: Changsha, Central South University Press.
  8. Chumakov, I.S., 1967, Плиоценовые и плейстоценовые отложения долины Нила в Нубии и Верхнем Египте [Pliocene and Pleistocene deposits of the Nile Valley in Nubia and Upper Egypt;: Academy of Sciences of the U.S.S.R., Geol. Institute Trans., Moscow, v. 170, no. 5.
  9. Chumakov, I.S., 1973, Pliocene and Pleistocene deposits of the Nile Valley in Nubia and Upper Egypt (abstract), in Kaneps, A.G., ed., 1973, Initial Reports of the Deep Sea Drilling Project, Volume XIII, Part I: Washington, D.C., U.S. Government Printing Office, https://doi.org/10.2973/dsdp.proc.13.144-3.1973.
  10. Comelles, E., and Laso, N.J., 2025, The impact of MT as a writing tool on EFL academic writing: A qualitative linguistic analysis: Journal of Second Language Writing, v. 69, https://doi.org/10.1016/j.jslw.2025.101231.
  11. Faghri, A., and Bergman, T.L., 2024, Highly Ranked Scholars and the influence of countries and regions in research fields, disciplines, and specialties: Quantitative Science Studies, v. 5, no. 2, p. 464–483, https://doi.org/10.1162/qss_a_00291.
  12. Ferguson, G., 2007, The global spread of English, scientific communication and ESP: Questions of equity, access and domain loss: Ibérica: Revista de la Asociación Europea de Lenguas para Fines Específicos, v. 13, p. 7–38.
  13. Hannah, K., Haddaway, N.R., Fuller, R.A., and Amano, T., 2024, Language inclusion in ecological systematic reviews and maps: Barriers and perspectives: Research Synthesis Methods, v. 15, no. 3, p. 466–482, https://doi.org/10.1002/jrsm.1699.
  14. History of Earth Sciences Society, 2022, About Us:
    https://historyearthscience.org/about-us (accessed May 2026)
  15. Huttner-Koros, A., and Perera, S., 2016, Communicating science in English: A preliminary exploration into the professional self-perceptions of Australian scientists from language backgrounds other than English: Journal of Science Communication, v. 15, https://doi.org/10.22323/2.15060203.
  16. Hwang, K., 2005, The inferior science and the dominant use of English in knowledge production: A case study of Korean science and technology: Science Communication, v. 26, no. 4, p. 390–427, https://doi.org/10.1177/1075547005275428.
  17. Kaplan, R.B., 2001, English—the accidental language of science, in Ammon, U., ed., The Dominance of English as a Language of Science: Berlin, De Gruyter Brill, p. 3–26, https://doi.org/10.1515/9783110869484.3.
  18. Konno, K., Akasaka, M., Koshida, C., Katayama, N., Osada, N., Spake, R., and Amano, T., 2020, Ignoring non‐English‐language studies may bias ecological meta‐analyses: Ecology and Evolution, v. 10, no. 13, p. 6373–6384, https://doi.org/10.1002/ece3.6368.
  19. Läubli, S., Castilho, S., Neubig, G., Sennrich, R., Shen, Q., and Toral, A., 2020, A set of recommendations for assessing human–machine parity in language translation: Journal of Artificial Intelligence Research, v. 67, https://doi.org/10.1613/jair.1.11371.
  20. Liu, W., 2017, The changing role of non-English papers in scholarly communication: Evidence from Web of Science’s three journal citation indexes: Learned Publishing, v. 30, no. 2, p. 115–123, https://doi.org/10.1002/leap.1089.
  21. López-Navarro, I., Moreno, A.I., Quintanilla, M.Á., and Rey-Rocha, J., 2015, Why do I publish research articles in English instead of my own language? Differences in Spanish researchers’ motivations across scientific domains: Scientometrics, v. 103, p. 939–976, https://doi.org/10.1007/s11192-015-1570-1.
  22. Lung, S.C., 1986, 地洼学说奖励基金简介 [Introduction to the Geodesic Doctrine Incentive Fund]: Geotectonics and Mineralogy.
  23. Márquez, M.C., and Porras, A.M., 2020, Science communication in multiple languages is critical to its effectiveness: Frontiers in Communication, v. 5, https://doi.org/10.3389/fcomm.2020.00031.
  24. Meneghini, R., and Packer, A.L., 2007, Is there science beyond English?: Initiatives to increase the quality and visibility of non‐English publications might help to break down language barriers in scientific communication: EMBO Reports, v. 8, p. 112–116, https://doi.org/10.1038/sj.embor.7400906.
  25. Mohamed, S.A., Elsayed, A.A., Hassan, Y.F., and Abdou, M.A., 2021, Neural machine translation: past, present, and future: Neural Computing & Applications, v. 33, p. 15919–15931, https://doi.org/10.1007/s00521-021-06268-0.
  26. Mur Dueñas, P., 2012, Getting research published internationally in English: An ethnographic account of a team of Finance Spanish scholars’ struggles: Ibérica: Revista de la Asociación Europea de Lenguas para Fines Específicos, v. 24, p.139–155.
  27. Nguyen, A., and Tran, M., 2019, Science journalism for development in the Global South: A systematic literature review of issues and challenges: Public Understanding of Science, v. 28, no. 8, p. 973–990, https://doi.org/10.1177/0963662519875447.
  28. O’Neil, D., 2018, English as the lingua franca of international publishing: World Englishes, v. 37, no. 2, p. 146–165, https://doi.org/10.1111/weng.12293.
  29. Orrego-Carmona, D., 2022, La traducció automàtica en mans de tots: Adopció i canvis entre els usuaris generals de TA [Machine translation in everyone’s hands: Adoption and changes among general users of MT]: Tradumàtica: Tecnologies de la Traducció, p. 322–339, https://doi.org/10.5565/rev/tradumatica.324.
  30. Pavlovsky, E. V., 1953, О некоторых общих закономерностях развития фауны [On some general regularities of fauna development]: Proceedings of the USSR Academy of Sciences, Geological Series, no. 5.
  31. Ramírez-Castañeda, V., 2020, Disadvantages in preparing and publishing scientific papers caused by the dominance of the English language in science: The case of Colombian researchers in biological sciences: PLoS One, v. 15, no. 9, https://doi.org/10.1371/journal.pone.0238372.
  32. Ren, S., and Rousseau, R., 2002, International visibility of Chinese scientific journals: Scientometrics, v. 53, no. 3, p. 389–405, https://doi.org/10.1023/A:1014877130166.
  33. Rovira, C., Codina, L., and Lopezosa, C., 2021, Language bias in the Google Scholar ranking algorithm: Future Internet, v. 13, no. 2, https://doi.org/10.3390/fi13020031.
  34. Şengör, A.M.C., and Bach, T., 2025, Ernst Haeckel’s present of the first two volumes of Das Antlitz der Erde by Eduard Suess to his friend Dr. Paul von Ritter: Why is it important?: International Journal of Earth Sciences, v. 114, p. 323–332, https://doi.org/10.1007/s00531-024-02478-8.
  35. Steigerwald, E., Ramírez-Castañeda, V., Brandt, D.Y.C., Báldi, A., Shapiro, J.T., Bowker, L., and Tarvin, R.D., 2022, Overcoming language barriers in academia: Machine translation tools and a vision for a multilingual future: Bioscience, v. 72, no. 10, p. 988–998, https://doi.org/10.1093/biosci/biac062.
  36. Suzina, A.C., 2021, English as lingua franca or the sterilisation of scientific work: Media Culture & Society, v. 43, no. 1, p. 171–179, https://doi.org/10.1177/0163443720957906.
  37. UNESCO, 2021, UNESCO Recommendation on Open Science, 34 p., https://doi.org/10.54677/MNMH8546.
  38. Vasconcelos, S.M.R., Sorenson, M.M., Leta, J., Sant’Ana, M.C., and Batista, P.D., 2008, Researchers’ writing competence: a bottleneck in the publication of Latin‐American science?: The EMBO Reports, v. 9, p. 700–702, https://doi.org/10.1038/embor.2008.143.
  39. Wan, Y., Yang, B., Wong, D.F., Chao, L.S., Yao, L., Zhang, H., and Chen, B., 2022, Challenges of neural machine translation for short texts: Computational Linguistics, v. 48, no.2, p. 321–342, https://doi.org/10.1162/coli_a_00435.
  40. Way, A., 2018, Chapter 3.3: Machine translation: Where are we at today, in Angelone, E., Ehrensberger-Dow, M., and Massey, G., eds., The Bloomsbury Companion to Language Industry Studies, New York, Bloomsbury, p. 311–332.
  41. Yan, J., Yan, P., Chen, Y., Li, J., Zhu, X., and Zhang, Y., 2026, Benchmarking LLMs against human translators: A comprehensive evaluation across languages, domains, and expertise levels: IEEE Transactions on Big Data, v. 12, no. 3, pp. 801-813, https://doi.org/10.1109/TBDATA.2025.3644594.
  42. Zenni, R.D., Barlow, J., Pettorelli, N., Stephens, P., Rader, R., Siqueira, T., Gordon, R., Pinfield, T., and Nuñez, M.A., 2023, Multi‐lingual literature searches are needed to unveil global knowledge: Journal of Applied Ecology, v. 60, no. 3, p. 380–383, https://doi.org/
Conversion Panel Background

Stay Updated on GSA 2025

Sign up to receive the latest news, key dates, and exclusive event updates straight to your inbox.