Page 6 - i1052-5173-32-11
P. 6

entity types. For example, IODP Site 1340   CLASSIFYING THE SUPPORT FOR   Multi-Layer Perceptron (henceforth, MLP)
         (IODP stands for Integrated Ocean Discovery   THE RESEARCH QUESTION OF   that  operates on the  same  features as the
         Program) refers to a certain location, but the   INTEREST              above SVM variants. Last, we implemented
         recognizer identified only 1340, and classi-  Even though these spatial and temporal   an ensemble strategy that combines the out-
         fied it incorrectly as a DATE. The recognizer   expressions are important to contextualize   puts of these three individual models.
         missed the term Pliocene, which means “the   the findings of a publication, they provide no   To prevent the classifiers from overfitting
         geologic time scale that extends from 5.333   information on our key research question:   on the training data, we used L2 regulariza-
         million to 2.58 million years B.P.” Ma in   whether volcanism affected climate change.   tion when training the statistical classifiers
         geosciences articles usually means  million   To make a prediction of whether the given   that support it (i.e., SVM, NB-SVM, and
         years ago, but the CoreNLP NER did not   paper supports or negates the relationship   MLP classifiers). Intuitively, regularization
         identify it as TIME.                between volcanism and climate change, it is   aims to “zero out” the features that are not
          To recognize expressions that were not   necessary to build a machine learning classi-  critical to the task, which reduces the
         identified by CoreNLP or Spacy, we used the   fier that infers if the observation is supported   potential of overfitting, or “hallucinating a
         Odin event extraction framework and rule   (or not) from the text of these publications.  classifier” (Domingos, 2015). All docu-
         language (Valenzuela-Escárcega et al., 2016);   Among the wide variety of text classifica-  ment classification routines are detailed in
         henceforth, Odin), and added custom rules to   tion methods, in this work we focused on four   supplemental document 3 (see footnote 1).
         capture geoscience-specific expressions. In   methods that have been shown to perform
         particular, we developed rules to capture:  well for text classification, including “tradi-  Data Annotation
                                             tional” statistical methods as well as deep   Data  annotation  was  performed  via
         Temporal Information                learning. To represent the traditional “camp,”   FindingFive. Two hundred papers were ran-
          As mentioned, initially we utilized the   we used Support Vector Machines (Cortes   domly chosen from the set of 1157 down-
         named entity recognition tool in Stanford’s   and Vapnik, 1995) and Naïve-Bayes SVMs   loaded papers, and then title, abstract, intro-
         CoreNLP (Manning et al., 2015); hence-  (NB-SVMs) (Wang and Manning, 2012). For   duction, conclusion/discussion sections of
         forth, CoreNLP) to identify time informa-  the deep learning field, we implemented a   200 papers were presented to the two
         tion. However, since CoreNLP was trained
         on general text data, it does not recognize
         geological  temporal  expressions,  such  as   A
         Paleocene or Jurassic. In addition, in geo-
         sciences papers, there were abbreviations
         such as  M.y.r. and  M.a., which mean  mil-
         lions of years (duration), and million years
         ago (absolute time). Thus, we wrote cus-
         tom rules to recognize geological temporal
         expressions and built a custom time normal-
         izer to convert actual times (e.g., 170 M.y.r.,
         or 1.5 million years ago) to relevant geologi-
         cal time scale (e.g.,  Jurassic,  Quaternary)
         (see supplemental document 1  for specific
                                1
         details on these rules).

         Site Information
          Similar to temporal information, there
         were domain-specific spatial expressions
         that could not be captured by existing NERs
         such as Stanford’s CoreNLP. Further, some
         of these expressions did not have any infor-  7
         mation about the actual locations that they
         indicate. Thus, we wrote scripts to extract   Figure 1. (A) Topographic map of Europe with circles representing the most frequent location found in
         spatial expressions, disambiguate geoscience-  each paper where the relationship between volcanism and climate change has been tested during the
                                             Cenozoic. Light blue circles indicate the locations where the impact of volcanism on climate change
         specific spatial expressions (e.g., IODP Site   was verified, and pink circles indicate the locations where previous research negated the relationship
         U1360), and normalize these expressions by   between volcanism and climate change. The size of the circles represents its frequency; i.e., the num-
                                             ber of publications supporting it. (B) Topographic map of North America with circles representing the
         aligning them with specific latitude-longi-  top three most frequent locations found in each paper where the relationship between volcanism and
         tude bounding boxes that indicate the actual   climate change has been tested during the Cenozoic.  (C) Topographic map of northern Europe with
                                             circles representing the most frequent location found in each paper where the relationship between
         location of the corresponding spatial expres-  volcanism and climate change has been tested during the Phanerozoic. (D) Topographic map of
         sions on the world map (see supplemental   Europe and Asia with circles representing the top three most frequent locations found in each paper
                                             where the relationship between volcanism and climate change has been tested during the Cenozoic.
         document 2 [see footnote 1]).       (Continued on following page.)

         1 Supplemental Material. Supplemental Documents 1–3. Go to https://doi.org/10.1130/GSAT.S.20030015 to access the supplemental material; contact editing@
         geosociety.org with any questions.

         6  GSA TODAY  |  November 2022
   1   2   3   4   5   6   7   8   9   10   11