Page 8 - i1052-5173-32-11
P. 8

SUPPORT, and pink for NEGATE). Figure   human/machine interaction must continue   els,  J.X.,  Strömberg,  C.A.E.,  and  Yanites,  B.J.,
         1A shows the most frequent locations dur-  if this system is to be improved.  2017, Biodiversity and topographic complexity:
                                                                                  Modern and geohistorical perspectives: Trends in
         ing the Cenozoic in Europe, and Figure 1   All in all, this experiment finds strong sup-  Ecology & Evolution, v. 32, no. 3, p. 211–226,
         shows the top three most frequent locations   port in favor of feedbacks existing between   https://doi.org/10.1016/j.tree.2016.12.010.
         during the Cenozoic in North America.   volcanism and climate change. However, the   Cohen, J., 1968, Weighed kappa: Nominal scale
         When manually inspecting the machine   precise correlation is not a simple one. Our lit-  agreement with provision for scaled disagreement
         prediction results from the MLP model, the   erature parsing system suggests that we do not   or partial  credit:  Psychological Bulletin, v.  70,
                                                                                  no. 4, p. 213–220, https://doi.org/10.1037/h0026256.
         domain experts observed that 11 out of 17   yet have a clear and complete understanding   Cortes, C., and Vapnik, V., 1995, Support-vector
         data points within the North American con-  of how volcanic events affect climate change.  networks:  Machine  Learning,  v.  20,  no.  3,
         tinent were correctly identified and visual-                             p. 273–297, https://doi.org/10.1007/BF00994018.
         ized on the world map. Out of the six errors,   CONCLUSIONS            Domingos, P., 2015, The Master Algorithm: How the
         four data points were from simulation   The result of this preliminary work intro-  Quest for the Ultimate Learning Machine Will Re-
                                                                                  make Our World: New York, Basic Books, 352 p.
         papers, and two data points were based on   duced a methodology to automatically pro-  Gernon, T.M., Hincks, T.K., Merdith, A.S., Rohling,
         incorrect predictions by the MLP classifier,   vide a global review of the geoscientific litera-  E.J., Palmer, M.R., Foster, G.L., Bataille, C.P., and
         as identified by the domain experts. For   ture and to evaluate the impact of specific   Müller, R.D., 2021, Global chemical weathering
         example, one pink circle (i.e., the corre-  research questions (i.e., understand if the   dominated by continental arcs since the mid-Pal-
         sponding paper was classified as not sup-  question is [mostly] supported or rejected by   aeozoic: Nature Geoscience, v.  14, p.  690–696,
                                                                                  https://doi.org/10.1038/s41561-021 -00806-0.
         porting the observation that volcanism   the literature), in this case the causal relation-  Herman,  F.,  Seward, D., Valla,  P.G.,  Carter,  A.,
         impacts  climate  change)  was  incorrectly   ship between volcanism and climate change.   Kohn, B., Willett, S.D., and Ehlers, T.A., 2013,
         predicted when the actual paper was unre-  We show the promises and limitations of this   Worldwide acceleration of mountain erosion un-
         lated with respect to this observation.  approach to the geoscience literature with this   der a cooling climate: Nature, v.  504, p.  423,
          These figures immediately highlight sev-  admittedly simplistic example. This approach   https://doi.org/10.1038/nature12877.
         eral important observations:        helps us process and interpret a large amount   Landis, J.R., and Koch, G.G., 1977, The measure-
                                                                                  ment of observer agreement for categorical data:
         • Our  data  processing  reduces the  search   of published scientific papers, without the   Biometrics, v. 33, no. 1, p. 159–174, https://doi.org/
          space by almost two orders of magnitude   need for human annotators to invest time in   10.2307/ 2529310.
          (from ~1,000 papers that are shallowly   reading and parsing all of the papers. In addi-  Lee, C-T., and Dee, S., 2019, Does volcanism cause
          related to the topic of interest to 17 that   tion, with the visualization, researchers are   warming or cooling?: Geology, v.  47, no.  7,
          validate/invalidate the current observation   able to investigate chronological changes in   p. 687–688, https://doi.org/10.1130/focus072019.1.
          that volcanism affects climate change),   the relationship between volcanism and   Honnibal, M., and Montani, I., 2017, spaCy 2: Natu-
                                                                                  ral language understanding with Bloom embed-
          while our visualizations allow the scientist   climate  change.  This  approach  could  be   dings, convolutional neural networks and incre-
          to quickly draw important  conclusions   expanded to any number of queries in the geo-  mental parsing: https://spacy.io/.
          that would not be easily available other-  science literature for the systematic analysis   Manning., C.D., 2015, Computational  linguistics
          wise. For example, our figures show that   of various observations and ideas by examin-  and deep learning: Computational Linguistics,
          while the majority of publications support   ing a large body of previously published   v. 41, no. 4, p. 701–707, https://doi.org/10.1162/
                                                                                  COLI_a_00239.
          the hypothesis investigated that volcanism   papers. Results can be further plotted on   Manning, C., Surdeanu, M., Bauer, J., Finkel, J.,
          impacts climate change, not all do.  reconstructed various sample or study loca-  Bethard, S., and McClosky, D., 2014, The Stan-
         • Similarly, this bird’s-eye-view of a scien-  tions using paleogeographic maps.  ford CoreNLP Natural  Language  Processing
          tific question allows one to quickly iden-  It is vital to emphasize that the proposed   Toolkit: https://doi.org/10.3115/v1/p14-5010.
          tify “blank spaces” in research, i.e., topics   methodology is hybrid, requiring direct col-  Raymo, M.E., and Ruddiman, W.F., 1992, Tectonic
                                                                                  forcing of late Cenozoic climate: Nature, v. 359,
          that are insufficiently investigated. For   laboration between humans and machines.   p. 117–122.
          example, our  visualizations show that                                Valenzuela-Escárcega, M.A., Hahn-Powell, G., and
          while support for our research question is   For example, geoscientists were required to   Surdeanu, M., 2016, Odin’s Runes: A rule lan-
                                             provide training data for our research ques-
          well represented for the North American   tion classifier. Further, as discussed, our   guage for information extraction, in Proceedings
          continent, it is scarce in other continents.                            of  the  10th  International  Conference  on  Lan-
         • Further, this work allows one to identify   resulting classifier is only ~80% accurate,   guage Resources and Evaluation, LREC 2016,
                                                                                  https://aclanthology.org/L16-1050.
          (potential) contradictions in scientific find-  which means that, in order to improve it, it   Wang, S., and Manning, C.D., 2012, Baselines and
          ings quickly, which provides opportunities   needs continuous feedback from the scien-  bigrams: Simple, good sentiment and topic classi-
          for better science. For example, Figure 1B   tists using it. Longer term, we envision a   fication, in 50th Annual Meeting of the Associa-
          shows apparent contradictions in findings   community-wide effort in which such clas-  tion for Computational Linguistics, ACL 2012—
          from the East coast of the North American   sifiers are created and deployed in the cloud   Proceedings of the Conference,  https://doi.org/
                                                                                  https://dl.acm.org/doi/10.5555/ 2390665 .2390688.
          continent in the Cenozoic.         to mine an arbitrary number of observations   Zhang, P., Molnar, P., and Downs, W.R., 2001, In-
         • Lastly, the fact that 11 out the 17 identified   and are continuously improved over time by   creased sedimentation rates and grain sizes 2–4
          papers are correctly classified is not sur-  their human end users.     Myr ago due to the influence of climate change
          prising considering that none of the auto-                              on erosion rates: Nature, v.  410, p.  891–897,
          mated components (i.e., the module that   REFERENCES CITED              https://doi.org/10.1038/35073504.
          extracts temporal and spatial context, and   Badgley, C., Smiley, T.M., Terry, R., Davis, E.B.,   MANUScRiPT REcEivED 16 NOv. 2021
                                              DeSantis, L.R.G., Fox, D.L., Hopkins, S.S.B., Jez-
          the research question classifier) are perfect.   kova, T., Matocq, M.D., Matzke, N., McGuire,   REviSED MANUScRiPT REcEivED 6 MAy 2022
          However, this result emphasizes  that  the   J.L., Mulch, M., Riddle, B.R., Roth, V.L., Samu-  MANUScRiPT AccEPTED 23 MAy 2022



         8  GSA TODAY  |  November 2022
   3   4   5   6   7   8   9   10   11   12   13