Page 9 - i1052-5173-31-5
P. 9

our lithology and outlier filtering methods   we have demonstrated here, the challenges   S.W., and Lyons, T.W., 2013, Large-scale fluctua-
         removed most U data because they were   of dealing with this imperfect record—and,   tions in Precambrian atmospheric and oceanic oxy-
                                                                                  gen levels from the record of U in shales: Earth and
         inappropriate for reconstructing trends in   by extension, the large datasets that docu-  Planetary Science Letters, v. 369, p. 284–293,
         mudstone geochemistry through time, that   ment it—certainly are surmountable.  https://doi.org/10.1016/j.epsl.2013.03.031.
         same data would be especially useful for other                         Peters, S.E., and McClennen, M., 2016, The paleo-
         questions, such as determining the variability   ACKNOWLEDGMENTS         biology database application programming in-
         of heat production within shales. This sort of   We thank everyone who contributed to the SGP   terface: Paleobiology, v. 42, no. 1, p. 1–7, https://
         filtering is a fixture of scientific research—  database, including T. Frasier (YGS). BGS authors   doi.org/10.1017/pab.2015.39.
         e.g., geochemists will consider whether sam-  (JE, PW) publish with permission of the Executive   Peters, S.E., Zhang, C., Livny, M., and Re, C.,
                                             Director of the British Geological Survey, UKRI. We
         ples are diagenetically altered when measur-  would like to thank the editor and one anonymous   2014, A machine reading system for assembling
                                                                                  synthetic paleontological databases: PLOS One,
         ing them for isotopic data—and, likewise,   reviewer for their helpful feedback.  v.  9, no.  12, e113523,  https://doi.org/10.1371/
         should be viewed as a necessary step in the                              journal .pone .0113523.
         analysis of large datasets.         REFERENCES CITED                   Peters, S.E., Husson, J.M., and Czaplewski, J., 2018,
          As our workflow demonstrates, filtering   Algeo, T.J., and Li, C., 2020, Redox classification and   Macrostrat: A platform for geological data integra-
         often requires multiple steps, some automatic   calibration of redox thresholds in sedimentary sys-  tion and deep-time Earth crust research: Geochem-
         (e.g., cutoffs that exclude vast amounts of data   tems: Geochimica et Cosmochimica Acta, v. 287,   istry Geophysics Geosystems, v. 19, no. 4, p. 1393–
                                              p. 8–26, https://doi.org/10.1016/j.gca.2020.01.055.
         in one fell swoop or algorithms to determine   Bai, Y., Jacobs, C.A., Kwan, M., and Waldmann, C.,   1409, https://doi.org/10.1029/2018GC007467.
         the  “outlierness”  of  data;  see  Ptáček  et  al.,   2017, Geoscience and the technological revolu-  Potter, P.E., Maynard, J.B., and Depetris, P.J., 2005,
                                                                                  Mud and Mudstones: Introduction and Over-
         2020) and others manual (e.g., examining   tion [perspectives]: IEEE Geoscience and Re-  view: Berlin, Springer, 297 p.
         source literature to determine whether an   mote Sensing Magazine, v.  5, no.  3, p.  72–75,   Ptáček, M.P., Dauphas, N., and Greber, N.D., 2020,
         anomalous value is, in fact, meaningful).   https://doi.org/10.1109/MGRS.2016.2635018.  Chemical evolution of the continental crust from
         Each procedure, along with any assumptions   Chan, D., Kent, E.C., Berry, D.I., and Huybers, P.,   a data-driven inversion of terrigenous sediment
                                              2019, Correcting datasets leads to more homoge-
         and/or justifications, must be documented   neous early-twentieth-century sea surface warm-  compositions: Earth and Planetary Science Let-
         clearly (and code included and/or stored in a   ing: Nature, v. 571, no. 7765, p. 393–397, https://  ters, v. 539, p. 116090.
         publicly accessible repository) by researchers   doi.org/10.1038/s41586-019-1349-2.  Reinhard, C.T., Planavsky, N.J., Gill, B.C., Ozaki,
                                                                                  K., Robbins, L.J., Lyons, T.W., Fischer, W.W.,
         so that others may reproduce their results and/  Chen, L., and Wang, L., 2018, Recent advances in   Wang,  C.,  Cole,  D.B.,  and  Konhauser,  K.O.,
         or build upon their conclusions with increas-  Earth  observation  big  data  for  hydrology:  Big   2017, Evolution of the global phosphorus cycle:
                                              Earth Data, v. 2, no. 1, p. 86–107, https://doi.org/
         ingly larger datasets.               10.1080/20964471.2018.1435072.      Nature, v. 541, no. 7637, p. 386–389, https://doi
          Along with documentation of data process-  Darwin, C., 1859, On the Origin of Species by   .org/ 10.1038/ nature20772.
         ing, filtering, and sampling, it is important for   Means of Natural Selection, or Preservation of   Rock, N.M.S., Webb, J.A., McNaughton, N.J., and
         researchers also to leverage sensitivity analy-  Favoured Races in the Struggle for Life: London,   Bell, G.D., 1987, Nonparametric estimation of av-
                                              John Murray, 490 p.
         ses to understand how parameter choices may   Faghmous, J.H., and Kumar, V., 2014, A big data   erages and errors for small data-sets in isotope geo-
         impact resulting trends. Here, through the   guide to understanding climate change: The   science: A proposal: Chemical Geology, Isotope
                                                                                  Geoscience Section, v. 66, no. 1–2, p. 163–177.
         analysis of various spatial and temporal   case for theory-guided data science: Big Data,   Sarbas, B., 2008, The Georoc database as part of a
         parameter values, we demonstrate that, while   v. 2, no. 3, p. 155–163, https://doi.org/10.1089/  growing geoinformatics network, in Geoinfor-
         the spread of data varies based on the pre-  big.2014.0026.              matics 2008—Data to Knowledge: U.S. Geologi-
         scribed values of  scale spatial   and  scale temporal ,   Gandomi, A., and Haider, M., 2015, Beyond the   cal Survey, p. 42–43.
                                              hype: Big data concepts, methods, and analytics:
         the averaged resampled trend does not (Fig.   International Journal of  Information Manage-  Schoene, B., 2014, U-Th-Pb geochronology, in Hol-
         S7 [see footnote 1]). At the same time, we see   ment,  v.  35,  no.  2,  p.  137–144,  https://doi.org/   land, H.D., and Turekian, K.K., eds., Treatise on
         that trends are directly influenced by the use   10.1016/j.ijinfomgt.2014.10.007.  Geochemistry  (Second  Edition):  Oxford,  UK,
                                                                                  Elsevier, p. 341–378.
         (or lack thereof) of Ca and P O  and outlier   Granitto, M., Giles, S.A., and Kelley, K.D., 2017,   Sperling, E.A., Tecklenburg, S., and Duncan, L.E.,
                                 5
                               2
         filtering. For example, the record of U in mud-  Global Geochemical Database for Critical Met-  2019, Statistical inference and reproducibility in
                                              als in Black Shales: U.S. Geological Survey Data
         stones becomes overprinted by anomalously   Release, https://doi.org/10.5066/F71G0K7X.  geobiology: Geobiology, v. 17, no. 3, p. 261–271,
         large values when carbonate samples are not   IEEE,  2019,  IEEE  Standard  for  Floating-Point   https://doi.org/10.1111/gbi.12333.
         excluded (Fig. S7B).                 Arithmetic: IEEE Std 754-2019 (Revision of   Walker, J.D., Lehnert, K.A., Hofmann, A.W., Sar-
                                              IEEE 754-2008), p. 1–84, https://doi.org/10.1109/  bas,  B.,  and  Carlson,  R.W.,  2005,  EarthChem:
                                              IEEESTD.2008.4610935.               International collaboration for solid Earth geo-
         CONCLUSIONS                         Keller, C.B., and Schoene, B., 2012, Statistical geo-
          Large datasets can provide increasingly   chemistry reveals disruption in secular litho-  chemistry in geoinformatics: AGUFM, v. 2005,
                                                                                  IN44A-03.
         valuable insights into the ancient Earth sys-  spheric evolution about 2.5 gyr ago: Nature,   Woodcock, N.H., 2004, Life span and fate of ba-
         tem. However, to extract meaningful trends,   v.  485, no.  7399, p.  490–493,  https://doi.org/   sins: Geology, v. 32, no. 8, p. 685–688, https://
         these datasets must be cultivated, curated,   10.1038/nature11024.       doi.org/10.1130/G20598.1.
         and processed with an emphasis on data   Nolet, G., 2012, Seismic tomography: With applica-  Young, G.M., and Nesbitt, H.W., 1998, Processes
                                              tions in global seismology and exploration geo-
         quality, uncertainty propagation, and trans-  physics: Berlin, Springer, v. 5, 386 p., https://doi   controlling the distribution of Ti and Al in
         parency. Charles Darwin once noted that the   .org/10.1007/978-94-009-3899-1.  weathering  profiles,  siliciclastic  sediments  and
         “natural geological record [is] a history of   Ogg, J.G., Ogg, G.M., and Gradstein, F.M., 2016, A   sedimentary rocks: Journal of Sedimentary Re-
                                                                                  search, v. 68, no. 3, p. 448–455.
         the world imperfectly kept” (Darwin, 1859,   concise geologic time scale 2016: Amsterdam,
                                              Elsevier, 240 p.
         p. 310), a reality that is the result of both geo-  Partin, C.A., Bekker, A., Planavsky, N.J., Scott, C.T.,   Manuscript received 28 sept. 2020
         logical and sociological causes. But while the   Gill, B.C., Li, C., Podkovyrov, V., Maslov, A., Kon-  revised Manuscript received 2 dec. 2020
         data are biased, they also are tractable. As   hauser, K.O., Lalonde, S.V., Love, G.D., Poulton,   Manuscript accepted 20 Feb. 2021
                                                                                        www.geosociety.org/gsatoday  9
   4   5   6   7   8   9   10   11   12   13   14