Page 9 - i1052-5173-31-5
P. 9
our lithology and outlier filtering methods we have demonstrated here, the challenges S.W., and Lyons, T.W., 2013, Large-scale fluctua-
removed most U data because they were of dealing with this imperfect record—and, tions in Precambrian atmospheric and oceanic oxy-
gen levels from the record of U in shales: Earth and
inappropriate for reconstructing trends in by extension, the large datasets that docu- Planetary Science Letters, v. 369, p. 284–293,
mudstone geochemistry through time, that ment it—certainly are surmountable. https://doi.org/10.1016/j.epsl.2013.03.031.
same data would be especially useful for other Peters, S.E., and McClennen, M., 2016, The paleo-
questions, such as determining the variability ACKNOWLEDGMENTS biology database application programming in-
of heat production within shales. This sort of We thank everyone who contributed to the SGP terface: Paleobiology, v. 42, no. 1, p. 1–7, https://
filtering is a fixture of scientific research— database, including T. Frasier (YGS). BGS authors doi.org/10.1017/pab.2015.39.
e.g., geochemists will consider whether sam- (JE, PW) publish with permission of the Executive Peters, S.E., Zhang, C., Livny, M., and Re, C.,
Director of the British Geological Survey, UKRI. We
ples are diagenetically altered when measur- would like to thank the editor and one anonymous 2014, A machine reading system for assembling
synthetic paleontological databases: PLOS One,
ing them for isotopic data—and, likewise, reviewer for their helpful feedback. v. 9, no. 12, e113523, https://doi.org/10.1371/
should be viewed as a necessary step in the journal .pone .0113523.
analysis of large datasets. REFERENCES CITED Peters, S.E., Husson, J.M., and Czaplewski, J., 2018,
As our workflow demonstrates, filtering Algeo, T.J., and Li, C., 2020, Redox classification and Macrostrat: A platform for geological data integra-
often requires multiple steps, some automatic calibration of redox thresholds in sedimentary sys- tion and deep-time Earth crust research: Geochem-
(e.g., cutoffs that exclude vast amounts of data tems: Geochimica et Cosmochimica Acta, v. 287, istry Geophysics Geosystems, v. 19, no. 4, p. 1393–
p. 8–26, https://doi.org/10.1016/j.gca.2020.01.055.
in one fell swoop or algorithms to determine Bai, Y., Jacobs, C.A., Kwan, M., and Waldmann, C., 1409, https://doi.org/10.1029/2018GC007467.
the “outlierness” of data; see Ptáček et al., 2017, Geoscience and the technological revolu- Potter, P.E., Maynard, J.B., and Depetris, P.J., 2005,
Mud and Mudstones: Introduction and Over-
2020) and others manual (e.g., examining tion [perspectives]: IEEE Geoscience and Re- view: Berlin, Springer, 297 p.
source literature to determine whether an mote Sensing Magazine, v. 5, no. 3, p. 72–75, Ptáček, M.P., Dauphas, N., and Greber, N.D., 2020,
anomalous value is, in fact, meaningful). https://doi.org/10.1109/MGRS.2016.2635018. Chemical evolution of the continental crust from
Each procedure, along with any assumptions Chan, D., Kent, E.C., Berry, D.I., and Huybers, P., a data-driven inversion of terrigenous sediment
2019, Correcting datasets leads to more homoge-
and/or justifications, must be documented neous early-twentieth-century sea surface warm- compositions: Earth and Planetary Science Let-
clearly (and code included and/or stored in a ing: Nature, v. 571, no. 7765, p. 393–397, https:// ters, v. 539, p. 116090.
publicly accessible repository) by researchers doi.org/10.1038/s41586-019-1349-2. Reinhard, C.T., Planavsky, N.J., Gill, B.C., Ozaki,
K., Robbins, L.J., Lyons, T.W., Fischer, W.W.,
so that others may reproduce their results and/ Chen, L., and Wang, L., 2018, Recent advances in Wang, C., Cole, D.B., and Konhauser, K.O.,
or build upon their conclusions with increas- Earth observation big data for hydrology: Big 2017, Evolution of the global phosphorus cycle:
Earth Data, v. 2, no. 1, p. 86–107, https://doi.org/
ingly larger datasets. 10.1080/20964471.2018.1435072. Nature, v. 541, no. 7637, p. 386–389, https://doi
Along with documentation of data process- Darwin, C., 1859, On the Origin of Species by .org/ 10.1038/ nature20772.
ing, filtering, and sampling, it is important for Means of Natural Selection, or Preservation of Rock, N.M.S., Webb, J.A., McNaughton, N.J., and
researchers also to leverage sensitivity analy- Favoured Races in the Struggle for Life: London, Bell, G.D., 1987, Nonparametric estimation of av-
John Murray, 490 p.
ses to understand how parameter choices may Faghmous, J.H., and Kumar, V., 2014, A big data erages and errors for small data-sets in isotope geo-
impact resulting trends. Here, through the guide to understanding climate change: The science: A proposal: Chemical Geology, Isotope
Geoscience Section, v. 66, no. 1–2, p. 163–177.
analysis of various spatial and temporal case for theory-guided data science: Big Data, Sarbas, B., 2008, The Georoc database as part of a
parameter values, we demonstrate that, while v. 2, no. 3, p. 155–163, https://doi.org/10.1089/ growing geoinformatics network, in Geoinfor-
the spread of data varies based on the pre- big.2014.0026. matics 2008—Data to Knowledge: U.S. Geologi-
scribed values of scale spatial and scale temporal , Gandomi, A., and Haider, M., 2015, Beyond the cal Survey, p. 42–43.
hype: Big data concepts, methods, and analytics:
the averaged resampled trend does not (Fig. International Journal of Information Manage- Schoene, B., 2014, U-Th-Pb geochronology, in Hol-
S7 [see footnote 1]). At the same time, we see ment, v. 35, no. 2, p. 137–144, https://doi.org/ land, H.D., and Turekian, K.K., eds., Treatise on
that trends are directly influenced by the use 10.1016/j.ijinfomgt.2014.10.007. Geochemistry (Second Edition): Oxford, UK,
Elsevier, p. 341–378.
(or lack thereof) of Ca and P O and outlier Granitto, M., Giles, S.A., and Kelley, K.D., 2017, Sperling, E.A., Tecklenburg, S., and Duncan, L.E.,
5
2
filtering. For example, the record of U in mud- Global Geochemical Database for Critical Met- 2019, Statistical inference and reproducibility in
als in Black Shales: U.S. Geological Survey Data
stones becomes overprinted by anomalously Release, https://doi.org/10.5066/F71G0K7X. geobiology: Geobiology, v. 17, no. 3, p. 261–271,
large values when carbonate samples are not IEEE, 2019, IEEE Standard for Floating-Point https://doi.org/10.1111/gbi.12333.
excluded (Fig. S7B). Arithmetic: IEEE Std 754-2019 (Revision of Walker, J.D., Lehnert, K.A., Hofmann, A.W., Sar-
IEEE 754-2008), p. 1–84, https://doi.org/10.1109/ bas, B., and Carlson, R.W., 2005, EarthChem:
IEEESTD.2008.4610935. International collaboration for solid Earth geo-
CONCLUSIONS Keller, C.B., and Schoene, B., 2012, Statistical geo-
Large datasets can provide increasingly chemistry reveals disruption in secular litho- chemistry in geoinformatics: AGUFM, v. 2005,
IN44A-03.
valuable insights into the ancient Earth sys- spheric evolution about 2.5 gyr ago: Nature, Woodcock, N.H., 2004, Life span and fate of ba-
tem. However, to extract meaningful trends, v. 485, no. 7399, p. 490–493, https://doi.org/ sins: Geology, v. 32, no. 8, p. 685–688, https://
these datasets must be cultivated, curated, 10.1038/nature11024. doi.org/10.1130/G20598.1.
and processed with an emphasis on data Nolet, G., 2012, Seismic tomography: With applica- Young, G.M., and Nesbitt, H.W., 1998, Processes
tions in global seismology and exploration geo-
quality, uncertainty propagation, and trans- physics: Berlin, Springer, v. 5, 386 p., https://doi controlling the distribution of Ti and Al in
parency. Charles Darwin once noted that the .org/10.1007/978-94-009-3899-1. weathering profiles, siliciclastic sediments and
“natural geological record [is] a history of Ogg, J.G., Ogg, G.M., and Gradstein, F.M., 2016, A sedimentary rocks: Journal of Sedimentary Re-
search, v. 68, no. 3, p. 448–455.
the world imperfectly kept” (Darwin, 1859, concise geologic time scale 2016: Amsterdam,
Elsevier, 240 p.
p. 310), a reality that is the result of both geo- Partin, C.A., Bekker, A., Planavsky, N.J., Scott, C.T., Manuscript received 28 sept. 2020
logical and sociological causes. But while the Gill, B.C., Li, C., Podkovyrov, V., Maslov, A., Kon- revised Manuscript received 2 dec. 2020
data are biased, they also are tractable. As hauser, K.O., Lalonde, S.V., Love, G.D., Poulton, Manuscript accepted 20 Feb. 2021
www.geosociety.org/gsatoday 9