Large datasets increasingly provide critical insights into crustal and surface processes on
Earth. These data come in the form of published and contributed observations, which often
include associated metadata. Even in the best-case scenario of a carefully curated dataset, it
may be non-trivial to extract meaningful analyses from such compilations, and choices made with
respect to filtering, resampling, and averaging can affect the resulting trends and any
interpretation(s) thereof. As a result, a thorough understanding of how to digest, process, and
analyze large data compilations is required. Here, we present a generalizable workflow developed
using the Sedimentary Geochemistry and Paleoenvironments Project database. We demonstrate the
effects of filtering and weighted resampling on Al2O3 and U contents, two
representative geochemical components of interest in sedimentary geochemistry (one major and one
trace element, respectively). Through our analyses, we highlight several methodological
challenges in a “bigger data” approach to Earth science. We suggest that, with slight
modifications to our workflow, researchers can confidently use large collections of observations
to gain new insights into processes that have shaped Earth’s crustal and surface environments.
Manuscript received 28 Sept. 2020. Revised manuscript received 2 Dec. 2020.
Manuscript accepted 20 Feb. 2021. Posted 24 Mar. 2021.
© The Geological Society of America, 2021. CC-BY-NC.