Skip to content

Concepts

This page explains how Scholar Data calculates the S-index and Dataset Index (D-index).

The Problem

Despite growing adoption of data sharing in research, there is no standardized, transparent, or equitable way to measure and reward it. Publication metrics like the h-index are well established, but datasets, which often drive discovery just as much as papers, have no equivalent. Researchers who invest in sharing high-quality data have had little way to demonstrate that impact.

The S-index was developed as part of the NIH S-index Challenge to address this gap.

Proposed S-index

We introduce a framework in which a dataset is assigned a Dataset Index that combines FAIRness, scholarly citations, and alternative mentions. A researcher's S-index is then defined as the sum of the Dataset Indices of their datasets:

Here, is the FAIR score between 0 and 1 (or 0 to 100%) of dataset . and are logarithmic time-weighted counts of the citations and alternative mentions to dataset :

Here, represents the time in years between the citation/mention event and the dataset’s publication. This weighting is designed to reward sustained reuse: a citation/mention on the day of dataset publication counts as 1, while one 20 years later counts as 2.

, , and are normalization factors intended to control for differences in field size, data reuse culture, and changing practices over time. They are calculated as 3-year moving medians based on datasets from the same field.

An example of S-index calculation is provided in Fig. 1. Calculation only requires dataset metadata from existing infrastructure, enabling large-scale calculation regardless of dataset size, format, reuse license, and access conditions.

Example calculation S-indexFigure 1. Example of the calculation of the Dataset Index of a dataset and the S-index of its author. Steps 1-5 need to be repeated for each dataset of the researcher to calculate their S-index. Steps 3B to 6 need to be repeated periodically (e.g., monthly) to account for new citations and mentions. The FAIR score will change if there are major updates to the data repository’s metadata practices, and normalization factors will change if we identify citations or mentions we may have missed in prior years, so they could be updated as well

Documentation written with assistance from Claude by Anthropic.