S-index Calculation

This page describes how S-index scores were calculated for the researcher profiles autogenerated and shown on Scholar Data.

These S-index scores were computed automatically for demo and validation purposes only. They are not intended to represent a researcher's definitive S-index. For an accurate S-index, researchers should create a profile, claim their datasets, and let Scholar Data compute their score from a verified, curated dataset list. The process to compute the S-index for those researcher created profiles remains the same.

Building Author Profiles

A DuckDB table was constructed by expanding the dataset corpus by author, producing a table of 216M+ rows, one row per dataset per author. Authors were then regrouped across datasets using the following strategy:

ORCID or other persistent identifier: used where available for unambiguous author matching
Name and affiliation set: used for authors without a persistent identifier

Authors listed as organizations in the DataCite creators field were excluded. After regrouping and deduplication, 1,032,546 unique authors were identified.

Grouping Method	Authors	Share
Identifier (e.g. ORCID)	346,524	33.6%
Name / affiliation set	686,022	66.4%
Total	1,032,546	100%

Each author was assigned a primary research field based on the research field of the majority of their datasets.