Skip to content

Auto-Generated Profiles

Scholar Data includes millions of auto-generated researcher profiles built by analyzing author metadata across the datasets in the Scholar Data database. These profiles are intended to help researchers explore and understand the S-index during this early stage of the project while user-created profiles are growing.

Auto-generated profiles are not a substitute for a user-created profile. For an accurate S-index, researchers should create a profile, claim their datasets, and let Scholar Data compute their score.

How Profiles Are Built

Authors from datasets are identified and grouped across datasets using two strategies:

  • Persistent identifier (e.g. ORCID): used where available for unambiguous matching
  • Name and affiliation: used for authors without a persistent identifier

Authors listed as organizations are excluded. Each profile is assigned a primary research field based on the field of the majority of its datasets.

S-index Calculation

Each auto-generated profile's S-index is calculated as the sum of the D-index scores of all datasets attributed to that researcher, following the S-index formula.

Limitations

Because profiles are assembled automatically from metadata, they carry inherent limitations. For instances name and affiliation matching can conflate different researchers with similar names, or split one researcher's work across multiple profiles.

Researchers who create a profile and manually claim their datasets will get a more accurate and up-to-date S-index that reflects their actual data sharing impact.

Documentation written with assistance from Claude by Anthropic.