Citations
This page describes how citations to datasets are identified for datasets in the Scholar Data database. Three complementary sources are used to ensure the most thorough coverage possible.
Sources
Make Data Count (MDC) Data Citation Corpus
The Make Data Count initiative is an international collaboration, including DataCite, the California Digital Library, and the Wellcome Trust, and focused on developing standardized metrics for research data. Its core output is the Data Citation Corpus: a centralized, publicly accessible (CC0 license) resource that aggregates dataset references from multiple sources, including DataCite events, the Chan Zuckerberg Initiative (CZI) Science Knowledge Graph, Aligning Science Across Parkinson's (ASAP), and Europe PubMed Central.
Unlike general bibliographic databases, the MDC corpus is specifically engineered to track citations to any dataset, including those identified by accession numbers rather than DOIs, making it essential for capturing citations to datasets from repositories like EMDB.
OpenAlex
OpenAlex is a fully open (CC0 license) global catalog of scholarly works, authors, and institutions maintained by the nonprofit OurResearch. Launched in 2022 as a successor to the Microsoft Academic Graph, it indexes over 450 million scholarly works and tracks citations between them.
DataCite
For datasets registered in DataCite, citation information is included directly in the dataset metadata under a citations key. These are used as an additional citation source.
Coverage
Citations are deduplicated across all three sources so the same reference is never counted twice. Scholar Data currently tracks over 7.6 million unique citations to datasets indexed in the Scholar Data databases.
| Source | Last Updated/Version used | Unique Citations to Scholar Data indexed datasets |
|---|---|---|
| Make Data Count Data Citation Corpus | v4.1 (August 2025) | 1,480,491 |
| OpenAlex | November 2025 snapshot | 2,754,157 |
| DataCite | September 30, 2025 | 4,641,366 |
| Total (deduplicated) | 7,669,263 |
Why Might a Citation Be Missing?
- The citing paper was published after the last database update
- The citation appears in a source not yet covered by any of the sources listed above