This page contains links to selected datasets collection that I’ve found. Feel free to email me if you have any suggestions!
[SNAP is the best!] A substantial collection of data sets describing large networks.
Microblogging networks, patent data set, online social networks, knowledge linking dataset, mobile dataset, etc.
“The first interactive data and network repository with real-time analytics.”
“KONECT (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields, collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.KONECT contains over a hundred network datasets of various types, including directed, undirected, bipartite, weighted, unweighted, signed and rating networks.” — From the website.
Network datasets collected from famous websites including BlogCatalog, Buzznet, Delicious, Digg, Douban, Flickr, Flixster, Last.fm, Twitter, YouTube and so on. Some datasets contain both the contact network and selected group membership information. (Most datasets contain around 100k nodes.)
Datasets collected by Tore Opsahl (in tnet-format and some also in UCINET-format). It contains some small networks (# of nodes: 32-16,726).
OSN datasets collection of BGU Social Networks Security Research Group. It contains directed networks (Anybeat, Academia.edu, Google+), undirected networks (TheMarker Cafe), multigraph networks (Students Network, WikiTree), and some other datasets of Facebook.
- Flickr, LiveJournal, Orkut, YouTube (user, links, groups, group members):
- The famous Lalonde dataset: Almost everywhere. For example, load it in R by
- Right Heart Catheterization Dataset
- Datasets for the Atlantic Causal Inference Conference Competition (2016/2017): The GitHub repository contains codes to generate the datasts.
- National Collaborative Perinatal Project: A study that was conducted on pregnant women and their children with the aim of identifying causal factors leading to developmental disorders. There are 6,700 data items on the approximately 58,000 study pregnancies.
Last updated: 2019/5/21