Yishi Lin

  • Home

  • Archives

  • Dataset

  • Blog

  • Categories

  • Search

Datasets

This page contains links to selected datasets collection that I’ve found. Feel free to email me if you have any suggestions!

Social Network Analysis

Stanford Large Network Dataset Collection

[SNAP is the best!] A substantial collection of data sets describing large networks.

Datasets for Social Network Analysis (Aminer.org)

Microblogging networks, patent data set, online social networks, knowledge linking dataset, mobile dataset, etc.

Network Data Repository

“The first interactive data and network repository with real-time analytics.”

konect - The Koblenz Network Collection

“KONECT (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields, collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.KONECT contains over a hundred network datasets of various types, including directed, undirected, bipartite, weighted, unweighted, signed and rating networks.” — From the website.

Social Computing Data Repository at ASU - Datasets

Network datasets collected from famous websites including BlogCatalog, Buzznet, Delicious, Digg, Douban, Flickr, Flixster, Last.fm, Twitter, YouTube and so on. Some datasets contain both the contact network and selected group membership information. (Most datasets contain around 100k nodes.)

Datasets | Tore Opsahl

Datasets collected by Tore Opsahl (in tnet-format and some also in UCINET-format). It contains some small networks (# of nodes: 32-16,726).

BGU Social Networks Security Research Group

OSN datasets collection of BGU Social Networks Security Research Group. It contains directed networks (Anybeat, Academia.edu, Google+), undirected networks (TheMarker Cafe), multigraph networks (Students Network, WikiTree), and some other datasets of Facebook.

Social Computing Research @ MPI-SWS

  • Flickr, LiveJournal, Orkut, YouTube (user, links, groups, group members):

Other sources of network data

  • Mark Newman’s list

Causal Inference

  • The famous Lalonde dataset: Almost everywhere. For example, load it in R by data(lalonde, package="MatchIt").
  • Right Heart Catheterization Dataset
  • Datasets for the Atlantic Causal Inference Conference Competition (2016/2017): The GitHub repository contains codes to generate the datasts.
  • National Collaborative Perinatal Project: A study that was conducted on pregnant women and their children with the aim of identifying causal factors leading to developmental disorders. There are 6,700 data items on the approximately 58,000 study pregnancies.

Last updated: 2019/5/21

  • Table of Contents
  • Overview
Yishi Lin

Yishi Lin

24 posts
11 categories
25 tags
RSS
GitHub E-Mail
  1. 1. Social Network Analysis
    1. 1.1. Stanford Large Network Dataset Collection
    2. 1.2. Datasets for Social Network Analysis (Aminer.org)
    3. 1.3. Network Data Repository
    4. 1.4. konect - The Koblenz Network Collection
    5. 1.5. Social Computing Data Repository at ASU - Datasets
    6. 1.6. Datasets | Tore Opsahl
    7. 1.7. BGU Social Networks Security Research Group
    8. 1.8. Social Computing Research @ MPI-SWS
    9. 1.9. Other sources of network data
  2. 2. Causal Inference
© 2013 – 2021 Yishi Lin
Powered by Hexo v3.9.0
|
Theme – NexT.Gemini v7.3.0