Datasets

This page contains links to some network datasets collection that I’ve found.

Stanford Large Network Dataset Collection

A substantial collection of data sets describing large networks. List copied from the website:

  • Social networks: online social networks, edges represent interactions between people
  • Networks with ground-truth communities: ground-truth network communities in social and information networks
  • Communication networks: email communication networks with edges representing communication
  • Citation networks : nodes represent papers, edges represent citations
  • Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)
  • Web graphs: nodes represent webpages and edges are hyperlinks
  • Amazon networks: nodes represent products and edges link commonly co-purchased products
  • Internet networks: nodes represent computers and edges communication
  • Road networks: nodes represent intersections and edges roads connecting the intersections
  • Autonomous systems: graphs of the internet
  • Signed networks: networks with positive and negative edges (friend/foe, trust/distrust)
  • Location-based online social networks: Social networks with geographic check-ins
  • Wikipedia networks and metadata: Talk, editing and voting data from Wikipedia
  • Twitter and Memetracker: Memetracker phrases, links and 467 million Tweets
  • Online communities: Data from online communities such as Reddit and Flickr
  • Online reviews: Data from online review systems such as BeerAdvocate and Amazon

Datasets for Social Network Analysis (Aminer.org)

Microblogging networks, patent data set, online social networks (friendship and contents/user behaviors), knowledge linking dataset, mobile dataset, etc.

Network Data Repository

“The first interactive data and network repository with real-time
analytics.”

konect - The Koblenz Network Collection

“KONECT (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields, collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.KONECT contains over a hundred network datasets of various types, including directed, undirected, bipartite, weighted, unweighted, signed and rating networks.” – From the website.

Social Computing Data Repository at ASU - Datasets

Network datasets collected from famous websites including BlogCatalog, Buzznet, Delicious, Digg, Douban, Flickr, Flixster, Last.fm, Twitter, YouTube and so on. Some datasets contain both the contact network and selected group membership information. (Most datasets contain around 100k nodes.)

Datasets | Tore Opsahl

Datasets collected by Tore Opsahl (in tnet-format and some also in UCINET-format). It contains some small networks (# of nodes: 32-16,726).

BGU Social Networks Security Research Group

OSN datasets collection of BGU Social Networks Security Research Group. It contains directed networks (Anybeat, Academia.edu, Google+), undirected networks (TheMarker Cafe), multigraph networks (Students Network, WikiTree), and some other datasets of Facebook.

Social Computing Research @ MPI-SWS

Other sources of network data

Last updated: 2017/06/07