Learning Influence Probabilities In Social Networks
In my research about influence maximization, I need datasets with learned influence probabilities. This post is about two things.
- Tools that I have been using.
- Related datasets.
- How I obtain the influence propbabilities on edges.
Github repository: yishilin14/learn-influence-prob (codes and datasets)
Background
Goal: Learn influence probabilities in social networks.
Paper: Goyal, A., Bonchi, F., & Lakshmanan, L. V. (2010, February). Learning influence probabilities in social networks. In _Proceedings of the third ACM international conference on Web search and data mining_ (pp. 241-250). ACM.
Source Codes:
- http://www.cs.ubc.ca/~goyal/code-release.php
- Download codes for this paper: Amit Goyal, Francesco Bonchi, Laks V.S. Lakshmanan, _A Data-based approach to Social Influence Maximization_, To appear in PVLDB 2012.
- For more details about the software, please refer to the readme file inside. I will only focus on how to use the tool to learn influence probabilities on edges.
Compilation:
- “Make” (I am using Archlinux & GCC5.3.0)
- Solution to the error “‘getpid’ was not declared in this scope”: Add
#include<unistd.h>
Bernoulli distribution under the static model:
- Static model: independent of time and simply to learn
Bernoulli distribution under the static model: $p_{v,u}= A_{v2u} / A_{v}$
- $p_{v,u}$: learned influence probability of v on u
- $A_{v2u}$: the number of actions propagated from v to u
- $A_{v}$: the number of actions v performed
- The first scan of the tool outputs $A_{v2u}$ and $A_{v}$.
Now, we try to learn influence probabilities for some public datasets (with dirty codes).
Flixster Dataset
- Download here: http://www.cs.ubc.ca/~jamalim/datasets/ (Update on June 6,2017: The link is no longer available. If you know the new link, please let me know. I put the dataset that I downloaded before in my Git repo.)
- Download the main dataset
- Download “The ratings in Flixster are associated with timestamps”
- Prepare the file “graph.txt”
- Append a zero column to
links.txt
- Desired format: Each line contains “user_from user_to 0”.
- Append a zero column to
- Prepare the file “actions_log.txt”
- Preprocess
Ratings.time.txt
as follows - Remove the first line
- Remove the third column (rating)
- Convert the date column (the last column) to a column of timestamps
- Sort action logs on action-ids and tuples on an action are chronologically ordered.
- Desired format: Each line contains “user_id action_id timestamp”.
- Preprocess
- Prepare the file “actions_ids.txt”
- Desired format: Each line contains an action id
- Misc.
- Clean the file:
iconv -f utf-8 -t utf-8 -c Ratings.timed.txt > ratings.timed.txt
- Clean the file:
The preprocessing step
1 | #!/usr/bin/env python |
Codes for converting the output file
- Run:
./InfluenceModels -c config.txt
- We convert file
"edgesCounts.txt"
to a file"inf_prob.txt"
containing a set of directed edges with probabilities, using codes as follows.
My config file for InfluenceModels
1 | # Config file for learning paramters (Scan 1) |
Now, we try to get a weighted directed graph. The weights are the learned influence probabilities. BTW, I only keep the largest weakly connected component.
1 | #!/usr/bin/env python |
Other datasets
Here are several datasets that both user links and user action logs are available. You will find corresponding scripts in my Github repo.