Yishi Lin

  • Home

  • Archives

  • Dataset

  • Blog

  • Categories

  • Search

因果推断学习笔记(一):经典方法尝试之Propensity Score Matching (Lalonde's Dataset)

Posted on 2018-11-15 Edited on 2018-12-23 In 因果推断 , 学习笔记

谨以此文纪念一个8点下班买了一杯肥宅快乐奶茶学习直到半夜的美好夜晚。—— 2018/11/15 2:30am超困的碎碎念

入门因果推断是希望推测Causal Effect。仔细来说,假设已知道 counfounding variables,希望推测 treatment 对 outcome 的影响。

这个文档的目的是收集和尝试各种经典、基础的因果推断方式,感受一下不同方法的差异。 数据集使用了Lalonde,感觉各种入门的tutorial都在用这个数据集。

持续更新中,加油!

Read more »

因果推断论文阅读(一):Comparison of Approaches to Ad Mean (Facebook's paper)

Posted on 2018-11-04 In 因果推断 , 论文阅读

Paper信息

Gordon, B. R., Zettelmeyer, F., Bhargava, N., & Chapsky, D. (2017). A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook. Ssrn. https://doi.org/10.2139/ssrn.3033144

  • Link: https://www.kellogg.northwestern.edu/faculty/gordon_b/files/fb_comparison.pdf
  • Slides: https://www.ftc.gov/system/files/documents/public_events/945353/zettelmeyer_fb_fcc_11-3-2016_fz_slides_0.pdf

小结

读这篇文章的时候还没有读 Imbens&Rubin 的因果推断教材,但是依然可以很顺畅地读完,说明作者的写作功力真的是超好,解释能力一流。

一些小结:

  1. Section 4 Observational Approaches 提供了很多经典的 observational methods 的具体定义和优缺点,可以作为手册参考
  2. 一些Tricks
    • 用户活跃度的变量,都转化成了deciles。虽然文中没有解释具体原因,个人猜测是因为活跃度指标往往服从长尾分布,如果不做这个处理,容易导致估算propensity score的时候有问题,类似于机器学习套逻辑回归前需要做的特征预处理
    • 部分variables如年龄,转化成了one-hot
    • 用LASSO去掉了一些variables (Imbens & Rubin教课书中的做法)
    • 去掉了propensity score <0.05 和>0.95的样本 (Imbens & Rubin教课书中的做法)

一些读后感:

  1. 因果推断不是一个万能占卜机器,它对因果,或者说“实验结果”的推断的能力是有很大局限性的
  2. 因果推断里,相比于复杂的推断算法,个人感觉找到满足unconfoundedness的covariates更加困难

摘录

Summary of Contributions

  1. Shed light on whether - as is thought in the industry - observational methods using good individual-level data are “good enough” for ad measurement, or whether even good data prove inadequate to yield reliable estimates of advertising effects. Our results support the latter.
    • Generally, the observational methods overestimate ad effectiveness relative to the RCT, although in some cases, they significantly underestimate effectiveness.
    • These biases persist even after conditioning on a rich set of observables and using a variety of flexible estimation methods.
  2. Characterize the nature of the unobservable needed to use observational methods successfully to estimate ad effectiveness.
    • Obtaining such data is likely not trivial.
  3. The third contribution of our paper is to the literature on observational versus experimental approaches to causal measurement.
    • We analyzed whether the improvements in observational methods for causal inference are sufficient for replicating experimentally generated results in a large industry where such methods are commonly used. We found they do not—at least not with the data at our disposal.
Read more »

因果推断学习笔记:资料收集

Posted on 2018-11-04 Edited on 2019-05-23 In 因果推断 , 学习笔记

被几个causal inference的tutorial拉入了坑,觉得还蛮有意思的。 写一篇笔记收藏一些自己找到的的资料吧,持续更新中。

科普文章

  1. 统计之都上有一个因果推断系列,写得很好
    • 因果推断简介之一:从 Yule-Simpson’s Paradox 讲起
    • 因果推断简介之二:Rubin Causal Model (RCM) 和随机化试验
    • 因果推断简介之三:R. A. Fisher 和 J. Neyman 的分歧
    • 因果推断简介之四:观察性研究,可忽略性和倾向得分
    • 因果推断简介之五:因果图 (Causal Diagram)
    • 因果推断简介之六:工具变量(instrumental variable)
    • 因果推断简介之七:Lord’s Paradox
    • 因果推断简介之八:吸烟是否导致肺癌?Fisher versus Cornfield

教材

  1. Causality: Models, Reasoning and Inference. Judea Pearl. 偏重 causal diagram 一些。
  2. Causal Inference for Statistics, Social, and Biomedical. Sciences: An Introduction. Guido W. Imbens and. Donald B. Rubin. 个人感觉比较实用。
  3. Mostly harmless econometrics: An empiricist’s companion. Angrist, J. D., & Pischke, J. S.. 实用。作者对银河系漫游指南真的是真爱。
  4. Causal Inference: The Mixtape. Scott Cunningham. 可以在教授主页上下载到 PDF。内容 self-contained,很全。
Read more »

Hexo Deployment with GitLab CD

Posted on 2018-11-04 In 瞎折腾

终极目标

目标:修改好Hexo源文件,push到GitLab后,可以自动编译加部署到VPS。有了自动部署,就可以在任何可以git的地方修改博客啦,不用每台机子都装一大堆环境啦!

原材料

  1. Hexo站点一个,源码放在GitLab的private repo里
  2. VPS一个,之前靠人肉rsync编译好的站点文件上去

碎碎念

说实话,我觉得我还是只会在我的Mac上更新的,而且十有八九我还是会在本地编译和看看效果的。Anyway,折腾一下,也算是体验一把GitLab CI/CD 吧!

配置Hexo Deploy

首先需要配置hexo deploy。照着文档Hexo Deploy - Rsync修改_config.yml就可以了。由于之前一直靠rsync同步文件,所以这一步很简单,不需要额外安装或者配置其它东西。

配置修改后hexo d一下,确认配置正确。

Using SSH keys with GitLab CI/CD

相关文档

  1. GitLab Continuous Integration (GitLab CI/CD)
  2. Using SSH keys with GitLab CI/CD
  3. node - Docker

步骤

  1. 生成一对新的 SSH key(用已有的key也可以),把公钥放到server上。
  2. 顺着文档2完成“Add the private key as a variable to your project”这一步。
  3. 准备好.gitlab-ci.yml。

踩坑

  1. 在项目设置里配好SSH_PRIVATE_KEY这个variable后,CI始终报错,报错如下。一顿折腾,发现是手贱在配置variable的时候打勾了“protected”,这样会导致variable只对protected branch/tag可见。

    1
    2
    $ echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add - > /dev/null
    Enter passphrase for (stdin): ERROR: Job failed: exit code 1
  2. hexo d在rsync时报错如下。最后发现是docker需要安装rsync。

    1
    2
    3
    4
    5
    6
    7
    8
    $ hexo d
    INFO Deploying: rsync
    FATAL Something's wrong. Maybe you can find the solution here: http://hexo.io/docs/troubleshooting.html
    Error: spawn rsync ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:238:19)
    at onErrorNT (internal/child_process.js:413:16)
    at process.internalTickCallback (internal/process/next_tick.js:72:19)
    ERROR: Job failed: exit code 1

点击More可以看到完整的.gitlab-ci.yml.

Read more »

升级Hexo及NexT

Posted on 2018-11-04 In 瞎折腾

想要把blog重新写起来,作为一个对升级东西有执念的强迫症少年,趁此机会升级一下Hexo和主题吧。

升级过程主要参考了升级Hexo及NexT主题笔记 这篇文章。 碎碎念记一下过程,方便下一次升级(如果还有下一次233)。

替换npm源

之前一直用的是我科的源,今天不知道为啥不管用了,就试了一下传说中的“淘宝源”。装了cnpm试试,之后可以直接用cnpm代替npm。

1
npm install -g cnpm --registry=https://registry.npm.taobao.org

备份旧目录

虽然有git,还是复制一下目录,方便一会儿来回搬运配置文件。

1
mv dango.rocks dango.rocks.old

安装Hexo

1
2
3
4
npm install hexo-cli -g
hexo init dango.rocks
cd dango.rocks
npm install

hexo init之后煲了一会儿剧,好一会儿之后发现还卡在npm install。无法判断到底还有没有在安装,于是强行停掉换成cnpm install,秒速搞定。

运行一下Hexo的服务器(hexo s --debug),然后在浏览器打开http://localhost:4000,可以看到初始化好的博客。

Read more »

iOS App Extension for Password Manager

Posted on 2017-06-08 In 瞎折腾 - iOS

最近和小伙伴在写一个密码管理app名曰Pass - Passwordstore。为了庆祝App Store上线,打算解决一下issue里两个关于iOS integration/Safari Extension的request (Safari Extension, Integrate it better into iOS)。

Getting Started

首先,我都不知道原来1Password和LastPass可以开启Safari的密码填充。还好提issue的人给了1Password的例子。先附上1Password和LastPass官方关于这个功能(图文并茂)的文档。

  • 1Password: Use the 1Password extension to fill in Safari and apps on your iPhone, iPad, and iPod touch
  • LastPass: LastPass mobile - iOS - Autofilling with LastPass

简单来说,用法分两步

  • 设:在Safari/Chrome的分享列表里勾上1Password/LastPass
  • 用:在Safari/Chrome/支持的app里的登录页,点Share Icon,选上1Password/LastPass,然后挑选一下列出来的密码
Read more »

Scanning QR codes to import keys

Posted on 2017-06-06 In 瞎折腾

一篇笔记,记录一下实现App里扫码导入PGP Keys和SSH Keys的过程中遇到的问题。

首先遇到的问题是private keys可能很长,装不进一个QR code里,只能拆成多张QR codes。想到的解决方案有两种。

  • 用户端着手机手动一张一张地扫QR code,最后扫码的结果拼起来是一个完整的Key。
  • 支持扫多张QR codes串起来按顺序循环播放的gif,好处是扫起来比较顺手,只要端着手机对着gif等着自动一张一张扫完(滴滴滴滴滴滴-done)。

最后决定选择第二种方法,因为扫起来比较爽。而且假设可以自动识别第一张和最后一张QR Code,实现第二种方案的同时也算是默默地支持了第一种方案,毕竟手动点“下一张”来换QR Code的效果等价于播放了一个不循环的gif。趁此机会也学习一下PGP和SSH key的格式(有趣而无用的知识?)。

PGP Keys

这里只考虑扫ASCII-armored Keys转成的QR codes。根据OpenPGP Message Format (RFC 4880)的Forming ASCII Armor,OpenPGP实现的key应该都长这样。
Public keys:

Read more »

Learning Influence Probabilities In Social Networks

Posted on 2017-02-08 In Network Science

In my research about influence maximization, I need datasets with learned influence probabilities. This post is about two things.

  • Tools that I have been using.
  • Related datasets.
  • How I obtain the influence propbabilities on edges.

Github repository: yishilin14/learn-influence-prob (codes and datasets)

Background

Goal: Learn influence probabilities in social networks.
Paper: Goyal, A., Bonchi, F., & Lakshmanan, L. V. (2010, February). Learning influence probabilities in social networks. In _Proceedings of the third ACM international conference on Web search and data mining_ (pp. 241-250). ACM.

Source Codes:

  • http://www.cs.ubc.ca/~goyal/code-release.php
  • Download codes for this paper: Amit Goyal, Francesco Bonchi, Laks V.S. Lakshmanan, _A Data-based approach to Social Influence Maximization_, To appear in PVLDB 2012.
  • For more details about the software, please refer to the readme file inside. I will only focus on how to use the tool to learn influence probabilities on edges.

Compilation:

  • “Make” (I am using Archlinux & GCC5.3.0)
  • Solution to the error “‘getpid’ was not declared in this scope”: Add #include<unistd.h>

Bernoulli distribution under the static model:

  • Static model: independent of time and simply to learn
  • Bernoulli distribution under the static model: $p_{v,u}= A_{v2u} / A_{v}$

    • $p_{v,u}$: learned influence probability of v on u
    • $A_{v2u}$: the number of actions propagated from v to u
    • $A_{v}$: the number of actions v performed
    • The first scan of the tool outputs $A_{v2u}$ and $A_{v}$.

Now, we try to learn influence probabilities for some public datasets (with dirty codes).

Read more »

My paper templates and tricks

Posted on 2017-02-08 In 瞎折腾

This post keeps what I have Googled and what my friends have shared with me.

My latest paper templates are here: https://github.com/yishilin14/paper-template

Maintaining paper and its technical report (full version)

I most cases, in order to “tweak” a paper to fit within the page limit, I have to move some parts of the paper to a technical report (full version of the paper). It is hard to maintain and proof-reading two versions of the same paper in two sets of separated files.

I define some commands so that I don’t have to maintain two sets of LaTeX files. Command “\ignorespaces” prevents LaTex from inserting spaces between sentences.

Read more »

Facebook blueprint course notes

Posted on 2017-01-10 In 瞎瞅瞅

FB提供的glossary of ad terms

  • Glossary of ad terms
  • General glossary of terms
  • Facebook for Business

挑着看了Facebook blueprint里比较感兴趣的部分。看的是英文版,学习学习新术语。没有标注来源的引言都来自Facebook blueprint。

Introduction to Facebook Apps and Services

Audience Network: Extend the Reach of Your Facebook Campaigns

Audience Network helps advertisers extend the reach of their Facebook ad campaigns by delivering them in third-party mobile apps and mobile websites.

How It Works:
When you create an ad on Facebook for News Feed, your ads will also be delivered in Audience Network. Eligible ads are converted into different formats including native, banner, interstitial, or video ads on a growing number of Facebook-approved publishers.

Read more »
123
Yishi Lin

Yishi Lin

24 posts
11 categories
25 tags
RSS
GitHub E-Mail
© 2013 – 2021 Yishi Lin
Powered by Hexo v3.9.0
|
Theme – NexT.Gemini v7.3.0