Hello! I’m George (a.k.a. eigenfoo)

2022

Data Collection is Hard. You Should Try It.

5 minute read

No, seriously: collecting data is a good idea, even for selfish reasons!

2021

Streaming Data with Tornado and WebSockets

7 minute read

WebSockets with the Tornado web framework is a simple, robust way to handle streaming data. I walk through a minimal example and discuss why these tools are good for the job.

Joining Flatiron Health

less than 1 minute read

An exciting professional update: I’ve joined Flatiron Health as a data scientist!

`cryptics.eigenfoo.xyz` — A Dataset of Cryptic Crossword Clues Permalink

1 minute read

cryptics.eigenfoo.xyz is a dataset of cryptic crossword clues, collected from various blogs and publicly available digital archives.

Back to top ↑

2020

What I Wish Someone Had Told Me About Tensor Computation Libraries

13 minute read

In this blog post, we’ll break down what tensor computation libraries actually are, and how they differ. We’ll take a detailed look at some popular libraries, and end with an observation on the future of Theano in the context o...

`littlemcmc` — A Standalone HMC and NUTS Sampler in Python

1 minute read

Introducing littlemcmc — a lightweight and performant implementation of HMC and NUTS in Python, spun out of the PyMC project.

Floating-Point Formats and Deep Learning

10 minute read

Floating-point format is not a crucial consideration in deep learning, but it can make a significant difference. What is floating-point, why should you (a deep learning practictioner) care, and what can you do about it?

Transformers in Natural Language Processing — A Brief Survey

13 minute read

I’ve recently had to learn a lot about natural language processing — specifically, Transformer-based models. I thought I’d write up my reading and research and post it.

Adventures in Manipulating Python ASTs

6 minute read

I explored the possibility of simplifying PyMC4’s model specification API by manipulating the Python abstract syntax tree (AST) of the model code.

Back to top ↑

2019

Benchmarks for Mass Matrix Adaptation

9 minute read

I benchmarked various mass matrix adaptation methods in PyMC3. Sane defaults are easy to take for granted: it’s more nuanced than I initially expected!

Introducing `stan-vim`

less than 1 minute read

A Vim plugin for Stan, offering syntax highlighting, automatic indentation and code folding. Check it out!

Anatomy of a Probabilistic Programming Framework

13 minute read

In this blog post, we’ll break down what probabilistic programming frameworks are made up of, and how the various pieces are organized and structured. We’ll take a look at some open source frameworks as examples.

Graduated Cooper Union, Joining Point72

less than 1 minute read

Some exciting personal news: I’ve graduated from The Cooper Union and I’m joining Point72 Asset Management as a data scientist/research analyst!

Python Port of Common Statistical Tests are Linear Models Permalink

less than 1 minute read

I ported Jonas Lindeløv’s post, Common Statistical Tests are Linear Models from R to Python. Check it out on my blog, GitHub, or Binder!

Decaying Evidence and Contextual Bandits — Bayesian Reinforcement Learning (Part 2)

8 minute read

In this blog post, we’ll take a look at two extensions to the multi-armed bandit. The first allows the bandit to model nonstationary rewards distributions, whereas the second allows the bandit to model context.

Autoregressive Models in Deep Learning — A Brief Survey

11 minute read

My current project involves working with a class of fairly niche and interesting neural networks that aren’t usually seen on a first pass through deep learning. I thought I’d write up my reading and research and post it.

Modern Computational Methods for Bayesian Inference — A Reading List

5 minute read

An annotated reading list on modern computational methods for Bayesian inference — Markov chain Monte Carlo (MCMC), variational inference (VI) and some other (more experimental) methods.

Back to top ↑

2018

Modelling Hate Speech on Reddit — A Three-Act Play (Slide Deck)

1 minute read

A talk I gave about a recent project to model hate speech on Reddit. In this blog post, I describe the thought processes behind the project, and the stumbling blocks encountered along the way.

Probabilistic and Bayesian Matrix Factorizations for Text Clustering

7 minute read

This blog post summarizes some literature on probabilistic and Bayesian matrix factorization methods, keeping an eye out for applications to one specific task in NLP: text clustering.

Multi-Armed Bandits and Conjugate Models — Bayesian Reinforcement Learning (Part 1)

8 minute read

In this blog post I hope to show that there is more to Bayesianism than just MCMC sampling and suffering, by demonstrating a Bayesian approach to a classic reinforcement learning problem: the multi-armed bandit.

Cookbook — Bayesian Modelling with PyMC3

24 minute read

This is a compilation of notes, tips, tricks and recipes for Bayesian modelling that I’ve collected from everywhere: papers, documentation, peppering my more experienced colleagues with questions.

Understanding Hate Speech on Reddit through Text Clustering

14 minute read

A recent project on trying to model hate speech on Reddit through text clustering — from ‘nimble navigators’ to ‘swamp creatures’, ‘spezzes’ to the ‘Trumpire’.

Why Latent Dirichlet Allocation Sucks

11 minute read

Latent Dirichlet allocation is a well-known and popular model in machine learning and natural language processing, but it really sucks sometimes. Here’s why.

Fruit Loops and Learning - The LUPI Paradigm and SVM+

12 minute read

What is learning using privileged information (LUPI), how do I do it, and why should I care? A brief introduction to LUPI and SVM+.

Back to top ↑

Hello! I’m George (a.k.a. eigenfoo)

George Ho

2022

Data Collection is Hard. You Should Try It.

2021

Streaming Data with Tornado and WebSockets

Joining Flatiron Health

`cryptics.eigenfoo.xyz` — A Dataset of Cryptic Crossword Clues Permalink

2020

What I Wish Someone Had Told Me About Tensor Computation Libraries

`littlemcmc` — A Standalone HMC and NUTS Sampler in Python

Floating-Point Formats and Deep Learning

Transformers in Natural Language Processing — A Brief Survey

Adventures in Manipulating Python ASTs

2019

Benchmarks for Mass Matrix Adaptation

Introducing `stan-vim`

Anatomy of a Probabilistic Programming Framework

Graduated Cooper Union, Joining Point72

Python Port of Common Statistical Tests are Linear Models Permalink

Decaying Evidence and Contextual Bandits — Bayesian Reinforcement Learning (Part 2)

Autoregressive Models in Deep Learning — A Brief Survey

Modern Computational Methods for Bayesian Inference — A Reading List

2018

Modelling Hate Speech on Reddit — A Three-Act Play (Slide Deck)

Probabilistic and Bayesian Matrix Factorizations for Text Clustering

Multi-Armed Bandits and Conjugate Models — Bayesian Reinforcement Learning (Part 1)

Cookbook — Bayesian Modelling with PyMC3

Understanding Hate Speech on Reddit through Text Clustering

Why Latent Dirichlet Allocation Sucks

Fruit Loops and Learning - The LUPI Paradigm and SVM+

2017

Linear Discriminant Analysis for Starters

Portfolio Risk Analytics and Performance Attribution with Pyfolio

Hello World!