February 3, 2015
There is an interesting new article out, which looks at some of the commonly used algorithms in data mining, and finds that they are generally not very accurate, or even reproducible.
Specifically, the study by Lancichinetti et al. (2015) looks at automated topic classification using the commonly used latent Dirichlet allocation algorithm (LDA), a machine learning process which uses a probabilistic approach to categorise and filter large
November 27, 2013
There are lies, damn lies, and statistics
It’s easy to knock statistics for being misleading, or even misused to support spurious findings. In fact, there seems to be a growing backlash at the automatic way that significance tests in scientific papers are assumed to be the basis for proving findings (an article neatly rebutted here in the aptly named post “Give p a chance!”). However, I think most of the time statistics are