-
To build a ML strange loop
To build a ML strange loop pdf version of this post part 2 of N Timothy Hanson January 2026 Abstract Springtail was founded to investigate open-ended model induction for science, a problem that is both primal and unsolved. This document is a both a summary of the work done at Springtail…
-
Sample efficiency, part 1: MLPs and Transformers
On sample efficiency – MLP’s and Transformers pdf version of this post Timothy Hanson August 10 2024 / revised October 23 2025 Abstract This is an experimental examination of the sample efficiency of MLPs and transformers. We show that while MLPs can be ‘perfectly’ sample efficient in terms of interpolation, transformers suffer from over-functionalization with…
-
From Pairwise to Higher Order Tensor Operations on GPUs
From Pairwise to Higher Order Tensor Operations on GPUs pdf version of this post Anosha Rahim & Timothy Hanson September 30, 2025 Pairwise primitives are a primary operational pattern in deep learning. They take two inputs and fuse them into a single output. Matrix multiplication, dot product, and element-wise or Hadamard product are all examples…
-
To make a (ML) strange loop
To make a (ML) strange loop pdf version of this post 1 Introduction A key component of science is model induction – the translation of observation into models. In an analogy11 Which can be formalized, to some degree – see [1]. to statistical mechanics & thermodynamics, one can think of this process as ‘pumping computational…
-
Guided Discrete Diffusion for Constraint Satisfaction Problems
Guided Discrete Diffusion for Constraint Satisfaction Problems pdf version of this post Justin Jung — January 10, 2025 Introduction AI for constraint satisfaction problems is an important field researched for more than half a century. Sudoku, a puzzle where no row, column, or block can have two of the same number, is a popular benchmark…
-
Geometric intuitions behind generalization
Geometric intuitions behind generalization pdf version of this post Timothy Hanson — January 13, 2025 Deep learning models generalize surprisingly well despite being overparameterized. Traditional measures of capacity, such as VC-dimension or Rademacher complexity, suggest that overparameterization should lead to overfitting – but it usually doesn’t. Prominent explanations for this phenomenon include that stochastic gradient…
-
Response to “Machines of Loving Grace” by Dario Amodei
“Machines of Loving Grace” is a very well-written, thoughtful, and interesting bit of prognostication on the future of AI11 Apparently, Amodei means one whom god loves – which makes the blog post title apt, even though the content has the directionality reversed: he (or we) love AI.. While long, it is not vacuously so; the…
-
Thoughts on Wolfram’s “What’s Really Going On in Machine Learning?”
Preface At a conference this past weekend I had the luck of meeting & having a good discussion with Stephen Wolfram. This mathematician / physicist / scientist has been a personal hero for many years – indeed, after A New Kind of Science came out, as part of a college course I made a VLSI…
-
Active learning for program induction
Active learning for program induction pdf version of this post Timothy Hanson & Justin Jung May 10 2024 Abstract This post goes into more detail on what is meant by active learning and how it relates to program induction. We discuss the use of a simulator for running a program (\(\sim \) compressed model), and…
-
Fast linear transforms using Butterfly Factorizations
This post discusses an older paper that shows how a clever idea (butterfly factorizations) can be learned in a typical deep-learning pipeline. It concludes with some conjecture how these fast algorithms can be applied to other slow, data-intensive algorithms: attention. Learning fast algorithms for linear transforms with butterfly factorizations. Tri Dao (author of FlashAttention), Albert…