“Machines of Loving Grace” is a very well-written, thoughtful, and interesting bit of prognostication on the future of AI11 Apparently, Amodei means one whom god loves – which makes the blog post title apt, even though the content has the directionality reversed: he (or we) love AI.. While long, it is not vacuously so; the details are informative. I agree with most of the predictions, but take issue with three points:
-
It lacks a mechanistic model of what intelligence is and what is required for its application to science.
- I argue you need greater model-inference sample efficiency and reentrance – mechanistic consciousness.
- In biology and technology, the essay draws unmollified extrapolations to AI’s impacts, whereas the civil and social extrapolations are cautious.
- The role of objective + learning algorithm is overstated or misplaced in the context of neuroscience.
1 Mechanistic Intelligence
Mechanistic intelligence can be taken as the operationalization of past experience (data) for the selection and generation of behavior (policy) toward a particular goal (objective). Operationalization is equivalent to functionalization22 Neither of these terms means exactly what I want, but bear with me. (Functionalization means something very particular in chemistry, e.g.) – converting the data to computable functions that then can be used to predict the future and, most importantly, reduce the search space over solutions. MI is inherently bidirectional and recursive – policies may select sub-goals, goals may be to generate more data, or goals may be to invent new efficient ways of search (which deep learning is).
Current LLMs have operationalized much public data on the internet, and have done it in a compositional manner, such that they do generalize out-of-domain – they offer processing gain, as discussed in the response to Stephen Wolfram. There is little doubt that this compression + functionalization will increase in efficiency (through search by large companies), which will improve the fidelity, utility, and reasoning ability of their products. Likely this will require a step to bidirectional inference; this is as viable a path as what Springtail is attempting, but more on that another time3.
For the open-ended scientific problems discussed in Dario’s essay, existing data is insufficient; straightforward use of existing LLMs is also insufficient, as the models they build (in their latent spaces, attentional matrices, and token streams) is a subset of those required to model biological data and hence exploit it for prediction33 Andrew K Richardson believes that, given sufficient context and chain-of-thought prompting, LLMs will be able to emulate bidirectional inference used for bespoke model building, as traces of such occur in the training data. I think it is far more interesting and far less limiting to attack the problem directly.. Hence AlphaFold and AlphaProteo – models explicitly domain-engineered by humans. But even these require billion-dollars and man-centuries of investment (PDB, Uniprot), since they are not really data efficient. The meta-problem of building these models through AI agents is no easier, as it also requires exemplary data efficiency: there are a handful of examples of development trajectories.
‘Trajectories’ because these open-ended problems require many rounds of iteration:
- experiments to learn objectives (what is working, what is not),
- application of those directions to gather data,
- building models to predict the data,
- selection of experiments to invalidate and improve the models,
- examination of the data not just to predict but to infer causality, which then can be used to drive interventions,
- (and particularly in computer science and conscious thinking:) re-examination of computational traces to extract and amortize search. ad infinitum.
This in-the-wild scientific method and free-form reentrant data compression is data efficient to the point that it’s capable of, well, science. The loop is inherent to many other instances of problem solving, including software development and debugging.
1.1 Pumping
For many years, I’ve been using the useful but as-yet approximate analogy of ‘computational entropy pumping’ 44 Hat-tip, Daniel Dennet. The analogy occurred while visiting a friend in Phoenix – very hot! Much regular thermodynamic entropy had to be pumped to stay frosty.. The scientific method is such a pump: it moves information from one place (the real world) to another (a model) while compressing and cooling it (reduced Kolmogorov MDL). Backpropagation and evolution are other computational entropic pumps; the absolute dumbest pump and the one that started it all is search or enumeration. The scientific method is unique in that it pumps information back into itself to continually improve the speed and quality of the pumping mechanism; without such internal loops, search seems to succumbs to a combinatorial explosion. This requires reentrancy or Hofstader-style self-reference: the loop that closes all other loops.
Therefore, to get the bounty-of-gods scientific results that Dario envisions, you will need the machines to be conscious – at least mechanistically.55 To add more flavor to the pumping analogy: if thermodynamic entropy is first order, i.e. concerned with the attributes of individual particles (temperature, momentum, chemical state (enthalpy)), computational entropy measures the relations between individual particles. And the relations between relations, or relations which create new relations etc. To this end, George et al at Symbolica are working on hypergraph re-writing systems..
2 Societal Implications
Preface this that I think Dario’s extrapolations are far better calibrated (and likely) than the potentially inflammatory things I’ll say here. Yet I think the asymptotic behavior ought to be examined.
If much intellectual knowledge is operationalized, e.g. actionable
and
If all new data can be operationalized through the application of cheap computational-entropy pumps / artificial scientists
Then our economic utility as computational entropy pumps goes to the market rate, i.e. the cost of hardware + electricity.
Insofar as people keep information off the cloud and out of big data centers (as if that will happen!), this also entails a centralization of power. Information was always power, but now central actors are even more empowered to act upon it. Not just by despots, as mentioned, but also by tech firms.66 Which, so far, has not been so bad??
Fortunately, we still have a trove of personal and non-centralizable knowledge in our heads that ought to ballast against this tendency, and indeed history is rife with examples of centralization that are usually corrected for. Like Dario, I am hopeful that the arc of history bends toward justice and equity. TBD.
Regarding white collar / blue collar, common sense and the internet generally agrees: Anthropic et.al. are automating themselves out of jobs (modulo above), while blue-collar manual jobs will persist. To push this, too, to an uncomfortable asymptote: we’ll mostly be like Uber is now, meat-puppets centrally controlled by a mechanistic intelligence.
Dario references the socialist calculation debate when suggesting that AI won’t be used for governance. If it’s much better at aggregating information and making decisions (‘policy’ again), then why not?77 Using even ’mechanisitically’ conscious systems is admittedly quite a complicating factor here. Seems like the next step of our evolution. Took a billion years to develop a nervous system for smart local behavior generation, now we’ll add a layer on top of it, and thus individually become more like the cells in our body.
3 Neuroscience
Finally, a smaller bone to pick on his neuroscience commentary: the machine-learning delineation of objective function, learning algorithm, and memory substrate does not seem to directly apply to the brain. Many brilliant minds have looked for backprop or clean objective functions there, with little-to-sparse supporting evidence for either. Something far more interesting and intricate is going on, which of course may be germane to the discussion above. Deep learning is one remarkably general entropic pump, but it is certainly not the only nor the most powerful one.
That said, certainly the ‘blessing of dimensionality’ and scaling results are deeply meaningful and essential to understanding the emergent behavior of the brain. Yet it’s unclear if these results alone are insufficient to create a data-efficient learning substrate for the bootstrapped active learning of an artificial scientist; something must be going on with navigability and discrete search in model space.
Again, alas, TBD.
Tim Hanson
October 15 2024
Appendix
Roughly, we are attempting a end run-around the data requirements of deep learning: train networks to solve small problems by amortizing backtracking search so they internalize general rules of inference that can be applied to successively larger problems. Hence, while the immediate substrate of learning (the weights) learns somewhat slowly (gradient descent), the activations and intermediate variables are updated quickly through learned functions.
We aim to prove that the pump works starting from zero, and so handicap ourselves by not leveraging the inference policies implicit in human data – aka LLMs – but by doing so we use much smaller models that are orders of magnitude faster and cheaper to iterate on.
Adding reentrancy is a separate and much harder problem.