AlphaEvolve + Lean = theorem prover

I don't mean to start a new project with every post, but here we are

Jul 05, 2025

(I still owe you some updates about language growing, or, as I’ve been calling it in my head recently, “Bayesian linguistics”, but I think I need to mess around with it a bit more before any update is interesting enough.)

Google’s DeepMind recently announced AlphaEvolve, which is a kind of genetic algorithm for generalistically improving algorithms with LLMs. One of their examples involves improving multiplication of 4x4 complex-valued matrices, but the setting is general enough for a multitude of problems:
· The user marks up the provided code into parts that may or may not be changed by AlphaEvolve, provides an evaluator function that returns how “good” the current code is according to some metric (for example, how performant);
· AlphaEvolve generates many solutions and evolves them according to their evaluation results, balancing the need for exploration vs. exploitation.

I notice that having a separate AlphaEvolve setup for improving each specific algorithm is a curious failure of imagination. I’m wondering if it could be used to simulate something more general, namely mathematical progress.

***

As I mentioned in my theorem marketplace post, formal theorem proving is one of the things where suggestion is very hard and verification is easy. LLMs represent a breakthrough in that domain, as in many others — at least you get to have *better* candidates to discard, and your search time decreases — but they currently don’t seem capable to deal with the amount of planning and pivoting and dealing with dead-ends required to create an actually complex proof.

Until AlphaEvolve, that is.

Proofs in Lean are code, after all.

Lean, as is known, has a magic word `sorry` used as a filler in the places of the proof that are yet to be written, without causing a syntax error. This corresponds to the process of working on parts of the proof one-by-one, so that Lean accepts them until you’ve finished building your structure.

What I have in mind is using AlphaEvolve to imitate this process of developing proofs. The initial state, scored 0, would just be the formulation of the theorem, followed by a `sorry`, and the final state, awarded the top score, would be a complete proof with no `sorries` in it. The idea is that, given the initial state, different descendants of it would be different attempts to break down the proof into possible sub-parts and work on them (potentially breaking them into sub-sub-parts etc.), using the genetic programming setup to simulate the chaotic and distributed state of mathematical research.

Setting up the evaluator function would be a challenge. Calls to another model to estimate the state of the progress may be necessary, or perhaps some heuristics would do for starters. I’d say it’s a challenge worth taking.

***

While I was sitting on this idea, I discovered that AlphaEvolve closed their expression-of-interest form (I don’t know when, but I think it’s quite recent).

I think this endeavour is non-trivial enough that we should ping them and ask if they’d be interested in a project like this; they are seeking interesting and impactful applications, and I think this could be one.

So, could anyone give me a warm intro to anyone in the AlphaEvolve team? (Such as authors in this paper). I’m also trying to find a reference among my friends — I think it makes more sense than trying to re-implement something like that ourselves.

Because building a better theorem prover would be pretty cool.

Rat and Tiger

Discussion about this post