sega_sai 5 days ago

I think the interface of LLM with formalized languages is really the future. Because here you can formally verify every statement and deal with hallucinations.

  • thesz 4 days ago

    So, you looking for Cyc [1], practically.

    [1] https://en.wikipedia.org/wiki/Cyc

    • trenchgun 4 days ago

      >The Cyc knowledge base involving ontological terms was largely created by hand axiom-writing

      >An inference engine is a computer program that tries to derive answers from a knowledge base. The Cyc inference engine performs general logical deduction.[8] It also performs inductive reasoning, statistical machine learning and symbolic machine learning, and abductive reasoning. The Cyc inference engine separates the epistemological problem from the heuristicproblem. For the latter, Cyc used a community-of-agents architecture in which specialized modules, each with its own algorithm, became prioritized if they could make progress on the sub-problem.

      Not really.

    • notTooFarGone 4 days ago

      You reminded me of my pains with OpenCyc in my student job. Thanks :D

    • auggierose 4 days ago

      Does Cyc have proofs?

      • cess11 4 days ago

        In a sense, yes, since it has a foundation in Prolog style facts and rules, and supposedly can output its reasoning.

        • auggierose 4 days ago

          Ok, sounds like in principle you could have proofs, but in practice, you don't?

          Are there any checks for the consistency of all facts?

          • cess11 4 days ago

            How could there be? Gödel gets in the way.

            • auggierose 4 days ago

              In most interactive theorem provers the check is simple: you single out the axioms you believe, and all other stuff you do is guaranteed to preserve consistency. That works in an ITP system because there are only a few axioms.

              If Cyc has 100000 axioms/facts, then that's a problem.

              • cess11 4 days ago

                It's not a theorem prover, it's more like a support system for decision making.

                They can probably isolate 100k facts from some domain for you if you pay for it, but their idea is to run inference on millions of "common sense" and everyday facts and have been for decades.

                • auggierose 4 days ago

                  That sounds to me like something that will be subsumed by the combination of ITP and AI.

                  • cess11 3 days ago

                    They've refused to die for many, many years by now. Not sure what you mean by ITP but if it's thrombocytopenia, they claim to have a lot of customers in medicine already.

              • thesz 4 days ago

                If axioms, facts and rules have types, these types can guide choice.

                • auggierose 4 days ago

                  Not a believer in types here, sorry. I'd rather have simple theorems instead without wading through a swamp of types first. And I am sure AI will be able to handle theorems just fine.

  • samweb3 4 days ago

    I am building Memelang (memelang.net) to help with this as well. I'd love your thoughts if you have a moment!

    • thesz 4 days ago

      You are building an SQL in disguise.

      First, you need to encode "memes" and relations between them at scale. This is not a language problem, it is data handling problem.

      Second, at some point of time you will need to query memes and relations between them, again, at scale. While expression of queries is a language problem, an implementation will heavily use what SQL engines does use.

      And you need to look at Cyc: https://en.wikipedia.org/wiki/Cyc

      It does what you are toing do for 40 (forty) years now.

      • jamilton 4 days ago

        The PHP implementation works by converting it to SQL, so I'm sure they're aware.

    • chvid 4 days ago

      Maybe include a section titled 'how is this different from and how is it related to' ie. relational algebra?

    • meindnoch 4 days ago

      Looks like an incomplete Prolog.

  • est 4 days ago

    > formalized languages is really the future

    Hmm, maybe it's time for symbolism to shine?

  • nybsjytm 4 days ago

    If this were the case, I don't see why we'd need to wait for an AI company to make a breakthrough in math research. The key issue instead is how to encode 'real-life' statements in a formal language - which to me seems like a ludicrous problem, just complete magical thinking.

    For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?

    Or have I totally misunderstood what people mean when they say that developments in automatic theorem proving will solve LLM's hallucination problem?

    • nine_k 4 days ago

      You can't talk about formally verifiable truthiness until you solve epistemology. This can be achieved formally in mathematics, with known principal limitations. Here strict theorem-proving, Lean-style, is viable.

      It can also be achieved informally and in a fragments way in barely-mathematical disciplines, like biology, linguistics, and even history. We have chains of logical conclusions that do not follow strictly, but with various probabilistic limitations, and under modal logic of sorts. Several contradictory chains follow under the different (modal) assumptions / hypotheses, and often both should be considered. This is where probabilistic models like LLMs could work together with formal logic tools and huge databases of facts and observations, being the proverbial astute reader.

      In some more relaxed semi-disciplines, like sociology, psychology, or philosophy, we have a hodgepodge of contradictory, poorly defined notions and hand-wavy reasoning (I don't speak about Wittgenstein here, but more about Freud, Foucault, Derrida, etc.) Here, I think, the current crop of LLMs is applicable most directly, with few augmentations. Still a much, much wider window of context might be required to make it actually productive, by the standards of the field.

      • nybsjytm 4 days ago

        Hmmm I think even in something very nominally nearby like theoretical physics, there's very little that's similar to theorem proving. I don't see how AlphaProof could be a stepping stone to anything like what you're describing.

        Generally, I think many people who haven't studied mathematics don't realize how huge the gulf is between "being logical/reasonable" and applying mathematical logic as in a complicated proof. Neither is really of any help for the other. I think this is actually the orthodox position among mathematicians; it's mostly people who might have taken an undergraduate math class or two who might think of one as a gateway to the other. (However there are certainly some basic commonalities between the two. For example, the converse error is important to understand in both.)

        • auggierose 4 days ago

          I don't think that mathematicians are actually in the best position to judge how math/logic might help in practical applications, because they are usually not interested in practical applications at all (at least for the last 70 years or so). Especially, they are also not interested in logic at all.

          But logic is very relevant to "being logical/reasonable", and seeing how mathematicians apply logic in their proofs is very relevant, and a starting point for more complex applications. I see mathematics as the simplest kind of application of logic you can have if you use only your brain for thinking, and not also a computer.

          "Being logical/reasonable" also contains a big chunk of intuition/experience, and that is where machine learning will make a big difference.

    • benlivengood 4 days ago

      Probabilistic reasoning is possible in a formal setting; It produces a probability distribution over answers. To ground probabilistic logic itself I'm not aware of much progress beyond the initial idea of logical induction[0].

      [0] https://arxiv.org/abs/1609.03543

      • nybsjytm 4 days ago

        This takes for granted a formal setting, which is what I'm questioning in any of these 'real world' contexts.

        • benlivengood 4 days ago

          > For example, how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language? What about "The theoretical basis of the October Revolution lay in a development of Marxism, but this development occurred through three successive rounds of theoretical debate"?

          > This takes for granted a formal setting, which is what I'm questioning in any of these 'real world' contexts.

          A formal model of semantics would likely be a low-level physical representation of possible states augmented with sound definitions of higher-level concepts and objects. I don't think humans are capable of developing a formal semantics that would work for your sentences (it's taken us hundreds of years to approach formalization of particle physics), but I think that an automated prover with access to physical experiments and an LLM could probably start building a more comprehensive semantics.

    • lukeschlather 4 days ago

      People talk about how LLMs need "more data" but data is really sort of a blunt instrument to try and impart the LLM a lot of experience. Unfortunately data isn't actually experience, it's a record of an experience.

      Now what we've seen with e.g. Chess and Go, is that when you can give a tensor model "real experience" at the speed of however many GPUs you have, the model is quickly capable of superhuman performance. Automatic theorem proving means you can give the model "real experience" without armies of humans writing down proofs, you can generate and validate proofs as fast as a computer will let you and the LLM has "real experience" with every new proof it generates along with an objective cost function, was the proof correct?

      Now, this might not let us give the LLMs real experience with being a teacher or dealing with Marxist revolutionaries, but it would be a sea change in the ability of LLMs to do math, and it probably would let us give LLMs real experience in writing software.

    • ianhorn 4 days ago

      This is a thing I'm working on, so I have some potentially useful thoughts. tl;dr, it doesn't have to be about encoding arbitrary real life statements to be super duper useful today.

      > how might an arbitrary statement like "Scholars believe that professional competence of a teacher is a prerequisite for improving the quality of the educational process in preschools" be put in a lean-like language?

      Totally out of scope in the any near future for me at least. But that doesn't prevent it from being super useful for a narrower scope. For example:

      - How might we take a statement like "(x + 1) (x - 5) = 0" and encode it formally?

      - Or "(X X^T)^-1 X Y = B"?

      - Or "6 Fe_2 + 3 H_2 O -> ?"?

      We can't really do this for a huge swath of pretty narrow applied problems. In the first, what kind of thing is X? Is it an element of an algebraically closed field? In the second, are those matrices of real numbers? In the third, is that 6 times F times e_2 or 6 2-element iron molecules?

      We can't formally interpret those as written, but you and I can easily tell what's meant. Meanwhile, current ways of formally writing those things up is a massive pain in the ass. Anything with a decent amount of common sense can tell you what is probably meant formally. We know that we can't have an algorithm that's right 100% of the time for a lot of relevant things, but 99.99% is pretty useful. If clippy says 'these look like matrices, right?' and is almost always right, then it's almost always saving you lots of time and letting lots more people benefit from formal methods with a much lower barrier to entry.

      From there, it's easy to imagine coverage and accuracy of formalizable statements going up and up and up until so much is automated that it looks kinda like 'chatting about real-life statements' again. I doubt that's the path, but from a 'make existing useful formal methods super accessible' lens it doesn't have to be.

  • tucnak 4 days ago

    Grammar sampling has been around for months, and remained largely unexplored. Don't fall into the common trap of thinking in a fixed language, rather think about a superset of possible languages (grammars) and how they could evolve from one another. I bet, if there's a breakthrough, it's probably in "differential grammars," or whatever it would be called: plug that into the backtracking sampler, & you have your System 2.

  • gosub100 4 days ago

    Ok so Claude says that the Riemann hypothesis is proven and gives you 1200 pages of math symbols backing it up. Now what?

    You probably say "now a human can verify it" but what if the humans are wrong? What is the source of truth?

    • Turneyboy 3 days ago

      The nice thing about formal verification is exactly that. You have a separate tool that's very much like a compiler that can check those 1200 pages and tell you that it's true.

      The source of truth here is the code we wrote for the formal verification system.

      • gosub100 3 days ago

        Could formal verification prove the nexus between the Taniyama-Shimura conjecture and Fermat's last theorem? I don't claim to know what the former is, but I am skeptical that a system that can only validate what we already know would have any luck with sudden leaps between disparate corners of the field. Or worse, what if we had this verification 30 years ago, Wiles gets excited thinking he proved a 300 year old mystery, only to be told by the algorithm, "sorry son, this code doesn't verify. Try again next time!"

  • raincole 5 days ago

    It's obviously not the future (outside of mathematics research). The whole LLM boom we've seen in the past two years comes from one single fact: peopel don't need to learn a new language to use it.

    • seizethecheese 5 days ago

      Both comments can be right. People don’t need to know HTML to use the internet.

    • nickpsecurity 5 days ago

      Natural language -> Formal Language with LLM-assisted tactics/functions -> traditional tools (eg provers/planners) -> expert-readable outputs -> layperson-readable results.

      I can imagine many uses for flows where LLM’s can implement the outer layers above.

  • Groxx 5 days ago

    The difficulty then will be figuring out if the proof is relevant to what you want, or simply a proof of 1=1 in disguise.

chvid 4 days ago

Mathematicians have been using computers, programming languages, and proof engines for over half a century; however breakthroughs in mathematics are still made by humans in any meaningful sense, even though the tools they use and make are increasingly complex.

But as things look now, I will be willing to bet that the next major breakthrough in maths will be touted as being AI/LLMs and coming out of one of the big US tech companies rather than some German university.

Why? Simply, the money is much bigger. Such an event would pop the market value of the company involved by a hundred billion - plenty of incentive right there to paint whatever as AI and hire whoever.

  • kzrdude 4 days ago

    But, these AI solutions are trying to solve math problems to prove their AI capabilities, not because they care about mathematics.

    • staunton 4 days ago

      Sure. Why do you say "but"? Solving such a math problem (while perhaps massively overstating the role AI actually played in the solution) would be great PR for everyone involved.

chompychop 4 days ago

Is it currently possible to reliably limit the cut-off knowledge of an LLM (either during training or inference)? An interesting experiment would be to feed an LLM mathematical knowledge only up to the year of proving a theorem, and then see if it can actually come up with the novel techniques used in the proof. For example, having only access to papers prior to 1993, can an LLM come up with Wiles' proof of FLT?

  • ogrisel 4 days ago

    That should be doable, e.g. by semi-automated curation of the pre-training dataset. However, since curating such large datasets and running pre-training runs is so expensive, I doubt that anybody will run such an experiment. Especially since would have to trust that the curation process was correct enough for the end-result to be meaningful. Checking that the curation process is not flawed is probably as expensive as running it in the first place.

  • n4r9 4 days ago

    There's the Frontier Math benchmarks [0] demonstrating that AI is currently quite far from human performance at research-level mathematics.

    [0] https://arxiv.org/abs/2411.04872

    • data_maan 4 days ago

      They didn't demonstrate anything. They haven't even released their dataset, nor mentioned how big it is.

      It's just hot air, just like the AlphaProof announcement, where very little is know about their system.

      • n4r9 3 days ago

        They won't publish the problem set for obvious reasons. And I doubt it's hot air, given the mathematicians involved in creating it.

nybsjytm 5 days ago

Why have they still not released a paper aside from a press release? I have to admit I still don't know how auspicious it is that running google hardware for three days apiece was able to find half-page long solutions, given that the promise has always been to solve the Riemann hypothesis with the click of a button. But of course I do recognize that it's a big achievement relative to previous work in automatic theorem proving.

  • whatshisface 5 days ago

    I don't know why so few people realize this, but by solving any of the problems their performance is superhuman for most reasonable definitions of human.

    Talking about things like solving the Reimman hypothesis in so many years assumes a little too much about the difficulty of problems that we can't even begin to conceive of a solution for. A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.

    • zeroonetwothree 5 days ago

      Well, multiply two large numbers instantly is a superhuman feat a calculator can do. I would hope we are going for a higher bar, like “useful”. Let’s see if this can provide proofs of novel results.

      • whatshisface 5 days ago

        The ability to multiply two numbers superhumanly has already transformed society.

      • rowanG077 5 days ago

        That superhuman capability of "multiplying two large numbers instantly" has transformed human society like not even plumbing has. I really can't see how this you could not consider this useful.

      • nybsjytm 5 days ago

        It's worth emphasizing that it's been possible for years to use an automatic theorem prover to prove novel results. The whole problem is to get novel interesting results.

    • GregarianChild 5 days ago

      We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs. So, in order to evaluate AlphaProof's achievements, we'd need to know how much of a shortcut AlphaProof achieved. A good proxy for that would be the total energy usage for training and running AlphaProof. A moderate proxy for that would be the number of GPUs / TPUs that were run for 3 days. If it's somebody's laptop, it would be super impressive. If it's 1000s of TPUs, then less so.

      • Onavo 5 days ago

        > We know that any theorem that is provable at all (in the chosen foundation of mathematics) can be found by patiently enumerating all possible proofs.

        Which computer science theorem is this from?

        • kadoban 5 days ago

          It's a direct consequence of the format of a proof. They're finite-length sequences of a finite alphabet of symbols, so of course they're enumerable (the same algorithm you use to count works to enumerate them).

          If you want a computer to be able to tell that it found a correct proof once it enumerates it, you need a bit more than that (really just the existence of automated proof checkers is enough for that).

        • zeroonetwothree 5 days ago

          It’s just an obvious statement. If a proof exists, you will eventually get to it.

          • j16sdiz 4 days ago

            Only if we take AC, I guess?

            • Tainnor 4 days ago

              No, this has nothing to do with choice.

        • E_Bfx 5 days ago

          I guess it is tautological from the definition of "provable". A theorem is provable by definition if there is a finite well-formulated formula that has the theorem as consequence (https://en.wikipedia.org/wiki/Theorem paragraph theorem in logic)

          • Xcelerate 5 days ago

            Not sure it’s a tautology. It’s not obvious that a recursively enumerable procedure exists for arbitrary formal systems that will eventually reach all theorems derivable via the axioms and transformation rules. For example, if you perform depth-first traversal, you will not reach all theorems.

            Hilbert’s program was a (failed) attempt to determine, loosely speaking, whether there was a process or procedure that could discover all mathematical truths. Any theorem depends on the formal system you start with, but the deeper implicit question is: where do the axioms come from and can we discover all of them (answer: “unknown” and “no”)?

            • Tainnor 4 days ago

              It's "obvious" in the sense that it's a trivial corollary of the completeness theorem (so it wouldn't be true for second order logic, for example).

              Hilbert's program failed in no contradiction to what GP wrote because the language of FOL theorems is only recursively enumerable and not decidable. It's obvious that something is true if you've found a proof, but if you haven't found a proof yet, is the theorem wrong or do you simply have to wait a little longer?

            • Turneyboy 3 days ago

              Yeah but width-first immediately gives you the solution for any finite alphabet. So in that sense it is trivial.

      • Davidzheng 4 days ago

        The shortcut vs enumeration is definitely enormous right? just take average shannon entropy to the exponent of the length for example will be probably > heat death (or whatever death) of universe on all of human compute (I'm assuming I didn't bother to check but it seems intuitively true by a margin)

    • nybsjytm 5 days ago

      > A better question is what can happen when everybody has access to above average reasoning. Our society is structured around avoiding confronting people with difficult questions, except when they are intended to get the answer wrong.

      What does this have to do with a hypothetical automatic theorem prover?

      • whatshisface 5 days ago

        Logic is pretty much absent from our culture and daily life, but that could be due to its limited supply.

        • nybsjytm 5 days ago

          Being logical in social life is pretty much completely different from being logical in a mathematical argument, especially in a formal theorem proving environment. (Just try to write any kind of cultural proposition in a formal language!)

          • whatshisface 5 days ago

            That's the way things are now, but this regime came about when proving things took intense concentration and specialized skills that very few people had. Contrast going to look something up in a library with googling something during a conversation.

  • GregarianChild 5 days ago

    Google stopped publishing interesting AI work since they had their AI lead taken away by OpenAI, and mostly with tech that was pioneered, but not monetised by Google like transformers.

    I imagine they are under pressure not to make this mistake again.

throwaway713 5 days ago

Anyone else feel like mathematics is sort of the endgame? I.e., once ML can do it better than humans, that’s basically it?

  • awanderingmind 4 days ago

    The end of an era perhaps, but not 'the end' - another commenter has already mentioned Moravec's paradox: https://en.wikipedia.org/wiki/Moravec%27s_paradox

    It will be interesting if/when these models start proving major open problems, e.g. the Riemann Hypothesis. The sociological impact on the mathematical community would certainly be acute, and likely lead to a seismic shift in the understanding of what research-level mathematics is 'for'. This discussion already appears to be in progress. As an outsider I have no idea what the timeline is for such things (2 years? 10? 100?).

    On the plus side, AlphaProof has the benefit over ordinary LLMs in their current form in that it does not pollute our common epistemological well, and its output is eminently interrogable (if you know Lean at last).

  • jdietrich 4 days ago

    Humans are terrible at anything you learn at university and incredibly good at most things you learn at trade school. In absolute terms, mathematics is much easier than laying bricks or cutting hair.

    https://en.wikipedia.org/wiki/Moravec%27s_paradox

    • youoy 4 days ago

      I would say that "narrow" mathematics (finding a proof of a given statement that we suspect has a proof using a formal language) is much easier that "generally" laying brick or cutting hair.

      But I cannot see how consistently doing general mathematics (as in finding interesting and useful statements to proof, and then finding the proofs) is easier than consistently cutting hair/driving a car.

      We might get LLM level mathematics, but not Human level mathematics, in the same way that we can get LLM level films (something like Avengers, or the final season of GoT), but we are not going to get Human level films.

      I suspect that there are no general level mathematics without the geometric experience of humans, so for general level mathematics one has to go through perceptions and interactions with reality first. In that case, general mathematics is one level over "laying bricks or cutting hair", so more complex. And the paradox is only a paradox for superficial reasoning.

      • staunton 4 days ago

        > But I cannot see how consistently doing general mathematics (as in finding interesting and useful statements to proof, and then finding the proofs) is easier than consistently cutting hair/driving a car.

        The main "absolute" difficulty there is in understanding and shaping what the mathematical audience thinks is "interesting". So it's really a marketing problem. Given how these tools are being used for marketing, I would have high hopes, at least for this particular aspect...

        • youoy 4 days ago

          Is it really marketing in general? I can agree with some of it, but for me the existence of the term "low hanging fruit" to describe some statements says otherwise...

    • looofooo0 4 days ago

      Sure but checking everything is correctly wired, plug-in, cut etc. Everything needes is thought of? There is plenty of things an AI could do to help a trades man.

  • GuB-42 4 days ago

    Not the endgame by far. Maybe the endgame for LLMs, and I am not even convinced.

    Maths is detached from reality. An AI capable of doing math better than humans may not be able do drive a car, as driving a car requires a good understanding of the world, it has to recognize object and understand their behavior, for example, understanding that a tree won't move but a person might, but it will move slower than another car. It has to understand the physics of the car: inertia, grip, control,... It may even have to understand human psychology and make ethical choices.

    Fully autonomous robots would be the endgame.

    • ykonstant 4 days ago

      >An AI capable of doing math better than humans may not be able do drive a car,

      Noo, but my excuse for being unable to drive a car is precisely that I am a quirky mathematician focused on research!

  • raincole 5 days ago

    If "better than humans" means when you give it a real world problem, it gives you a mathematical model to describe it (and does it better than human experts), then yes, it's the end game.

    If it just solves a few formalized problems with formalized theorems, not so much. You can write a program that solves ALL the problems under formalized theorems already. It just runs very slowly.

    • titanomachy 4 days ago

      I don’t think you can gloss over the importance of computational tractability here. A human could also start enumerating every possible statement in ZFC, but clearly that doesn’t make them a mathematician.

  • margorczynski 5 days ago

    I doubt it. Math has the property that you have a way to 100% verify that what you're doing is correct with little cost (as it is done with Lean). Most problems don't have anything close to that.

    • AlotOfReading 4 days ago

      Math doesn't have a property that you can verify everything you're doing is correct with little cost. Humans simply tend to prefer theorems and proofs that are simpler.

      • thrance 4 days ago

        You can, in principle, formalize any correct mathematical proof and verify its validity procedurally with a "simple" algorithm, that actually exists (See Coq, Lean...). Coming up with the proof is much harder, and deciding what to attempt to prove even harder, though.

        • AlotOfReading 4 days ago

          You can verify it with a simple algorithm, but that verification won't always be cheap. If it was, curry-howard would be incredibly uninteresting. It only seems cheap because we usually have little interest in exploring the expensive theorems. Sometimes we do though and get things like the 4 color theorem whose proof verification amounts to combinatorial search.

    • exe34 5 days ago

      to be fair, humans also have to run experiments to discover whether their models fit nature - AI will do it too.

      • margorczynski 5 days ago

        These kind of experiments are many times orders of magnitude more costly (time, energy, money, safety, etc.) than verifying a mathematical proof with something like Lean. That's why many think math will be one of the first to crack with AI as there is a relatively cheap and fast feedback loop available.

  • empath75 4 days ago

    Computers have been better than us at calculation since about a week after computers were invented.

    If a computer proves the Reimann Hypothesis, someone will say "Oh of course, writing a proof doesn't require intelligence, it's merely the dumb application of logical rules, but only a human could have thought of the conjecture to begin with."

  • roncesvalles 2 days ago

    I still think humans understand less than half of all the physics there is to understand.

  • AlexCoventry 4 days ago

    There are important forms of discernment and judgement which aren't captured by mathematics.

  • technotony 4 days ago

    Yes, because if AI can do maths then it can use that to improve the efficiency/quality of it's algorithms to self improve...

    • nybsjytm 4 days ago

      The quality of AI algorithms is not based on formal mathematics at all. (For example, I'm unaware of even one theorem relevant to going from GPT-1 to GPT-4.) Possibly in the future it'll be otherwise though.

    • SkiFire13 4 days ago

      ... or it might prove that it's impossible to self-improve given the current constraits

  • abrookewood 4 days ago

    I mean ... calculators can do better at mathematics than most of us. I don't think they are going to threaten us anytime soon.

wslh 5 days ago

If you were to bet on solving problems like "P versus NP" using these technologies combined with human augmentation (or vice versa), what would be the provable time horizon for achieving such a solution? I think we should assume that the solution is also expressible in the current language of math/logic.

  • zarzavat 5 days ago

    Probably a bad example, P vs NP is the most likely of the millennium problems to be unsolvable, so the answer may be "never".

    I'll bet the most technical open problems will be the ones to fall first. What AIs lack in creativity they make up for in ability to absorb a large quantity of technical concepts.

    • wslh 5 days ago

      Thank you for the response. I have a follow-up question: Could these AIs contribute to advancements in resolving the P vs NP problem? I recall that the solution to Fermat’s Last Theorem relied on significant progress in elliptic curves. Could we now say that these AI systems might play a similar role in advancing our understanding of P vs NP?

      • ccppurcell 5 days ago

        Just my guess as a mathematician. But if LLMs are good for anything it will be for finding surprising connections and applying our existing tools in ways beyond human search. There's a huge space of tools and problems, and human intuition and brute force searching can only go so far. I can imagine that LLMs might start to find combinatorial proofs of topological theorems, maybe even novel theorems. Or vice versa. But I find it difficult to imagine them inventing new tools and objects that are really useful.

        • nemonemo 5 days ago

          Do you have a past example where this already-proven theorem/new tools/objects would have been only possible by human but not AI? Any such example would make your arguments much more approachable by non-mathematicians.

    • meindnoch 4 days ago

      Ok, then the AI should formally prove that it's "unsolvable" (however you meant it).

      • less_less 4 days ago

        Unsolvable would mean that no proof exists, and no disproof exists, in whatever axiom system (eg ZF). While in some cases you can hack around this by proving eg "no proof and no disproof exist in ZF, if ZF is consistent", IIUC that's not always possible. It's like asking "when will AI be strong enough that it can always decide correctly whether a given computer program halts?"

        Then again, it's fairly likely that P=?NP is decidable but we just don't have any idea how to prove it. In that case the question is more or less "what's the time horizon until AI is vastly better than humans at formal math?", to which the answer is certainly "we don't know, there may be obstacles, just wait and see".

  • uptownfunk 5 days ago

    The hard part is in the creation of new math to solve these problems not in the use of existing mathematics. So new objects (groups rings fields) etc have to be theorized, their properties understood, and then that new machinery used to crack the existing problems. I think we will get to a place (around 5 years) where AI will be able to solve these problems and create these new objects. I don’t think it’s one of technology I think it’s more financial. Meaning, there isn’t much money to be made doing this (try and justify it for yourself) and so the lack of focus here. I think this is a red herring and there is a gold mine in there some where but it will likely take someone with a lot of cash to fund it out of passion (Vlad Tenev / Harmonic, or Zuck and Meta AI, or the Google / AlphaProof guys) but in the big tech world, they are just a minnow project in a sea of competing initiatives. And so that leaves us at the mercy of open research, which if it is a compute bound problem, is one that may take 10-20 years to crack. I hope I see a solution to RH in my lifetime (and in language that I can understand)

    • wslh 5 days ago

      I understand that a group of motivated individuals, even without significant financial resources, could attempt to tackle these challenges, much like the way free and open-source software (FOSS) is developed. The key ingredients would be motivation and intelligence, as well as a shared passion for advancing mathematics and solving foundational problems.

      • uptownfunk 5 days ago

        Ok but how do you get around needing a 10k or 100k h100 cluster

        • wslh 5 days ago

          It is well known that cloud services like Google Cloud subsidizes some projects and we don't even know if in a few years improvements will arise.

          • uptownfunk 4 days ago

            Possible but unlikely given how much demand there is and the pressure to deliver returns to shareholders, however sure it is possible. Right now search is very inefficient, the search space is massive. That is the main problem. You can have many sequences of text that sound plausible, but of them a much smaller number will be logically valid. This is the main challenge. Once we can search efficiently not just in semantically valid space but I suppose what you can call syntactically valid space then we will be able to crack this.

    • empath75 4 days ago

      > I think we will get to a place (around 5 years) where AI will be able to solve these problems and create these new objects.

      For all we know, buried deep in AlphaProof's attempts to solve these toy problems, it already tried and discarded several new ideas.

    • Davidzheng 4 days ago

      I think there's significant financial incentives for big tech given the scarcity of benchmarks for intelligence which are not saturated

  • hiddencost 5 days ago

    No one is focused on those. They're much more focused on more rote problems.

    You might find them used to accelerate research math by helping them with lemmas and checking for errors, and formalizing proofs. That seems realistic in the next couple of years.

    • nybsjytm 5 days ago

      There are some AI guys like Christian Szegedy who predict that AI will be a "superhuman mathematician," solving problems like the Riemann hypothesis, by the end of 2026. I don't take it very seriously, but that kind of prognostication is definitely out there.

      • Davidzheng 4 days ago

        link to this prediction? The famous old prediction of Szegedy was IMO gold by 2026 and that one is basically confirmed right? I think 2027/2028 personally is a breakeven bet for superhuman mathematician.

        • nybsjytm 4 days ago

          I consider it unconfirmed until it happens! No idea where I saw it but it was probably on twitter.

      • titanomachy 4 days ago

        I’m sure AI can “solve” the Riemann hypothesis already, since a human proved it and the proof is probably in its training data.

        • nybsjytm 4 days ago

          No, nobody has proved it.

          Side point, there is no existing AI which can prove - for example - the Poincaré conjecture, even though that has already been proved. The details of the proof are far too dense for any present chatbot like ChatGPT to handle, and nothing like AlphaProof is able either since the scope of the proof is well out of the reach of Lean or any other formal theorem proving environment.

          • Davidzheng 4 days ago

            what does this even mean? Surely an existing AI could reguritate all of Perelman's arxiv papers if we trained them to do that. Are you trying to make a case that the AI doesn't understand the proof it's giving? Because then I think there's no clear goal-line.

            • nybsjytm 4 days ago

              You don't even need AI to regurgitate Perelman's papers, you can do that in three lines of python.

              What I meant is that there's no AI you can ask to explain the details of Perelman's proof. For example, if there's a lemma or a delicate point in a proof that you don't understand, you can't ask an AI to clarify it.

sincerely 5 days ago

in the first question, why do they even specify ⌊n⌋ (and ⌊2n⌋ and so on) when n is an integer?

  • rishicomplex 5 days ago

    Alpha need not be an integer, we have to prove that it is

    • sincerely 5 days ago

      Should have read more carefully, thank you!

Robotenomics 5 days ago

“Only 5/509 participants solved P6”

  • nybsjytm 5 days ago

    This has to come with an asterisk, which is that participants had approximately 90 minutes to work on each problem while AlphaProof computed for three days for each of the ones it solved. Looking at this problem specifically, I think that many participants could have solved P6 without the time limit.

    (I think you should be very skeptical of anyone who hypes AlphaProof without mentioning this - which is not to suggest that there's nothing there to hype)

    • letitgo12345 4 days ago

      Think more is made of this asterix than necessary. Quite possible adding 10x more GPUs would have allowed it to solve it in the time limit.

      • nybsjytm 4 days ago

        Very plausible, but that would also be noteworthy. As I've mentioned in some other comments here, (as far as I know) we outside of DeepMind don't know anything about the computing power required to run alphaproof, and the tradeoff between computing power required and the complexity of problems it can address is really key to understanding how useful it might be.

    • auggierose 5 days ago

      Certainly an interesting information that AlphaProof needed three days. But does it matter for evaluating the importance of this result? No.

      • nybsjytm 5 days ago

        I agree that the result is important regardless. But the tradeoff of computing time/cost with problem complexity is hugely important to think about. Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.

        Three days per problem is, by many standards, a 'reasonable' amount of time. However there are still unanswered questions, notably that 'three days' is not really meaningful in and of itself. How parallelized was the computation; what was the hardware capacity? And how optimized is AlphaProof for IMO-type problems (problems which, among other things, all have short solutions using elementary tools)? These are standard kinds of critical questions to ask.

        • auggierose 4 days ago

          > Finding a proof in a formal language is trivially solvable in theory since you just have to search through possible proofs until you find one ending with the desired statement. The whole practical question is how much time it takes.

          No, in my experience the whole practical question is, can it be found automatically, or can it not be found automatically? Because there is an exponential search space that conventional automated methods will not be able to chew through, it either works, or it doesn't. AlphaProof shows that for some difficult IMO problems, it works.

        • dash2 5 days ago

          Though, if you start solving problems that humans can't or haven't solved, then questions of capacity won't matter much. A speedup in the movement of the maths frontier would be worth many power stations.

          • nybsjytm 5 days ago

            For some time a 'superhuman math AI' could be useful for company advertising and getting the attention of VCs. But eventually it would be pretty clear that innovative math research, with vanishingly few exceptions, isn't very useful for making revenue. (I am a mathematician and this is meant with nothing but respect for math research.)

            • empath75 4 days ago

              The big exception being predicting market movements, and I can't imagine how much money the hedge funds are spending on this right now.

            • airstrike 5 days ago

              "Making revenue" is far from being the only metric by which we deem something worthy.

              • nybsjytm 5 days ago

                As a mathematician, of course I agree. But in a sentence like:

                > A speedup in the movement of the maths frontier would be worth many power stations

                who is it 'worth' it to? And to what end? I can say with some confidence that many (likely most, albeit certainly not all) mathematicians do not want data centers and power stations to guzzle energy and do their math for them. It's largely a vision imposed from without by Silicon Valley and Google research teams. What do they want it for and why is it (at least for now) "worth" it to them?

                Personally, I don't believe for a second that they want it for the good of the mathematical community. Of course, a few of their individual researchers might have their own personal and altruistic motivations; however I don't think this is so relevant.

                • auggierose 4 days ago

                  There is something called applied math, and there is a big gulf between applied math and pure math. This new technology has the potential of making use of much more "pure math" for applied math, unifying much of pure math, applied math, and programming. I don't really care about the RH, I care about that.

      • andrepd 5 days ago

        More or less. Modern theorem provers, even fully automatic ones, can prove incredibly difficult problems if given enough time. With 3 days and terabytes of memory, perhaps they could? Would be interesting to compare Alphaproof with a standard theorem prover that is given similarly astronomical computing resources.

        • auggierose 5 days ago

          That is an interesting thought. But I doubt that standard automated theorem proving techniques would have solved this in a 100 years, even with immense resources, as all they do is (basically) brute force search, and it remains an exponential search space. Therefore 3 days does not really matter, indeed, 3 months would still be a result that 10 years ago I would have thought impossible.