Hard problems that reduce to document ranking

318 points by noperator 4 days ago

obblekk 4 days ago

The open source ranking library is really interesting. It's using a type of merge sort where the comparator function is an llm comparing (but doing batches >2 for fewer calls).

Reducing problems to document ranking is effectively a type of test-time search - also very interesting!

I wonder if this approach could be combined with GRPO to create more efficient chain of thought search...

https://github.com/BishopFox/raink?tab=readme-ov-file#descri...

rahimnathwani 4 days ago

The article introducing the library has something about how pairwise comparisons are most reliable (i.e. for each pair of items you ask an LLM which they prefer) but computationally expensive. Doing a single LLM call (rank these items in order) is much less reliable. So they do something in between that gives enough pairwise comparisons to have a more reliable list.
https://news.ycombinator.com/item?id=43175658

hexator 4 days ago

This furthers an idea I've had recently that we (and the media) are focusing too much on creating value by making more ever more complex LLMs, and instead we are vastly underestimating creative applications of current generation AI.

crazygringo 3 days ago

Why not both?
The LLM companies work on the LLMs, while tens of thousands of startups and established companies work on applying what already exists.
It's not either/or.
noperator 4 days ago

Agree. I think LLMs are usually not "harnessed" correctly for complex, multi-step problems—hence the `raink` CLI tool: https://github.com/noperator/raink
barrenko 3 days ago

Currently we are using mllms like lego blocks to build lego-powered-like devices.

antirez 4 days ago

One interesting thing about LLMs, that is also related to why chain of thoughts work so well, is that they are good at sampling (saying a lot of things about a problem), and are good, when shown N solutions, to point at the potentially better one. They do these things better than zero-shot "tell me how to do that". So CoT is searching inside the space of representation + ranking, basically. So this idea is leveraging something LLMs are able to clearly do pretty well.

noperator 4 days ago

A concept that I've been thinking about a lot lately: transforming complex problems into document ranking problems to make them easier to solve. LLMs can assist greatly here, as I demonstrated at inaugural DistrictCon this past weekend.

lifeisstillgood 4 days ago

So would this be 1600 commits and one of which fixes the bug (which might be easier with commit messages?) or is this a diff between two revisions, with 1600 chunks, each chunk a “document” ?
I am trying to grok why we want to find the fix - is it to understand what was done so we can exploit unpatched instances in the wild?
Also also
“identifying candidate functions for fuzzing targets“ - if every function is a document I get where the list of documents is, what what is the query - how do I say “find me a function most suitable to fuzzing”
Apologies if that’s brusque - trying to fit new concepts in my brain :-)
- noperator 4 days ago
  
  Great questions. For commits or revision diffs as documents—either will work. Yes, I've applied this to N-day vulnerability identification to support exploit development and offensive security testing. And yes, for fuzzing, a sensible approach would be to dump the exported function attributes (names, source/disassembled code, other relevant context, etc.) from a built shared library, and ask, "Which of these functions most likely parses complex input and may be a good candidate for fuzzing?" I've had some success with that specific approach already.

rfurmani 4 days ago

Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: https://sugaku.net/oa/W4401043313/)

mskar 4 days ago

Great article, I’ve had similar findings! LLM based “document-chunk” ranking is a core feature of PaperQA2 (https://github.com/Future-House/paper-qa) and part of why it works so well for scientific Q&A compared to traditional embedding-ranking based RAG systems.

noperator 4 days ago

That's awesome. Will take a closer look!

tbrownaw 3 days ago

So instead of testing each patch, it's faster to "read" it and see if it looks like the right kind of change to be fixing a particular bug. Neat.

adamkhakhar 4 days ago

I'm curious - why is LLM ranking preferred over cosine similarity from an embedding model (in the context of this specific problem)?

panarky 4 days ago

Because the question "does Diff A fix Vuln B" is not answered by the cosine distance between vector(Diff A) and vector(Vuln B).
- janalsncm 3 days ago
  
  You can learn a function that embeds diffs with vulnerability A near each other, and vulnerability B near each other, etc which is much more efficient than asking an LLM about hundreds of chunks one at a time.
  Maybe you even use the LLM to find vulnerable snippets at the beginning, but a multi class classifier or embedding model will be way better at runtime.
  - telotortium 3 days ago
    
    Perhaps you can learn such a function, but it may be hard to learn a suitable embedding space directly, so it makes sense to lean on the more general capabilities of an LLM model (perhaps fine-tuned and distilled for more efficiency).
    
    janalsncm 2 days ago
    
    In principle, there is no reason why an LLM should be able to do better than a more focused model, and a lot of reasons why it will be worse. You’re wasting a ton of parameters memorizing the capital of France and what the powerhouse of a cell is.
    If data is the issue you can probably even generate vulnerabilities to create a synthetic dataset.
  - jasonjmcghee 3 days ago
    
    I've thought about this and am very interested in this problem. Specifically, how can you efficiently come up with a kernel function that maps a "classic" embedding space to answer a specific ranking problem?
    With enough data, you could train a classic ml model, or you could keep the llm in the inference pipeline, but is there another way?
    
    janalsncm 2 days ago
    
    The typical methods would be
    1. Train an embedding model which forces “similar” inputs close together using triplet loss. Here “similar” can mean anything, but you would probably want to mark similar vulnerabilities as being similar.
    2. If you have a fixed set of N vulnerabilities you can train a multi class classifier. Of course it’s a pain in the ass to add a new class later on.
    3. For any particular vulnerability you could train a ranking model using hinge loss. This is what most industrial ranking and recommendation systems do.
- adamkhakhar 3 days ago
  
  "does Diff A fix Vuln B" is not the ranking solution proposed by the author. the ranking set-up is the same as the embedding case.
- daralthus 4 days ago
  
  What if u embed `askLLM("5 thins this Diff could fix" + chunk)` instead of `chunk`? That should be closer in the latent space.

patapong 3 days ago

Interesting insight, and funny in a way since LLMs themselves can be seen as a specific form of document ranking, i.e. ranking a list of tokens by appropriateness as continuation of a text sequence.

westurner 4 days ago

Ranking (information retrieval) https://en.wikipedia.org/wiki/Ranking_(information_retrieval...

awesome-generative-information-retrieval > Re-ranking: https://github.com/gabriben/awesome-generative-information-r...

Everdred2dx 4 days ago

Very interesting application of LLMs. Thanks for sharing!

jasonjmcghee 3 days ago

I see in the readme you investigated tournament style, but didn't see results.

How'd it perform compared to listwise?

Also curious about whether you tried schema-based querying to the llm (function calling / structured output). I recently tried to have a discussion about this exact topic with someone who posted about pairwise ranking with llms.

https://lobste.rs/s/yxlisx/llm_sort_sort_input_lines_semanti...

marcosdumay 4 days ago

Hum... The gotcha is that LLMs can rank for subject relevance, but not for most other kinds of quality.

ambicapter 4 days ago

What other kinds of quality are you thinking of?
- o11c 4 days ago
  
  I'll be happy when I meet an LLM that doesn't randomly inject/ignore the word "not".

m3kw9 4 days ago

That title hurts my head to read

moralestapia 4 days ago

Minor nitpick,

Should be "document ranking reduces to these hard problems",

I never knew why the convention was like that, it seems backwards to me as well, but that's how it is.

dwringer 4 days ago

"Document ranking reduces to these hard problems" would imply that document ranking is itself an instance of a certain group of hard problems. That's not what the article is saying.
- moralestapia 4 days ago
  
  I know its counterintuitive, as I explained in my comment, but that's the correct terminology in CS world.
  - markerz 4 days ago
    
    I want to hear more about your point of view, because I disagree and am curious if there's another definition of "reduce". In my CS world, reduce is a term that you use to take a list of stuff and return a smaller list or instance of stuff. For example: [1, 2, 3].reduce(+) => 6. The title would go like [hardProblem1, hardProblem2, hardProblem3].reduce(...) => documentRanking. I think this mental model works for the non-CS world. So I'm curious what your viewpoint is.
    
    moralestapia 4 days ago
    
    In (Theoretical) Computer Science it is sometimes helpful to be able to say "Any instance of an A-type Problem can be transformed into an instance of a B-type Problem by applying this polynomial-time procedure".
    Say you have a problem that you know reasonably well (A-type) and another one that you're studying (B-type), intuitively, you'd say "If I transform B to A and I know the solution to A, then I solved B" but what you actually need to do is to transform A to B, this is called "reducing A to B", for some reason, and then you can say things like "B is at least as complex as A" and "I can solve some instances of B the way I solve the general case of A".
    This doesn't really apply here since neither the "hard problems" TFA mentions nor "document ranking" are canonical problems that you would typically use in these proofs, but since he's borrowing the term from this part of CS I wanted to make that remark on its proper use. Hence why I wrote "minor nitpick".
    The reduce operation that you mentioned doesn't make sense within the context of the article.
    
    riwsky 4 days ago
    
    Wikipedia: “Intuitively, problem A is reducible to problem B, if an algorithm for solving problem B efficiently (if it existed) could also be used as a subroutine to solve problem A efficiently.”
    The article takes for granted that LLM-driven listwise comparisons efficiently solve document ranking (problem B), and then shows this can also be used as a subroutine to solve various hard problems like vulnerability analysis (problems A) efficiently.
    
    moralestapia 4 days ago
    
    Hmm, the discussion here requires a deeper understanding of these concepts, much deeper than a casual read of one sentence picked from Wikipedia.
    I wouldn't have wrote that sentence to introduce people to reduction, because it misses a very important property of the operation that changes the whole thing. That sentence could lead you to think that reducing A <= B is the same as reducing B <= A, which is not always true. To see why, try to understand [1].
    There's a reason why the reduction equivalence classes form a preorder, as stated on the Wikipedia page you quoted.
    1: https://news.ycombinator.com/item?id=43179918
    
    riwsky 4 days ago
    
    > When this is true, solving A cannot be harder than solving B. "Harder" means having a higher estimate of the required computational resources in a given context (e.g., higher time complexity, greater memory requirement, expensive need for extra hardware processor cores for a parallel solution compared to a single-threaded solution, etc.). The existence of a reduction from A to B can be written in the shorthand notation A ≤m B, usually with a subscript on the ≤ to indicate the type of reduction being used (m : mapping reduction, p : polynomial reduction).
    The article indeed argues that solving n-day vulnerability discovery is no harder than document ranking. It does not argue that document ranking is no harder than n-day discovery, because it assumes that most people already would assume that; nor does it set out to disprove it.
    
    moralestapia 3 days ago
    
    To be honest, even the article is not a formal attempt at proving all of this, nor does it have to, it's a bit of a conjecture, but anyway.
    My reasoning goes along this line, which of the following two sentences you think is more likely to be true?
    A. All n-day vulnerability discovery problems can be mapped to a document rerank problem.
    or
    B. Some n-day vulnerability discovery problems can be mapped to a document rerank problem.
    I lean towards B, without any proof for it. Hence why I think the correct way is to reduce reranks into n-days (and all the other "hard problems").
    If you think A is true, you still have to show that all reranks can be reduced to n-days, to be rigorous and able to say that your proposed algorithm works in both domains.
    In the end it could be that both alternatives are equivalent, but it's easier to just say "reranks can be reduced to n-days" and because of this "some n-days can be adequately solved by my algorithm that works in reranks".
    
    riwsky 3 days ago
    
    Even if you believe B, that’s still not reducing document ranking to n-day vulnerability discovery; it’s “reducing some n-day discovery to document ranking”.
    A does not require demonstrating doc-rank problems map to n-day problems, since reduction isn’t required to be symmetric.
    Where you might be getting caught up is in the mapping of problems vs the mapping of solutions. “Nday reduces to docrank” implies we can turn every nday problem into a docrank problem, and every docrank solution into an nday solution, but does not say anything about whether we can turn docrank problems into nday problems, or nday solutions into docrank solutions.
    
    moralestapia 3 days ago
    
    >Even if you believe B, that’s still not reducing [...]
    I agree.
    >A does not require demonstrating doc-rank problems map to n-day problems
    Of course you have to, that's pretty much the whole operation of reducing A to B. You may be doing it implicitly but you're doing it for sure.
    >since reduction isn’t required to be symmetric
    That's exactly my point. A <= B is not necessarily the same as B <= A, even though in some cases, at a first glance, it seems to be the case. Your choice of which one is A or B will change how you construct whatever proof you want to come up with.
    I would choose to do "reduce rerank to n-day (and all others)", because it feels like it would be easier down the road, but also because one typically reduces the problem that one knows best (more general, known bounds solutions, etc...) into the one that you're studying. That's why I wrote "minor nitpick".
    Think about the textbook example of 3-SAT and all other problems it reduces to: clique, vertex cover, independent set, ... one does not reduce these problems into 3-SAT, it's the other way around. That doesn't mean you couldn't, some of them may have an complete equivalences to 3-SAT, but it's just easier to work out everything if you go from 3-SAT to the rest. My argument is the same, rerank is the thing that you reduce to all others.
    
    riwsky 2 days ago
    
    > >A does not require demonstrating doc-rank problems map to n-day problems > Of course you have to, that's pretty much the whole operation of reducing A to B. You may be doing it implicitly but you're doing it for sure.
    But the article isn’t mapping docrank to nday, nor is it claiming to reduce docrank to nday. Its choice of A and B is clear.
    Nday didn’t have an understood solution to leverage like docrank did.
    > one typically reduces the problem that one knows best (more general, known bounds solutions, etc...) into the one that you're studying
    Wikipedia: “”” There are two main situations where we need to use reductions:
    First, we find ourselves trying to solve a problem that is similar to a problem we've already solved. In these cases, often a quick way of solving the new problem is to transform each instance of the new problem into instances of the old problem, solve these using our existing solution, and then use these to obtain our final solution. This is perhaps the most obvious use of reductions. “””
    This is what the article does, reducing the new nday problem into the known docrank one.
    “”” Second: suppose we have a problem that we've proven is hard to solve, and we have a similar new problem. We might suspect that it is also hard to solve. We argue by contradiction: suppose the new problem is easy to solve. Then, if we can show that every instance of the old problem can be solved easily by transforming it into instances of the new problem and solving those, we have a contradiction. This establishes that the new problem is also hard. “””
    This is your ‘transform A to B, this is called "reducing A to B", for some reason, and then you can say things like "B is at least as complex as A" and "I can solve some instances of B the way I solve the general case of A’. “Nday discovery is at least as complicated as document ranking” is obvious, though, which is why it’s not the subject of a blog post.
    
    moralestapia 2 days ago
    
    >Nday didn’t have an understood solution to leverage like docrank did.
    Hence why you do docrank reduced to n-day. You typically map known into unknown. That's the point I'm trying to make.
    I'm not saying the alternative is wrong, though, it's just not what one usually does.
    (btw, even though this format makes everything look confrontative, I'm actually enjoying this thread a lot :D)
    
    riwsky 2 days ago
    
    "can map A to B, solve B, and use that to solve A" establishes the pre-order A ≤ B; you seem to pronounce "A ≤ B" as "B reduces to A", which indeed lines up with the wikipedia page for preorders in general—"when a≤b, one may say that b covers a or that a precedes b or that b reduces to a"—but compsci pronounces that the other way around, as per the Reduction (complexity) wiki page: "The existence of a reduction from A to B can be written in the shorthand notation A ≤m B, usually with a subscript on the ≤ to indicate the type of reduction being used (m : mapping reduction, p : polynomial reduction)". As inelegant as the discrepancy may be, the compsci conjugation is much more common than the category-theoretic one, and this is a programming forum. Just think of it like an irregular verb.
  - Ar-Curunir 3 days ago
    
    If A reduces to B, it means that an algorithm implementing B can be used (with some pre- and post-processing) to solve A.
    If A reduces to B, it means that B is at least as hard as A.
    This is the standard terminology in every theoretical computer science; see for example the DPV textbook on page 210: https://github.com/eherbold/berkeleytextbooks/blob/master/Al...
    
    moralestapia 3 days ago
    
    Isn't that what I wrote on [1]?
    Do you have something to add or is it just ... a confirmation?
    Weird.
    1: https://news.ycombinator.com/item?id=43179918
    
    iamnotempacc 3 days ago
    
    Are you trolling or what. Let's see what's written in the comment above.
    > If A reduces to B, it means that an algorithm implementing B can be used (with some pre- and post-processing) to solve A.
    Here we say that that we can solve a "hard problem" if we can express it in terms of the "Document Ranking" problem.
    Let's rewrite that quoted sentence:
    An algorithm implementing "Document Ranking" can be used (with some pre- and post-processing) to solve "Hard problem".
    Let's do substitution in the first part of the sentence, "If A reduces to B", where A is "hard problem" and B is "Document Ranking":
    Hard problem reduces to Document Ranking.
    That means EXACTLY that we can USE Document Ranking to SOLVE the Hard Problem. Just as we wanted.
    
    moralestapia 3 days ago
    
    Ar-Curunir: If A reduces to B, it means that an algorithm implementing B can be used (with some pre- and post-processing) to solve A.
    moralestapia: If I transform B to A and I know the solution to A, then I solved B.
    Ar-Curunir: If A reduces to B, it means that B is at least as hard as A.
    moralestapia: [...] B is at least as complex as A
    Can you even read?
    
    Ar-Curunir 3 days ago
    
    No, that is not what you wrote. You wrote “document ranking reduces to these hard problems", which means that document ranking can be solved with an algorithm for one of those hard problems. The article discusses the opposite: those hard problems can be solved by using algorithms for document ranking (which is itself a non-trivial problem)
bubblyworld 3 days ago

Not quite - in complexity theory you say problem A reduces to problem B if an oracle for problem B can be used to solve problem A. So the title of the article is correct, as an oracle for document ranking (LLMs in this case) can be used to solve a list of hard problems (given in the article).
- moralestapia 3 days ago
  
  Wrong.
  At least bother to read the discussion in the sibling comments.
  - bubblyworld 3 days ago
    
    It's standard terminology. I'm not going to waste time arguing about it.
    
    moralestapia 3 days ago
    
    Define what's an oracle for you, that's a concept that's not even needed for this discussion.
    I don't think you understand what is being talked about here.