xnhbx 2 days ago

> DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

A CEO who codes.

  • anonzzzies 2 days ago

    When my company was still working closely with CN factories a few years ago (before the bans / clients no longer wanting to work with companies working with china etc), the CEO's of the factories we worked with all were electronic engineers at that company or another before; they all could jump in, debug schematics, sold and write firmware themselves. And they did. These were places with massive campuses with towering buildings with robots and a few (relative to the massive space) employees doing maintenance etc + prototyping.

    • larodi 2 days ago

      It sounds so more reasonable to have a director who is actually technical, doesn't it? I'm absolutely amazed how this (to the east) contrasts to understanding (to the west) that directors rather need to know finance, strategic planning, and marketing, than the actual nuance of the work.

      • tway223 2 days ago

        To be blunt this is exactly what is wrong with the “leadership” mindset in the west, as decisions are often made without understanding the “nuances” yet they are confident it would work.

  • tantalor 2 days ago

    "developed" and "codes" have different meanings.

    • ikeashark 2 days ago

      Yes but in this context, they are very close to each other in meaning.

      Besides Liang does indeed code a significant amount and has contributed to almost all of their published papers.

danielhanchen 2 days ago

I attached all 3 algorithms 1F1B (1 forward 1 backward), ZB1P (zero bubble pipeline parallelism) and DualPipe as a picture here: https://x.com/danielhanchen/status/1894937006352031832 for those interested :)

puppycodes 2 days ago

Sorry for us utter simpletons can someone explain what it do?

  • fasterergpes 2 days ago

    It makes it so that having more GPUs makes inference run faster. Worst case has been you can only use memory from them and gain no speed at all

  • qrios 2 days ago

    In very simple words: it is one way to reduce the white squares in the picture from @danielhanchen[1].

    In more complex words: imagine a processor which is able to process every instruction in 10 clock cycles. But also the processor is able to get new input for this instruction on every clock cycle and starts to process this new input in a pipeline. After the first input you have to wait ten clock cycles. But if you feed the input line every time you will get the output also permanently.

    In the case of GPUs, it is now not only a topic of a single pipeline, but multiple in parallel. Depends on your data and algorithm it can be thousands in parallel.

    [1] https://x.com/danielhanchen/status/1894937006352031832

optimalplusone 2 days ago

I hope all the open sources Deepseek is doing encourages American labs to do more of the same. Surely they'll realize their momentum is more of a moat than their tech at any one point in time.

jpcom 2 days ago

Does this remind anyone else of the Pied Piper compression algorithm?

snake_doc 2 days ago

Hmm weren’t there also supposed to be the SM re-allocation, doesn’t look like it was included; I may have been mis-remembering the explanation.