AI firms follow DeepSeek's lead, create cheaper models with distillation

etiam 12 hours ago

First presented in 2014/2015, to the best of my knowledge.

Guess that means it's been at least a decade now, that it's strange it hasn't been much public discussion of this sort of use.

The article's snippet from Yann LeCun is somewhat perplexing to me. There's no way he hasn't been aware of the possibility, and I have a hard time believing it's been beyond the means of Facebook's R&D division to implement it.

fspeech 9 hours ago

The drive so far has been to scale up to push performance, not to reduce cost. Although everyone offers cheaper versions of models they don't base their marketing on these. Now that the momentum of scaling up on training is slowing down they are pushing inference time computes and efficiency comes to the forefront.