eigenvalue 4 hours ago
  • mchiang 2 hours ago

    I’m one of the maintainers of Ollama.

    It’s amazing to see others build on top of open-source projects. Forks like RamaLama are exactly what open source is all about. Developers with different design philosophies can still collaborate in the open for everyone’s benefit.

    Some folks on the Ollama team have contributed directly to the OCI spec, so naturally we started with tools we know best. But we made a conscious decision to deviate because AI models are massive in size - on the order of gigabytes - and we needed performance optimizations that the existing approaches didn’t offer.

    We have not forked llama.cpp, We are a project written in Go, so naturally we’ve made our own server side serving in server.go. Now, we are beginning to hit performance, reliability and model support problems. This is why we have begun transition to Ollama’s new engine that will utilize multiple engine designs. Ollama is now naturally responsible for the portability between different engines.

    I did see the complaint about Ollama not using Jinja templates. Ollama is written in Go. I’m listening but it seems to me that it makes perfect sense to support Go templates.

    We are only a couple of people, and building in the open. If this sounds like vendor lock-in, I'm not sure what vendor lock-in is?

    You can check the source code: https://github.com/ollama/ollama

    • zozbot234 an hour ago

      These comments seem reasonable to me. Could you clarify the Ollama maintainers' POV wrt. the recent discussion of Ollama Vulkan support at https://news.ycombinator.com/item?id=42886680 ? Many people seem to be upset that this PR seems to have gotten zero acknowledgment from the Ollama folks, even with so many users being quite interested in it for obvious reasons. (To be clear, I'm not sure that the PR is in a mergeable state as-is, so I would disagree with many of those comments. But this is just my personal POV - and with no statement on the matter from the Ollama maintainers, users will be confused.)

      EDIT: I'm seeing a newly added comment in the Vulkan PR GitHub thread, at https://github.com/ollama/ollama/pull/5059#issuecomment-2628... . Quite overdue, but welcome nonetheless!

  • mohsen1 3 hours ago

    I see! Now I understand why I need to create those useless `Modelfile` files...

    I'm glad there is a more open source alternative to Ollama now.

    • zozbot234 an hour ago

      I don't get it. The 'Modelfile' files are used to save and restore chat history as well, set custom system prompts and lots of other stuff that would require custom coding with most other local AI frameworks. Llama.cpp certainly doesn't offer anything like that out of the box. Those sorts of complaints seem pointless to me.

  • ericyd 3 hours ago

    I wish this were on the readme. Or if it already is, I wish it were significantly higher up.

  • buyucu 4 hours ago

    Thanks for this context, I will give RamaLlama a try!

mckirk 4 hours ago

This looks great!

While we're at it, is there already some kind of standardized local storage location/scheme for LLM models? If not, this project could potentially be a great place to set an example that others can follow, if they want. I've been playing with different runtimes (Ollama, vLLM) the last days, and I really would have appreciated better interoperability in terms of shared model storage, instead of everybody defaulting to downloading everything all over again.

  • svilen_dobrev 10 minutes ago

    i just started to play with ollama and ramalama.. on linux. The models are quite some gigabytes.. not pretty to keep N copies..

    ollama stores things under ~/.ollama/models/blobs/ named sha256-whatevershaisit

    ramalama stores things under ~/.local/share/ramalama/repos/ollama/blobs/ named sha256:whatevershaisit

    Note the ":" in ramalama names instead of the "-" .. that may not fly under windows.

    if one crosslinks ramalama things over to ollama with that slight rename, ollama will remove them as they are not pulled via itself - no metadata on them.

    i guess vllm etc everybody-else has yet-another schema and/or metadata.

    btw Currently, arch-linux-wise, there is llm-manager (pointing to https://github.com/xyproto/llm-manager ), but it's made dependent on some of ollama packages, and can't be installed just by itself (without overforcing).

  • ggerganov 3 hours ago

    The llama.cpp tools and examples download the models by default to a OS-specific cache folder [0]. We try to follow the HF standard (as discussed in the linked thread), though the layout of the llama.cpp cache is not the same atm. Not sure about the plans for RamaLama, but it might be something worth to consider.

    [0] https://github.com/ggerganov/llama.cpp/issues/7252

    • sitkack 2 hours ago

      I think it would be the most important thing to consider, because the biggest thing that predecessor to RamaLama provided was a way to download a model (and run it).

      If there was a contract about how models were laid out on disk, then downloading, managing and tracking model weights could be handled by a different tool or subsystem.

jerrygenser an hour ago

122 points 2 hours ago yet this is currently #38 and not on the front page.

Strange. At the same time I see numerous items that are on the front page posted 2 hours or older with fewer points.

I'm willing to take a reputation hit on this meta post. I wonder why this got demoted so quickly from front page despite people clearly voting on it. I wonder if it has anything to do with being backed by YC.

I sincerely hope it's just my miss understanding of hn algorithm though

  • mchiang an hour ago

    Can confirm it doesn't. Many Ollama posts get pushed off the front page too despite having hundreds of points. Over time I understood. If they did this for YC companies, it would ruin the trust of HN, YC, and probably the most important to YC companies, the reputation of the startup itself.

    • zozbot234 20 minutes ago

      I assume this is what happens when many HN users just flag every AI- and LLM-related post out of sheer frustration with the reality distortion field around this particular topic.

2mlWQbCK 3 hours ago

What benefit does Ollama (or RamaLama) offer over just plain llama.cpp or llamafile? The only thing I understand is that there is automatic downloading of models behind the scenes, but a big reason for me to want to use local models at all is that I want to to know exactly what files I use and keep them sorted and backed up properly, so a tool automatically downloading models and dumping in some cache directory just sounds annoying.

  • rahimnathwani 2 hours ago

    IIRC it makes things a little easier, e.g. you don't need to specify a ClI flag to set how many layers to offload to GPU, and it provides an API that other programs on your system can use (e.g. openwebui).

    It's been a while since I used llama.cpp directly, and I don't know whether I'm correct about its current scope.

pzo 4 hours ago

To make it AI really boring all those projects need to be more approachable to non-tech savvy people, e.g. some minimal GUI for searching, listing, deleting, installing ai models. I wish e.g. this or ollama could work more as invisible AI models dependency manager. Right now every app that want to have STT like whisper will bundle such model inside. User waste more memory storage and have to wait to download big models. We had similar problems with and static libraries and then moved to dynamic linking libraries.

I wish your app could add some model as dependency and on install would download only if such model is not avialable locally. Also would check if ollama is installed and only bootstrap if also doesn't exist on drive. Maybe with some nice interface for user to confirm download and nice onboarding.

wsintra2022 2 hours ago

I’m using openwebui, can this replace ollama in my setup?

Y_Y 3 hours ago

So it's a replacement for Ollama?

The killer features of Ollama for me right now are the nice library of quantized models and the ability to automatically start and stop serving models in response to incoming requests and timeouts. The first send to be solved by reusing the Ollama models, but I can't see if the service is possible from my cursory look.

  • maxamillion 3 hours ago

    ramalama can just pull (almost) any arbitrary model off huggingface and run it ... you're not limited to just what ollama has repackaged into their non-standard format

baron-bourbon 3 hours ago

Does this provide a Ollama compatible API endpoint? I've got at least one other project running that only supports Ollama's API or OpenAI's hosted solution (ie. the API endpoint isn't configurable to use llama.cpp and friends)

glitchc 3 hours ago

Great, finally an alternative to ollama's convenience.

  • jniles 3 hours ago

    It sounds like this project isn't addressing the user convenience aspect of ollama, but rather the developer convenience.

    Hopefully both will be easy for users to play around with, but RamaLama should be easier to get your PR merged as a developer and swap out different registries. Vendor lock-in is rarely a good thing in the world of open source.

guerrilla 4 hours ago

> Running in containers eliminates the need for users to configure the host system for AI.

When is that a problem?

Based on the linked issue in eigenvalue's comment[1], this seems like a very good thing. It sounds like ollama is up to no good and this is a good drop-in replacement. What is the deeper problem being solved here though, about configuring the host? I've not run into any such issue.

1. https://news.ycombinator.com/item?id=42888129

  • sitkack 2 hours ago

    So you have never hit the issue so no one else has?

    • guerrilla an hour ago

      ... orrrrr I have never hit the issue, so that's why I'm asking.

      Calm down. It's Friday, time to relax, my friend. ;)

esafak 4 hours ago

Is this useful? Can someone help me see the value add here?

  • BubbleRings 4 hours ago

    Well, if you aren’t that great with Docker but you want to try out a variety of LLMs under Docker, how much would this help you? How much trouble is it to enable an LLM to reach outside of a container to make use of your GPU? How much does this tool help with that?

  • maxamillion 3 hours ago

    ramalama can just pull (almost) any arbitrary model off huggingface and run it ... you're not limited to just what ollama has repackaged into their non-standard format