Ask HN: Best LLM Stack for Q&A over Internal PDFs?

RAG and embedding-based search are the same thing AFAIK.

My approach is to stuff as many documents as possible directly into the context. The context windows of frontier models are large enough for my use case of ~20-40 documents. Context windows are 128K tokens for gpt-4o/o1/o3 and 1M for Gemini.

When stuffing all of them in one query isn't possible, split the documents into multiple queries and aggregate the answers.

I've tried RAG. But matching query embeddings to chunk embeddings isn't that straightforward. I noticed that relevant content was missed even with my modest number of documents. Semantic matching using query embeddings is one level above dumb keyword-matching but one level below direct queries to LLMs.

Direct LLM queries seem to perform the best especially when some intermediate understanding is required (like "Based on these documents, infer the industries where X technique may be useful"). That's not possible with simple embedding search unless some of the documents specifically use the umbrella word "industry" or its close synonyms.

Embedding search can probably be improved - like generating a synthetic answer and matching that answer's embedding to chunk embeddings. But I haven't tried such techniques.

muzani 13 hours ago

Langchain was the OG for PDF RAG. You don't need fine tuning or anything, it does embedding based search right out of the box.

ratg13 5 hours ago

Microsoft co-pilot does this out of the box

Just upload your documents to a OneDrive, Sharepoint, or Teams Site that you have access to and just start asking questions.