These are potentially confusing terms and concepts that deserve careful articulation. This article fell out of a detailed conversation in the OpenSiders Slack channel intended to clarify many of the underlying elements of GPT when used in chained processes to introduce information that OpenAI’s LLMs are incapable of knowing.
There is a misconception that LangChain’s purpose is to make chain process steps from the GPT query to the final inference output easier. This is possible, of course, but it’s not a thorough representation of the bigger picture.
While LangChain can be used to overcome LLM context limitations by extending the ability of conversations to occur over multiple interactions, that was not its core intention. Instead, the core intention was to make it possible to do precisely what GPT Plugins make possible - the ability to introduce real-time information in responses that LLMicrosoft were incapable of doing because of their fixed training termination date. This is known as LangChain Agent.
To be very clear (super duper necessary clarification) - I used the term “multiple interactions” in the previous paragraph with great precision. By definition, I mean -
Chain points that either augment the inputs as outputs to get more inputs that may be needed for the remaining steps in the inferencing process.
Hold that thought.
As summed up by the creators of LangChain, they make it clear that “retrieval” is the core objective because …
… [there is] a fundamental limitation of ChatGPT - it doesn’t know about YOUR data, and most people would find it more useful if it did.
They go on to describe the core elements of retrieval augmentation that can only occur in a chaining of steps. Retrieval augmentation is at the core of what they do well.
The main way of doing this is through a process commonly referred to as “Retrieval Augmented Generation”. In this process, rather than just passing a user question directly to a language model, the system “retrieves” any documents that could be relevant in answering the question, and then passes those documents (along with the original question) to the language model for a “generation” step.
It’s probably wise to use these terms and concepts with discrete articulation because they are new and not fully understood ideas by the general development community. Bear with me on this journey as we allow the creators of LangChain to set the table concerning vectors and the retrieval objective:
The main way most people - including us at LangChain - have been doing retrieval is by using semantic search. In this process, a numerical vector (an embedding) is calculated for all documents, and those vectors are then stored in a vector database (a database optimized for storing and querying vectors). Incoming queries are then vectorized as well, and the documents retrieved are those who are closest to the query in embedding space.
This helps us all understand vectors, which are not so easily managed in something like Airtable; you need a database that knows how to quickly perform a semantic search across a vector space of tuples comprised of 1532-by-1 arrays if using the Ada model. At scale, databases explicitly designed for vector search (like Pinecone and Weaviate) are recommended, although, in this post I make it clear that there isn’t that much magic involved. You are free to plow these vectors into a table; searching them fast is another ball of worms.
In mid-March 2023 (which roughly co-insides with the announcement of GPT Plugins), the creators of LangChain realized they had misled us;
We realized we made a mistake - by making our abstractions centered around VectorDBQA we were limiting to use of our chains, making them hard to use (1) for users who wanted to experiment with other retrieval methods, (2) for users who created a retriever outside the LangChain ecosystem.
Two specific problems best summarize this:
- There a lot of different variations in how you do this retrieval step. People want to do things beyond semantic search. Ergo, vectors and Pinecone may be irrelevant in many cases.
- Developers may want to construct retrievers outside of LangChain; example: ChatGPT Retrieval Plugin.
So yeah, while LangChain was seen as this darling tech that seamlessly integrated vectors for semantic search, it was the tip of the iceberg in the grand scheme of business requirements that wasn’t clear to most developers until OpenAI pushed the GPT plugin baby out of the nest.
This concept was clear to me a year ago when I created my first fully chained GPT inference that also made calls to corporate databases on the journey from query to inference. My client has been enjoying the benefits of what has ostensibly been bundled and marketed as a GPT Plugin, for 12 months now. This example also underscores the likelihood that GPT Plugins may not be as advantageous as everyone thinks. Certainly, they will be used a bunch, but for what purposes [exactly] is explored in this hastily banged out electronic missive (Cannoli or Cake?).