Suddenly, Rows Can Now Have Meaning

We all know about ChatGPT. It’s profoundly expanding the possibility of creating some very smart systems. Pervasive and near-free access to LLMs (large language models) inch us closer to AGI (artificial general intelligence), which can be applied to apps and data in several ways.

Airtable, of course, is readily able to enjoy the benefits that services such as OpenAI provide to developers. Integrating the power of LLMs for text and code completion is almost trivial. These are magical capabilities, but they aren’t the only capabilities.

Most AI experts and analysts agree - AI will become pervasive in all solutions, and the ones that create the greatest customer value will blend application data, user context, and LLMs to create extremely relevant and powerful outcomes.

Rows That Have Meaning

Airtable search is not a pleasant experience at all. The findability of discrete records in a table is terrible. Locating key data across multiple tables and bases is almost impossible. I have explored this challenge with several clients, and I’m kinda thrilled to say all of that work is now obsolete. This paper needs to be burned.

Imagine if we could quickly capture the meaning of a row.

LLM embeddings make this possible. Embeddings are vectors, a fancy term for complex numeric tuples or arrays. It’s possible to get a vector for an Airtable record. The vector is a formidable representation of meaning because it is derived by associating your data with all the vectors in an LLM.

By building a simple string of key field values in a table row and using that to determine its word vectors within a model such as OpenAI’s text-embedding-ada-002 LLM, you will know the meaning of that row. But to make this approach into a solution, you need a few more pieces of machinery; a vector datastore.

Vector databases (like Pinecone and Weaviate) have been around for a while. Still, you’ll soon hear a lot more about them because they are necessary to store the natural language essence of any information.

Opinion: If Airtable were on its game, it would already have a vector data store baked into its architecture, but sadly, I predict it will try to solve the search and findability crisis with a Lucene-like architecture that I said should now be burned.

I’m using Airtable data, vectors, and LLMs like this. It’s a bold and profoundly powerful experience when users can employ natural language to locate their own information. Or to discover related information without describing relationships through linked records.

4 Likes

Is this a solution that you have built and is working right now today? Finding information with natural language across multiple tables? :exploding_head:

The concept of vector databases has also just blow my mind…

Yeah, in production since late 2022.

Yeah, if two records from two different tables have similar vectors, they will score well when you vectorize a common natural language reference and perform a vector query. The vector store doesn’t care that two records are from two different tables or even different tables in different bases. Given a vector query like "show me records about client (x), It has pointers to client (x) and meta values that identify each source with base, table, and record IDs.

1 Like

You sound like you’ve already applied this into your bases, do you have some concrete example where search was used, and was more efficient? I’m curious!

Yes. Just three bases so far.

No. Not without breaching NDAs. But, let’s be clear, you don’t have to be good at search to be more efficient than Airtable’s current search feature. Here’s as concrete as I can get.

Imagine a record contains four fields with these exact terms:

  1. I-15 @ Sahara (option list)
  2. daily count (option list)
  3. northbound (option list)
  4. 2023-03-13 (date selector)

And my search query is:

Show me today’s volume for Interstate 15 north.

It will link me to that exact record. How does it do this? Simple - with word vectors. The record itself was automatically vectorized in the vector database when it was last changed. The query was vectorized when it was asked. The query’s vectors were compared with all the vectors in the vector database, and the top scores were presented to me much the way Google search does. The record I sought had the highest score, and the inference was 86% confidence.

This approach sidesteps full-text and wildcard searches by adding inferred similarities that are impossible with inverted index architectures. This is why Google’s search business is at risk and why embeddings will likely put a big dent in findability across many apps.

3 Likes

Makes me feel like AI will change the web interface too. We don’t need web-pages for people to read and search. They have been designed for people to look at them. And we don’t need directories with search results like Google. Alta Vista baby. We just want the one answer (okay, maybe 3 top choices). You are probably very much right with today’s advertisement based search business advertisement based model will be gone and this is giving Google shivers.

1 Like

This is correct. We live in a world that started with paper (~5,000BC), and despite many attempts to stop killing trees to make paper, we still consume more paper every year. But we are a digital species now, and the web reflects our apparent affinity for reading words.

Much like the web, roadways have been designed for humans with eyes. We use signage, curbs, and lane indicators to navigate much the way we use words to navigate the web. When cars reach full self-driving capability, will we rip up the roads? Um, no.

When AGI can extract almost everything we need from the web, will we dismantle it?

A few questions from afar, and by that, I mean the mother of another community. :wink:

I wonder if it’ll be possible to create vector representations within Airtable Tables itself using a script.

Absolutely. It’s just an API call to an embedding service using an LLM.

… vector DBs are pretty complex.

But not as complex as you might think. You can actually use a one-line javascript function to compare two vectors for similarities. This is what a vector DB does at scale. Read this.

… if a script can create vector representations of data, I wonder if I could set GPT to search more accurately within Airtable.

You can. Should you? That’s another debate.

In this sneak peak to a future post to my paid subscribers, I explore the future of tags as they succumb to the pressure by all vendors to abandon legacy search index architectures and lean into not only semantic search based on embeddings but reasoned search based on field-level attributes.

Yet another interesting comment from afar, this time from afar.

My thoughts (seem to be inline with the creators thoughts):

  1. This is a good learning moment.
  2. It costs about a half-a-buck per search; an embedding architecture would be about 1/60th of a penny.
  3. It is slow; search should be Doherty threshold or less.
  4. It replicates a lot of information in adjacent fields.
  5. It’s an impressive array of steps; a Make posterchild for complex solutions.
  6. Proof that if it can be done, it should not necessarily be done.
  7. Embeddings; 'nuf said.

I like this concept. In my current tests with Airtable and Qdrant using Flowise to upsert information, I am running into the question about updating/replacing existing vectors, if an airtable dataset is updates. So far, I can only create new vector. How do you solve that @bfrench ?

A vector based on data that no longer exists is useless. You must create new vectors for new records and old records than have changed where the fields that have changed are part of the vector to that information.

It’s not a matter of solving anything. It’s simply a matter of sustaining accurate vectors in a constantly shifting data set.

So that means you would update/upsert the vector or delete the old one while creating a new one? The question is: How can the system identify the vector, if the vector IDs are not available in the dataset?

Leaving the old vector would lead to wrong information.

You don’t need a vector ID if it is dependent on a database row. You need a new vector stored into the row when the dependent fields change.

If you are not storing the vectors in Airtable, that’s an entirely different approach.

No, I am not storing vectors in Airtable. Airtable is just my source of data, which is then stored in a vector database (I am using qdrant.com).

:point_right:That is exactly what I’m doing. When an airtable row is updated, a new vector is created. However, since the old vector is still in qdrant at this time, there are now 2 versions of the airtable data available. An old one (which is not valid anymore) and a new one (which was just created after the airtable update).

My question still remains: how can I (automatically) “delete” the old vector from my qdrant database when a new one is created?

Example:

Airtable data last week:

  • car price was 20k USD
  • stored in vector 123

Airtable data this week:

*car price went down to 18k USD
*stored in a new vector 234

When I now query the car price from the database, the answer is not clear, or is it? Or does a similarity search in the vectors always use the latest information?

Hope that makes it a bit clearer.
Thanks again for helping.

I was unaware that you were storing vectors outside of Airtable, but had I been, I would have said this:

To maintain parity between your data updates and your vector store, you have two options:

  1. Update the vector store with your Airtable record data;
  2. Update your Airtable records with the vector database IDs.

This is usual and customary software architecture to sustain the functions to keep either data set clean when data changes occur.