Suddenly, Rows Can Now Have Meaning

We all know about ChatGPT. It’s profoundly expanding the possibility of creating some very smart systems. Pervasive and near-free access to LLMs (large language models) inch us closer to AGI (artificial general intelligence), which can be applied to apps and data in several ways.

Airtable, of course, is readily able to enjoy the benefits that services such as OpenAI provide to developers. Integrating the power of LLMs for text and code completion is almost trivial. These are magical capabilities, but they aren’t the only capabilities.

Most AI experts and analysts agree - AI will become pervasive in all solutions, and the ones that create the greatest customer value will blend application data, user context, and LLMs to create extremely relevant and powerful outcomes.

Rows That Have Meaning

Airtable search is not a pleasant experience at all. The findability of discrete records in a table is terrible. Locating key data across multiple tables and bases is almost impossible. I have explored this challenge with several clients, and I’m kinda thrilled to say all of that work is now obsolete. This paper needs to be burned.

Imagine if we could quickly capture the meaning of a row.

LLM embeddings make this possible. Embeddings are vectors, a fancy term for complex numeric tuples or arrays. It’s possible to get a vector for an Airtable record. The vector is a formidable representation of meaning because it is derived by associating your data with all the vectors in an LLM.

By building a simple string of key field values in a table row and using that to determine its word vectors within a model such as OpenAI’s text-embedding-ada-002 LLM, you will know the meaning of that row. But to make this approach into a solution, you need a few more pieces of machinery; a vector datastore.

Vector databases (like Pinecone and Weaviate) have been around for a while. Still, you’ll soon hear a lot more about them because they are necessary to store the natural language essence of any information.

Opinion: If Airtable were on its game, it would already have a vector data store baked into its architecture, but sadly, I predict it will try to solve the search and findability crisis with a Lucene-like architecture that I said should now be burned.

I’m using Airtable data, vectors, and LLMs like this. It’s a bold and profoundly powerful experience when users can employ natural language to locate their own information. Or to discover related information without describing relationships through linked records.

4 Likes

Is this a solution that you have built and is working right now today? Finding information with natural language across multiple tables? :exploding_head:

The concept of vector databases has also just blow my mind…

Yeah, in production since late 2022.

Yeah, if two records from two different tables have similar vectors, they will score well when you vectorize a common natural language reference and perform a vector query. The vector store doesn’t care that two records are from two different tables or even different tables in different bases. Given a vector query like "show me records about client (x), It has pointers to client (x) and meta values that identify each source with base, table, and record IDs.

You sound like you’ve already applied this into your bases, do you have some concrete example where search was used, and was more efficient? I’m curious!

Yes. Just three bases so far.

No. Not without breaching NDAs. But, let’s be clear, you don’t have to be good at search to be more efficient than Airtable’s current search feature. Here’s as concrete as I can get.

Imagine a record contains four fields with these exact terms:

  1. I-15 @ Sahara (option list)
  2. daily count (option list)
  3. northbound (option list)
  4. 2023-03-13 (date selector)

And my search query is:

Show me today’s volume for Interstate 15 north.

It will link me to that exact record. How does it do this? Simple - with word vectors. The record itself was automatically vectorized in the vector database when it was last changed. The query was vectorized when it was asked. The query’s vectors were compared with all the vectors in the vector database, and the top scores were presented to me much the way Google search does. The record I sought had the highest score, and the inference was 86% confidence.

This approach sidesteps full-text and wildcard searches by adding inferred similarities that are impossible with inverted index architectures. This is why Google’s search business is at risk and why embeddings will likely put a big dent in findability across many apps.

2 Likes

Makes me feel like AI will change the web interface too. We don’t need web-pages for people to read and search. They have been designed for people to look at them. And we don’t need directories with search results like Google. Alta Vista baby. We just want the one answer (okay, maybe 3 top choices). You are probably very much right with today’s advertisement based search business advertisement based model will be gone and this is giving Google shivers.

This is correct. We live in a world that started with paper (~5,000BC), and despite many attempts to stop killing trees to make paper, we still consume more paper every year. But we are a digital species now, and the web reflects our apparent affinity for reading words.

Much like the web, roadways have been designed for humans with eyes. We use signage, curbs, and lane indicators to navigate much the way we use words to navigate the web. When cars reach full self-driving capability, will we rip up the roads? Um, no.

When AGI can extract almost everything we need from the web, will we dismantle it?