Data Interpretation with AI

Note to @ScottWorld - still looking for that AI category to post these AI topics into. :wink:

On the crappy forum, a really interesting question was posed by @omf. I had penned an excellent assessment of the pitfalls of trying to marry survey data with a natural language interface only to have Khoros blow it away. Apparently, it doesn’t save drafts like every other online content authoring application.

The original question piqued my interest:

I’m trying to use ChatGPT to analyze written data from airtable surveys, but ChatGPT is unable to access the information from a csv file. Is it possible to connect ChatGPT to Airtable in order to give it access to specific tables? The idea would be to be able to ask ChatGPT questions such as “What are the top 3 things guests enjoy about this trip?” based on the survey responses.

As you know, I deeply disdain writing anything in Khoros, so I’ll expand on my recent thoughts here. I’ll start by recapping points about this response thread.

Me: I would love to see how [exactly] GPT can give you this outcome.

The Author: It’s is easy to do this by first creating a summary table that can aggregate and consolidate the survey responses. This can be accomplished by utilizing rollup fields, which allow us to group and merge data from multiple records from the survey response table.

At the outset, no one wants to reshape their data to chat with it. Adding data complexity places a tax on AI adoption, and few have the budget for new taxes.

The Author: Once the survey responses have been merged and consolidated in the summary table, a formula field can be added to concatenate a prompt to the consolidated responses. This will enhance the clarity and cohesiveness of the final message that is to be sent to ChatGPT.

Aggregations is the Right Approach

But it has some gotcha’s.

The complexity in this approach won’t scale. Imagine you have a survey with 50 questions. Even if you could distill this data through a Goldbergian process, the results would be too big to fit into a GPT prompt required of a few-shot learner.

@ScottWorld has the right idea - aggregations are required before GPT can help us with a natural language interface. However, while pushing the aggregation off-Airtable into Make may work, sending the summary data back will add latency, and in chat interfaces, latency is to be avoided. Plus, you must send ALL the data to Make to create the aggregation. This seems nutty to me.

One might then assume that the glue factory is where the aggregation AND inference should be performed, and now you have business logic moving to the Make environment. This is not ideal, but it will certainly work if the definition of “work” is slow.

AI Interface Principles

In my view, there are four AI interface principles that apply to this user’s requirement.

  1. Aggregations should be dynamic. They should seamlessly happen regardless of the shape of the data or the number of fields.
  2. AI interfaces are generally not singular in dimension (i.e., send some data in a prompt and get an answer through a collection API). They require steps including, but not limited to, giving the model time to compute the answer.
  3. Chat conversations should have at least a short-term memory. If you ask a question, the next question should know about the previous question(s) and utilize the previous answer(s) if relevant. Users will find your UX annoying if they must restate anything.
  4. Prompts engineered from aggregations must intelligently scale. AI should be used to build the prompt where the variety of classes and/or the number of fields is high.

To principle #4, imagine a question like this in a sea of database fields where maintenance is but one of 45 fields.

What are the top three preferences for maintenance?

A generalized aggregation passed into the GPT prompt would include 44 irrelevant fields. AI [embeddings] is how you avoid this. This speaks to principle #2 as well. The interface needs to cut away all the aggregation data that will serve only to potentially confuse the model, or overwhelm the token limitations.

The complexity grows when you have two fields related to the same topic. Imagine a survey where we ask:

  • Select your preferred maintenance provider for CyberLandr.
  • Select the maintenance provider closest to you.

How will the interface know which metric the query is intended for? The only answer is an inference algorithm, which is made possible with embeddings.

Dealing with “Other” Comments

In surveys, customers often express written sentiments. How will your AI interface deal with these unstructured values? Too deep to cover here, but my architectural approach embraces this challenge. Using NLP, users can glean data insights from complex written comments, even across thousands of responses.

AI: It’s Easy!

I love it when purveyors of AI say, “It’s easy! Just use this!” In most cases, it’s not. If it’s that easy and effortless, we should see a live demonstration, eh? :wink:

Dynamic Aggregations

Perhaps you’ve not seen a dynamic JSON aggregation that is blistering fast. Here’s one.

//
// dynamic aggregator
//

// create the intermediate data frame
const intermediate = surveyHash.reduce((acc, curr) => {
  Object.entries(curr).forEach(([k,v]) => {
    if (k !== "id") {
      if (!acc[k]) {
        acc[k] = [];
      }
      acc[k].push(v);
    }
  });
  return acc;
}, {});

const final = Object.entries(intermediate).reduce((acc, [k, v]) => {
  const accumulated = v.reduce((inner_acc, inner_val) => {
    if (inner_val === null) {
      inner_val = "empty";
    }
    if (!inner_acc[inner_val]) {
      inner_acc[inner_val] = 0;
    }
    inner_acc[inner_val] += 1;
    return inner_acc;
  }, {});    
  acc.push({
    [k]: accumulated,
  });
  return acc;
}, []);

This process first produces a data frame much like a Pandas data frame. The second phase shapes the aggregation, as you might expect. Weigh this against a massive collection of new roll-up fields and formulas or a Make recipe. Airtable just 22 lines of code and able to aggregate a thousand responses in a few seconds, there is no reason to use more complex or latent approaches.

You’ll find a little more information and some screen shots here.

2 Likes

@bfrench Just amazed how you put your puzzle together. My guess is you are going to use GPT for customization tool for buyers to customize their CyberLandr.

Possibly, but CyberLander is intentionally designed to have few options.

We have big plans for AGI in the total customer experience, though. If you need to know how long your water will last using just your voice, AGI brings this to reality.

1 Like

Makes sense. Makes me think at some point soon somebody will give Airtable run for money. If the AI models are capable enough, why would I need to know how manually sort things or create columns of data with formulas and link it through automations or Make/Zapier. It would feel like 1990’s. I don’t sort incoming email in Folders or adding Tags anymore, haven’t done that for ages, why would I, unless I have some kind of sorting-obsession.

Well, you’re right - this is the knee of the curve. The inflection point where the very essence of data becomes this idea that managing it could be replaced over time.

Hmmm… perhaps! Could be a good idea!