Note to @ScottWorld - still looking for that AI category to post these AI topics into.
On the crappy forum, a really interesting question was posed by @omf. I had penned an excellent assessment of the pitfalls of trying to marry survey data with a natural language interface only to have Khoros blow it away. Apparently, it doesnât save drafts like every other online content authoring application.
The original question piqued my interest:
Iâm trying to use ChatGPT to analyze written data from airtable surveys, but ChatGPT is unable to access the information from a csv file. Is it possible to connect ChatGPT to Airtable in order to give it access to specific tables? The idea would be to be able to ask ChatGPT questions such as âWhat are the top 3 things guests enjoy about this trip?â based on the survey responses.
As you know, I deeply disdain writing anything in Khoros, so Iâll expand on my recent thoughts here. Iâll start by recapping points about this response thread.
Me: I would love to see how [exactly] GPT can give you this outcome.
The Author: Itâs is easy to do this by first creating a summary table that can aggregate and consolidate the survey responses. This can be accomplished by utilizing rollup fields, which allow us to group and merge data from multiple records from the survey response table.
At the outset, no one wants to reshape their data to chat with it. Adding data complexity places a tax on AI adoption, and few have the budget for new taxes.
The Author: Once the survey responses have been merged and consolidated in the summary table, a formula field can be added to concatenate a prompt to the consolidated responses. This will enhance the clarity and cohesiveness of the final message that is to be sent to ChatGPT.
Aggregations is the Right Approach
But it has some gotchaâs.
The complexity in this approach wonât scale. Imagine you have a survey with 50 questions. Even if you could distill this data through a Goldbergian process, the results would be too big to fit into a GPT prompt required of a few-shot learner.
@ScottWorld has the right idea - aggregations are required before GPT can help us with a natural language interface. However, while pushing the aggregation off-Airtable into Make may work, sending the summary data back will add latency, and in chat interfaces, latency is to be avoided. Plus, you must send ALL the data to Make to create the aggregation. This seems nutty to me.
One might then assume that the glue factory is where the aggregation AND inference should be performed, and now you have business logic moving to the Make environment. This is not ideal, but it will certainly work if the definition of âworkâ is slow.
AI Interface Principles
In my view, there are four AI interface principles that apply to this userâs requirement.
- Aggregations should be dynamic. They should seamlessly happen regardless of the shape of the data or the number of fields.
- AI interfaces are generally not singular in dimension (i.e., send some data in a prompt and get an answer through a collection API). They require steps including, but not limited to, giving the model time to compute the answer.
- Chat conversations should have at least a short-term memory. If you ask a question, the next question should know about the previous question(s) and utilize the previous answer(s) if relevant. Users will find your UX annoying if they must restate anything.
- Prompts engineered from aggregations must intelligently scale. AI should be used to build the prompt where the variety of classes and/or the number of fields is high.
To principle #4, imagine a question like this in a sea of database fields where maintenance is but one of 45 fields.
What are the top three preferences for maintenance?
A generalized aggregation passed into the GPT prompt would include 44 irrelevant fields. AI [embeddings] is how you avoid this. This speaks to principle #2 as well. The interface needs to cut away all the aggregation data that will serve only to potentially confuse the model, or overwhelm the token limitations.
The complexity grows when you have two fields related to the same topic. Imagine a survey where we ask:
- Select your preferred maintenance provider for CyberLandr.
- Select the maintenance provider closest to you.
How will the interface know which metric the query is intended for? The only answer is an inference algorithm, which is made possible with embeddings.
Dealing with âOtherâ Comments
In surveys, customers often express written sentiments. How will your AI interface deal with these unstructured values? Too deep to cover here, but my architectural approach embraces this challenge. Using NLP, users can glean data insights from complex written comments, even across thousands of responses.
AI: Itâs Easy!
I love it when purveyors of AI say, âItâs easy! Just use this!â In most cases, itâs not. If itâs that easy and effortless, we should see a live demonstration, eh?
Dynamic Aggregations
Perhaps youâve not seen a dynamic JSON aggregation that is blistering fast. Hereâs one.
//
// dynamic aggregator
//
// create the intermediate data frame
const intermediate = surveyHash.reduce((acc, curr) => {
Object.entries(curr).forEach(([k,v]) => {
if (k !== "id") {
if (!acc[k]) {
acc[k] = [];
}
acc[k].push(v);
}
});
return acc;
}, {});
const final = Object.entries(intermediate).reduce((acc, [k, v]) => {
const accumulated = v.reduce((inner_acc, inner_val) => {
if (inner_val === null) {
inner_val = "empty";
}
if (!inner_acc[inner_val]) {
inner_acc[inner_val] = 0;
}
inner_acc[inner_val] += 1;
return inner_acc;
}, {});
acc.push({
[k]: accumulated,
});
return acc;
}, []);
This process first produces a data frame much like a Pandas data frame. The second phase shapes the aggregation, as you might expect. Weigh this against a massive collection of new roll-up fields and formulas or a Make recipe. Airtable just 22 lines of code and able to aggregate a thousand responses in a few seconds, there is no reason to use more complex or latent approaches.
Youâll find a little more information and some screen shots here.