We’re probably weeks away from some clever GPT-4 approach to ending the deduping madness that exists in all database systems worldwide.
Using this basic approach, I’d be willing to bet a nice sub sandwich that spotting records that are very close to duplicates is possible with very elegant code that doesn’t need to know anything about the fields or their values.
Fuzzy logic is another way of saying “inference”.
The beauty of embedding vectors is they represent a search index for things of likeness.