Hi,
I am trying to find a way to identify vendors based on unique words/phrases on receipts found via ORCTEXT. Essentially, finding a store number, address, phone number, etc; anything that is unique to that vendor at that location. As ORCTEXT isn’t necessarily accurate, I’m having troubles as there’s receipts, for example, where a period is used instead of a coma in an address so INTERSECT doesn’t return it as a common value.
I tried adding sets of 5 intersections of random receipts from that vendor, and then using intersect between those groups. I thought this would remove that problem as it lends likelihood to two reciepts with the same ORCTEXT error being found, and then both versions being added as key phrases.
The most straight forward is taking all phrases found on receipts from that vendor, and then subtracting all phrases found on receipts from other vendors. This actually lets AppSheet recognize the vendor fairly well, but it includes alot of things I don’t want that intersect would remove (prices, dates, barcodes, etc). And it’s likely there would be overlap at somepoint if leaving those so not overly robust.
I originally used single words rather than phrases, but addresses, store numbers, etc work better with phrases, though I could do a combination.
If you have any thoughts, I’d be glad to hear them!