What benefits does a ref column bring?

Colleagues,
I’m working on performance. It feels like a literal game of chess, which is probably why I’m so passionate about AppSheet.

To the point: while thinking of ways to improve sync performance, I began to reflect on what the Ref column is really for—what benefit does it provide?

The Ref column automatically creates a virtual field in the parent table called [Related_Children]. This virtual column is recalculated during every sync. This isn’t a big deal in small apps with security filters—but what happens when the number of records explodes, and on top of that, many tables have tons of these cross-references?

Personally, I don’t use system-generated views because if I change the app structure or regenerate columns, those views “reset,” which forces me to redo them. So I usually duplicate them and work on my own views, which remain unchanged. Along those lines, I also don’t use “inline” views. Normally, I keep the “Related” fields hidden (I still use them when needed, but many times they’re there and I never touch them).

If I mark a Ref field as Is a part of, I create a deeper dependency that enables cascading deletions (i.e. deleting a parent deletes all its children)—but that doesn’t apply to updates (updating a parent doesn’t update its children). Since those updates don’t cascade, there are two paths: either build a virtual column or create the logic using bots—which I find more appropriate.

At this point I thought: if I need a bot to update all child records, I could just as well use that same logic for deletions. It may be a bit more complex (just a little), but it could lead to 100% optimized sync performance.

Let me know your thoughts—maybe I’m overlooking something important that could make all the difference.

2 Likes

A REF column is analogous to a Foreign Key in the database world - basically it is a pointer to a row in another table allowing you to easily access any OTHER column on that row.

Correct! And this is actually advantageous IF you utilize the [Related_Children] column. I believe there is some optimizations in reading the associated tables.

If you do NOT use the [Related…] columns you can prevent them from being created. Instead of defining the column as REF, define it as ENUM with a Base Type of REF. You can still get REF column behavior but the associated {Related…] column will not be created in referenced table.

I don’t understand this one. I use the system generated views and only add other views when needed. I am constantly adding new columns and regenerating the tables. My views are not impacted. If you are finding that your views need re-created after a regeneration of the table, then I would raise this to AppSheet Support. That should not be happening.

Correct. This is by design. Typically the Children update the Parent. The whole idea is to gather all the common data amongst the rows and place that into a Parent table row. The Children then hold details that, collectively, are unique in the Children table. There are occasions when it is helpful to replicate Parent info on the Children rows but then those values should be grabbed using the REF column to the Parent so any changes made are automatically updated on all the Children rows.

Of course, there are use cases that transcend these standard implementations that add complexity. For example, there are times when details needs to be remembered at time of entry, say the cost of items at the time an Order is placed. For that, values need to stored in their own columns and not retrieved through dereference - though they can be initialized by a dereference.

I’d be curious to know what updates you need pushed down to Children rows. Maybe a design change can solve the problem?

If you do not need the [Related…] column, you could redefine it as described above to an ENUM/REF. This would prevent the “Is part of” setting being available so Yes you would. need a Bot to delete all Children rows. And yes this would save Sync performance on that {Related…] column.

I hope this helps!

2 Likes

Thanks for the response—super clear and helpful. What led me to consider switching from REF to ENUM with a base type of REF wasn’t just avoiding the [Related_...] column, but also a potential improvement in sync times, especially in apps with many inter-table relationships.

I understand that the [Related_Children] column can be useful in some cases, but in my app I don’t use it directly, and each of those extra columns impacts sync performance, since AppSheet has to resolve and calculate relationships that I don’t actually need.

That’s why, in heavily queried tables or those with many children, I’m leaning toward using ENUM base REF—keeping the reference behavior when I need it (like dereferencing), while avoiding the unnecessary overhead of [Related...].

I also have quite a bit of logic automated through bots, which lets me manually replicate the “IsPartOf” behavior (like cascading deletes), without paying the cost of having all that preloaded into memory or recalculated on every sync.

For the time being, something I’m testing is that in some of my modules that have read-only tables, instead of using the same REF configuration they have in the app where changes are made, I use enum base ref, and this frees up the read-only tables from generating the relation again.

Regarding cascading deletion, I don’t always allow it (sometimes I do, and I’ve been using ref is a part of), and it works. When I update the children, the parents aren’t updated unless there’s a bot that does it (I’ve always set it up that way).

It might seem like a micro-optimization, but in a large app, every detail counts. Have you noticed any measurable difference in sync performance on large apps when keeping [Related...] active? I’m curious whether the AppSheet engine optimizes them when unused, or if it always loads them anyway.

“I’d be curious to know what updates you need pushed down to Children rows. Maybe a design change can solve the problem?”

Regarding your query, for example, if I have a code in the child rows that depends on the code of the parent rows since it’s a concatenation (parent = 001, children 001-001, 001-002, 001-003), the code should be updated because there’s a ‘parent-child is a part of’ relationship. However, this relationship seems to be configured only to act on deletions (a case that isn’t too complex to handle manually with bots to improve future synchronization). But since a bot needs to be made for updating, I would also make it for deleting, and I would save myself the excessive calculation of some virtual columns (I say this because they really become bottlenecks).

I’ve tested this in some of my apps, and what I’m saying works. In fact, I’ve reserved virtual columns only for some cases, such as columns that display inline actions. Now, the synchronization seems to be working better, and also, the latest performance profile report is from a much earlier version, and since there are no more virtual columns, the performance profile isn’t providing data.

I’ll keep testing and let you know.

@Gustavo_Eduardo ,

Just to add , you may wish to take a look at the following detailed post that discusses app performance and reverse reference columns.

There Praveen ( @pravse ) who was CEO of AppSheet before AppSheet became part of Google Workspace, explains that system generated reverse reference columns seldom cause performance issues.

Solved: Improving performance by getting rid of unnecessar… - Google Cloud Community

Here are the excerpt from that post where Praveen describes it in detail.

The above is one of the best discussions post about the sync performance in AppSheet community and I believe one can bookmark it for frequent reference.

In my own experience also, system generated reverse reference rarely cause performance issues, unless there are several cross table cycles that involve computation based on system generated reverse columns spanning multiple tables.

This means one computes say total inventory in a VC called say [Total__inventory] by an expression something like [Related Inventorys][Available Inventory] in a parent table and this [Total_inventory] column in turn is used in another table’s VC for some heavy multirow computation again, then likely the performance can degrade. But restricting to just these reverse reference columns and some associated computation in the same table itself is less likely to degrade the performance. For example an expression like [Related Inventorys][Available Inventory] in itself is less likely to cause performance issues.

2 Likes

Thanks so much for your input, Suvrutt — it’s always great to have your insight, and especially helpful to hear the deeper perspective on how sync performance is actually handled under the hood.

I completely agree with the point you made: the Performance Profiler can sometimes mislead us into thinking virtual columns are the bottleneck, when really it’s just how the fetch sequence and compute time are reported. And yes, REF_ROWS() is generally well-optimized and efficiently handled server-side — which makes it a solid default tool in many apps.

That said, I’d love to add a small note from a design and structure perspective, especially when working with large-scale or distributed apps, where the priority isn’t only performance — it’s control, predictability, and maintainability as well.

Ref vs Enum base Ref – A structural comparison



Feature



Ref (standard)



Enum base type Ref



Generates [Related…] virtual column



Yes



No



Auto-creates inline views



Yes



No



Auto-generates actions



Yes



No



Manual control over side-effects



Limited



Full



Requires full Ref setup (Key/Label)



Yes



No



Potential sync overhead in scale



Possible



Reduced



Supports cascading deletes (IsPartOf)



Yes



No (requires bots/automation)

So why choose Enum base Ref?

In cases where:

  • Inline views aren’t needed
  • Auto-actions aren’t used
  • You don’t want to trigger auto-generated reverse references
  • You’re optimizing for sync clarity, structural simplicity, and performance

…using Enum base Ref allows you to:

  • Prevent automatic structures you don’t need
  • Minimize load during sync
  • Keep a cleaner, more intentional schema

In summary

It’s not that Ref is “bad” — rather, it’s about when and why you choose one over the other.

This approach is not about replacing Ref, but rather about offering a more controlled alternative in contexts where auto-generated structures create overhead or add unintended complexity. Especially in performance-critical apps, this can make a big difference.

But of course, there are absolutely cases where the automatic behavior of Ref is a real advantage — and I’d love to hear about situations where that has helped simplify your workflow or improve performance. Always happy to learn from your experience!

Would love to hear your thoughts, and thank you again for always helping push the conversation forward!

1 Like

Hi @Gustavo_Eduardo ,

I think @WillowMobileSys has responded in detail to major points on the topic.

Also your main question/topic in the first post in this thread was about system generated references and whether they cause issues with higher number of records. I believe I was responding to that.

I will let other community members opine on your recent thread about pure reference vs Enum with base as reference.

1 Like

Hi all, and thanks so much for all the thoughtful responses — @WillowMobileSys and @Suvrutt_Gurjar , I really appreciate the time you both put into this thread.

I wanted to clarify just a little where my line of thinking diverges — not to disagree, but to frame things from a slightly different lens:


:magnifying_glass_tilted_left: Functional behavior vs structural optimization

You’re both absolutely right that from a functional point of view, using a Ref column gives you:

  • The [Related...] virtual column

  • Auto-generated inline views

  • Action behaviors

  • Cascading deletes (via IsPartOf)

And if you’re using those things, they are great helpers.

But what I’m exploring is what happens in large apps when you don’t need them.
In those cases, having the system automatically create all those virtual elements — views, actions, relationships — ends up adding structure and computation that you’re not using.

So instead of removing them manually, I started using Enum base Ref — which gives me:

  • The same dereference behavior ([Parent].[Column])

  • Manual control over child fetching (FILTER() or REF_ROWS())

  • No auto-generated structures I don’t need

And most importantly: a cleaner sync, fewer auto-calculated VCs, and nothing being “watched” unless I explicitly design it that way.


:bullseye: Not just about what happens — but when it happens

When we say “performance optimization”, most people think of formula runtime or sync duration.
But there’s a more subtle kind: structural optimization — reducing the automatic architecture that scales poorly over time in large apps.

In other words, I’m not saying Ref is bad — I use it when I need the full toolkit it comes with.
But if all I need is a clean relationship — no virtuals, no inline views, no extra actions — I prefer the minimalism of Enum base Ref.


? Observation, not resistance

I’m not trying to resist AppSheet’s default behavior — I’m just being deliberate about when I lean into it and when I opt out.

My goal is simply: build only what I use, and use only what I build.

Thanks again for letting me add this angle. I really value the discussion, and I’m always open to seeing when the tradeoff isn’t worth it — or when I’ve missed something entirely. :folded_hands:

By the way, anyone who can add something to this topic will always have their input read and clearly valued. In any case, I thank you all again. Please don’t take my words as if I’m contradicting you; that’s not my intention. My intention is to learn and explore. Greetings, have a good day.

2 Likes

Your instincts are sharp, especially regarding sync performance and how Refs and system-generated virtual columns can bloat recalculation times as your dataset scales.

Let’s break it down.

  1. Ref Columns and [Related_Children] Virtual Columns

Ref Columns:

Great for modeling relationships.

Enable dereferencing ([Parent].[Name]) and make building UIs more intuitive.

Automatically generate [Related_Children] via LIST(SELECT(…)), which recalculates every sync.

Performance Pitfall:

That recalculated virtual column can be brutal at scale. If you have 10,000 parent records and each recalculates a related list of 50 children—do the math. And this happens on every sync, regardless of whether you use that related list or not.

What You’re Doing (hiding unused [Related_*] cols) = Smart. If you don’t need them, hide them or remove them entirely to reduce compute load.

  1. “Is a part of” and Cascading Deletes

Is a part of:

Helpful for edit contexts and allows cascading deletes.

But yes, it doesn’t cascade updates, which makes it only a partial relationship manager.

Your Bot-Based Strategy:

By using bots for both cascading deletions and cascading updates, you’re essentially removing the need for Is a part of, and even the Ref in many cases.

This lets you:

Avoid auto-virtual columns like [Related_*].

Completely control what gets triggered and when.

Possibly eliminate expensive recalculations in favor of on-demand logic.

It’s not just good—it’s elite-level optimization.

  1. Ref vs. Manual Management

Let’s compare:

TL;DR: If you’re not using [Related_*] or dereferencing in views, you’re better off skipping Refs in high-scale scenarios.

  1. My Take

You’re absolutely not overlooking anything important. You’re refining your architecture the way a backend engineer would in traditional development:

Reduce real-time computed fields.

Shift to event-based processing (bots).

Keep UI logic separate from data logic.

Build resilience against automatic resets (e.g. avoiding system views).

In fact, I’d go a step further and recommend:

Audit your entire schema: See which Refs you can remove or replace with LOOKUPs or key-matching SELECTs.

Group child tables by parent IDs in sheets for easy SELECT filtering, minimizing bot processing time.

Use security filters to further reduce client-side burden.

Wow, thank you so much for this breakdown — honestly, it feels great to know the approach I’ve been exploring isn’t just “creative” or “odd,” but actually aligned with how backend engineers think and build scalable systems.

I started down this path mostly by questioning what was happening under the hood, noticing the cost of virtual recalculations and wondering if there was a leaner way to structure things. I’m learning by doing (and by breaking things!), so your message really helped me feel like I’m not crazy for digging into these structural details.

The recommendation to shift to event-driven processing, keep UI logic decoupled from data logic, and audit all Refs — that resonates deeply with how I’ve been refining my apps lately. And yes, I’ve definitely run into some of those auto-generated views and columns being reset or recalculated unnecessarily, which pushed me toward manual management.

Thank you again for this. It’s the kind of validation that fuels the learning process and keeps me curious to keep building smarter.

1 Like