AppSheet Parser Suite: Analyze and Clean Up Your Apps

Hi Kirk,

thank you for your investigation and clarification. And thanks to Claude for his nice overview. Stil then the most valuable input for me was to use another command (without knowing exactly what this means for the company restrictions):

py -m pip install beautifulsoup4

this worked with the same settings, i.e. without any additional admin rights or so.

4 Likes

Great! Glad it worked. I hope the scripts work well for you. :slight_smile:

P.S. I see now that you had already written that. Sorry to have missed that.

3 Likes

What does py -m pip install beautifulsoup4 do?

The command

py -m pip install beautifulsoup4 installs the Beautiful Soup 4 library, a tool used in Python for web scraping and parsing HTML and XML documents.

Here’s a breakdown of the command:

  • py: On Windows, this runs the Python launcher and executes the command for the version of Python you have installed. On other operating systems, you would typically use python3 instead.
  • -m: This flag tells Python to run the following module as a script.
  • pip: The package installer for Python, which manages and installs libraries not included in the standard Python installation.
  • install: The specific pip command to download and install a package.
  • beautifulsoup4: The name of the package to be installed from the Python Package Index (PyPI). The library is then imported in your code from the bs4 module.

What Beautiful Soup is used for

  • Web Scraping: It helps extract information from web pages by parsing the raw HTML content.
  • Data Extraction: It creates a “parse tree” from the HTML, which makes it easy to navigate, search for, and extract specific data, such as a title, paragraph, or specific text.
  • Handles Imperfect HTML: It is particularly good at parsing poorly formatted HTML, earning its name as a reference to Lewis Carroll’s poem about “beautiful soup,” or “tag soup”.

Example usage
After installing, you can import and use the library in a Python script to parse HTML:

python

from bs4 import BeautifulSoup
import requests

# Fetch the HTML content from a website
response = requests.get("https://news.ycombinator.com/")
html_content = response.content

# Create a BeautifulSoup object to parse the HTML
soup = BeautifulSoup(html_content, "html.parser")

# Find the first title tag
title_tag = soup.find("title")

# Print the text content of the title tag
print(title_tag.text) 

Use code with caution.

5 Likes

I recently got a question on GitHub from someone who didn’t know where to find the HTML documentation for an app. I have a text explanation here

but I neglected to provide a screenshot. Fortunately, @Steve has already provided a clear visual so I’ll share that here:

3 Likes

:puzzle_piece: Analyze your CSV files in Google Sheets

If you’d like to use Google Sheets functions such as QUERY(), FILTER(), or REGEXMATCH() to search and analyze the CSV files produced by the AppSheet Parser Suite, this script can save you a lot of manual work.

:light_bulb: Why It’s Useful

AppSheet’s HTML export, once parsed by the Parser Suite, produces several CSVs (appsheet_columns.csv, appsheet_views.csv, etc.).
Analyzing relationships or searching across them inside Google Sheets can be powerful, but importing each file by hand is tedious and error-prone.

This Google Apps Script automatically creates a Google Sheets file containing the parsed data from your app. Expressions are brought into the file as text to prevent errors and ensure that they will be searchable. Once converted to a Google Sheets file, you can use spreadsheet tools (QUERY, FILTER, BYROW, etc.) to explore your app.


:card_index_dividers: Preparation

Before running the script, upload your parsed CSV files to Google Drive:

  1. Locate the folder on your computer that contains the CSV files produced by the AppSheet Parser Suite.
  2. Open https://drive.google.com in your browser.
  3. Drag and drop that folder into Google Drive.
    • Drive will upload the entire folder and preserve its name.
  4. After the upload finishes, open the folder in Drive.
  5. Copy the URL from your browser’s address bar — it should look like this:
    https://drive.google.com/drive/folders/1A2b3C4D5E6F7G8H9I0J
    
  6. Before you run the script, you’ll need to paste this URL into it, replacing the placeholder text inside this line:
    const folderInput = "PASTE_YOUR_FOLDER_URL_OR_ID_HERE";
    

:brain: Script

// --- begin script ---

/**
 * Import all CSV files from a Google Drive folder into one spreadsheet.
 * - Spreadsheet name = folder name (exactly)
 * - Sheet names = CSV file names (without .csv)
 * - All values imported as literal text
 * - Result link shown in Execution Log
 */

function importCSVsFromFolderDirect() {
  const folderInput = "PASTE_YOUR_FOLDER_URL_OR_ID_HERE";
  const folderId = extractFolderId(folderInput);
  if (!folderId) {
    Logger.log("❌ Invalid folder URL or ID. Please check the line near the top of the script.");
    return;
  }
  const link = importCSVsFromFolder(folderId);
  Logger.log("✅ Done! Created spreadsheet: " + link);
}

function importCSVsFromFolder(folderId) {
  const folder = DriveApp.getFolderById(folderId);
  const folderName = folder.getName();
  const ss = SpreadsheetApp.create(folderName);
  const files = folder.getFiles();
  let csvCount = 0;

  while (files.hasNext()) {
    const file = files.next();
    const name = file.getName();
    if (!/\.csv$/i.test(name)) continue;

    const text = stripBOM(file.getBlob().getDataAsString('UTF-8'));
    const delimiter = detectDelimiter(text);
    const rows = Utilities.parseCsv(text, delimiter) || [];

    // Treat all cells as literal text
    const safeRows = rows.map(row =>
      row.map(cell => {
        let value = String(cell);
        if (/^[=+]/.test(value)) value = "'" + value; // escape formulas
        return value;
      })
    );

    const sheetName = makeUniqueSheetName(ss, sanitizeSheetName(name.replace(/\.csv$/i, '')));
    const sheet = ss.insertSheet(sheetName);

    const range = sheet.getRange(1, 1, Math.max(1, safeRows.length), Math.max(1, safeRows[0]?.length || 1));
    range.setNumberFormat('@');
    if (safeRows.length > 0 && safeRows[0].length > 0) {
      range.setValues(safeRows);
    }
    csvCount++;
  }

  const defaultSheet = ss.getSheets()[0];
  if (defaultSheet.getDataRange().getValue() === '') ss.deleteSheet(defaultSheet);
  if (csvCount === 0) Logger.log("⚠️ No CSV files found in this folder.");
  return ss.getUrl();
}

/** ---------- Helpers ---------- **/

function extractFolderId(input) {
  const urlMatch = input.match(/\/folders\/([a-zA-Z0-9_-]+)/);
  if (urlMatch) return urlMatch[1];
  const qMatch = input.match(/[?&]id=([a-zA-Z0-9_-]+)/);
  if (qMatch) return qMatch[1];
  if (/^[a-zA-Z0-9_-]{20,}$/.test(input)) return input;
  return null;
}

function stripBOM(text) {
  return text.charCodeAt(0) === 0xFEFF ? text.slice(1) : text;
}

function detectDelimiter(text) {
  const firstLine = (text.split(/\r?\n/).find(l => l.trim().length) || '');
  const counts = { ',': (firstLine.match(/,/g) || []).length, '\t': (firstLine.match(/\t/g) || []).length, ';': (firstLine.match(/;/g) || []).length };
  let best = ',', max = -1;
  for (const d in counts) if (counts[d] > max) { max = counts[d]; best = d; }
  return best;
}

function sanitizeSheetName(name) {
  let n = name.replace(/[\[\]\:\*\?\/\\]/g, ' ');
  if (n.length > 100) n = n.slice(0, 100);
  return n.trim() || 'Sheet';
}

function makeUniqueSheetName(ss, base) {
  let name = base, i = 2;
  while (ss.getSheetByName(name)) {
    const suffix = ' (' + i + ')';
    const head = base.length + suffix.length > 100 ? base.slice(0, 100 - suffix.length) : base;
    name = head + suffix;
    i++;
  }
  return name;
}

// --- end script ---

:rocket: How to Use

  1. Open https://script.google.com and click New project.
  2. Delete any placeholder code and paste the complete script above.
  3. Click :floppy_disk: Save, then give the project a name such as CSV Folder Importer.
  4. At the top of the script, find this line:
    const folderInput = "PASTE_YOUR_FOLDER_URL_OR_ID_HERE";
    
    Paste your Google Drive folder URL or ID between the quotes.
    • Example:
      https://drive.google.com/drive/folders/1A2b3C4D5E6F7G8H9I0J
  5. From the function dropdown, choose importCSVsFromFolderDirect, then click the :play_button: Run button.
    • The first time you run it, you’ll be asked to authorize access to your Google Drive and Sheets.
  6. After it finishes, open the Execution log:
    • Menu: View → Execution log, or use the Logs panel at the bottom.
    • You’ll see a message like:
      ✅ Done! Created spreadsheet: https://docs.google.com/spreadsheets/d/xxxxxxxxxxxx/edit
      
  7. Click the link. You’ll find:
    • A new Google Sheet named exactly like your Drive folder
    • One tab for each CSV file in that folder
    • All content imported as plain text (no formulas or dates converted)

:abacus: Next Steps

Once you have the combined Google Sheet, you can use spreadsheet formulas to explore your AppSheet data.

:speech_balloon: Tip: Once you’ve verified that everything imports correctly, you can reuse this script anytime—just change the folder URL near the top and run it again to generate a fresh Google Sheet for new projects or new versions of the same project.

P.S. To save time, I used ChatGPT to prepare this post and the script. I have confirmed that the script works with my own data and I have read and revised the text of this post to my satisfaction. Any remaining problems are my fault, of course, not the responsibility of any AI tool I used.

5 Likes

You are really addressing a need here with this app/scripts. Thanks @Kirk_Masden !

At my first attempt today, everything seemed good up to Phase 4, where the parser failed!?

Any idea on how I can fix this would be greatly appreciated.

============================================================

PHASE 4: Parsing Actions

:open_file_folder: Loading slice mapping from appsheet_slices.csv
:white_check_mark: Loaded 27 slice mappings
:high_voltage: Extracting actions…

:open_file_folder: Loading system action data from actions.txt
:bar_chart: Found 343 actions in text file (102 system-generated)
:cross_mark: Actions parser failed: sequence item 0: expected str instance, NoneType found

============================================================================================================

Parsing complete!

:bar_chart: Summary of data to be analyzed:
:white_check_mark: Data Sources: 60 total
├─ Tables: 33
└─ Slices: 27
:white_check_mark: Columns: 616 total
├─ Physical: 569
└─ Virtual: 47
:white_check_mark: Format rules: 37
:cross_mark: Actions: Failed
Error: sequence item 0: expected str instance, NoneType found
:white_check_mark: Views: 122 total
├─ System-generated: 72
└─ User-created: 50
:white_check_mark: Navigation targets: 0

1 Like

hi @shimodabt !

Thank you for using the scripts. Apologies for the error you encountered. I think the script is probably not robust enough to deal with your particular configuration.

I wonder if you would be willing to share your data with me so that I can try to parse it and investigate the problem.

Unless I’m mistaken, private messaging in not enabled on this forum so we need to contact each other via email. kirkmasden@hotmail.com is not my regular email account (I rarely look at it) but I’ll try to check it over the next week. If you contact me there, we can discuss how to share data confidentially.

Email sent @Kirk_Masden

1 Like

Thanks! I sent a reply and Cc’d the Gmail account I would like to use to collaborate on this. I’m looking forward to finding the problem and improving the scripts. :slight_smile:

1 Like

Gmail put my message in the “Spam” folder. So, you may need to look there to find it. :slight_smile:

I found the source of the problem!

I was able to use the data you kindly shared with me to track down exactly what was happening. The issue occurs when an app has an action configured as “Execute an action on a set of rows” but the target action hasn’t been specified yet — in the JSON, this shows up as "ReferencedAction":null. The parser wasn’t handling that null value properly, which caused the crash.

If you want to fix this yourself right away, open actions_parser.py and find line 708. Change:

                if 'ReferencedAction' in data:

to:

                if 'ReferencedAction' in data and data['ReferencedAction'] is not None:

That’s it — just adding that extra check prevents the error.

I’ll publish the corrected script to the GitHub repository in the coming days.

Thanks again for sharing your files and helping me debug this!

P.S. I used Claude Opus 4.5 to debug this. Otherwise, all of this would be completely over my head.

1 Like

Super @Kirk_Masden! Thanks for fixing this issue.

Now, it looks like I have a lot of cleanup to do. :smiley:

1 Like

Thank you for your help Kirk!

I cannot wait to dig into the results.

1 Like

@shimodabt and all,

In a private email to me, @shimodabt kindly provided feedback about false positives for orphan actions (actions that were actually in use) and pointed out some other issues that require fixing. I hope to be able to address these issues within a week or so.

Little by little, with feedback from people who use the scripts to reduce cruft and address other issues, I hope to make them more robust and effective. :slight_smile:

1 Like

Thanks to the data that @shimodabt kindly shared with me, I was able to do the following:

Bug Fixes:

  • actions_parser.py: Fixed crash when “Execute an action on a set of rows” has no target action specified (null ReferencedAction)
  • actions_orphan_detector.py: Fixed false positives for navigation actions with Display Overlay prominence in table views
  • views_parser.py: Fixed dashboard views not being recognized as root views (position field was storing “laterType: dashboard” instead of “later”)

Improvements:

  • actions_orphan_detector.py & column_orphan_detector.py: Added notes explaining that items appearing only on unreachable views will be flagged as potential orphans
  • master_parser_and_orphan_detector.py: Improved “run later” instructions to use absolute paths tailored to the user’s directory

I spent a lot of time debugging with my own apps but @shimodabt’s configurations were different from mine, so they helped me find problems I was unaware of. I hope that others of you who encounter problems will let me know so that I can make the suite of scripts more robust.

The GitHub repository has been updated: GitHub - KirkMasden/appsheet_parser_and_orphan_detector: Python tools for AppSheet developers: Convert HTML Application Documentation to CSV format, analyze component dependencies, and identify unused (orphan) elements for cleanup.

I also improved the documentation on my website:

  • Step 5 updated: Removed hardcoded path that didn’t work for ZIP downloads
  • New section added: “Understanding Orphan Detection” - explains how view reachability affects detection and why detection is iterative

Full documentation: AppSheet Parser Suite - Technical Documentation

Thanks again to @shimodabt! I suspect other issues will need to be addressed in the future. When that time comes, I will provide updates here.

2 Likes

I noticed recently that my app was running slowly. The performance monitor showed that one virtual field was the main issue. To improve speed, I replaced it with a sheet field using a formula. The challenge, of course, is remembering where that virtual field is used in the app so nothing breaks. To solve this, I used Kirk’s parser, which reported that the column field was referenced in four views. That insight let me quickly update those views with the new column field. Thanks @Kirk_Masden for the great tool — it saved me time and kept my app running smoothly!

3 Likes

Thanks so much, @shimodabt ! I’m very glad that my scripts have been of some help, even though they’re not as robust as they should be.

One of several things you helped me realize through our direct exchanges is that I should not have excluded settings tables (the system-created tables that allow individual users to control settings for their apps). In my own app building I had not used these and so I thought I could exclude them. Your app, however, showed me that they should be included and I am currently working on fixing that problem.

I’m rather slow. I have other work to do and the script adjustments often take me a while, even with the help of AI. But, I’m pretty confident that I can continue to make the scripts more robust (that is, get them to accommodate apps with configurations that I did not foresee). So, I hope that you and others will continue to post about problems you come across. :slight_smile:

Thanks to issues that @shimodabt kindly alerted me to, I’ve made several improvements to the parser suite. These changes should improve results for creators who use User Settings in their apps. However, more testing is needed to make sure everything is working properly.

New Features:

  • USERSETTINGS() Reference Parsing: The parser now captures USERSETTINGS("ColumnName") references, storing them as _Per User Settings[ColumnName] in the referenced_columns field. This enables proper dependency tracking for User Settings columns.
  • [_THISUSER].[ColumnName] Syntax: Added support for the alternative syntax for accessing User Settings.
  • User Settings Orphan Detection: Detects configured User Settings columns that are never referenced via USERSETTINGS(). Skips system columns (_EMAIL, _NAME, etc.), Show type columns, and unconfigured default slots (Option 1, Option 2, etc.).
  • Broken USERSETTINGS Reference Detection: Flags when USERSETTINGS("X") references a column that doesn’t exist in _Per User Settings. This helped me find an actual bug in @shimodabt’s app, which I have been using for testing!

Bug Fixes:

  • _Per User Settings Table Included: Previously, the parser excluded this system table. It is now included so that User Settings columns can be analyzed for orphans and dependencies.
  • JSON Array False Positive: Fixed a bug where EnumList values like ["MOCM","MASS"] were incorrectly captured as column references.
  • CONTEXT() Case Sensitivity: The phantom view detector now uses case-sensitive matching for CONTEXT("View") comparisons, matching AppSheet’s actual behavior. (AppSheet’s CONTEXT() and FIND() functions are the only case-sensitive functions.)

The GitHub repository has been updated: GitHub - KirkMasden/appsheet_parser_and_orphan_detector

As always, please let me know if you encounter any issues!

2 Likes