Automating resource discovery: Identifying untagged and unlabeled assets with Cloud Asset Inventory

Harshapriya · November 4, 2024, 9:04pm

Overview

Efficient resource management in Google Cloud Platform (GCP) is essential for cost control, access management, and automation. Tags and labels serve as powerful tools to achieve this.

Tags allow you to enforce policies with fine-grained access control, while labels help in cost attribution and resource management.

Maintaining consistent tagging and labeling in large-scale environments is challenging, often resulting in missing or inconsistent metadata.

This blog focuses on automating the process of identifying untagged or unlabeled resources using the Cloud Asset Inventory API.

Importance of tagging and labeling in GCP

Organization and categorization
- Tags and labels help organize cloud resources, making it easier to locate, filter, and manage them, especially in large environments.
- Proper tagging and labeling ensure clarity, reduce management complexity, and improve resource visibility, allowing teams to quickly find and act on specific resources.
Cost allocation and analysis
- Assigning tags and labels allows for accurate cost tracking by project, department, or application, which helps in tracking costs more accurately and understanding where your cloud spend is going.
Automation
- Tags can trigger automated actions, like starting or stopping instances based on specific tags, or applying policies to resources based on specific tags.

Understanding tags and labels

Now that we have explored why tagging and labeling are crucial for resource management, let’s dive into a deeper understanding of how they work in GCP.

Labels

Labels are also key-value pairs but primarily used for grouping resources and managing costs at scale. Information about labels is forwarded to the billing system that lets you break down your billed charges by label.

Check the supported resources to learn how to apply labels and to what you can apply them. For instance, BigQuery lets you add labels to your datasets, tables, and views, while Cloud Storage allows you to add labels to buckets. You can add labels to projects but not to folders.

For example, if you want to identify which team is associated with specific Google Cloud resources and their associated costs, you can create a key called key:team and apply the labels team:alpha, team:beta, and team:delta to the resources to achieve this.

Difference between tags and labels

Tags	Labels
Tag keys, tag values, and tag bindings are all discrete resources	Not a resource in itself, but metadata for resources
Can be defined at Organization, folder, or project level	Can be applied to Project or resource level
Requires defining the tag key and value first	Can be added directly to resources
Can be used in IAM conditions	No IAM policy support
Inherited by descendants within the resource hierarchy	Not inherited

Use cases for tags and labels

Cost management: Track and allocate cloud costs based on teams, applications, or environments.
Policy enforcement: Use tags to restrict access or apply security policies to specific resources.
Resource grouping: Organize resources by application, department, or environment for easier management.
Automation: Trigger automated actions based on tags or labels, such as starting/stopping instances or applying security policies.
Monitoring and logging: Filter logs and metrics based on tags or labels to gain insights into specific resources or groups of resources.

Challenges - Inconsistent tagging and labeling

Lack of standardization

Inconsistent naming conventions: Different teams or individuals may use varying naming conventions, leading to confusion and difficulties in resource management.

Manual effort

Time-consuming processes: Adding tags and labels manually can be time-intensive and prone to errors, especially when dealing with numerous resources.

Scaling challenges

Keeping up with changes: As Google Cloud’s services evolve and new resource types are introduced, adapting your tagging and labeling strategy to ensure consistency across all resource types can be an ongoing challenge.
Resource sprawl: Rapid resource creation can lead to backlogs in tagging efforts.

Reporting and analysis complications

Data quality issues: Inconsistent tagging results in inaccurate reporting, making it difficult to derive meaningful and valuable insights.

Given these challenges, leveraging the right tools becomes essential. One such tool is Google Cloud’s Asset Inventory, which simplifies the management of resources through automated discovery and tracking.

Leveraging Cloud Asset Inventory

Cloud Asset Inventory is a fully managed metadata inventory service that allows you to view, monitor, analyze, and gain insights into your entire Google Cloud Platform (GCP) environment.

An asset can be a resource, such as a Compute Engine Instance or a BigQuery table, or a policy, such as an IAM Policy or Organization Policy. It can also include runtime information from VMs, such as system updates.

Metadata refers to the configuration and properties of these assets. Cloud Asset Inventory operates at the organization, folder, and project levels, with fine-grained controls provided by IAM Policies.

Key features of Cloud Asset Inventory

Resource discovery: It automatically discovers resources across your Google Cloud environment, including VMs, storage buckets, projects, and more, providing a centralized view.
Metadata tracking: It stores metadata for each asset, such as its location, configurations, tags, labels, and resource hierarchy information (organization, folder, project level).
Historical data: It provides asset history, enabling users to see how resources change over time.
Monitor real-time configuration changes: Real time notification feature to achieve continuous monitoring. Cloud Asset Inventory allows you to monitor assets and will send a Pub/Sub message when change occurs.
Full change history: View a complete history of changes made to resources and their configurations over time.

Identifying untagged/unlabeled resources

Proper tagging and labeling are essential for resource organization, cost allocation, and access control. Cloud Asset Inventory makes it easy to identify resources that lack proper tags or labels.

Cloud Asset Inventory provides multiple ways to interact with asset data, such as through the REST API, Client libraries and gcloud command-line tool.

To automate the identification of untagged/unlabelled resources, Cloud Asset Inventory API via the Python client library. While you can interact with Google Cloud APIs directly, client libraries simplify the process, reducing the amount of code you need to write.

Here’s an example of how to use the Cloud Asset Inventory Python Client library to find untagged/unlabeled resources

1. Enable Cloud Asset Inventory API in the project where the following code is executed

gcloud services enable cloudasset.googleapis.com

2. Install the required Python package

pip install --upgrade google-cloud-asset

3. Import the library and instantiate the Cloud Asset Inventory Client

from google.cloud import asset_v1
client = asset_v1.AssetServiceClient()

from google.cloud import asset_v1 - imports the module asset_v1 from the google-cloud library. This module provides the necessary classes and functions for working with cloud asset inventory API.

Use the AssetServiceClient() class to connect to the Cloud Asset Inventory API. It handles the underlying communication and authentication with the service.

4. Function to retrieve projects without tags

The below function in Python helps identify projects within a specific folder that do not have any tags assigned.

# This function searches for projects within a specific Google Cloud Folder that do not have any tags assigned.
def list_projects_without_tags():
 	
   # Sending the request to Cloud Asset Inventory, by using the AssetServiceClient which was instantiated in a previous step. 
   response = client.search_all_resources(
       request={
           "scope" : f"folders/{folder_id}",
           "asset_types" :["cloudresourcemanager.googleapis.com/Project"],
           "query" : "-tagKeys:*",
           "read_mask" : "project,folders,organization,name"
       }
   )

   # Loop through each resource in the response and print relevant details.
   for resource in response:
       print(f"Organization: {resource.organization}")
       print(f"Folder: {resource.folders}")
       print(f"Project ID: {resource.project}")
       print(f"Project Name: {resource.folders}")

  return response

client.search_all_resources() - search_all_resources() method allows you to send a structured request to Cloud Asset Inventory and receives a response containing matching resources.

Here, we provide the request object that specifies the search parameters:

scope
- Defines the search boundary or where to look for.
- In this case it’s looking within a folder, which is specified by folder_id. This can be modified to search across an organization or project.
asset_types
- This specifies the type of resource to look for.
- Here it’s set to cloudresourcemanager.googleapis.com/Project, so the function will only search for projects.
- This link lists the supported asset types in Cloud Asset Inventory.
query
- This further allows us to refine our search to only filter the projects without tags. It uses a query language that supports a variety of operations and functions.
- The minus sign (-) before the tagKeys is the negation operator, this specifies the Cloud Asset Inventory to find resources that don’t match the following condition
- Examples**:**
  - -tagKeys:* : Finds resources that do not have any tags assigned.
  - labels.env:production : Finds resources with a label named “env” that has the value “production”.
  - name:my-instance-* : Finds resources with names starting with “my-instance-”.
  - location:us-central1 : Finds resources located in the us-central1 region.
read_mask
- Defines the attributes/metadata we want in the response.
- In this case, we are interested in the project, folders, organization, and name fields.
- This ensures that the output includes all key metadata about the untagged projects.

For each project returned in the response, we then print out the details.

5. Function to retrieve resource instances without labels

The below function in Python, helps to identify Compute Engine Instances and Cloud Storage buckets within a specific project that haven’t been assigned any labels.

# This function searches for resources (Compute Engine instances and Storage buckets) in a specific Google Cloud project that do not have any labels assigned.
def list_resources_without_labels():

   # Sending the request to Cloud Asset Inventory, by using the AssetServiceClient which    was instantiated in a previous step. 
   response = client.search_all_resources(
       request={
           # Scope defines the project to search within
           "scope" : f"projects/{project_id}",

	    # Specify the asset types: Compute Engine instances and Storage buckets
           "asset_types": [
"compute.googleapis.com/Instance",
"storage.googleapis.com/Bucket"
],

	    # Query to find resources without lables
           "query" : "-labels:*",
           "read_mask" : "name,asset_type,project,folders,organization,display_name,location"
       }
   )

   return response

scope
- Is restricted to project level specified by projects/{project_id}.
- Specifies that the search for asset data needs to be done within the specified project .
asset_types
- The function searches for the below asset types
  - "[compute.googleapis.com/Instance](http://compute.googleapis.com/Instance)" → Compute Engine Instance
  - "[storage.googleapis.com/Bucket](http://storage.googleapis.com/Bucket)" → Cloud Storage Bucket
query
- The query “-labels:*” is used to find resources that do not have any labels attached to them. This is the key to identifying the unlabelled resources.
- The query can be modified to find the resources without a particular label key. This can be done by updating the query to “-labels.:*”
- For example, to find the resources that don’t have a label with the key environment attached to them , you would use the below -
- ```
  "query" : "-labels.environment:*"
```

Conclusion

We have explored the importance of maintaining consistent tagging and labeling practices in GCP for better resource management, cost tracking and automation. By leveraging Google Cloud Asset Inventory, you can automate the process of identifying resources that lack proper tags or labels and ensure better compliance and organization across your environment.

Effectively managing cloud resources is a continuous journey. The next step is to adapt these automation strategies to your specific needs and environment.

While we demonstrated how to identify missing metadata for projects, Compute Engine instances, and Cloud Storage buckets, you can extend these methods to other asset types.

By continuously improving your tagging and labeling processes, you not only gain better control over your cloud resources but also set up your organization for success in cost management, security, and compliance.

Madhu_kumar_v · November 5, 2024, 2:29pm

Iniyaraja · November 8, 2024, 4:27am

Excellent guide on using tags and labels in GCP for better resource management, cost tracking, and automation.

Pragathi · November 8, 2024, 4:41am

Great insights on streamlining resource management in GCP

Nalinibabu · November 8, 2024, 4:54am

Good information about labels and tags

Topic		Replies	Views
System Design: Resource Management Best Practices Community Articles googler-article	0	74	December 13, 2021
Automatic Resource Deletion Compute Infrastructure cloud-storage	3	2	November 20, 2021
How to identify untagged data sets Data Analytics bigquery , data-catalog	4	101	February 15, 2024