This blog provides a high-level overview of and how to use the GCP DLP api for readacting the sensitive PII data in JAVA. In today’s data driven world, protecting your sensitive data is utmost important in order to protect user privacy and any security breaches. Google Cloud DLP api is an extremely powerful tool for discovering and protecting major PII data such as credit card numbers, phone numbers , emails and addresses to cover a few.
This blog will guide you on how you can leverage the DLP service in a JAVA client code for handling sensitive data.
Prerequisites
-
GCP Project with billing enabled : GCP project with billing enabled is mandatory to begin using DLP api.
-
Enablement of DLP api : The api can be enabled from the GCP console.
-
Java Environment: A working Java Development Kit (JDK 8 or later) and a build tool like Maven or Gradle.
Setting Up Your Java Project
To use the Cloud DLP API, you need to add the official Google Cloud client library to your project. If you’re using Maven, add the following dependency to your pom.xml:
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-dlp</artifactId>
<version>3.33.0</version>
</dependency>
Setting Up DLP Service Client
To create DLPServiceClient object, you can use both default authentication and json key authentication.
GoogleCredentials credentials = GoogleCredentials.fromStream(new FileInputStream("path-to-json-file")) .createScoped(Lists.newArrayList("https://www.googleapis.com/auth/cloud-platform"));
DlpServiceSettings settings = DlpServiceSettings.newBuilder() .setCredentialsProvider(FixedCredentialsProvider.create(creds)).build();
try (DlpServiceClient dlpServiceClient = DlpServiceClient.create(settings))
{ ///... other stuff here ...///
}
Setting Up DLP Configuration
This step essentially defines what all types of sensitive data should be covered while using the DLP api. InfoType detectors are predefined patterns for common sensitive data types, like EMAIL_ADDRESS, CREDIT_CARD_NUMBER, or US_SOCIAL_SECURITY_NUMBER. You can find a full list here.
List<InfoType> infoTypes = Stream.of("PERSON_NAME", "PHONE_NUMBER", "EMAIL_ADDRESS", "STREET_ADDRESS", "CREDIT_CARD_NUMBER").map(it -> InfoType.newBuilder().setName(it).build()).collect(Collectors.toList());
InspectConfig inspectConfig = InspectConfig.newBuilder().addAllInfoTypes(infoTypes).setIncludeQuote(true).build();
Redacting PII info using DLP for defined Configuration
DLP not only helps identify the sensitive data but also helps you act upon it by redacting, masking, or tokenizing it. Code snippet below shows how you can modify the request to redact the findings with a placeholder [REDACTED].
The Google Cloud DLP API is an essential tool for any developer working with potentially sensitive data on GCP. With just a few lines of Java, you can build powerful inspection and de-identification capabilities directly into your applications, helping you stay secure and compliant.
RedactConfig redactConfig = RedactConfig.newBuilder().build(); PrimitiveTransformation transformation = PrimitiveTransformation.newBuilder() .setRedactConfig(redactConfig) .build();
InfoTypeTransformation infoTypeTransformation = InfoTypeTransformation.newBuilder() .setPrimitiveTransformation(transformation) .build(); InfoTypeTransformations transformations = InfoTypeTransformations.newBuilder() .addTransformations(infoTypeTransformation) .build();
DeidentifyConfig deidentifyConfig = DeidentifyConfig.newBuilder() .setInfoTypeTransformations(transformations) .build();
ContentItem contentItem = ContentItem.newBuilder().setValue(INPUT_TEXT).build();
DeidentifyContentRequest request = DeidentifyContentRequest.newBuilder() .setParent(LocationName.of(projectId, "global").toString())
.setInspectConfig(inspectConfig)
.setDeidentifyConfig(deidentifyConfig)
.setItem(contentItem) .build();
DeidentifyContentResponse response = dlpServiceClient.deidentifyContent(request);
System.out.println("Original Text: " + INPUT_TEXT);
System.out.println("Redacted Text: " + response.getItem().getValue()); } } }
From here, you can explore more advanced features like:
-
Inspecting data in Cloud Storage, BigQuery, and Datastore.
-
Creating custom InfoType detectors for your specific data patterns.
-
Using more complex de-identification techniques like masking and tokenization.
Happy coding, and stay secure! ![]()