Google Cloud Data Loss Prevention Extension

Version: 1.3.0

Use this extension to obscure sensitive data from content and images. For example, you might redact credit card numbers, names, and social security numbers.

Prerequisites

This content provides reference for configuring and using this extension. Before using the extension from an API proxy using the ExtensionCallout policy, you must:

  1. Enable the Google Cloud DLP API for your project.

  2. Grant permission for the level of access you'll want for the extension.

  3. Use the GCP Console to generate a key for the service account.

  4. Use the contents of the resulting key JSON file when adding and configuring the extension using the configuration reference.

About Cloud Data Loss Prevention (DLP)

Cloud Data Loss Prevention (DLP) is an API for inspecting text, images, and other data to identify and manage sensitive data.

For more, see the DLP overview. For reference to the API that this extension exposes, see Cloud Data Loss Prevention (DLP) API.

Samples

The following examples illustrate how to configure support for Cloud DLP extension actions using the ExtensionCallout policy.

To make trying this sample code easier, these examples use an AssignMessage policy to set flow variable values and to retrieve extension response values for display in the Trace tool.

Mask with stars

This example uses the deidentifyWithMask action to mask the specified types of text with a character specified in the policy -- here, the * character.

The following AssignMessage policy sets the request.content variable for illustration purposes. Ordinarily, you'd retrieve the request content from the client's request.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="Set-Variable">
    <DisplayName>Set Variable</DisplayName>
    <AssignTo type="response" createNew="false"/>
    <AssignVariable>
        <Name>request.content</Name>
        <Value>Visit my site at https://example.com. Or contact me at gladys@example.com.</Value>
    </AssignVariable>
</AssignMessage>

The following ExtensionCallout policy retrieves the request.content variable value and passes it to a Cloud DLP extension (here, called example-dlp). That extension has been configured to mask values based on the URL and EMAIL_ADDRESS infoTypes.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ConnectorCallout async="false" continueOnError="true" enabled="true" name="Data-Loss-Extension-Callout">
    <DisplayName>Data Loss Prevention Extension Callout</DisplayName>
    <Connector>example-dlp</Connector>
    <Action>deidentifyWithMask</Action>
    <Input><![CDATA[{
        "text" : "{request.content}",
        "mask" : "*"
    }]]></Input>
    <Output>masked.output</Output>
</ConnectorCallout>

The following AssignMessage policy retrieves the extension's output for display in the Trace tool.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="Get-DLP-Output">
    <DisplayName>Get DLP Output</DisplayName>
    <AssignTo type="response" createNew="false"/>
    <Set>
        <Payload contentType="application/json">{masked.output}</Payload>
    </Set>
</AssignMessage>

The following is an example of output from this code.

{"text":"Visit my site at ******************* Or contact me at *****************."}

Mask with name

This example uses the deidentifyWithType action to mask the specified types of text with the infotype name itself. For example, it would replace the email address gladys@example.com with EMAIL_ADDRESS.

The following AssignMessage policy sets the request.content variable for illustration purposes. Ordinarily, you'd retrieve the request content from the client's request.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="Set-Variable">
    <DisplayName>Set Variable</DisplayName>
    <AssignTo type="response" createNew="false"/>
    <AssignVariable>
        <Name>request.content</Name>
        <Value>Visit my site at https://example.com. Or contact me at gladys@example.com.</Value>
    </AssignVariable>
</AssignMessage>

The following ExtensionCallout policy retrieves the request.content variable value and passes it to a Cloud DLP extension (here, called example-dlp). That extension has been configured to mask values based on the URL and EMAIL_ADDRESS infoTypes.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ConnectorCallout async="false" continueOnError="true" enabled="true" name="Data-Loss-Extension-Callout">
    <DisplayName>Data Loss Prevention Extension Callout</DisplayName>
    <Connector>example-dlp</Connector>
    <Action>deidentifyWithType</Action>
    <Input><![CDATA[{
        "text" : "{request.content}"
    }]]></Input>
    <Output>masked.output</Output>
</ConnectorCallout>

The following AssignMessage policy retrieves the extension's output for display in the Trace tool.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<AssignMessage async="false" continueOnError="false" enabled="true" name="Get-DLP-Output">
    <DisplayName>Get DLP Output</DisplayName>
    <AssignTo type="response" createNew="false"/>
    <Set>
        <Payload contentType="application/json">{masked.output}</Payload>
    </Set>
</AssignMessage>

The following is an example of output from this code.

{"text":"Visit my site at [URL] Or contact me at [EMAIL_ADDRESS]."}

Actions

deidentifyWithMask

De-identify the sensitive data from text and mask the data with the mask character. This action masks those parts of text specified by the infoTypes property in extension configuration.

Masking sensitive data replaces characters with a symbol, such as an asterisk (*) or hash (#). The types of the sensitive data can be configured in extension configuration.

Syntax

<Action>deidentifyWithMask</Action>
<Input><![CDATA[{
  "text" : "text-to-deidentify",
  "mask" : "masking-character"
}]]></Input>

Example

In the following example, the input to mask is an email address stored in an input.email.address flow variable. To support this specific example, this extension must have been configured to support the EMAIL_ADDRESS infoType. For a list of infoTypes, see InfoType Detector Reference.

<Action>deidentifyWithMask</Action>
<Input><![CDATA[{
    "text" : "{input.email.address}",
    "mask" : "*"
}]]></Input>
<Output>masked.output</Output>

The output for this example would be the following:

{"text":"*****************"}

Request parameters

Parameter Description Type Default Required
text Text to de-identify. String None. Yes.
mask Character to use to mask sensitive data. String None. Yes.

Response

The input text with values of the specified infoTypes replaced with the specified character. For example,

{"text":"*********"}

deidentifyWithTemplate

De-identify sensitive data in text content using a template that configures what text to de-identify and how to handle it.

Templates are useful for decoupling configuration such as what you inspect for and how you de-identify it from the implementation of your API calls. Templates provide a way to re-use configuration and enable consistency across users and data sets.

In your template, you'll specify infoTypes representing the content to de-identify. For a list of infoTypes, see InfoType detector reference. De-identifying masks those parts of text specified by the infoTypes property in the template.

Syntax

<Action>deidentifyWithTemplate</Action>
<Input><![CDATA[{
  "text" : "text-to-deidentify"
  "templateName" : "path-to-template"
}]]></Input>

Example

In the following example, the input to de-identify is the request body carried by the request.content flow variable.

<Action>deidentifyWithTemplate</Action>
<Input><![CDATA[{
    "text" : "{request.content}"
    "templateName" : "projects/[PROJECT_ID]/deidentifyTemplates/1231258663242"
}]]></Input>

The output for this example would be the de-identified request content.

Request parameters

Parameter Description Type Default Required
text The text to de-identify. This is what the de-identification process operates on. Object None. Yes.
templateName The template to use. This will be a path to the template in the following form: projects or organizations/PROJECT_ID/deidentifyTemplates/TEMPLATE_ID. When you create the template with the Google API, use the response's name property value as the templateName. String None. Yes.

Response

The input text with values of the specified infoTypes replaced with the infoType names.

deidentifyWithType

De-identify sensitive data in text content, replacing each matched value with the name of the infoType. For a list of infoTypes, see InfoType detector reference. This action masks those parts of text specified by the infoTypes property in extension configuration.

In the following example, the phone number is recognized by the service, then replaced with the name of the infoType itself.

  • Input text:

    John Smith, 123 Main St, Seattle, WA 98122, 206-555-0123.

  • Result text:

    John Smith, 123 Main St, Seattle, WA 98122, PHONE_NUMBER.

Syntax

<Action>deidentifyWithType</Action>
<Input><![CDATA[{
  "text" : "text-to-deidentify"
}]]></Input>

Example

In the following example, the input to mask is an email address stored in an input.email.address flow variable. To support this specific example, this extension must have been configured to support the EMAIL_ADDRESS infoType. For a list of infoTypes, see InfoType Detector Reference.

<Action>deidentifyWithType</Action>
<Input><![CDATA[{
    "text" : "{input.email.address}"
}]]></Input>

The output for this example would be the following:

{"text":"EMAIL_ADDRESS"}

Request parameters

Parameter Description Type Default Required
text The text to de-identify. String None. Yes.

Response

The input text with values of the specified infoTypes replaced with the infoType names. For example,

{"text":"EMAIL_ADDRESS"}

redactImage

Redact text that falls into one of the infoType categories. The redacted content is detected and obscured with an opaque rectangle. This action masks those parts of image_data specified by the infoTypes property in extension configuration.

For a list of infoTypes, see InfoType detector reference.

Request parameters

<Action>redactImage</Action>
<Input><![CDATA[{
  "image_data" : "base64-encoded-image-to-analyze",
  "image_type" : "type-of-image"
}]]></Input>
Parameter Description Type Default Required
image_data The image data encoded in base64. String None. Yes.
image_type Constant of the image type. Available values are IMAGE_JPEG, IMAGE_BMP, IMAGE_PNG, IMAGE_SVG. String None. Yes.

Response

The image with text redacted.

Configuration Reference

Use the following when you're configuring and deploying this extension for use in API proxies. For steps to configure an extension using the Apigee console, see Adding and configuring an extension.

Common extension properties

The following properties are present for every extension.

Property Description Default Required
name Name you're giving this configuration of the extension. None. Yes.
packageName Name of the extension package as given by Apigee Edge. None. Yes.
version Version number for the extension package from which you're configuring an extension. None. Yes.
configuration Configuration value specific to the extension you're adding. See Properties for this extension package None. Yes.

Properties for this extension package

Specify values for the following configuration properties specific to this extension.

Property Description Default Required
projectId The GCP project ID for which the Cloud Data Loss Prevention API is enabled. None. Yes.
infoTypes Info types of the sensitive data. If omitted, the service will detect all built-in types. For a list of infoTypes supported by the Google Cloud DLP service, see InfoType Detector Reference. None. No.
credentials When entered in the Apigee Edge console, this is the contents of your service account key file. When sent via the management API, it is a base64-encoded value generated from the service account key file. None. Yes.