PII Pseudonymization Enrichment
PII Pseudonymization Enrichment¶
Summary¶
The PII Enrichment enables Snowplow users to better protect the privacy rights of data subjects, therefore aiding in compliance for regulatory measures.
Overview¶
As more and more regulation is brought out worldwide to protect individuals in regards to their behavioural and personal data that is collected, stored and processed, Snowplow wants to ensure that we enable our users to have more control over how that data is handled.
This enrichment builds off of the ability to pseudonimize certain fields collected using Snowplow trackers. This enrichment is configured to choose which fields to hash along with other configuration settings related to the hashing itself.
To read more detail on this enrichment go here.
For help setting up this enrichment for your pipeline please contact us at support@snowplowanalytics.com
Example¶
{
"schema": "iglu:com.snowplowanalytics.snowplow.enrichments\/pii_enrichment_config\/jsonschema\/2-0-0",
"data": {
"vendor": "com.snowplowanalytics.snowplow.enrichments",
"name": "pii_enrichment_config",
"emitEvent": true,
"enabled": true,
"parameters": {
"pii": [
{
"pojo": {
"field": "user_id"
}
},
{
"pojo": {
"field": "user_fingerprint"
}
},
{
"json": {
"field": "unstruct_event",
"schemaCriterion": "iglu:com.mailchimp\/subscribe\/jsonschema\/1-*-*",
"jsonPath": "$.data.['email', 'ip_opt']"
}
}
],
"strategy": {
"pseudonymize": {
"hashFunction": "SHA-1",
"salt": "pepper123"
}
}
}
}
}
The configuration above is for a Snowplow pipeline that is receiving events from the Snowplow JavaScript Tracker, plus a Mailchimp webhook integration:
The Snowplow JavaScript Tracker has been configured to emit events which includes the
user_id
anduser_fingerprin
fieldsThe Mailchimp webhook (available since release 0.9.11) is emitting subscribe events (among other events, ignored for the purpose of this example)
With the above PII Enrichment configuration, then, you are specifying that:
You wish for the user_id and user_fingerprint from the Snowplow Canonical event model fields to be hashed (the full list of supported fields for pseudonymization is viewable in the enrichment configuration schema)
You wish for the
data.email
anddata.ip_opt
fields from the Mailchimp subscribe event to be hashed, but only if the schema version begins with 1-You wish to use the SHA-256 variant of the algorithm for the pseudonymization
You wish for the re-identification events to be emitted to the pii stream (see stream enrich configuration for configuring the stream)
You wish for the salt value pepper123 to be used in hashing all the values
Last updated
Was this helpful?