SI-19: De-identification

Baselines:

  • Low

    N/A

  • Moderate

    N/A

  • High

    N/A

  • Privacy
    • SI-19
Info icon.

Control is new to this version of the control set.

Control Statement

  1. Remove the following elements of personally identifiable information from datasets: [Assignment: organization-defined elements of personally identifiable information]; and
  2. Evaluate [Assignment: organization-defined frequency] for effectiveness of de-identification.

Supplemental Guidance

De-identification is the general term for the process of removing the association between a set of identifying data and the data subject. Many datasets contain information about individuals that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records. Datasets may also contain other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information. Personally identifiable information is removed from datasets by trained individuals when such information is not (or no longer) necessary to satisfy the requirements envisioned for the data. For example, if the dataset is only used to produce aggregate statistics, the identifiers that are not needed for producing those statistics are removed. Removing identifiers improves privacy protection since information that is removed cannot be inadvertently disclosed or improperly used. Organizations may be subject to specific de-identification definitions or methods under applicable laws, regulations, or policies. Re-identification is a residual risk with de-identified data. Re-identification attacks can vary, including combining new datasets or other improvements in data analytics. Maintaining awareness of potential attacks and evaluating for the effectiveness of the de-identification over time support the management of this residual risk.

Control Enhancements

SI-19(1): Collection

Baseline(s):

(Not part of any baseline)

De-identify the dataset upon collection by not collecting personally identifiable information.

SI-19(2): Archiving

Baseline(s):

(Not part of any baseline)

Prohibit archiving of personally identifiable information elements if those elements in a dataset will not be needed after the dataset is archived.

SI-19(3): Release

Baseline(s):

(Not part of any baseline)

Remove personally identifiable information elements from a dataset prior to its release if those elements in the dataset do not need to be part of the data release.

SI-19(5): Statistical Disclosure Control

Baseline(s):

(Not part of any baseline)

Manipulate numerical data, contingency tables, and statistical findings so that no individual or organization is identifiable in the results of the analysis.

SI-19(6): Differential Privacy

Baseline(s):

(Not part of any baseline)

Prevent disclosure of personally identifiable information by adding non-deterministic noise to the results of mathematical operations before the results are reported.

SI-19(8): Motivated Intruder

Baseline(s):

(Not part of any baseline)

Perform a motivated intruder test on the de-identified dataset to determine if the identified data remains or if the de-identified data can be re-identified.