Skip to content

Disable Immuta from Sampling Raw Data

If you want to disable the metadata collection that requires sampling data, you must

  1. Disable Immuta fingerprinting.
  2. Stop all data source health checks.
  3. Add the Skip Stats Job tag to all data sources.

These steps will ensure that Immuta queries no data, under any circumstances. Without this sample data, some Immuta features will be unavailable. Sensitive data discovery (SDD) cannot be used to automatically detect sensitive data in your data sources, and the following masking policies will not work:

  • Masking with format preserving masking
  • Masking with k-anonymization
  • Masking using randomized response

Advanced Configuration

Fingerprinting

To stop Immuta from running fingerprints on all data sources,

  1. Navigate to the App Settings page, and scroll to the Advanced Configuration section.
  2. Enter the following YAML:

    fingerprints:
      uri: null
      classification:
        enabled: false
    
  3. Click Save.

Data Source Health Checks

To stop Immuta from running data source health checks on all data sources,

  1. Navigate to the App Settings page, and scroll to the Advanced Configuration section.
  2. Enter the following YAML:

    plugins:
      snowflakeHandler:
        config:
          healthCheckQuery: null
      redshiftHandler:
         config:
          healthCheckQuery: null
      trinoHandler:
        config:
          healthCheckQuery: null
      databricksHandler:
        config:
          healthCheckQuery: null
      asaHandler:
        config:
          healthCheckQuery: null
    
  3. Click Save.

Skip Stats Job Tag

Tag each data source with the seeded Skip Stats Job tag to stop Immuta from collecting a sample and running table stats on the sample. You can tag data sources as you create them in the UI or via the Immuta API.

Note that data sources automatically skip the stats job upon registration, without the Skip Stats Job tag, as long as there are no active policies requiring them. The following policies require stats:

  • Column masking with randomized response
  • Column masking with format preserving masking
  • Column masking with k-anonymization
  • Column masking with rounding
  • Column masking with reversibility
  • Row minimization