Google BigQuery Integration
Private preview
This integration is available to select accounts. Reach out to your Immuta representative for details.
Getting started with Google BigQuery integration
The Google BigQuery integration allows users to query policy protected data directly in BigQuery as secure views within an Immuta-created dataset. Immuta controls who can see what within the views, allowing data governors to create complex ABAC policies and data users to query the right data within the BigQuery console.
Configuration
Google BigQuery is configured through the Immuta console and a script provided by Immuta. While you can complete some steps within the BigQuery console, it is easiest to install using gcloud and the Immuta script.
- Create a custom role and assign that role to a custom user to use as the Immuta system account.
- Enable the integration in the Immuta console.
Protect your data
Once Google BigQuery has been configured, BigQuery admins can start creating subscription and data policies to meet compliance requirements and users can start querying policy protected data directly in BigQuery.
- Create a global subscription or supported data policy.
- Register your BigQuery tables and views in Immuta as data sources.
- Revoke user access to the original datasets and grant users access to the Immuta created datasets in BigQuery.
- Users query data from the Immuta created datasets directly in BigQuery.
FAQs
-
What permissions will Immuta have in my BigQuery environment?
- You can find a list of the permissions the custom Immuta role has here.
-
What integration features will Immuta support for BigQuery?
- For private preview, Immuta supports a basic version of the BigQuery integration where Immuta can enforce specific policies on data in a single BigQuery project. At this time, workspaces, tag ingestion, user impersonation, native query audit, and multiple integrations are not supported.
Google BigQuery integration conceptual overview
In this policy push integration, Immuta creates views that contain all policy logic. Each view has a 1-to-1 relationship with the original table. Access controls are applied in the view, allowing customers to leverage Immuta’s powerful set of attribute-based policies and query data directly in BigQuery.
BigQuery is organized by projects (which can be thought of as databases), datasets (which can be compared to schemas), tables, and views. When you enable the integration, an Immuta dataset is created in BigQuery that contains the Immuta-required user entitlements information. These objects within the Immuta dataset are intended to only be used and altered by the Immuta application.
After data sources are registered, Immuta uses the custom user and role, created before the integration is enabled, to push the Immuta data sources as views into a mirrored dataset of the original table. Immuta manages grants on the created view to ensure only users subscribed to the Immuta data source will see the data.
Secure views
The Immuta integration uses a mirrored dataset approach. That is, if the source dataset is named mydataset
, Immuta
will create a dataset named mydataset_secure
, assuming that _secure
is the specified Immuta dataset suffix. This
mirrored dataset is an authorized dataset,
allowing it to access the data of the original dataset. It will contain
the Immuta-managed views, which have identical names to the original tables they’re based on.
Managing access
Following the principle of least privilege, Immuta does not have permission to manage Google Cloud Platform users, specifically in granting or denying access to a project and its datasets. This means that data governors should limit user access to original datasets to ensure data users are accessing the data through the Immuta created views and not the backing tables. The only users who need to have access to the backing tables are the credentials used to register the tables in Immuta.
Additionally, a data governor must grant users access to the mirrored datasets that Immuta will create and populate with views. Immuta and BigQuery’s best practice recommendation is to grant access via groups in Google Cloud Platform. Because users still must be registered in Immuta and subscribed to an Immuta data source to be able to query Immuta views, all Immuta users can be granted access to the mirrored datasets that Immuta creates.
Limitations
- This integration can only be enabled through a manual bootstrap using the Immuta API.
- This integration can only be enabled to work in a single region.
Supported policies
This integration supports the following policy types:
- Column masking
- Mask using hashing (SHA256())
- Mask by making NULL
- Mask using constant
- Mask using a regular expression
- Mask by date rounding
- Mask by numeric rounding
- Mask using custom functions
- Row-level masking
- Row visibility based on user attributes and/or object attributes
- Only show rows that fall within a given time window
- Minimize rows
- Filter rows using custom WHERE clause
- Always hide rows
Additional resources
See the resources below to start implementing and using the BigQuery integration:
- Configuring the Google BigQuery integration
- Creating BigQuery data sources
- Building global subscription and data policies to govern data
- Creating projects to collaborate
Configure the Google BigQuery integration
Follow this guide to connect your Google BigQuery data warehouse to Immuta.
Prerequisites
- Immuta SaaS or Immuta v2023.1 or newer with Google BigQuery integration (PrPr) enabled.1
- Google BigQuery ODBC driver uploaded on the Immuta app settings page.5
- Immuta role with SYSTEM_ADMIN permissions and an API key.
- Install the gcloud CLI.2
Google Cloud service account and role used by Immuta to connect to Google BigQuery
The Google BigQuery integration requires you to create a Google Cloud service account and role that will be used by Immuta to
- create a Google BigQuery dataset that will be used to store a table of user entitlements, UDFs for policy enforcement, etc.3
- manage the table of user entitlements via updates when entitlements change in Immuta.
- create datasets and secure views with access control policies enforced, which mirror tables inside of datasets you ingest as Immuta data sources.
You have two options to create the required Google Cloud service account and role:
The Immuta script
The bootstrap.sh
script is a shell script provided by Immuta that creates prerequisite Google Cloud IAM objects
for the integration to connect. When you run this script from your command line, it will create the following items,
scoped at the project-level4:
- A new Google Cloud IAM role
- A new Google Cloud service account, which will be granted the newly-created role
- A JSON keyfile for the newly-created service account
You will need to use the objects created in these steps to enable the Google BigQuery integration.
Google Cloud IAM roles required to run the script
To execute bootstrap.sh
from your command line, you must be authenticated to the gcloud CLI utility as a user with
all of the following roles:
roles/iam.roleAdmin
roles/iam.serviceAccountAdmin
roles/serviceusage.serviceUsageAdmin
Having these three roles is the least-privilege set of Google Cloud IAM roles required to successfully run the
bootstrap.sh
script from your command line. However, having either of the following Google Cloud IAM roles will
also allow you to run the script successfully:
roles/editor
roles/owner
Create a service account and role by running the script provided by Immuta
-
Install gcloud.
-
Set the account property in the core section for Google Cloud CLI to the account gcloud should use for authentication. (You can run gcloud auth list to see your currently available accounts):
gcloud config set account ACCOUNT
-
In Immuta, navigate to the App Settings page and click the Integrations tab.
- Click Add Native Integration and select Google BigQuery from the dropdown menu.
- Click Select Authentication Method and select Key File.
- Click Download Script(s).
-
Before you run the script, update your permissions to execute it:
chmod 755 <path to downloaded script>
-
Run the script, where
- PROJECT_ID is the Google Cloud Platform project to operate on.
- ROLE_ID is the name of the custom role to create.
- NAME will create a service account with the provided name.
- OUTPUT_FILE is the path where the resulting private key should be written. File system write permission will be checked on the specified path prior to the key creation.
- undelete-role (optional) will undelete the custom role from the project. Roles that have been deleted
for a long time can't be undeleted. This option can fail for the following reasons:
- The role specified does not exist.
- The active user does not have permission to access the given role.
- enable-api (optional) provided you’ve been granted access to enable the Google BigQuery API, will enable the service.
$ bootstrap.sh \ --project PROJECT_ID \ --role ROLE_ID \ --service_account NAME \ --keyfile OUTPUT_FILE \ [--undelete-role] \ [--enable-api]
Create a service account and role by using Google Cloud console
Alternatively, you may use the Google Cloud Console to create the prerequisite role, service account, and private key file for the integration to connect to Google BigQuery.
-
Create a custom role using the console with the following privileges:
bigquery.datasets.create
bigquery.datasets.delete
bigquery.datasets.get
bigquery.datasets.update
bigquery.jobs.create
bigquery.jobs.get
bigquery.jobs.list
bigquery.jobs.listAll
bigquery.routines.create
bigquery.routines.delete
bigquery.routines.get
bigquery.routines.list
bigquery.routines.update
bigquery.tables.create
bigquery.tables.delete
bigquery.tables.export
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.tables.setCategory
bigquery.tables.update
bigquery.tables.updateData
bigquery.tables.updateTag
-
Create a service account and grant it the custom role you created in step 1.
- Enable the Google BigQuery API.
Enable the Google BigQuery integration
Once the Google Cloud IAM custom role and service account are created, you can enable the Google BigQuery integration. This section illustrates how to enable the integration on the Immuta app settings page. To configure this integration via the Immuta API, see the Configure a Google BigQuery integration API guide.
- In Immuta, navigate to the App Settings page and click the Integrations tab.
- Click Add Native Integration and select Google BigQuery from the dropdown menu.
- Click Select Authentication Method and select Key File.
-
Upload your GCP Service Account Key File. This is the private key file generated in create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery. Uploading this file will auto-populate the following fields:
- Project Id: The Google Cloud Platform project to operate on, where your Google BigQuery data warehouse is located. A new dataset will be provisioned in this Google BigQuery project to store the integration configuration.
- Service Account: The service account you created in create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery.
-
Complete the following fields:
- Immuta Dataset: The name of the Google BigQuery dataset to provision inside of the project. Important: if you are using multiple environments in the same Google BigQuery project, this dataset to provision must be unique across environments.
- Immuta Role: The custom role you created in create a Google Cloud service account and role for Immuta to use to connect to Google BigQuery.
- Dataset Suffix: The suffix that will be postfixed to the name of each dataset created to store secure views, one per dataset that you ingest a table for as a data source in Immuta. Important: if you are using multiple environments in the same Google BigQuery project, this suffix must be unique across environments.
- GCP Location: The dataset’s location. After a dataset is created, the location can't be changed. Note: If you choose EU for the dataset location, your Core BigQuery Customer Data resides in the EU.
GCP location must match dataset region
The region set for the GCP location must match the region of your datasets. Set GCP location to a general region (for example,
US
) to include child regions. -
Click Save.
Disable the Google BigQuery integration
You can disable the Google BigQuery integration automatically or manually.
Automatically disable integration
- Click the App Settings icon, and then click the Integrations tab.
- Select the Google BigQuery integration you would like to disable, and select the Disable Integration checkbox.
- Click Save.
Manually disable integration
The privileges required to run the cleanup script are the same as the Google Cloud IAM roles required to run the
bootstrap.sh
script.
- Click the App Settings icon, and then click the Integrations tab.
- Select the Google BigQuery integration you would like to disable, and click Download Scripts.
- Click Save. Wait until Immuta has finished saving your configuration changes before proceeding.
-
Before you run the script, update your permissions to execute it:
chmod 755 <path to downloaded script>
-
Run the cleanup script.
Next steps
- Create Google BigQuery data sources
- Build global subscription policies and data policies
- Create projects to securely collaborate on analytical workloads
-
Reach out to your Immuta account manager to get access to the Google BigQuery integration (PrPr) enabled. ↩
-
Only required if you create a service account and role by running the Immuta script. ↩
-
The rest of the objects created and managed by the integration are internal to the implementation of the integration and are not user-facing. Please reach out to your Immuta account manager if there are any questions regarding these objects from an InfoSec perspective. ↩
-
As opposed to scoped at the organization-level, to adhere to the principle of least privilege. ↩
-
Not required for customers using Immuta SaaS. ↩