Skip to main content

Databricks and Apache Iceberg

dbt supports materializing Iceberg tables in Unity Catalog using the catalog integration, starting with the dbt-databricks 1.9.0 release, for two Databricks materializations:

Databricks Iceberg tables

Databricks is built on Delta Lake and stores data in the Delta table format.

Databricks supports two methods for creating Iceberg tables in its data catalog, Unity Catalog:

  • Creating Unity Catalog managed Iceberg tables. Databricks Runtime 16.4 LTS and later support this feature.
  • Enabling Iceberg reads on Delta tables. These tables still use the Delta file format, but generate both Delta and Iceberg-compatible metadata. Databricks Runtime 14.3 LTS and later support this feature.

External Iceberg compute engines can read from and write to these Iceberg tables using Unity Catalog's Iceberg REST API endpoint. However, Databricks only supports reading from external Iceberg catalogs.

To set up Databricks for reading and querying external tables, configure Lakehouse Federation and establish the catalog as a foreign catalog. Configure this outside of dbt. Once completed, it becomes another database you can query.

dbt does not yet support enabling Iceberg v3 on managed Iceberg tables.

Creating Iceberg tables

To configure dbt models to materialize as Iceberg tables, you can use a catalog integration with table_format: iceberg (see dbt Catalog integration configurations for databricks).

External tables

dbt also supports creating externally-managed Iceberg tables using the model configuration location_root. Databricks' DDL for creating tables requires a fully qualified location. dbt defines this parameter on the user's behalf to streamline usage and enforce basic isolation of table data:

  • When you set a location_root string, dbt generates a location string of the form: {{ location_root }}/{{ model_name }}. If you set the configuration option include_full_name_in_path to true, dbt generates a location string of the form {{ location_root }}/{{ database_name}}/{{ schema_name }}/{{ model_name }}.

dbt Catalog integration configurations for Databricks

Note

On Databricks, if a model has catalog_name=<> in its model config, the catalog name becomes the catalog part of the model's FQN. For example, if the catalog is named my_database, a model with catalog_name='my_database' is materialized as my_database.<schema>.<model>.

Configure catalog integration for Iceberg tables

  1. Create a catalogs.yml at the top level of your dbt project (at the same level as dbt_project.yml)

    An example of Unity Catalog as the catalog:

catalogs:
- name: unity_catalog
active_write_integration: unity_catalog_integration
write_integrations:
- name: unity_catalog_integration
table_format: iceberg
catalog_type: unity
file_format: delta
  1. Add the catalog_name config parameter in either a config block (inside the .sql model file), properties YAML file (model folder), or your project YAML file (dbt_project.yml).

An example of iceberg_model.sql:


{{
config(
materialized = 'table',
catalog_name = 'unity_catalog'

)
}}

select * from {{ ref('jaffle_shop_customers') }}

  1. Execute the dbt model with a dbt run -s iceberg_model.

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

0
Loading