Iceberg Catalog Connector
Connect to an Iceberg catalog provider and query Iceberg tables.
Configuration​
catalogs:
- from: iceberg:https://iceberg-catalog-host.com/v1/namespaces/my_catalog
name: ice # tables from this catalog will be available in the "ice" catalog in Spice
include:
- '*.my_table_name' # include only the "my_table_name" tables
params:
iceberg_token: ${secrets:iceberg_token} # Optional. Bearer token value to use for Authorization header.
iceberg_oauth2_credential: ${secrets:client_id}:${secrets:client_secret} # Optional. Credential to use for OAuth2 client credential flow when initializing the catalog. Separated by a colon as <client_id>:<client_secret>.
iceberg_oauth2_scope: catalog # Optional. Scope to use for OAuth2 client credential flow when initializing the catalog (default: catalog).
iceberg_oauth2_server_url: https://iceberg-catalog-host.com/oauth2/token # Optional. URL of the OAuth2 server tokens endpoint for the client credential flow.
iceberg_s3_endpoint: http://localhost:9000 # Optional. S3-compatible endpoint where the Iceberg tables are stored.
iceberg_s3_region: us-west-2 # Optional. Region of the S3-compatible endpoint.
iceberg_s3_access_key_id: ${secrets:aws_access_key_id} # Optional. Access key ID for the S3-compatible endpoint.
iceberg_s3_secret_access_key: ${secrets:aws_secret_access_key} # Optional. Secret access key for the S3-compatible endpoint.
iceberg_s3_session_token: ${secrets:aws_session_token} # Optional. Session token for the S3-compatible endpoint.
iceberg_s3_role_arn: arn:aws:iam::123456789012:role/my-role # Optional. ARN of the IAM role to assume when accessing the S3-compatible endpoint.
iceberg_s3_role_session_name: my-session # Optional. Session name to use when assuming the IAM role.
iceberg_s3_connect_timeout: 60 # Optional. Connection timeout for the S3-compatible endpoint (default: 60).
# AWS Glue Catalog
- from: iceberg:https://glue.us-east-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces
name: glue
params:
iceberg_sigv4_enabled: true
from
​
The from
field is used to specify the catalog provider. For Iceberg, use iceberg:<namespace_path>
. The namespace_path
is the URL to the Iceberg namespace in the catalog provider to load the tables from. It is formatted as http[s]://<iceberg_catalog_host>/v1/{prefix}/namespaces/<namespace_name>
.
For AWS Glue catalogs, the URL format is https://glue.<region>.amazonaws.com/iceberg/v1/catalogs/<account_id>/namespaces
, where <account_id>
is your AWS account ID.
The selected namespace must have sub-namespaces where the tables are stored.
Example: With this Iceberg catalog structure:
.
├── blockchain
│ └── eth
│ ├── blocks
│ └── transactions
├── spice
│ ├── tpch
│ │ ├── orders
│ │ └── customers
│ ├── info
│ └── extra
│ └── tpch_orders_metadata
└── unity
└── very
└── nested
└── namespace
└── foobar
A valid from
value would be iceberg:https://iceberg-catalog-host.com/v1/namespaces/spice
, and would load the following tables:
<name>.tpch.orders
<name>.tpch.customers
<name>.extra.tpch_orders_metadata
For loading a multi-part namespace, separate the namespace parts with the %1F
character. For example, /v1/namespaces/unity%1Fvery%1Fnested
would load the foobar
table from the unity/very/nested/namespace
namespace as <name>.namespace.foobar
.
name
​
The name
field is used to specify the name of the catalog in Spice. Tables from the Iceberg catalog will be available in the schema with this name in Spice. The schema hierarchy of the external catalog is preserved in Spice.
include
​
Use the include
field to specify which tables to include from the catalog. The include
field supports glob patterns to match multiple tables. For example, *.my_table_name
would include all tables with the name my_table_name
in the catalog from any schema. Multiple include
patterns are OR'ed together and can be specified to include multiple tables.
params
​
The following parameters are supported for configuring the connection to the Iceberg catalog, file, or S3 storage:
Parameter Name | Description |
---|---|
iceberg_token | Bearer token value to use for Authorization header. |
iceberg_oauth2_credential | Credential to use for OAuth2 client credential flow when initializing the catalog. Separated by a colon as <client_id>:<client_secret> . |
iceberg_oauth2_token_url | The URL to use for OAuth2 token endpoint. |
iceberg_oauth2_scope | The scope to use for OAuth2 token endpoint (default: catalog ). |
iceberg_oauth2_server_url | URL of the OAuth2 server tokens endpoint. |
iceberg_sigv4_enabled | Enable SigV4 authentication for the catalog (for connecting to AWS Glue). |
iceberg_signing_region | The region to use when signing the request for SigV4. Defaults to the region in the catalog URL if available. |
iceberg_signing_name | The name to use when signing the request for SigV4. Default: glue . |
iceberg_s3_endpoint | Configure an alternative endpoint for the S3 service. This can be any S3-compatible object storage service (e.g., Minio, R2). |
iceberg_s3_access_key_id | The AWS access key ID to use for S3 storage. |
iceberg_s3_secret_access_key | The AWS secret access key to use for S3 storage. |
iceberg_s3_session_token | Configure the static session token used for S3 storage. |
iceberg_s3_region | The AWS S3 region to use. |
iceberg_s3_role_session_name | An optional identifier for the assumed role session for auditing purposes. |
iceberg_s3_role_arn | The Amazon Resource Name (ARN) of the role to assume. If provided instead of iceberg_s3_access_key_id and iceberg_s3_secret_access_key, temporary credentials will be fetched by assuming this role. |
iceberg_s3_connect_timeout | Configure socket connection timeout, in seconds (default: 60 ). |
The Iceberg Catalog Connector supports both REST Catalog and Hadoop Catalog endpoints. Hadoop Catalog endpoints use file://
, s3://
, or s3a://
URLs to specify the warehouse path for the catalog.
Example using Hadoop Catalog with a local warehouse:
catalogs:
- from: iceberg:file:///tmp/hadoop_warehouse/
name: local_hadoop
Example using Hadoop Catalog with S3:
catalogs:
- from: iceberg:s3a://my-bucket/hadoop_warehouse/
name: s3_hadoop
Cookbook​
- A cookbook recipe to configure Iceberg as a catalog connector in Spice. Iceberg Catalog Connector