Skip to main content
Version: Next

GraphQL Data Connector Deployment Guide

Production operating guide for the GraphQL data connector covering authentication, pagination, and operational tuning.

Authentication & Secrets​

Authentication is endpoint-specific. The connector supports bearer tokens, custom headers via graphql_auth_header, and HTTP Basic Auth:

ParameterDescription
graphql_auth_headerCustom authorization header name. The value of graphql_auth_token is sent as this header's value.
graphql_auth_tokenBearer token for GraphQL requests. Typically "${secrets:api_token}".
graphql_auth_userUsername for HTTP Basic Auth.
graphql_auth_passPassword for HTTP Basic Auth.
graphql_queryThe GraphQL query to execute.
json_pointerRFC-6901 JSON pointer to the row collection inside the response (e.g. /data/repository/issues/nodes).

Tokens must be sourced from a secret store in production.

TLS​

Use HTTPS endpoints in production. Self-signed certificates require a trusted CA bundle in the container / host OS trust store.

Resilience Controls​

Retry Behavior​

HTTP-level retries cover 408 (request timeout) and 5xx (server errors) plus transient network errors. 429 responses are handled proactively by the built-in rate limiter rather than retried. Retries use fibonacci backoff with a maximum of 5 attempts.

Pagination​

The connector supports cursor-based pagination. Each page is a separate HTTP request; pagination errors mid-sequence cause the entire refresh to fail. Use json_pointer to select the row collection and configure the pagination variables to match the upstream schema's cursor fields.

Server Rate Limits​

GraphQL APIs (GitHub, Shopify, etc.) typically enforce query-cost-based rate limits rather than request count. When a query returns a cost/rate-limit error, the connector surfaces it immediately. Reduce refresh frequency or narrow the query to stay within budget.

Capacity & Sizing​

  • Throughput: Bounded by the upstream rate limit, typical GraphQL endpoints cap at 100s-1000s of requests per minute.
  • Query cost: Design graphql_query to request only the fields you need. Request fewer nested fields to reduce query cost.
  • Pagination depth: Large datasets requiring hundreds of pages extend refresh duration linearly; plan refresh intervals accordingly.

Metrics​

When used as a dataset connector, GraphQL exposes per-origin HTTP rate-control metrics under the graphql component that can be enabled per-dataset:

Metric NameTypeDescription
inflight_operationsGaugeCurrent number of HTTP requests holding a rate-control permit.
rate_control_max_concurrent_requestsGaugeConfigured maximum concurrent HTTP requests for this upstream origin; 0 means disabled.
rate_control_requests_per_second_limitGaugeConfigured HTTP request-per-second limit for this upstream origin; 0 means disabled.
rate_control_requests_per_minute_limitGaugeConfigured HTTP request-per-minute limit for this upstream origin; 0 means disabled.
rate_control_jitter_min_msGaugeConfigured minimum rate-control jitter (ms) before HTTP requests.
rate_control_jitter_max_msGaugeConfigured maximum rate-control jitter (ms) before HTTP requests.
rate_control_available_permitsGaugeCurrent available permits in the HTTP request concurrency semaphore; 0 when concurrency is disabled.
rate_control_acquisitions_totalCounterTotal HTTP request rate-control permits acquired.
rate_control_acquire_errors_totalCounterTotal HTTP request rate-control permit acquisition errors.
rate_control_wait_duration_msCounterCumulative time (ms) spent waiting for HTTP rate-control permits, quotas, and jitter.
rate_limit_retry_after_updates_totalCounterTotal upstream cooldown hints accepted from Retry-After or RateLimit reset headers.
rate_limit_retry_after_waits_totalCounterTotal waits caused by Retry-After or RateLimit reset headers.
rate_limit_retry_after_wait_duration_msCounterCumulative time (ms) spent waiting because of Retry-After or RateLimit reset headers.
rate_limit_retry_after_remaining_msGaugeCurrent remaining Retry-After / RateLimit cooldown (ms) for this upstream origin.

Enable component metrics in the dataset's metrics section. See Component Metrics for general configuration.

For broader observability, also monitor:

  • Spice query execution metrics (query_duration_ms, query_processed_rows, query_failures_total) from runtime.metrics.
  • HTTP response status distribution via the shared resilient_http instrumentation.
  • The upstream GraphQL provider's rate-limit dashboards.

Task History​

GraphQL requests participate in task history through the HTTP client's span. Each page fetch is a child of the enclosing sql_query or accelerated_table_refresh task.

Known Limitations​

  • Read-only: Only GraphQL queries (not mutations or subscriptions) are supported.
  • Single query per dataset: Each dataset is one GraphQL query. Multi-query datasets require separate dataset definitions.
  • Schema inference: The connector infers schema from the first response; schemas with deeply-nested optional fields may require an explicit dataset schema override.
  • Batching: GraphQL query batching (multiple operations in one HTTP request) is not exposed.

Troubleshooting​

SymptomLikely causeResolution
401 UnauthorizedWrong or expired token in graphql_auth_token.Rotate the token; verify the header format (Bearer prefix, etc.).
Rows missing from the datasetWrong json_pointer.Inspect the response payload; JSON pointer must navigate to the array of rows.
Refresh fails mid-paginationRate-limit or transient network failure.Reduce refresh frequency; the connector will retry on retriable errors. Narrow the query.
Query cost exceededQuery requests too many nested fields.Simplify the query; fetch only required fields.
Inferred schema differs between refreshesOptional fields appear/disappear in responses.Provide an explicit dataset schema to lock down types.