DNSZone Operator Consolidation Migration Troubleshooting¶
This guide helps troubleshoot issues during and after the DNSZone operator consolidation migration (Phases 1-8, January 2026).
Pre-Migration Checklist¶
Before upgrading, verify:
-
Export existing resources:
-
Check Bindy operator version:
-
Verify all zones are healthy before upgrade:
Common Migration Issues¶
Issue 1: DNSZone Status Fields Missing After Upgrade¶
Symptoms:
- status.syncStatus[] is empty or missing
- status.syncedInstancesCount is null
- status.totalInstancesCount is null
Root Cause:
These fields were removed as part of the consolidation. The new architecture uses only status.instances[].
Resolution: 1. Verify the new operator is running:
kubectl get pods -n dns-system -l app=bindy-operator
kubectl logs -n dns-system -l app=bindy-operator --tail=100
-
Check
status.instances[]field instead: -
If
status.instances[]is empty, trigger a reconciliation:
Expected Status:
status:
instances:
- apiVersion: bindy.firestoned.io/v1beta1
kind: Bind9Instance
name: primary-dns-0
namespace: dns-system
status: Configured
lastReconciledAt: "2026-01-06T10:00:00Z"
Issue 2: Bind9Instance Missing selectedZones Field¶
Symptoms:
- Bind9Instance.status.selectedZones[] is empty or missing
- Monitoring dashboards showing zero zones per instance
Root Cause:
The selectedZones reverse reference was removed. This field created circular dependencies and is no longer maintained.
Resolution: 1. Update monitoring queries to use DNSZone status instead:
# OLD (broken):
kubectl get bind9instance primary-dns-0 -o jsonpath='{.status.selectedZones}'
# NEW (correct):
kubectl get dnszones -A -o json | jq -r '.items[] | select(.status.instances[]?.name == "primary-dns-0") | .metadata.name'
- Update dashboards to query DNSZone resources for instance relationships
Migration Note: This is an intentional breaking change. The DNSZone operator now owns the instance-zone relationship.
Issue 3: Zones Not Synchronizing to Instances¶
Symptoms:
- status.instances[].status is Claimed or Failed instead of Configured
- Zones missing from BIND9 instance configuration
- Ready condition is False
Diagnosis:
-
Check DNSZone status:
-
Look for Failed instances:
-
Check error messages:
-
Check operator logs:
Common Root Causes:
A. Bindcar API Unavailable¶
Error Message: "HTTP 500: bindcar API unavailable"
Resolution: 1. Verify Bind9Instance pod is running:
-
Check bindcar container logs:
-
Verify bindcar service is accessible:
B. Instance Not Selected (Wrong Labels/ClusterRef)¶
Error: Instance remains in Claimed state indefinitely
Resolution: 1. Check if instance matches the DNSZone selector:
# Check DNSZone selectors
kubectl get dnszone <zone-name> -n <namespace> -o yaml | grep -A10 "bind9InstancesFrom:"
# Check instance labels
kubectl get bind9instance <instance-name> -n <namespace> -o jsonpath='{.metadata.labels}' | jq .
-
Verify clusterRef if used:
-
If labels/clusterRef don't match, instance won't be selected - this is correct behavior
C. Operator RBAC Permissions Missing¶
Error: "Forbidden: User system:serviceaccount:dns-system:bindy-operator cannot update resource..."
Resolution: 1. Verify RBAC is deployed:
kubectl get clusterrole bindy-operator
kubectl get clusterrolebinding bindy-operator
kubectl get serviceaccount -n dns-system bindy-operator
- Redeploy RBAC if needed:
Issue 4: DNSZone Ready Condition is False¶
Symptoms:
- status.conditions[?(@.type=='Ready')].status is "False"
- Some instances show Configured, others show Failed
Diagnosis:
-
Check Ready condition details:
-
Count configured vs total instances:
Resolution:
The Ready condition is True ONLY when ALL instances are in Configured status. If any instance is Failed or Claimed, Ready will be False.
- Identify failed instances and check their error messages (see Issue 3)
- Fix the root cause for each failed instance
- Operator will automatically retry and update status
Issue 5: Duplicate Instances in Status¶
Symptoms:
- Same instance appears multiple times in status.instances[]
- Instance count is higher than expected
Root Cause:
Instance matches BOTH clusterRef AND bind9InstancesFrom selectors, but deduplication failed.
Diagnosis:
kubectl get dnszone <zone-name> -n <namespace> -o json | jq '.status.instances | group_by(.name) | map(select(length > 1))'
Resolution: This should not happen (deduplication is automatic), but if it does:
- Check operator version (bug may be fixed in newer version)
-
Force reconciliation:
-
If issue persists, file a bug report with operator logs
Issue 6: Old ZoneSync Operator Still Running¶
Symptoms:
- Two operators reconciling the same DNSZone
- Conflicting status updates
- status.syncStatus[] is being updated (should not exist)
Diagnosis:
# Check for multiple bindy operator pods
kubectl get pods -n dns-system -l app=bindy-operator
# Check operator version
kubectl get deployment -n dns-system bindy-operator -o jsonpath='{.spec.template.spec.containers[0].image}'
Resolution:
-
Verify correct image version:
-
Force rollout:
-
Verify old pods are terminated:
Rollback Procedure¶
If migration fails and you need to rollback:
-
Restore previous operator version:
-
Restore old CRDs (if CRD update was applied):
-
Restore DNSZone resources from backup:
-
Verify zones are working:
Post-Migration Validation¶
After successful migration, verify:
-
All zones show Ready=True:
(Output should be empty) -
All instances are Configured:
(All counts should be 0) -
No legacy status fields exist:
(Output should be empty) -
Verify zone queries work:
Getting Help¶
If issues persist:
-
Collect diagnostic information:
-
File a GitHub issue with:
- Migration step where failure occurred
- Error messages from operator logs
- DNSZone YAML showing problematic status
-
Bind9Instance YAML for affected instances
-
Check known issues:
- GitHub Issues
- CHANGELOG.md
- DNSZone Consolidation Roadmap