Building a highly available DNS using NixOS and CoreDNS
Why?
Why not. When George Mallory was asked why he climbed Mount Everest he said "because it is there". DNS is there. Ever looming. Ever the cause of issues. I'd like to finally be at least 50% certain it's not DNS, probably.
You've got my attention, what do I need?
- At least two machines, preferably in separate failure zones
- A BGP capable router
- Knowledge of NixOS
- Preferably a separate subnet/vlan with at least 4 free IPs
- A deep-seated appreciation for NixOS
- DNS issues
The stack and architecture
For this project I (obviously) went with NixOS to manage multiple nodes simultaneously and ensure consistent state between them. The nodes run CoreDNS because it's simple, lightweight, and rock solid. For BGP announcements FRR is a solid choice.
Things I had to solve
Suboptimal domain partitioning
I own and use the cows.homes domain for my internal network. But I designated the root domain .cows.homes. and *.cows.homes. to K8S hosted services and I'm using *.<vlan_name>.cows.homes. for devices on my network.
I have an authoritative dns deployed to my K8S cluster to manage the root domain and subdomains specific to the cluster services. My UniFi Gateway is also my DHCP server and is aware of the device FQDNs.
That leaves the question: How do I forward K8S cluster NS requests to the NS servers in the cluster, keep resolution of network devices working AND avoid loops in the DNS resolution?
The compromise
I ultimately settled on forwarding .cows.homes. to my K8S cluster's DNS. But to keep resolution of internal device FQDNs I decided to periodically scrape my Gateway's API for IPs and FQDNs.
Sounds simple enough right?
Excursion 1: The frustrating reality of using UniFi devices :(
I went to the official api documentation and immediately saw GET /v1/sites/{siteId}/clients sounds perfect, doesn't it? So I try it on my local network...
http get
-k
--headers {X-API-Key: $env.API_KEY}
https://($env.IP)/proxy/network/integration/v1/sites/($env.SITE_ID)/clients
| get data
| first╭────────────────┬──────────────────────────────────────╮
│ type │ WIRED │
│ id │ b94f34b2-917d-3ddf-9e7f-6701ad2c7357 │
│ name │ argon dd:04 │
│ connectedAt │ 2026-03-28T15:45:58Z │
│ ipAddress │ 10.30.0.153 │
│ macAddress │ bc:24:11:e3:dd:04 │
│ uplinkDeviceId │ 17662d09-0bde-3ad3-9d78-df4053f45a45 │
│ │ ╭──────┬─────────╮ │
│ access │ │ type │ DEFAULT │ │
│ │ ╰──────┴─────────╯ │
╰────────────────┴──────────────────────────────────────╯I can't get a clean hostname and I'd have to rely on the IP to designate the proper FQDN. I also have 0 access to any ipv6 information.
That's... suboptimal.
But there's a saving grace: The unofficial legacy API, specifically the GET proxy/network/s/default/stat/alluser endpoint.
let auth = { username: $env.UNIFI_USER, password: $env.UNIFI_PASSWORD }
let login_full = ($auth |
http post
--full
--insecure
--content-type application/json
https://($env.IP)/api/auth/login
)
let cookie = ($login_full.headers.response
| where name == "set-cookie"
| get value.0
| split row ";"
| get 0
)
let csrf = ($login_full.headers.response
| where name == "x-csrf-token"
| get value.0
)
http get
-k
--headers { "Cookie": $cookie, "X-CSRF-Token": $csrf }
https://($env.IP)/proxy/network/api/s/default/stat/alluser
| get data
| where hostname? != null
| where ($it.use_fixedip? == true)
| select last_ip last_ipv6? hostname
| first╭───────────┬────────────────────────╮
│ last_ip │ 10.30.0.153 │
│ │ ╭───┬────────────────╮ │
│ last_ipv6 │ │ 0 │ fd::10:30:0:dd │ │
│ │ ╰───┴────────────────╯ │
│ hostname │ argon │
╰───────────┴────────────────────────╯That's complicated but I can work with that! With some prioritization of existing FQDNs, VLAN information, and culling of old entries I can generate a pretty good hosts file using a single, not overly complicated nushell script.
Excursion 2: I have a hosts file already, what if...
I just add blocklists to it? That shouldn't be too hard. And it isn't. I can just use a similar nushell script to scrape some blocklist sources and write them to a hosts file. For this to work properly I'm having the lease scraper write to /var/lib/coredns/router.hosts and the blocklist scraper to /var/lib/coredns/blocklist.hosts.
Whenever any of the two scripts is executed it updates its own file and then merges both files into a coredns.hosts which is then used by coredns via the hosts plugin.
Hot take, I mean Excursion 3: Systemd
Now the only thing that might be obvious is that these scripts don't loop. They just write the file and exit. To solve this we can just set up these files as services and start timers to run them every-so-often.
For instance, the blocklist updater runs once a day, roughly.
Now back to the nix of it all
With all the information I've gathered so far, I can start creating the dns nodes.
I use my own nix-scaffold flake module to help keep clutter out of my nix configuration and allow me to focus on just having implementation in my repo. For the DNS I created a new
templatethat will be shared by both nodes. Think of templates as very dumb nixos modules, you can allocate one per machine and it semantically brings everything that a DNS, Workstation, or Server might need. Kind of like a profile, or a kind of "type".
The template for the DNS nodes has around three distinct areas:
- Static files that are necessary, like the scripts and static hosts
- CoreDNS configuration
- FRR configuration
Static files
I use a static.hosts file that maps nameserver IPs to hostnames, I also define the hostname for my K8S Cluster's VIP here. This file is just prepended to the coredns.hosts file generated by the scripts from the excursions.
I attempted to allow Windows to automatically detect that my DNS is DoH compatible via Discovery of Designated Resolvers, I'm not sure if I configured the zone file wrong or if Windows is weird, but it ultimately didn't work.
If anyone has an idea why this doesn't work, please contact me and let me know :)
CoreDNS
The coredns configuration is relatively straight forward.
This configuration enables DoT, DoH, DoQ, and generic DNS via port 53/tcp & 53/udp, then sets up health checks (important for FRR later), prometheus endpoints and includes the forwards we went over in things I had to solve.
You may notice that I use Cloudflare DoT as my upstream resolver. I'm not sure if that's a good idea yet, the latency is definitely noticeable. I'll maybe change it, maybe not. I'm also omitting the acme configuration here, since it's fairly standard and simple.
FRR
Now the magic sauce: ✨✨ BGP ✨✨. The point of using BGP is twofold:
- I want to keep the logical IP(s) for the nameserver(s) separate from the actual hosts
- I want to be able to reboot or even pull the plug on one host without (much of a) disruption
To accomplish this the nodes announce to my router very tight BGP timings (3 second keepalive, 9 second timeout):
and I created a job that continuously monitors the aforementioned coredns health endpoint and withdraws the announced routes within a second should coredns have issues on a node.
It might seem redundant to announce two IPs per address family, but this is to ensure that no "bad" secondary dns is configured by whatever OS you're using, which might otherwise lead to issues with resolving internal domains.
Conclusion time
I have had this setup running like this for a few weeks now and did some fault tolerance testing. I have restarted nodes freely, killed vms abruptly. Pulled the plug on the other host. All without even noticing it.
I feel confident that I will continue using this or a similar architecture for my DNS for the foreseeable future, and will probably expand this concept to other core infrastructure services where it might seem useful :)