Homelab Overhaul Part 3: Troubleshooting & Fixes

No migration survives first contact unchanged. Despite careful planning during the Traefik migration and DNS updates (covered in Parts 1 and 2), a redirect loop appeared the day after the migration. This post details the troubleshooting process and the multi-part fix required.

The Problem: Redirect Loop Detected

The morning after completing the DNS migration, I attempted to access adguardhome.internal.domain and was greeted with the dreaded "too many redirects" error. The browser was caught in an infinite redirect loop, bouncing between URLs without ever loading the service.

                Initial Symptoms:
                Browser error: "ERR_TOO_MANY_REDIRECTS"
Service appears unreachable despite being online
Other services on the same Traefik instance work fine

            

This was frustrating because the service had been working perfectly before the migration, and all my testing the previous day had shown everything functional. Something about the new setup wasn't quite right.

Diagnosis: Following the Redirect Chain

To understand what was happening, I needed to see the actual redirect chain. Using curl with the -I flag (headers only) and --resolve to bypass DNS caching, I could trace exactly what was happening:

                curl -I --resolve adguardhome.internal.domain:443:192.168.50.21 \
    https://adguardhome.internal.domain
            

The output revealed the problem: Traefik was redirecting the request, but the redirect was pointing back to itself, creating an infinite loop. The issue was that adguardhome.internal.domain didn't have a proper route defined in Traefik's configuration.

Understanding the Root Cause

During the original DNS setup months ago, I had created records for both adguardhome.internal.domain and adguard.internal.domain pointing to the same service. This redundancy seemed harmless at the time, but it became problematic in the new architecture.

The new Traefik configuration only had a route defined for adguard.internal.domain, not adguardhome.internal.domain. When requests came in for the latter, Traefik didn't know what to do with them, leading to the redirect loop.

The Solution: A Three-Part Fix

Resolving this required coordinated changes across both DNS and Traefik configuration:

Part 1: DNS Restructuring

Instead of having two separate A records pointing to the same IP, I restructured the DNS to use a CNAME:

Kept adguard.internal.domain as an A record pointing to 192.168.50.21
Changed adguardhome.internal.domain from an A record to a CNAME pointing to adguard.internal.domain

This approach is cleaner because it establishes adguard.internal.domain as the canonical name, with adguardhome as an alias. Any future IP changes only require updating one record.

Part 2: Traefik Middleware

Even with the CNAME in place, I wanted to be explicit about the redirect behavior. I added a redirect middleware to Traefik's dynamic.yml:

                http:
  middlewares:
    adguardhome-redirect:
      redirectRegex:
        regex: "^https://adguardhome\\.internal\\.domain/(.*)"
        replacement: "https://adguard.internal.domain/$${1}"
        permanent: true
            

This middleware uses a regex to match any request to adguardhome.internal.domain and issues a permanent (301) redirect to adguard.internal.domain, preserving any path or query parameters.

Part 3: Secure Transport Configuration

AdGuard Home runs its own web interface with HTTPS on port 443 of the media server (192.168.50.20). However, it uses a self-signed certificate, which Traefik would normally reject.

To allow Traefik to proxy requests to AdGuard's HTTPS endpoint without certificate validation, I configured an insecureTransport:

                http:
  routers:
    adguard:
      rule: "Host(`adguard.internal.domain`)"
      service: adguard-service
      tls: {}
  
  services:
    adguard-service:
      loadBalancer:
        servers:
          - url: "https://192.168.50.20:443"
        serversTransport: insecureTransport
  
  serversTransports:
    insecureTransport:
      insecureSkipVerify: true
            

                Security Note: Using insecureSkipVerify is generally not recommended for production environments. However, in a homelab context where we control both endpoints and the traffic never leaves our internal network, this is an acceptable compromise for usability.
            

Applying and Testing the Fix

With all three parts of the configuration in place, I reloaded Traefik's configuration. Modern versions of Traefik watch the dynamic.yml file for changes and reload automatically, but I restarted the container to be certain:

sudo docker restart traefik

Then came the validation. I used curl to verify each part of the fix:

1. Verify the Redirect

                curl -I --resolve adguardhome.internal.domain:443:192.168.50.21 \
    https://adguardhome.internal.domain

# Expected: 301 redirect to adguard.internal.domain
            

2. Verify the Target Service

                curl -I --resolve adguard.internal.domain:443:192.168.50.21 \
    https://adguard.internal.domain

# Expected: 200 OK with AdGuard Home headers
            

3. Check Traefik's Router Status

                curl http://192.168.50.21:8080/api/http/routers | jq '.[] | 
    select(.name | contains("adguard"))'

# Verify the router is loaded and active
            

All three checks passed. The redirect was working, the target service was accessible, and Traefik showed the router as active. The fix was successful.

Why This Approach Works

This solution addresses the problem at multiple layers:

DNS Layer: Using a CNAME establishes a clear hierarchy and reduces configuration duplication
Application Layer: The Traefik middleware provides explicit redirect behavior that's easy to understand and maintain
Transport Layer: The insecure transport configuration allows proxying to self-signed certificates without warnings

Each layer reinforces the others, creating a robust configuration that's both functional and maintainable.

Lessons for Future Troubleshooting

This troubleshooting experience reinforced several valuable practices:

1. Use Command-Line Tools for Diagnosis

Browser caching and DNS caching can hide problems or show stale results. Using curl with --resolve bypasses these caches and shows exactly what's happening at the protocol level.

2. Layer Your Solutions

Rather than trying to solve everything at one layer, think about how different layers (DNS, application, transport) can work together. This multi-layered approach provides redundancy and makes troubleshooting easier.

3. Document Configuration Decisions

Using a file-based configuration (dynamic.yml) naturally creates documentation. Each middleware and router has a clear purpose that's evident from the configuration itself.

4. Test Incrementally

Rather than making all three changes at once and hoping for the best, I applied them incrementally and tested after each step. This made it clear which change fixed which part of the problem.

5. Understand Your Edge Cases

Services with self-signed certificates or non-standard configurations will require special handling. Documenting these edge cases in your configuration makes future troubleshooting much faster.

Final Thoughts

What started as a simple port conflict (Part 1) evolved into a major infrastructure migration (Part 2) and required careful troubleshooting to resolve edge cases (Part 3). The entire process took two days but resulted in a more stable, better-documented, and more maintainable homelab infrastructure.

                Key Takeaway: Complex migrations rarely go perfectly on the first try. Budget time for troubleshooting, document your fixes, and use each problem as an opportunity to improve your understanding of the system.
            

The homelab, as always, continues to evolve.

Tools & Workflow

Troubleshooting the redirect loop demonstrated how .md-based context enables faster problem resolution:

Gemini CLI & CodeX: Executed diagnostic curl commands, analyzed Traefik logs, modified configuration files, and validated fixes across services. The AI tools referenced recent operations in GEMINI.md to understand what had changed.
Context Continuity: Because GEMINI.md logged the previous day's migration work, when I returned to troubleshoot, the AI tools immediately understood the current architecture, recent changes, and validation approaches used. No need to re-explain the entire setup.
Claude: Solely responsible for this blog series and website—transforming the technical work into readable documentation.

The .md file approach shines during multi-day projects: context persists between sessions, successful commands are preserved for reuse, and the file itself becomes valuable documentation for future maintenance. It's like having a technical journal that AI tools can read and contribute to.