mirror of
https://github.com/MAKS-IT-COM/maksit-certs-ui.git
synced 2026-05-16 04:48:12 +02:00
3.8 KiB
3.8 KiB
High Availability Architecture
This document explains how HA works in MaksIT.CertsUI after moving mutable ACME coordination state to PostgreSQL.
Goals
- Run multiple
serverreplicas without ACME race conditions. - Keep HTTP-01 challenge tokens coherent across replicas.
- Ensure startup/bootstrap and renewal loops do not run in parallel on every pod.
- Expose health endpoints suitable for Kubernetes probes.
Runtime model
- Shared source of truth: PostgreSQL stores ACME challenge rows and runtime leases.
- Per-instance identity: each running server process gets one canonical
InstanceId(IRuntimeInstanceIdsingleton). - Lease holder: mutating ACME paths acquire a PostgreSQL lease row (
app_runtime_leases) with TTL. - Challenge reads:
/.well-known/acme-challenge/{token}reads token value from PostgreSQL and materializes a short-lived file in/acmefor compatibility. - Background coordination: bootstrap and renewal hosted services use named leases to avoid duplicate work.
Lease design
- Lease table key:
lease_name. - Lease owner:
holder_id(instance id). - Acquire semantics:
- insert new row if missing;
- steal only when expired;
- renew when current holder matches.
- Release semantics:
- delete only when
lease_nameandholder_idboth match.
- delete only when
This is implemented as an optimistic single-statement INSERT ... ON CONFLICT ... DO UPDATE ... WHERE ... flow in PostgreSQL.
HTTP-01 coherence design
NewOrderAsyncstores challenge tokens inacme_http_challengesviaUpsertAsync.- Challenge handler (
AcmeChallengeAsync) reads token value from DB, writes/acme/{token}, and returns the value. - Fallback: if DB row is missing, legacy on-disk token read remains available for migration compatibility.
- Cleanup: auto-renewal loop calls
DeleteOlderThanAsync(TimeSpan.FromDays(10)).
Kubernetes behavior
servercan run withreplicaCount >= 2when your storage/network setup allows it.- Server readiness and liveness probes are wired to:
GET /health/ready(DB roundtrip check),GET /health/live(process liveness).
- Helm now sets
POD_NAMEfrommetadata.namefor stable per-pod identity.
Current non-goals and boundaries
- Agent remains single-instance by design near edge proxy.
- Only HTTP-01 challenge type is supported currently.
- Optional split of ACME worker into a dedicated workload is not implemented yet.
Files involved
Core coordination contracts
src/MaksIT.CertsUI.Engine/RuntimeCoordination/IRuntimeInstanceId.cssrc/MaksIT.CertsUI.Engine/RuntimeCoordination/RuntimeLeaseNames.cssrc/MaksIT.CertsUI.Engine/Infrastructure/IRuntimeLeaseService.cssrc/MaksIT.CertsUI.Engine/Persistance/Services/IAcmeHttpChallengePersistenceService.cs
PostgreSQL implementation
src/MaksIT.CertsUI.Engine/Infrastructure/RuntimeLeaseServiceNpgsql.cssrc/MaksIT.CertsUI.Engine/Persistance/Services/Linq2Db/AcmeHttpChallengePersistenceServiceLinq2Db.cssrc/MaksIT.CertsUI.Engine/Data/CertsLinq2DbMapping.cssrc/MaksIT.CertsUI.Engine/FluentMigrations/20260425130000_AcmeChallengesAndRuntimeLeases.cssrc/MaksIT.CertsUI.Engine/Infrastructure/SchemaSyncService.cs
Runtime usage in app flows
src/MaksIT.CertsUI.Engine/DomainServices/CertsFlowDomainService.cssrc/MaksIT.CertsUI/HostedServices/InitializationHostedService.cssrc/MaksIT.CertsUI/HostedServices/AutoRenewal.cssrc/MaksIT.CertsUI/Infrastructure/RuntimeInstanceIdProvider.cssrc/MaksIT.CertsUI/Program.cssrc/MaksIT.CertsUI/Controllers/WellKnownController.cssrc/MaksIT.CertsUI/Services/CertsFlowService.cs
Helm and deployment wiring
src/helm/values.yamlsrc/helm/templates/deployments.yamlsrc/helm/templates/poddisruptionbudget.yaml
Tests
src/MaksIT.CertsUI.Tests/Services/CertsFlowServiceTests.cs