Complete realtime architecture for a Kahoot-scale quiz platform — WebSockets, Redis pub/sub, distributed game state, anti-cheat, and horizontal scaling.
Socket.io with automatic fallback to long-polling. Binary message packing via msgpack for 40% smaller payloads. HTTP/3 for initial handshake.
All hot game state lives in Redis Cluster with TTL. PostgreSQL for durable history. Redis Streams for event sourcing. No single point of failure.
Stateless Node.js pods behind HAProxy. Redis Adapter for Socket.io cross-pod pub/sub. K8s HPA triggers at 60% CPU. Zero-downtime rolling deploys.
End-to-end request flow from client to database layer with all intermediate services.
Complete bidirectional event contract between client and server. All events use msgpack binary encoding.
| Event | Direction | Payload | Description |
|---|---|---|---|
| connection | C→S | {token, clientVersion, region} | Initial WS handshake. Server validates JWT, sets up socket metadata. |
| session:ack | S→C | {socketId, serverTime, pingInterval} | Server confirms session. Client syncs clock using serverTime offset. |
| reconnect:attempt | C→S | {gameCode, playerId, sessionToken} | Player reconnects mid-game. Server restores state from Redis. |
| reconnect:state | S→C | {gameState, currentQ, timeRemaining, score} | Full game state snapshot sent to reconnecting player. |
| ping | C→S | {t: Date.now()} | Heartbeat every 5s. Used for RTT measurement and connection health. |
| pong | S→C | {t, serverTime} | Echo with server timestamp. Client calculates clock offset = (serverTime - t) / 2. |
| Event | Direction | Payload | Description |
|---|---|---|---|
| room:create | C→S | {quizId, settings: {timePerQ, maxPlayers}} | Host creates a game room. Server generates a 6-char PIN, initializes Redis state. |
| room:created | S→C | {gameCode, roomId, shareUrl} | Room creation confirmed. Host displays PIN to players. |
| room:join | C→S | {gameCode, playerName, avatarId} | Player joins via PIN. Server validates code, adds to Redis set, emits join broadcast. |
| room:join:ack | S→C | {playerId, gameCode, players[], quizMeta} | Sent only to joining player with full lobby snapshot. |
| room:player:joined | ⟳ Broadcast | {player: {id, name, avatar}, totalCount} | Broadcast to all room members when a new player joins lobby. |
| room:player:left | ⟳ Broadcast | {playerId, reason: 'disconnect'|'kick'} | Broadcast when player disconnects. Host can see live roster. |
| room:kick | C→S (host) | {targetPlayerId} | Host-only. Removes player from room. Validates host role server-side. |
| Event | Direction | Payload | Description |
|---|---|---|---|
| game:start | C→S (host) | {gameCode} | Host triggers game start. Server locks room (no new joins), begins countdown. |
| game:countdown | ⟳ Broadcast | {startsAt: epochMs, count: 3} | Countdown broadcast with absolute server timestamp. Clients sync to serverTime. |
| game:question:show | ⟳ Broadcast | {qIndex, question, answers[], endsAt: epochMs} | Question payload. Correct answer NOT included. endsAt is absolute server time. |
| game:answer:submit | C→S | {gameCode, qIndex, answerIdx, clientTime, answerHash} | Player submits answer. Server validates timing, qIndex match, and HMAC hash. |
| game:answer:ack | S→C | {received: true, serverTime} | Immediate acknowledgment. Does NOT reveal correct answer yet. |
| game:answer:progress | ⟳ Broadcast | {answeredCount, totalCount, distribution: [n,n,n,n]} | Throttled to 1/s. Shows how many answered without revealing which option. |
| game:question:reveal | ⟳ Broadcast | {correctIndex, pointsMap: {playerId: pts}, explanationText} | Reveal broadcast after timer expires or all players answer. Includes earned points. |
| game:leaderboard | ⟳ Broadcast | {rankings: [{id,name,score,delta,streak}], yourRank} | Sorted leaderboard snapshot after each question. Redis ZREVRANGE O(log N). |
| game:end | ⟳ Broadcast | {finalRankings[], xpGained:{}, badges:{}, gameId} | Final results. Server persists to PostgreSQL, clears Redis room after TTL. |
| Event | Direction | Payload | Description |
|---|---|---|---|
| host:pause | C→S (host) | {gameCode} | Pause timer. Server stores remaining time in Redis. Broadcast pause state. |
| host:resume | C→S (host) | {gameCode} | Resume with new endsAt = now + remainingMs. Clients re-sync timer. |
| host:skip:question | C→S (host) | {gameCode, reason} | Skip to reveal phase immediately. Server scores based on current answers. |
| host:extend:time | C→S (host) | {gameCode, addSeconds: 10} | Add time to current question. Server updates endsAt, broadcasts new value. |
Room initialised in Redis. PIN generated. Players joining get HSET added. Socket.io room created.
Host fires game:start. Room locked — no new joins. 3-2-1 countdown broadcast with absolute epoch timestamps.
Question loop running. Answers streamed to Redis. Timer tracked server-side. Anti-cheat validating each submission.
Timer expired or all answered. Correct answer broadcast, scores calculated, leaderboard updated in Redis ZSET.
Final results persisted to PostgreSQL. Redis TTL set to 300s for reconnect window, then purged. XP awarded.
# Room metadata (Hash) HSET room:{gameCode} hostId "uid_abc123" quizId "quiz_789" state "ACTIVE" # LOBBY|STARTING|ACTIVE|REVEAL|ENDED currentQ 3 totalQ 10 questionEndsAt 1703123456789 # epoch ms pausedAt "" podId "pod-7f3k" # owning pod EXPIRE room:{gameCode} 7200 # Players in room (Set) SADD room:{gameCode}:players uid_p1 uid_p2 ... # Per-player metadata (Hash) HSET room:{gameCode}:player:{uid} name "Blaze99" avatar "🦊" score 4250 streak 3 connected 1 socketId "sck_xyz" # Live leaderboard (Sorted Set — O(log N) updates) ZADD room:{gameCode}:lb 4250 uid_p1 ZADD room:{gameCode}:lb 3800 uid_p2 # Answers for current question (Hash) HSET room:{gameCode}:q:{idx}:answers uid_p1 "2:1703123442100" # answerIdx:submittedAt uid_p2 "0:1703123443200" # Answer distribution (for progress bar) INCR room:{gameCode}:q:{idx}:dist:2 # answer index 2 # Anti-cheat: seen hashes (prevent replay) SETEX ac:{gameCode}:{uid}:{qIdx} 30 "hash_xyz"
// Client-side clock sync using ping/pong class ClockSync { private offset = 0; // ms delta to server private rtt = 0; // round-trip time sync(socket: Socket) { const t0 = Date.now(); socket.emit('ping', { t: t0 }); socket.once('pong', ({ t, serverTime }) => { const t3 = Date.now(); this.rtt = t3 - t0; this.offset = serverTime - (t0 + t3) / 2; }); } // Convert server timestamp → local display time serverToLocal(serverMs: number): number { return serverMs - this.offset; } // Get remaining ms until server deadline msUntil(endsAt: number): number { const serverNow = Date.now() + this.offset; return Math.max(0, endsAt - serverNow); } } // Usage — question timer driven by server epoch socket.on('game:question:show', ({ endsAt }) => { const remaining = clock.msUntil(endsAt); // All clients see identical countdown regardless of join latency timerBar.start(remaining); });
// Server is single source of truth for time async function startQuestion( gameCode: string, qIdx: number ) { const room = await redis.hgetall(`room:${gameCode}`); const quiz = await getQuiz(room.quizId); const q = quiz.questions[qIdx]; const endsAt = Date.now() + q.timeMs; // absolute epoch // Store deadline in Redis for reconnect recovery await redis.hset(`room:${gameCode}`, { state: 'ACTIVE', currentQ: qIdx, questionEndsAt: endsAt }); // Broadcast WITHOUT answer — clients can't cheat io.to(gameCode).emit('game:question:show', { qIndex: qIdx, question: q.text, answers: q.options, // shuffled, no correct flag endsAt, // epoch ms — clients sync to this totalQuestions: quiz.questions.length }); // Server-side timer — authoritative end await scheduleReveal(gameCode, qIdx, q.timeMs); } // Use Bull queue for reliable timer (survives pod crash) async function scheduleReveal( gameCode: string, qIdx: number, delayMs: number ) { await revealQueue.add( { gameCode, qIdx }, { delay: delayMs, attempts: 3 } ); }
// Answer submission handler — full validation pipeline socket.on('game:answer:submit', async ({ gameCode, qIndex, answerIdx, clientTime, answerHash }) => { const playerId = socket.data.playerId; // set from JWT on connect // 1. Validate room state const room = await redis.hgetall(`room:${gameCode}`); if (room.state !== 'ACTIVE' || +room.currentQ !== qIndex) return; // 2. Anti-cheat: check not already answered this question const alreadyAnswered = await redis.hexists( `room:${gameCode}:q:${qIndex}:answers`, playerId ); if (alreadyAnswered) return; // 3. Validate server-side deadline (client can't fake timing) const serverNow = Date.now(); if (serverNow > +room.questionEndsAt + 500) return; // 500ms grace // 4. Validate HMAC hash (anti-replay / packet tampering) const expected = hmac(`${gameCode}:${playerId}:${qIndex}:${answerIdx}`); if (answerHash !== expected) return; // 5. Atomic store — HSETNX prevents race conditions const stored = await redis.hsetnx( `room:${gameCode}:q:${qIndex}:answers`, playerId, `${answerIdx}:${serverNow}` // store server time, not client time ); if (!stored) return; // race condition — already set // 6. Increment distribution counter for progress bar await Promise.all([ redis.incr(`room:${gameCode}:q:${qIndex}:dist:${answerIdx}`), redis.incr(`room:${gameCode}:q:${qIndex}:totalAnswered`), ]); // 7. Ack immediately (before reveal) socket.emit('game:answer:ack', { received: true, serverTime }); // 8. Check if all players answered → early reveal const [answered, total] = await Promise.all([ redis.get(`room:${gameCode}:q:${qIndex}:totalAnswered`), redis.scard(`room:${gameCode}:players`), ]); if (+answered >= total) triggerEarlyReveal(gameCode, qIndex); });
Correct answers are never sent to the client before reveal. Score calculation only happens server-side. Client-reported scores are ignored entirely.
Each answer submission includes an HMAC-SHA256 signature of gameCode:playerId:qIdx:answerIdx using a session secret. Prevents packet replay attacks.
Server records submission time using its own clock. Client-sent timestamps are ignored for scoring. 500ms grace period for network jitter. No client can fake speed.
Redis HSETNX ensures exactly-once semantics. First write wins atomically. Subsequent submissions are silently dropped — no double-answering possible.
Redis sliding window rate limiter: max 2 answer events per question per player. Bot detection via submission pattern analysis. Auto-kick after 5 violations.
Server verifies submitted qIndex matches current question. Players can't pre-answer future questions or re-answer past ones.
interface ScoringConfig { basePoints: number; // 1000 maxTimeBonus: number; // 500 — for answering instantly streakMultiplier:number; // 1.5x for 3+ streak wrongPenalty: number; // 0 (no negative) or -50 for hard mode } async function calculateScore( gameCode: string, playerId: string, qIdx: number, correctIdx: number, config: ScoringConfig ): Promise<number> { // Retrieve player's answer (stored with server timestamp) const raw = await redis.hget( `room:${gameCode}:q:${qIdx}:answers`, playerId ); if (!raw) return 0; // didn't answer const [answerIdx, submittedAt] = raw.split(':'); if (+answerIdx !== correctIdx) return config.wrongPenalty; // Time bonus: full points for instant, zero at deadline const room = await redis.hgetall(`room:${gameCode}`); const qDuration = await getQuestionDuration(room.quizId, qIdx); const elapsed = +submittedAt - (+room.questionEndsAt - qDuration); const timeFrac = Math.max(0, 1 - elapsed / qDuration); const timeBonus = Math.floor(config.maxTimeBonus * timeFrac); // Streak bonus from player metadata const streak = await redis.hget( `room:${gameCode}:player:${playerId}`, 'streak' ); const mult = +streak >= 3 ? config.streakMultiplier : 1; return Math.floor((config.basePoints + timeBonus) * mult); }
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: quizblaze-ws spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: quizblaze-ws minReplicas: 3 maxReplicas: 50 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 - type: Pods pods: metric: name: ws_connections_per_pod target: type: AverageValue averageValue: "2800" # scale before 3K behavior: scaleUp: stabilizationWindowSeconds: 30 policies: - type: Pods value: 5 # add 5 pods at once periodSeconds: 60 scaleDown: stabilizationWindowSeconds: 300 # slow drain
import { createAdapter } from '@socket.io/redis-adapter'; import { createClient } from 'redis'; // Two Redis clients — one pub, one sub const pubClient = createClient({ url: process.env.REDIS_URL }); const subClient = pubClient.duplicate(); await Promise.all([ pubClient.connect(), subClient.connect() ]); io.adapter(createAdapter(pubClient, subClient)); // Now io.to(roomId).emit() works across ALL pods // Pod-1 can broadcast to players on Pod-7, Pod-23, etc. // Graceful shutdown — drain connections first process.on('SIGTERM', async () => { // Stop accepting new connections io.close(); // Wait for in-flight events to complete await sleep(5000); // Clients auto-reconnect to another pod process.exit(0); });
| Scale Tier | Concurrent Players | WS Pods | Redis | PostgreSQL | Estimated Cost/mo |
|---|---|---|---|---|---|
| Starter | 1K | 2 pods (2vCPU/4GB) | Single node 4GB | Single db.t3.medium | ~$180 |
| Growth | 10K | 5 pods (4vCPU/8GB) | Cluster 3×6GB | Primary + 1 read replica | ~$850 |
| Scale | 100K | 34 pods (4vCPU/8GB) | Cluster 6×16GB | Primary + 2 replicas + RDS Proxy | ~$6,400 |
| Kahoot-level | 10M+ | Multi-region, 500+ pods | Enterprise cluster + ElastiCache | Aurora Global + Citus sharding | ~$80K+ |
// Hash tags force related keys to same shard // All keys for a room route to same Redis node via {gameCode} room:{ABC123} room:{ABC123}:players room:{ABC123}:lb room:{ABC123}:q:3:answers room:{ABC123}:player:uid_xyz // This means all room operations are local — no cross-shard transactions // ZADD, HSET, HSETNX, INCR all execute on same shard = fast + atomic // Leaderboard sorted set — O(log N) insertion, O(N) range query ZADD room:{ABC123}:lb NX 0 playerId // init score 0 ZINCRBY room:{ABC123}:lb 850 playerId // add points atomically ZREVRANGE room:{ABC123}:lb 0 49 WITHSCORES // top 50 in O(N) // Throttled leaderboard broadcast — push every 2s, not on every answer const lbKey = `lb_throttle:${gameCode}`; const shouldSend = await redis.set(lbKey, 1, 'NX', 'EX', 2); if (shouldSend) broadcastLeaderboard(gameCode);
Socket.io detects disconnect via heartbeat timeout (30s). Server marks player as connected: 0 in Redis. Game continues — no pause.
Client retries: 1s → 2s → 4s → 8s → 16s (max). Socket.io handles this automatically. New connection can land on any pod.
Client sends reconnect:attempt with JWT + gameCode + playerId. Any pod can validate since state is in Redis.
Server fetches full room state from Redis: current question, time remaining, player's score, leaderboard. Sends as reconnect:state snapshot.
// Server: handle reconnection from any pod socket.on('reconnect:attempt', async (data) => { const { gameCode, playerId, sessionToken } = data; // Validate session token (Redis blacklist check) const valid = await validateSession(sessionToken); if (!valid) { socket.emit('error:session'); return; } // Fetch current game state from Redis const [room, player, lb] = await Promise.all([ redis.hgetall(`room:${gameCode}`), redis.hgetall(`room:${gameCode}:player:${playerId}`), redis.zrevrange(`room:${gameCode}:lb`, 0, 49, 'WITHSCORES'), ]); if (!room || room.state === 'ENDED') { socket.emit('error:game_ended'); return; } // Rejoin Socket.io room (new pod, same logical room) await socket.join(gameCode); // Update player connection status await redis.hset(`room:${gameCode}:player:${playerId}`, { connected: 1, socketId: socket.id }); // Build state snapshot for client const timeRemaining = +room.questionEndsAt - Date.now(); socket.emit('reconnect:state', { gameState: room.state, currentQ: +room.currentQ, questionEndsAt:+room.questionEndsAt, // client re-syncs timer myScore: +player.score, myStreak: +player.streak, leaderboard: parseLeaderboard(lb), serverTime: Date.now(), // for clock sync }); });
| Failure | Impact | Detection | Mitigation | RTO |
|---|---|---|---|---|
| WS Pod crashes | ~3K players disconnect | K8s liveness probe (5s) | Clients auto-reconnect to other pods. State in Redis. New pod spins in 30s. | ~5s reconnect |
| Redis primary fails | All operations stall | Redis Sentinel | Sentinel promotes replica in <30s. Clients see brief error, retry succeeds. | ~30s |
| PostgreSQL failure | Auth + persistence fails | Healthcheck + RDS Multi-AZ | RDS Multi-AZ automatic failover. Read replicas absorb reads. | ~60s |
| Load balancer down | All traffic fails | Cloud provider monitor | HAProxy in active-passive pair. DNS failover to secondary. | ~60s |
| Network partition | Split-brain risk | Redis Cluster voting | Redis requires quorum (3/5) for writes. Partitioned nodes enter read-only mode. | Auto |
| Memory leak in pod | Degraded performance | Prometheus OOM alert | K8s restarts pod (memory limit: 8GB). Graceful drain first via SIGTERM handler. | ~60s |
-- Users CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), username VARCHAR(32) UNIQUE NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, password_hash TEXT NOT NULL, avatar_id SMALLINT DEFAULT 0, xp INT DEFAULT 0, level SMALLINT DEFAULT 1, streak SMALLINT DEFAULT 0, streak_last_at TIMESTAMPTZ, badges JSONB DEFAULT '[]', settings JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW(), last_seen_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_users_xp ON users (xp DESC); -- Quizzes CREATE TABLE quizzes ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), creator_id UUID REFERENCES users(id) ON DELETE CASCADE, title VARCHAR(120) NOT NULL, description TEXT, category VARCHAR(40), difficulty SMALLINT CHECK (difficulty BETWEEN 1 AND 3), is_public BOOLEAN DEFAULT TRUE, play_count INT DEFAULT 0, avg_score NUMERIC(5,2), created_at TIMESTAMPTZ DEFAULT NOW() ); -- Questions (stored as JSONB array on quiz for fast reads) CREATE TABLE questions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), quiz_id UUID REFERENCES quizzes(id) ON DELETE CASCADE, position SMALLINT NOT NULL, question TEXT NOT NULL, options JSONB NOT NULL, -- ["opt0","opt1","opt2","opt3"] correct_idx SMALLINT NOT NULL, -- NEVER exposed via API pre-reveal explanation TEXT, time_ms INT DEFAULT 20000, points SMALLINT DEFAULT 1000 ); -- Game sessions (persistent record) CREATE TABLE game_sessions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), game_code CHAR(6) NOT NULL, quiz_id UUID REFERENCES quizzes(id), host_id UUID REFERENCES users(id), player_count SMALLINT, final_state JSONB, -- full leaderboard snapshot started_at TIMESTAMPTZ, ended_at TIMESTAMPTZ, duration_ms INT ); CREATE INDEX idx_sessions_quiz ON game_sessions(quiz_id, started_at DESC); -- Player game results CREATE TABLE player_results ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), session_id UUID REFERENCES game_sessions(id) ON DELETE CASCADE, player_id UUID REFERENCES users(id), rank SMALLINT, score INT, correct_count SMALLINT, best_streak SMALLINT, xp_earned INT, answers JSONB, -- [{qIdx, answerIdx, correct, pts, ms}] created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_results_player ON player_results(player_id, created_at DESC); CREATE INDEX idx_results_session ON player_results(session_id, rank);
Replace JSON with msgpack binary serialization. Answer events drop from ~180 bytes to ~40 bytes. Leaderboard broadcasts shrink 60%. Critical at 100K concurrent senders.
Leaderboard updates throttled to 1 per 2s using Redis NX lock. Answer progress (distribution bars) throttled separately. Full snapshot only on question reveal. Reduces Redis pub/sub pressure 10×.
Answer processing pipelines 4 Redis commands (HSETNX, INCR, INCR, HGET) into a single round-trip. Scoring pipelines 6 commands. Reduces Redis RTT overhead by 80% at peak load.
Route53 latency-based routing directs players to nearest region: US-East, EU-West, AP-Southeast. Average RTT drops from 180ms to 35ms for EU players. Edge terminates TLS.
Initial connection uses HTTP/3 QUIC for 0-RTT reconnects after network change (WiFi → cellular). WebTransport as long-term WS replacement — no HoL blocking, multiplexed streams.
Full quiz loaded from PostgreSQL into Redis on room creation. During gameplay, every question read hits Redis only (~0.3ms). No DB queries during active gameplay. Questions invalidated after game ends.
// next.config.js — CDN asset configuration const nextConfig = { assetPrefix: process.env.CDN_URL, // https://cdn.quizblaze.com images: { domains: ['cdn.quizblaze.com'], formats: ['image/avif', 'image/webp'], }, // Aggressive caching — JS/CSS chunks are content-hashed async headers() { return [{ source: '/_next/static/:path*', headers: [{ key: 'Cache-Control', value: 'public, max-age=31536000, immutable' // 1 year }] }]; } }; // CloudFront cache behavior for game assets # TTL: 1yr for hashed assets, 0 for API routes, 5min for HTML # Compress: gzip + brotli enabled # HTTP/3: enabled on all distributions # Origin shield: us-east-1 (reduces origin hits 90%) # Price class: PriceClass_All (all edge locations) // WS connections bypass CDN — direct to LB via separate subdomain // wss://ws.quizblaze.com → HAProxy → WS pods // https://quizblaze.com → CloudFront → Next.js → API