Microservices API Gateway

Production-Grade Service Mesh Entry Point

Completed January 2024

Project Overview

Production-grade API gateway service providing a unified entry point for 25+ microservices. Handles authentication, rate limiting, request routing, and transformation for a high-traffic enterprise application. Built with Node.js and deployed on Kubernetes for high availability and scalability.

Timeline

September 2023 - January 2024

Role

Lead Developer

Type

Infrastructure / Gateway

Status

Production

Tech Stack

Node.js 20 Express.js Redis 7 Docker Kubernetes OAuth 2.0 JWT Azure

Results & Impact

50M+
Requests Served
99.95%
Uptime
45%
Response Time ↓
70%
Auth Overhead ↓

Key Features

Architecture

Request Flow

  1. TLS Termination: HTTPS connections terminated at load balancer
  2. Authentication: JWT validation with Redis-cached public keys
  3. Rate Limiting: Check user/IP rate limits in Redis
  4. Service Discovery: Consul lookup for healthy service instances
  5. Circuit Breaker Check: Verify downstream service health
  6. Request Forwarding: Proxy to target service with added headers
  7. Response Processing: Transform and return to client
  8. Logging: Async logging to Elasticsearch

High Availability Setup

Technical Highlights

Smart Token Caching

JWT validation is CPU-intensive. Solution: Cache validation results in Redis with TTL matching token expiry. Result: 70% reduction in authentication overhead.

// Pseudocode async function validateToken(token) { const cached = await redis.get(`token:${hash(token)}`); if (cached) return JSON.parse(cached); const claims = await jwt.verify(token, publicKey); await redis.setex(`token:${hash(token)}`, claims.exp, JSON.stringify(claims)); return claims; }

Circuit Breaker Pattern

Prevents cascading failures when downstream services are unhealthy. After 5 consecutive failures, gateway returns cached response or friendly error for 30 seconds before retrying.

Distributed Rate Limiting

Uses Redis atomic operations (INCR with EXPIRE) to enforce limits across multiple gateway instances. Sliding window algorithm provides smooth rate limiting without burst allowances.

Production Incident Example

In November 2023, a downstream payment service had a database deadlock issue causing 30-second response times. The circuit breaker detected this and started returning cached responses, preventing user-facing timeouts. The issue was isolated to the payment service while other features remained functional. Total user impact: <2 minutes vs. potential 45+ minutes of site-wide slowness.

Challenges & Solutions

Challenge: Token Validation Performance

Problem: JWT signature verification was bottleneck at 10k+ req/min.

Solution: Implemented Redis caching layer. Also moved to RS256 (async crypto) and cached public keys. Reduced CPU usage by 65%.

Challenge: Rate Limit Synchronization

Problem: Multiple gateway instances had inconsistent rate limit counts.

Solution: Centralized rate limiting state in Redis cluster. Used Lua scripts for atomic increment+check operations.

Challenge: Service Discovery Latency

Problem: Looking up service endpoints on every request added 5-10ms latency.

Solution: Local caching with background refresh every 30s. Active health checks update cache immediately on service changes.

Lessons Learned

Performance Metrics

← Back to Past Projects