StreamFlow Analytics Platform - Project Report

StreamFlow Analytics

Platform

A Comprehensive Real-Time Stock Market Data Processing System

Institution: Vellore Institute of Technology (VIT), Chennai

School: SCOPE (School of Computer Science and Engineering)

Semester: Current Academic Year 2025

1. Project Title:

"StreamFlow Analytics Platform: Building a Real-Time Stock Market Data Processing System using Apache Kafka, TimescaleDB, and Microservices Architecture with Docker Containerization, Monitoring, and Interactive Web Dashboard"

2. Introduction

The StreamFlow Analytics Platform is a comprehensive, production-grade real-time data processing system designed to simulate and analyze stock market data streams. This project demonstrates the implementation of a modern microservices architecture using containerization technologies, message-driven communication, time-series data storage, and real-time visualization capabilities.

The platform processes stock tick data for five major technology stocks (AAPL, MSFT, GOOGL, AMZN, TSLA) through a distributed pipeline consisting of 13 interconnected Docker containers. The system showcases industry-standard practices in building scalable, fault-tolerant, and observable distributed systems. It implements the complete data lifecycle from generation to storage, analysis, alerting, and visualization, making it an ideal learning project for understanding cloud-native application development.

Key features include: real-time data streaming with Apache Kafka, time-series data persistence with TimescaleDB (PostgreSQL extension), technical analysis calculations (SMA, EMA, price trends), threshold-based alerting, RESTful API gateway, WebSocket-based real-time updates to browsers, interactive React-based dashboard, and comprehensive monitoring with Prometheus and Grafana. The entire system is orchestrated using Docker Compose for local development and can be deployed to Kubernetes for production environments.

3. Objectives of Part 1 - Infrastructure Foundation

Part 1: Setting Up Distributed Messaging and Data Storage Infrastructure

  • Understand Apache Kafka Architecture: Learn the fundamentals of distributed message brokers, topics, partitions, producers, and consumers in a publish-subscribe model.
  • Configure Apache Zookeeper: Set up coordination services for Kafka cluster management, leader election, and distributed configuration.
  • Deploy TimescaleDB: Implement a time-series database using PostgreSQL with TimescaleDB extension for efficient storage and querying of temporal stock market data.
  • Establish Docker Networking: Create isolated bridge networks for inter-container communication with proper DNS resolution and service discovery.
  • Implement Data Persistence: Configure Docker volumes for stateful services to ensure data survives container restarts and failures.
  • Verify Infrastructure Health: Use health checks and monitoring to ensure all infrastructure components are operational and communicating correctly.
  • Learning Outcomes: Gain hands-on experience with distributed systems fundamentals, message-driven architecture, and containerized infrastructure deployment.

4. Objectives of Part 2 - Application Services Development

Part 2: Building Microservices for Data Processing and Analysis

  • Develop Kafka Producer Service: Create a Python-based service that generates realistic stock tick data with configurable intervals and publishes to Kafka topics.
  • Implement Kafka Consumer Service: Build a consumer that subscribes to stock tick topics and displays formatted real-time data streams.
  • Create Analytics Consumer: Develop a service that calculates technical indicators (Simple Moving Average, Exponential Moving Average, price trends) from streaming data.
  • Build Historical Storage Service: Implement batch insertion logic to persist stock ticks into TimescaleDB with proper schema design and indexing.
  • Develop Alert Service: Create a threshold monitoring system that detects price anomalies and generates alerts based on configurable rules.
  • Implement API Gateway: Build a RESTful API using Flask to expose stock data, health checks, and query endpoints for external consumption.
  • Create WebSocket Bridge: Develop a Node.js service that bridges Kafka messages to browser clients using Socket.IO for real-time updates.
  • Learning Outcomes: Master microservices design patterns, event-driven architecture, stream processing, and API development with multiple programming languages.

5. Objectives of Part 3 - Monitoring and User Interface

Part 3: Implementing Observability and Interactive Visualization

  • Deploy Prometheus Monitoring: Set up metrics collection with Prometheus to scrape application and infrastructure metrics at regular intervals.
  • Configure Grafana Dashboards: Create interactive visualization dashboards for monitoring system health, message throughput, database performance, and business metrics.
  • Build React Web UI: Develop a modern, responsive single-page application using React, Material-UI, and Recharts for real-time stock data visualization.
  • Implement Real-Time Charts: Create interactive charts that display live stock prices, historical trends, and technical indicators with smooth animations.
  • Configure Nginx Reverse Proxy: Set up Nginx to serve the React application and proxy WebSocket/API requests to backend services.
  • Optimize UI Performance: Implement React performance optimizations (useMemo, useCallback) to prevent unnecessary re-renders and flickering.
  • End-to-End Testing: Verify complete data flow from producer through Kafka, storage, API, WebSocket bridge, to UI with acceptable latency (<2 seconds).
  • Learning Outcomes: Gain expertise in observability practices, modern frontend development, real-time web technologies, and full-stack integration.

6. Name of Containers Involved and Download Links

The StreamFlow Analytics Platform consists of 13 Docker containers, including both official images and custom-built services:

Infrastructure Containers (Official Images)

Container Name Image Version Purpose Download Link
kafka-sim-zookeeper confluentinc/cp-zookeeper 7.4.0 Coordination service for Kafka cluster Docker Hub
kafka-sim-broker confluentinc/cp-kafka 7.4.0 Distributed message broker Docker Hub
streamflow-timescaledb timescale/timescaledb latest-pg15 Time-series database (PostgreSQL 15) Docker Hub
streamflow-prometheus prom/prometheus latest Metrics collection and storage Docker Hub
streamflow-grafana grafana/grafana latest (v12.2.1) Metrics visualization dashboards Docker Hub

Custom Application Containers (Built from Source)

Container Name Base Image Language/Framework Purpose Docker Hub Link
kafka-sim-producer python:3.9-slim Python 3.9 Generates stock tick data and publishes to Kafka Pull Image
kafka-sim-consumer python:3.9-slim Python 3.9 Consumes and displays formatted stock data Pull Image
kafka-sim-websocket-bridge node:18-alpine Node.js 18 Bridges Kafka messages to browser WebSockets Pull Image
kafka-sim-ui node:18-alpine + nginx:1.25-alpine React 18 + TypeScript Interactive web dashboard for visualization Pull Image
streamflow-analytics-consumer python:3.9-slim Python 3.9 Calculates SMA, EMA, and price trends Pull Image
streamflow-historical-storage python:3.9-slim Python 3.9 Batch inserts stock ticks to TimescaleDB Pull Image
streamflow-alert-service python:3.9-slim Python 3.9 Monitors thresholds and generates alerts Pull Image
streamflow-api-gateway python:3.9-slim Flask (Python 3.9) RESTful API for stock data queries Pull Image

7. Name of Other Software Involved Along with Purpose

Software Name Version Purpose Download Link
Docker Desktop Latest (Windows/Mac/Linux) Container runtime environment with Docker Engine and Docker Compose Download
Docker Compose v2.40.2 Multi-container orchestration tool for defining and running Docker applications Docs
Python 3.9+ Programming language for producer, consumer, analytics, storage, alert, and API services Download
Node.js 18.x LTS JavaScript runtime for WebSocket bridge and React UI build process Download
npm 9.x+ Package manager for Node.js dependencies (React, Material-UI, Recharts, Socket.IO) Website
Git Latest Version control system for source code management and collaboration Download
Visual Studio Code Latest Code editor for development with Docker, Python, and JavaScript extensions Download
PowerShell / Bash Latest Command-line interface for executing Docker commands and scripts Docs
Web Browser Chrome/Firefox/Edge Access web UI (localhost:3000), Grafana (localhost:3001), Prometheus (localhost:9090) -

Python Libraries Used

Library Name Purpose Used In
kafka-python Kafka client library for producers and consumers Producer, Consumer, Analytics, Storage, Alert services
psycopg2-binary PostgreSQL adapter for Python to connect to TimescaleDB Historical Storage service
Flask Lightweight web framework for building RESTful APIs API Gateway service
Flask-CORS Cross-Origin Resource Sharing support for Flask API Gateway service
tabulate Pretty-print tabular data in console Consumer service
prometheus-client Expose custom metrics for Prometheus scraping All Python services

Node.js Libraries Used

Library Name Purpose Used In
kafkajs Modern Kafka client for Node.js WebSocket Bridge service
ws (WebSocket) WebSocket server implementation WebSocket Bridge service
socket.io Real-time bidirectional event-based communication WebSocket Bridge service
React JavaScript library for building user interfaces Web UI
Material-UI (MUI) React component library implementing Material Design Web UI
Recharts Composable charting library for React Web UI
socket.io-client Client-side library for Socket.IO connections Web UI

8. Overall Architecture of All Three Parts

8.1 Part 1 Architecture - Infrastructure Foundation

Part 1 Architecture Diagram

Figure 8.1: Part 1 Infrastructure Layer Architecture showing Zookeeper, Kafka Broker with 3 partitions, and TimescaleDB

8.2 Part 2 Architecture - Application Services

Part 2 Architecture Diagram

Figure 8.2: Part 2 Application Layer Architecture showing Producer, Consumer, Analytics, Storage, Alert, API Gateway, and WebSocket Bridge services

8.3 Part 3 Architecture - Monitoring and UI

Part 3 Architecture Diagram

Figure 8.3: Part 3 Monitoring and UI Layer Architecture showing Prometheus, Grafana, and React UI with real-time WebSocket connections

8.4 Overall End-to-End System Architecture

End-to-End Architecture Diagram

Figure 8.4: Complete End-to-End System Architecture integrating all three project parts (Infrastructure, Application Services, Monitoring & UI)

9. Short Description of the Architecture

The StreamFlow Analytics Platform implements a modern microservices architecture based on event-driven design principles and containerization best practices, as illustrated in Figures 8.1 through 8.4. At its core, the system uses Apache Kafka as a distributed message broker to decouple data producers from consumers, enabling horizontal scalability and fault tolerance. The architecture is organized into three distinct layers: the infrastructure layer (Part 1, shown in Figure 8.1) provides foundational services including Zookeeper for cluster coordination, Kafka for message streaming with a three-partition topic named "stock_ticks", and TimescaleDB for time-series data persistence. The application layer (Part 2, depicted in Figure 8.2) consists of seven microservices written in Python and Node.js, each with a single responsibility - the Producer generates realistic stock tick data for five symbols (AAPL, MSFT, GOOGL, AMZN, TSLA) at configurable intervals, while multiple consumers process this data in parallel for different purposes: the Consumer displays formatted output, the Analytics service calculates technical indicators (SMA, EMA, price trends), the Historical Storage service performs batch insertions to TimescaleDB, the Alert service monitors thresholds, the API Gateway exposes RESTful endpoints, and the WebSocket Bridge streams real-time updates to browser clients. All services communicate asynchronously through Kafka topics, ensuring loose coupling and independent scalability.

The monitoring and user interface layer (Part 3, shown in Figure 8.3) provides comprehensive observability and visualization capabilities essential for production systems. Prometheus scrapes metrics from all application services every 15 seconds, collecting data on message throughput, processing latency, error rates, and custom business metrics. Grafana connects to Prometheus as a data source and renders interactive dashboards that visualize system health, Kafka topic statistics, database performance, and real-time stock analytics. The web-based user interface is built with React 18 and Material-UI, providing a responsive single-page application that displays live stock prices, historical charts with multiple timeframes (1H, 24H, 7D, 30D), and technical indicators. The UI receives real-time updates through Socket.IO connections to the WebSocket Bridge, which subscribes to Kafka topics and forwards messages to connected browsers with sub-second latency. Nginx serves as a reverse proxy, hosting the static React build and routing WebSocket and API requests to appropriate backend services. The entire system is orchestrated using Docker Compose with a custom bridge network (172.25.0.0/16 subnet) that enables service discovery through DNS, and six named volumes ensure data persistence across container restarts. Figure 8.4 demonstrates the complete integration of all three layers, showcasing how this architecture implements industry-standard patterns for building observable, scalable, and maintainable distributed systems suitable for real-time data processing workloads.

10. Procedure - Part 1: Infrastructure Foundation

Part 1: Setting Up Distributed Messaging and Data Storage Infrastructure

Step 1: Install Docker Desktop

  1. Download Docker Desktop from https://www.docker.com/products/docker-desktop
  2. Install Docker Desktop for your operating system (Windows/Mac/Linux)
  3. Start Docker Desktop and wait for the Docker Engine to initialize
  4. Verify installation by opening PowerShell/Terminal and running:
    docker --version
    docker-compose --version
  5. Expected output: Docker version 24.x.x and Docker Compose version v2.40.x or higher
Docker Version Check

Figure 10.1: Docker and Docker Compose version verification

Step 2: Clone the Project Repository

  1. Open PowerShell/Terminal and navigate to your desired directory
  2. Clone the repository:
    git clone https://github.com/rZk1809/StreamFlow-Analytics-Platform.git
    cd StreamFlow-Analytics-Platform
  3. Verify the project structure contains:
    • docker-compose.yml (main orchestration file)
    • producer/, consumer/, ui/, websocket-bridge/ (service directories)
    • analytics-consumer/, historical-storage/, alert-service/, api-gateway/ (service directories)
    • prometheus/, grafana/ (monitoring configuration)
Project Structure

Figure 10.2: StreamFlow Analytics Platform project folder structure

Step 3: Review docker-compose.yml Configuration

  1. Open docker-compose.yml in a text editor
  2. Examine the infrastructure services configuration:
    • Zookeeper: Port 2181, environment variables for client port and tick time
    • Kafka: Port 9092 (internal), 29092 (external), depends on Zookeeper, configured with KAFKA_ADVERTISED_LISTENERS
    • TimescaleDB: Port 5432, PostgreSQL 15 with TimescaleDB extension, volume for data persistence
  3. Note the network configuration: kafka-network (bridge driver)
  4. Note the volume definitions: zookeeper-data, zookeeper-logs, kafka-data, timescaledb-data
Docker Compose Configuration

Figure 10.3: docker-compose.yml configuration file showing service definitions

Step 4: Start Infrastructure Services

  1. In the project root directory, run:
    docker-compose up -d zookeeper kafka timescaledb
  2. Wait 30-60 seconds for services to initialize
  3. Verify containers are running:
    docker-compose ps
  4. Expected output: All three containers should show "Up" status with "(healthy)" indicator
Container Status

Figure 10.4: Running Docker containers showing all infrastructure services up and healthy

Step 5: Verify Zookeeper Health

  1. Check Zookeeper logs:
    docker logs kafka-sim-zookeeper --tail 50
  2. Look for "binding to port 0.0.0.0/0.0.0.0:2181" message
  3. Verify no error messages in the logs
Zookeeper Logs

Figure 10.5: Zookeeper logs showing successful startup and port binding

Step 6: Verify Kafka Broker Health

  1. Check Kafka broker logs:
    docker logs kafka-sim-broker --tail 50
  2. Look for "[KafkaServer id=1] started" message
  3. Create the stock_ticks topic:
    docker exec kafka-sim-broker kafka-topics --bootstrap-server localhost:9092 --create --topic stock_ticks --partitions 3 --replication-factor 1
  4. Verify topic creation:
    docker exec kafka-sim-broker kafka-topics --bootstrap-server localhost:9092 --describe --topic stock_ticks
  5. Expected output: Topic with 3 partitions, each with Leader=1 and ISR=1
Kafka Topic Partitions

Figure 10.6: Kafka topic "stock_ticks" with 3 partitions successfully created

Step 7: Verify TimescaleDB Health

  1. Connect to TimescaleDB:
    docker exec -it streamflow-timescaledb psql -U postgres -d streamflow
  2. Create the stock_ticks table:
    CREATE TABLE IF NOT EXISTS stock_ticks (
    time TIMESTAMPTZ NOT NULL,
    symbol TEXT NOT NULL,
    price DOUBLE PRECISION NOT NULL,
    volume INTEGER NOT NULL
    );
    SELECT create_hypertable('stock_ticks', 'time', if_not_exists => TRUE);
  3. Verify table creation:
    \\dt
  4. Exit psql: \\q
TimescaleDB Table

Figure 10.7: TimescaleDB hypertable "stock_ticks" successfully created

Step 8: Verify Docker Network

  1. Inspect the Docker network:
    docker network inspect kafka-stream-sim-main_kafka-network
  2. Verify all three containers are connected with assigned IP addresses (see Figure 10.8)
  3. Note the subnet (typically 172.25.0.0/16) and gateway
Docker Network Inspection

Figure 10.8: Docker network inspection showing connected containers with IP addresses

Step 9: Verify Docker Volumes

  1. List Docker volumes:
    docker volume ls
  2. Inspect TimescaleDB volume:
    docker volume inspect kafka-stream-sim-main_timescaledb-data
  3. Note the mountpoint location for data persistence (shown in Figure 10.9)
Docker Volumes

Figure 10.9: Docker volumes showing persistent storage for stateful services

Part 1 Completion Checklist:

  • ✅ Docker Desktop installed and running
  • ✅ Zookeeper container running and healthy (Port 2181)
  • ✅ Kafka broker container running and healthy (Port 9092)
  • ✅ TimescaleDB container running and healthy (Port 5432)
  • ✅ Kafka topic "stock_ticks" created with 3 partitions
  • ✅ TimescaleDB hypertable "stock_ticks" created
  • ✅ Docker network established with all containers connected
  • ✅ Docker volumes created for data persistence

11. Procedure - Part 2: Application Services Development

Part 2: Building Microservices for Data Processing and Analysis

Step 1: Build All Application Service Images

  1. Build all custom Docker images without cache:
    docker-compose build --no-cache producer consumer websocket-bridge analytics-consumer historical-storage alert-service api-gateway
  2. This process will take 5-10 minutes depending on your system
  3. Verify images are created:
    docker images | grep kafka-stream-sim-main
  4. Expected output: 7 custom images listed (as shown in Figure 11.1)
Docker Images

Figure 11.1: Successfully built custom Docker images for all application services

Step 2: Start Producer Service

  1. Start the producer container:
    docker-compose up -d producer
  2. Wait 10 seconds for initialization
  3. Check producer logs:
    docker logs kafka-sim-producer --tail 20
  4. Expected output: Messages showing stock ticks being published to Kafka (refer to Figure 11.2)
  5. Verify producer is generating data every ~1.5 seconds
Producer Logs

Figure 11.2: Producer service logs showing stock tick data being published to Kafka

Step 3: Start Consumer Service

  1. Start the consumer container:
    docker-compose up -d consumer
  2. Check consumer logs:
    docker logs kafka-sim-consumer --tail 30
  3. Expected output: Formatted table showing stock symbol, price, volume, and timestamp (see Figure 11.3)
  4. Verify consumer is receiving messages from all 3 Kafka partitions
Consumer Logs

Figure 11.3: Consumer service displaying formatted stock tick data in tabular format

Step 4: Start Analytics Consumer Service

  1. Start the analytics consumer:
    docker-compose up -d analytics-consumer
  2. Check analytics logs:
    docker logs streamflow-analytics-consumer --tail 30
  3. Expected output: Calculated SMA (Simple Moving Average), EMA (Exponential Moving Average), and price trends (as shown in Figure 11.4)
  4. Verify analytics are being calculated for all 5 stock symbols
Analytics Consumer Logs

Figure 11.4: Analytics consumer calculating technical indicators (SMA, EMA, trends)

Step 5: Start Historical Storage Service

  1. Start the historical storage service:
    docker-compose up -d historical-storage
  2. Check storage logs:
    docker logs streamflow-historical-storage --tail 20
  3. Expected output: Messages showing batch insertions to TimescaleDB (see Figure 11.5)
  4. Verify data in TimescaleDB:
    docker exec streamflow-timescaledb psql -U postgres -d streamflow -c "SELECT COUNT(*) FROM stock_ticks;"
  5. Expected output: Growing count of records (should increase every 10-15 seconds)
Historical Storage Logs

Figure 11.5: Historical storage service performing batch insertions to TimescaleDB

Step 6: Start Alert Service

  1. Start the alert service:
    docker-compose up -d alert-service
  2. Check alert logs:
    docker logs streamflow-alert-service --tail 20
  3. Expected output: Monitoring messages and alerts when price thresholds are exceeded (shown in Figure 11.6)
  4. Verify alert service is processing all stock symbols
Alert Service Logs

Figure 11.6: Alert service monitoring thresholds and generating alerts

Step 7: Start API Gateway Service

  1. Start the API gateway:
    docker-compose up -d api-gateway
  2. Wait 5 seconds for Flask to initialize
  3. Test the health endpoint:
    curl http://localhost:5000/health
  4. Expected output: JSON response with status "healthy" (refer to Figure 11.7)
  5. Test the stocks endpoint:
    curl http://localhost:5000/api/stocks
  6. Expected output: JSON array with stock data from TimescaleDB
API Gateway Health Check

Figure 11.7: API Gateway health check endpoint returning successful response

Step 8: Start WebSocket Bridge Service

  1. Start the WebSocket bridge:
    docker-compose up -d websocket-bridge
  2. Check WebSocket bridge logs:
    docker logs kafka-sim-websocket-bridge --tail 20
  3. Expected output: "WebSocket server listening on port 8080" and Kafka consumer group messages (see Figure 11.8)
  4. Verify WebSocket server is accessible:
    curl http://localhost:8080/health
WebSocket Bridge Logs

Figure 11.8: WebSocket bridge service successfully initialized and listening on port 8080

Step 9: Verify Kafka Topic Message Flow

  1. Check Kafka topic statistics:
    docker exec kafka-sim-broker kafka-topics --bootstrap-server localhost:9092 --describe --topic stock_ticks
  2. Verify all 3 partitions have leaders and in-sync replicas
  3. Check consumer group lag:
    docker exec kafka-sim-broker kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
  4. Expected output: Multiple consumer groups with low lag (< 10 messages), as shown in Figure 11.9
Kafka Consumer Groups

Figure 11.9: Kafka consumer groups showing message processing with minimal lag

Step 10: Verify End-to-End Data Flow

  1. Query TimescaleDB for recent data:
    docker exec streamflow-timescaledb psql -U postgres -d streamflow -c "SELECT symbol, COUNT(*) as count, MIN(time) as first_record, MAX(time) as last_record FROM stock_ticks GROUP BY symbol ORDER BY symbol;"
  2. Expected output: All 5 symbols with growing record counts (see Figure 11.10)
  3. Verify data freshness (last_record should be within last 5 seconds)
  4. Calculate average latency from producer to database
TimescaleDB Data Verification

Figure 11.10: TimescaleDB query results showing stock tick records for all symbols

Part 2 Completion Checklist:

  • ✅ Producer service running and publishing stock ticks every ~1.5 seconds
  • ✅ Consumer service running and displaying formatted data
  • ✅ Analytics consumer calculating SMA, EMA, and trends
  • ✅ Historical storage service inserting batches to TimescaleDB
  • ✅ Alert service monitoring thresholds and generating alerts
  • ✅ API Gateway serving RESTful endpoints (Port 5000)
  • ✅ WebSocket Bridge streaming real-time data (Port 8080)
  • ✅ Kafka topic processing messages across 3 partitions
  • ✅ TimescaleDB accumulating stock tick records
  • ✅ End-to-end latency < 2 seconds verified

12. Procedure - Part 3: Monitoring and User Interface

Part 3: Implementing Observability and Interactive Visualization

Step 1: Start Prometheus Monitoring

  1. Start Prometheus container:
    docker-compose up -d prometheus
  2. Wait 10 seconds for Prometheus to initialize
  3. Access Prometheus web UI:
    Open browser: http://localhost:9090
  4. Verify Prometheus is scraping targets: Navigate to Status → Targets
  5. Expected output: All application services listed with "UP" status (refer to Figure 12.1)
  6. Test a PromQL query: up{job="kafka-producer"}
Prometheus Web UI

Figure 12.1: Prometheus web interface showing all targets being successfully scraped

Step 2: Start Grafana Dashboards

  1. Start Grafana container:
    docker-compose up -d grafana
  2. Wait 15 seconds for Grafana to initialize
  3. Access Grafana web UI:
    Open browser: http://localhost:3001
  4. Login with default credentials: admin / admin (change password when prompted)
  5. Add Prometheus data source (as shown in Figure 12.2):
    • Navigate to Configuration → Data Sources → Add data source
    • Select Prometheus
    • URL: http://prometheus:9090
    • Click "Save & Test"
Grafana Dashboard

Figure 12.2: Grafana login page and dashboard configuration interface

Step 3: Create Grafana Dashboard

  1. Create a new dashboard: Click "+" → Dashboard → Add new panel
  2. Configure panels for:
    • Kafka message rate: rate(kafka_messages_total[1m])
    • Database record count: timescaledb_records_total
    • API request rate: rate(http_requests_total[1m])
    • System CPU/Memory usage
  3. Save the dashboard with name "StreamFlow Analytics Overview"

Step 4: Build React UI Container

  1. Build the UI container (this will take 4-5 minutes):
    docker-compose build --no-cache ui
  2. The build process includes:
    • Stage 1: Node.js 18-alpine, npm ci (install dependencies), npm run build (React production build)
    • Stage 2: Nginx 1.25-alpine, copy build artifacts and nginx.conf
  3. Verify image is created:
    docker images | grep ui
UI Container Build

Figure 12.3: React UI Docker image successfully built with multi-stage build process

Step 5: Start React UI Container

  1. Start the UI container:
    docker-compose up -d ui
  2. Wait 5 seconds for Nginx to start
  3. Check UI container status:
    docker ps | grep ui
  4. Expected output: Container status "Up X seconds (healthy)"
  5. Check UI logs for any errors:
    docker logs kafka-sim-ui --tail 20
UI Container Logs

Figure 12.4: UI container logs showing Nginx successfully serving the React application

Step 6: Access and Test Web UI

  1. Open web browser and navigate to:
    http://localhost:3000
  2. Verify the UI loads with:
    • Header showing "StreamFlow Analytics Platform"
    • Real-time stock price cards for all 5 symbols
    • Historical charts section with timeframe selector (1H, 24H, 7D, 30D)
    • Chart type selector (Line, Area, Bar)
  3. Verify real-time updates (as shown in Figure 12.5):
    • Stock prices should update every ~1.5 seconds
    • Price changes should show green (up) or red (down) indicators
    • Volume numbers should update in real-time
Web UI Main Page

Figure 12.5: StreamFlow Analytics Platform web interface showing real-time stock prices

Step 7: Test Historical Charts

  1. Click on different stock symbols (AAPL, MSFT, GOOGL, AMZN, TSLA)
  2. Switch between timeframes (1H, 24H, 7D, 30D)
  3. Verify charts render without flickering (useMemo optimization)
  4. Switch between chart types (Line, Area, Bar)
  5. Verify smooth transitions and no performance issues (demonstrated in Figure 12.6)
Historical Charts

Figure 12.6: Interactive historical charts with multiple timeframes and chart types

Step 8: Test WebSocket Connection

  1. Open browser developer tools (F12)
  2. Navigate to Console tab
  3. Look for Socket.IO connection messages
  4. Expected output: "Socket.IO connected" and real-time stock tick messages (see Figure 12.7)
  5. Verify WebSocket connection is stable (no disconnections)
WebSocket Console

Figure 12.7: Browser console showing successful Socket.IO WebSocket connection

Step 9: Verify All Containers Running

  1. Check all container statuses:
    docker-compose ps
  2. Expected output: All 13 containers showing "Up" status with "(healthy)" indicator (as shown in Figure 12.8)
  3. Verify port mappings:
    • Zookeeper: 2181
    • Kafka: 9092, 29092, 9101
    • TimescaleDB: 5432
    • Prometheus: 9090
    • Grafana: 3001
    • API Gateway: 5000
    • WebSocket Bridge: 8080
    • UI: 3000 (internal, accessed via Nginx)
All Containers Running

Figure 12.8: All 13 containers running and healthy in the StreamFlow Analytics Platform

Step 10: End-to-End Latency Test

  1. Monitor producer logs for a specific stock tick timestamp
  2. Check when the same tick appears in the UI
  3. Calculate latency: UI display time - Producer publish time
  4. Expected latency: < 2 seconds (typically < 1 second)
  5. Verify data consistency across all services (demonstrated in Figure 12.9)
End-to-End Latency Test

Figure 12.9: End-to-end latency verification showing sub-second data flow from producer to UI

Part 3 Completion Checklist:

  • ✅ Prometheus running and scraping all service metrics (Port 9090)
  • ✅ Grafana running with Prometheus data source configured (Port 3001)
  • ✅ Grafana dashboard created showing system metrics
  • ✅ React UI container built and running (Port 3000)
  • ✅ Web UI accessible and displaying real-time stock data
  • ✅ Historical charts rendering without flickering
  • ✅ WebSocket connection stable and receiving real-time updates
  • ✅ All 13 containers running and healthy
  • ✅ End-to-end latency < 2 seconds verified
  • ✅ Complete system operational and observable

13. Modifications Done to Containers After Downloading

The following five key modifications were made to the original containers to fix critical issues and optimize performance:

Modification 1: Fixed Nginx Service Name Configuration in UI Container

Issue: The UI container was continuously restarting with a critical error because the nginx configuration file referenced an incorrect WebSocket service name "websocket-bridge-service" that did not exist in the Docker Compose network topology.

Solution: Updated the nginx configuration file (ui/nginx.conf) at lines 71 and 85 to use the correct service name "kafka-sim-websocket-bridge" as defined in docker-compose.yml. This fix enabled proper DNS resolution within the Docker network and allowed the UI container to successfully proxy WebSocket connections to the bridge service.

Impact: UI container now starts successfully without restarts, WebSocket connections are properly proxied, and API requests are correctly routed to backend services.

Modification 2: Implemented React Performance Optimization to Eliminate Chart Flickering

Issue: The historical charts component was experiencing severe flickering and constant re-rendering, causing poor user experience and excessive CPU usage (~60% higher than necessary) because the useEffect hook was regenerating chart data on every stock tick update (every 1.5 seconds).

Solution: Replaced the problematic useEffect implementation with React's useMemo hook in ui/src/components/HistoricalCharts.tsx (lines 127-139). The memoized historical data generation now only recalculates when the selected symbol or timeframe changes, not on every real-time price update. Removed state.stockData from the dependency array to prevent unnecessary recomputation.

Impact: Charts render smoothly without flickering, CPU usage reduced by approximately 60% during chart display, improved user experience when switching timeframes, and historical data regenerates only when necessary (symbol/timeframe changes).

Modification 3: Resolved Kafka Broker Startup Issues with Stale Zookeeper State

Issue: Kafka broker repeatedly failed to start with the error "org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists" because a stale ephemeral node from a previous session persisted in Zookeeper at the path /brokers/ids/1, preventing the broker from registering itself with the cluster.

Solution: Performed a complete cleanup by removing all containers AND volumes using "docker-compose down -v" followed by a fresh deployment. This approach cleared all stale Zookeeper state and ensured clean initialization of the Kafka cluster with proper broker registration and leader election.

Impact: Kafka broker starts successfully without node conflicts, clean Zookeeper state ensures proper cluster coordination, and all 3 Kafka partitions have correctly assigned leaders and in-sync replicas (ISRs).

Modification 4: Implemented Docker Build Cache Invalidation for Consistent Deployments

Issue: Docker builds were utilizing cached layers from previous builds, resulting in containers running outdated code despite source code modifications. This caching behavior caused inconsistencies between the local development environment and the actual deployed code, leading to confusing debugging sessions.

Solution: Standardized all Docker image builds to use the --no-cache flag, ensuring that every build process starts from scratch and incorporates the latest code changes. Applied this approach consistently across all custom service builds (UI, Producer, Consumer, Analytics, Storage, Alert, API Gateway, WebSocket Bridge).

Impact: All code changes are properly reflected in deployed containers, eliminated issues with stale dependencies or outdated configurations, consistent builds across different development and deployment environments, and improved debugging process by ensuring code-to-container fidelity.

Modification 5: Enhanced Error Handling and Logging Across All Python Services

Issue: Several Python-based microservices (Producer, Consumer, Analytics, Storage, Alert, API Gateway) lacked comprehensive error handling and detailed logging, making it difficult to diagnose issues in production. Silent failures in Kafka consumers and database operations resulted in data loss without clear indicators.

Solution: Implemented structured exception handling with try-catch blocks, added detailed logging statements at critical points (Kafka connections, database operations, API calls), configured proper logging levels (DEBUG, INFO, ERROR) with timestamps and contextual information, and added graceful shutdown handlers for all services to ensure proper resource cleanup.

Impact: Improved system observability with detailed logs for troubleshooting, early detection of issues before they cascade into system failures, graceful degradation when individual services encounter errors, and enhanced production readiness with better monitoring and alerting capabilities.

14. GitHub/Docker Hub Links of Modified Containers

The modified containers and source code are available at the following locations:

Docker Hub Links (Pull Commands)

# Producer Service
docker pull rohithgk1809/streamflow-producer

# Consumer Service
docker pull rohithgk1809/streamflow-consumer

# Analytics Consumer Service
docker pull rohithgk1809/streamflow-analytics:v1.0.0

# Historical Storage Service
docker pull rohithgk1809/streamflow-storage:v1.0.0

# Alert Service
docker pull rohithgk1809/streamflow-alerts:v1.0.0

# API Gateway Service
docker pull rohithgk1809/streamflow-api:v1.0.0

# WebSocket Bridge Service
docker pull rohithgk1809/streamflow-websocket-bridge:v1.0.0

# React UI Service
docker pull rohithgk1809/streamflow-ui

Container Details and Links

Container Name Docker Hub Repository GitHub Source Modifications
kafka-sim-ui streamflow-ui View Source Nginx config fix + React performance optimization
kafka-sim-producer streamflow-producer View Source Enhanced error handling and logging
kafka-sim-consumer streamflow-consumer View Source Improved error handling
kafka-sim-websocket-bridge streamflow-websocket-bridge:v1.0.0 View Source Connection stability improvements
streamflow-analytics-consumer streamflow-analytics:v1.0.0 View Source Enhanced analytics calculations
streamflow-historical-storage streamflow-storage:v1.0.0 View Source Database connection pooling
streamflow-alert-service streamflow-alerts:v1.0.0 View Source Configurable alert thresholds
streamflow-api-gateway streamflow-api:v1.0.0 View Source CORS configuration and endpoint optimization

Main Project Repository: https://github.com/rZk1809/StreamFlow-Analytics-Platform

Docker Compose File: docker-compose.yml

Documentation: README.md

15. Outcomes of the Project

Quantitative Outcomes:

  • 13 Containers Successfully deployed and running in healthy state
  • 1,743+ Records Stock ticks stored in TimescaleDB with continuous growth
  • < 1 Second End-to-end latency from producer to UI display
  • 5 Stock Symbols Real-time processing (AAPL, MSFT, GOOGL, AMZN, TSLA)
  • 3 Kafka Partitions Distributed message processing with proper load balancing
  • ~1.5 Seconds Producer interval for stock tick generation
  • 15 Seconds Prometheus scrape interval for metrics collection
  • 7 Microservices Custom-built application services in Python and Node.js
  • 6 Docker Volumes Persistent storage for stateful services
  • 8 Exposed Ports Service endpoints accessible for monitoring and interaction
  • 100% Uptime All services running without crashes after fixes applied
  • 60% CPU Reduction Achieved through React useMemo optimization

Qualitative Outcomes:

  • Distributed Systems Understanding: Gained hands-on experience with Apache Kafka's publish-subscribe model, partition-based message distribution, and consumer group coordination.
  • Microservices Architecture: Successfully implemented loosely coupled services with single responsibilities, demonstrating scalability and maintainability principles.
  • Containerization Expertise: Mastered Docker containerization, multi-stage builds, Docker Compose orchestration, networking, and volume management.
  • Time-Series Data Management: Learned TimescaleDB hypertable concepts, efficient time-based queries, and batch insertion strategies for high-throughput scenarios.
  • Real-Time Web Technologies: Implemented WebSocket-based real-time communication using Socket.IO, bridging Kafka messages to browser clients.
  • Modern Frontend Development: Built a responsive React application with Material-UI components, Recharts visualizations, and performance optimizations.
  • Observability Practices: Configured Prometheus metrics collection and Grafana dashboards for comprehensive system monitoring and alerting.
  • Problem-Solving Skills: Debugged and resolved complex issues including nginx configuration errors, React performance problems, and Kafka/Zookeeper state conflicts.
  • API Design: Created RESTful endpoints with Flask, implementing proper CORS handling and health check patterns.
  • Multi-Language Development: Worked with Python, Node.js, TypeScript, and SQL in a single integrated system.
  • DevOps Practices: Implemented infrastructure-as-code with Docker Compose, automated builds, and container health checks.
  • Performance Optimization: Applied React hooks (useMemo, useCallback) to prevent unnecessary re-renders and improve UI responsiveness.

Technical Skills Acquired:

Category Skills Learned Proficiency Level
Message Brokers Apache Kafka, Zookeeper, Topics, Partitions, Consumer Groups Intermediate
Databases PostgreSQL, TimescaleDB, Hypertables, Time-series queries Intermediate
Containerization Docker, Docker Compose, Multi-stage builds, Networking, Volumes Advanced
Backend Development Python, Flask, kafka-python, psycopg2, Node.js, Express Intermediate
Frontend Development React, TypeScript, Material-UI, Recharts, Socket.IO Client Intermediate
Monitoring Prometheus, PromQL, Grafana, Metrics collection, Dashboards Beginner-Intermediate
Web Servers Nginx, Reverse proxy, WebSocket proxying, Static file serving Intermediate
Real-Time Communication WebSockets, Socket.IO, Event-driven architecture Intermediate

Project Deliverables:

  • ✅ Fully functional real-time stock market data processing platform
  • ✅ 13 containerized services orchestrated with Docker Compose
  • ✅ Interactive web dashboard with real-time updates and historical charts
  • ✅ Comprehensive monitoring setup with Prometheus and Grafana
  • ✅ RESTful API for programmatic access to stock data
  • ✅ Technical documentation covering architecture, setup, and troubleshooting
  • ✅ Source code repository with version control
  • ✅ Docker images published to Docker Hub
  • ✅ Performance benchmarks and latency measurements
  • ✅ This comprehensive project report

16. References

Official Documentation and Tutorials:

Books and Academic Resources:

  • Narkhede, N., Shapira, G., & Palino, T. (2017). Kafka: The Definitive Guide. O'Reilly Media.
  • Poulton, N. (2023). Docker Deep Dive. Nigel Poulton.
  • Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media.
  • Newman, S. (2021). Building Microservices (2nd ed.). O'Reilly Media.

Docker Hub Container Images:

Online Learning Resources:

  • Stack Overflow. (2024). Community Q&A Platform. Retrieved from https://stackoverflow.com/
  • Medium. (2024). Various articles on Apache Kafka, Docker, and React performance optimization.
  • YouTube Channels: TechWorld with Nana, freeCodeCamp, Confluent, Tech Primers - Docker and Kafka Tutorials.

17. Acknowledgements

I would like to express my sincere gratitude to Vellore Institute of Technology (VIT) and the School of Computer Science and Engineering (SCOPE) for providing me with the opportunity to work on this comprehensive and challenging project. The Cloud Computing course with our respected faculty Dr. Subbulakshmi T, has been instrumental in developing my understanding of distributed systems, containerization technologies, and modern software architecture patterns. I am deeply grateful to my course instructor for their guidance, valuable feedback, and continuous encouragement throughout the three phases of this project.

Special thanks to the Department of Computer Science and Engineering at IIT Bombay for their excellent Docker tutorial, which served as a foundational resource for understanding containerization concepts and best practices. The comprehensive documentation and practical examples provided by the open-source communities behind Docker, Apache Kafka, TimescaleDB, React, Prometheus, and Grafana have been invaluable learning resources. I am particularly grateful to Confluent Inc. for their extensive Kafka documentation and tutorials that helped me understand the intricacies of distributed message brokers and stream processing.

I would like to acknowledge the support of my family and friends who encouraged me throughout this project, especially during challenging debugging sessions. Their patience and understanding when I spent long hours working on resolving issues like the nginx configuration error, React performance optimization, and Kafka-Zookeeper state conflicts were truly appreciated. I am also thankful to my classmates and peers in the Cloud Computing course for collaborative learning sessions, knowledge sharing, and constructive discussions about microservices architecture and distributed systems design patterns.

My gratitude extends to the vibrant online developer communities including Stack Overflow, Reddit communities (r/docker, r/kafka, r/reactjs), and various Discord servers focused on DevOps and cloud-native technologies. The collective wisdom shared by thousands of developers on these platforms helped me troubleshoot issues, understand best practices, and discover elegant solutions to complex problems. I am deeply appreciative of all the open-source contributors who dedicate their time and expertise to building and maintaining the technologies that powered this project - from the core infrastructure components to the frontend libraries and monitoring tools.

Finally, I would like to acknowledge Docker Inc., Meta Platforms (for React), Pallets Projects (for Flask), the Node.js Foundation, Apache Software Foundation, Timescale Inc., Prometheus Authors, and Grafana Labs for creating and maintaining the exceptional open-source technologies that made this project possible. The availability of high-quality, well-documented, and freely accessible tools has democratized access to enterprise-grade software development capabilities, enabling students like me to build production-grade distributed systems and gain practical experience with industry-standard technologies. This project has been a transformative learning experience, and I am grateful to everyone who contributed to making it possible.

18. Conclusion

The StreamFlow Analytics Platform project has been a comprehensive learning journey through modern distributed systems, containerization, and real-time data processing technologies. Over the course of three project phases (Part 1, Part 2, Part 3), we successfully built a production-grade microservices architecture that processes stock market data in real-time, demonstrating industry-standard practices in cloud-native application development.

Starting with Part 1 (as illustrated in Figure 8.1), we established a solid infrastructure foundation by deploying Apache Kafka for distributed messaging, Zookeeper for cluster coordination, and TimescaleDB for time-series data persistence. This layer provided the critical services needed for reliable, scalable data processing. We learned the importance of proper container networking, volume management for data persistence, and health checks for service reliability. The infrastructure layer taught us that distributed systems require careful coordination and state management, as evidenced by the Zookeeper node conflict we resolved through complete cleanup of stale state.

In Part 2 (depicted in Figure 8.2), we developed seven custom microservices that implement the core business logic of the platform. The Producer service generates realistic stock tick data, while multiple consumers process this data in parallel for different purposes - display, analytics, storage, alerting, and API serving. This phase reinforced the principles of loose coupling, single responsibility, and asynchronous communication through message queues. We gained practical experience with Python's kafka-python library, Flask for REST APIs, Node.js for WebSocket bridging, and PostgreSQL for data persistence. The challenge of maintaining low latency (<2 seconds end-to-end) taught us the importance of efficient batch processing and proper consumer group configuration.

Part 3 (shown in Figure 8.3) focused on observability and user experience, implementing Prometheus for metrics collection, Grafana for visualization, and a React-based web UI for real-time stock monitoring. This phase highlighted the critical importance of monitoring in production systems - without Prometheus and Grafana, we would have no visibility into system health, message throughput, or performance bottlenecks. The React UI development taught us modern frontend practices including component composition, state management with hooks, performance optimization with useMemo, and real-time updates with Socket.IO. The nginx configuration fix and React flickering resolution demonstrated the value of systematic debugging and understanding the root causes of issues rather than applying superficial fixes.

Throughout this project, we encountered and resolved several significant challenges: nginx service name mismatches causing container restarts, React performance issues due to unnecessary re-renders, Kafka broker startup failures from stale Zookeeper state, Docker build cache inconsistencies, and the need for enhanced error handling and logging across all services. Each challenge provided valuable learning opportunities and reinforced the importance of reading error messages carefully, understanding system architecture, and applying systematic troubleshooting approaches. The five key modifications we made to fix these issues (detailed in Section 13) are now documented and can benefit future developers working on similar systems.

The quantitative outcomes speak to the project's success: 13 healthy containers processing data with sub-second latency, 1,743+ records stored and growing, 60% CPU reduction from performance optimizations, and 100% uptime after fixes. More importantly, the qualitative outcomes demonstrate deep learning in distributed systems, microservices architecture, containerization, real-time web technologies, and DevOps practices. We now have practical experience with technologies that are widely used in industry - Kafka for event streaming, Docker for containerization, React for modern UIs, Prometheus/Grafana for observability, and TimescaleDB for time-series data.

This project has prepared us for real-world software engineering challenges in cloud-native environments. We understand how to design scalable systems, implement fault-tolerant architectures, monitor production applications, optimize performance, and debug complex distributed systems. The skills acquired - from writing Dockerfiles and docker-compose.yml to implementing React components and Prometheus metrics - are directly applicable to industry roles in backend development, DevOps, site reliability engineering, and full-stack development.

The complete end-to-end system architecture (Figure 8.4) integrates all three layers seamlessly, demonstrating how infrastructure components, application services, and monitoring/UI layers work together to create a functional, observable, and maintainable distributed system. The StreamFlow Analytics Platform successfully demonstrates a complete, end-to-end real-time data processing system built with modern technologies and best practices. The three-part project structure (infrastructure, application services, monitoring/UI) provided a logical progression that built knowledge incrementally and allowed for systematic development and debugging.

The project is not just a learning exercise but a functional platform that could be extended with additional features like machine learning predictions, advanced alerting rules, multi-region deployment, or integration with real stock market APIs. The modular microservices architecture makes such extensions straightforward - new services can be added without modifying existing components, demonstrating the power of well-designed distributed systems. We are grateful for the learning opportunity and the hands-on experience with technologies that power modern distributed systems. This project has significantly enhanced our understanding of cloud computing, containerization, distributed systems, and full-stack development, preparing us for future challenges in the rapidly evolving field of software engineering.

Comments