StreamFlow Analytics
Platform
A Comprehensive Real-Time Stock Market Data Processing System
Institution: Vellore Institute of Technology (VIT), Chennai
School: SCOPE (School of Computer Science and Engineering)
Semester: Current Academic Year 2025
Table of Contents
- 1. Project Title
- 2. Introduction
- 3. Objectives of Part 1 - Infrastructure Foundation
- 4. Objectives of Part 2 - Application Services Development
- 5. Objectives of Part 3 - Monitoring and User Interface
- 6. Name of Containers Involved and Download Links
- 7. Name of Other Software Involved Along with Purpose
- 8. Overall Architecture of All Three Parts
- 9. Two-Paragraph Description of the Architecture
- 10. Procedure - Part 1: Infrastructure Foundation
- 11. Procedure - Part 2: Application Services Development
- 12. Procedure - Part 3: Monitoring and User Interface
- 13. Modifications Done to Containers After Downloading
- 14. GitHub/Docker Hub Links of Modified Containers
- 15. Outcomes of the Project
- 16. References
- 17. Acknowledgements
- 18. Conclusion
1. Project Title:
"StreamFlow Analytics Platform: Building a Real-Time Stock Market Data Processing System using Apache Kafka, TimescaleDB, and Microservices Architecture with Docker Containerization, Monitoring, and Interactive Web Dashboard"
2. Introduction
The StreamFlow Analytics Platform is a comprehensive, production-grade real-time data processing system designed to simulate and analyze stock market data streams. This project demonstrates the implementation of a modern microservices architecture using containerization technologies, message-driven communication, time-series data storage, and real-time visualization capabilities.
The platform processes stock tick data for five major technology stocks (AAPL, MSFT, GOOGL, AMZN, TSLA) through a distributed pipeline consisting of 13 interconnected Docker containers. The system showcases industry-standard practices in building scalable, fault-tolerant, and observable distributed systems. It implements the complete data lifecycle from generation to storage, analysis, alerting, and visualization, making it an ideal learning project for understanding cloud-native application development.
Key features include: real-time data streaming with Apache Kafka, time-series data persistence with TimescaleDB (PostgreSQL extension), technical analysis calculations (SMA, EMA, price trends), threshold-based alerting, RESTful API gateway, WebSocket-based real-time updates to browsers, interactive React-based dashboard, and comprehensive monitoring with Prometheus and Grafana. The entire system is orchestrated using Docker Compose for local development and can be deployed to Kubernetes for production environments.
3. Objectives of Part 1 - Infrastructure Foundation
Part 1: Setting Up Distributed Messaging and Data Storage Infrastructure
- Understand Apache Kafka Architecture: Learn the fundamentals of distributed message brokers, topics, partitions, producers, and consumers in a publish-subscribe model.
- Configure Apache Zookeeper: Set up coordination services for Kafka cluster management, leader election, and distributed configuration.
- Deploy TimescaleDB: Implement a time-series database using PostgreSQL with TimescaleDB extension for efficient storage and querying of temporal stock market data.
- Establish Docker Networking: Create isolated bridge networks for inter-container communication with proper DNS resolution and service discovery.
- Implement Data Persistence: Configure Docker volumes for stateful services to ensure data survives container restarts and failures.
- Verify Infrastructure Health: Use health checks and monitoring to ensure all infrastructure components are operational and communicating correctly.
- Learning Outcomes: Gain hands-on experience with distributed systems fundamentals, message-driven architecture, and containerized infrastructure deployment.
4. Objectives of Part 2 - Application Services Development
Part 2: Building Microservices for Data Processing and Analysis
- Develop Kafka Producer Service: Create a Python-based service that generates realistic stock tick data with configurable intervals and publishes to Kafka topics.
- Implement Kafka Consumer Service: Build a consumer that subscribes to stock tick topics and displays formatted real-time data streams.
- Create Analytics Consumer: Develop a service that calculates technical indicators (Simple Moving Average, Exponential Moving Average, price trends) from streaming data.
- Build Historical Storage Service: Implement batch insertion logic to persist stock ticks into TimescaleDB with proper schema design and indexing.
- Develop Alert Service: Create a threshold monitoring system that detects price anomalies and generates alerts based on configurable rules.
- Implement API Gateway: Build a RESTful API using Flask to expose stock data, health checks, and query endpoints for external consumption.
- Create WebSocket Bridge: Develop a Node.js service that bridges Kafka messages to browser clients using Socket.IO for real-time updates.
- Learning Outcomes: Master microservices design patterns, event-driven architecture, stream processing, and API development with multiple programming languages.
5. Objectives of Part 3 - Monitoring and User Interface
Part 3: Implementing Observability and Interactive Visualization
- Deploy Prometheus Monitoring: Set up metrics collection with Prometheus to scrape application and infrastructure metrics at regular intervals.
- Configure Grafana Dashboards: Create interactive visualization dashboards for monitoring system health, message throughput, database performance, and business metrics.
- Build React Web UI: Develop a modern, responsive single-page application using React, Material-UI, and Recharts for real-time stock data visualization.
- Implement Real-Time Charts: Create interactive charts that display live stock prices, historical trends, and technical indicators with smooth animations.
- Configure Nginx Reverse Proxy: Set up Nginx to serve the React application and proxy WebSocket/API requests to backend services.
- Optimize UI Performance: Implement React performance optimizations (useMemo, useCallback) to prevent unnecessary re-renders and flickering.
- End-to-End Testing: Verify complete data flow from producer through Kafka, storage, API, WebSocket bridge, to UI with acceptable latency (<2 seconds).
- Learning Outcomes: Gain expertise in observability practices, modern frontend development, real-time web technologies, and full-stack integration.
6. Name of Containers Involved and Download Links
The StreamFlow Analytics Platform consists of 13 Docker containers, including both official images and custom-built services:
Infrastructure Containers (Official Images)
| Container Name | Image | Version | Purpose | Download Link |
|---|---|---|---|---|
| kafka-sim-zookeeper | confluentinc/cp-zookeeper | 7.4.0 | Coordination service for Kafka cluster | Docker Hub |
| kafka-sim-broker | confluentinc/cp-kafka | 7.4.0 | Distributed message broker | Docker Hub |
| streamflow-timescaledb | timescale/timescaledb | latest-pg15 | Time-series database (PostgreSQL 15) | Docker Hub |
| streamflow-prometheus | prom/prometheus | latest | Metrics collection and storage | Docker Hub |
| streamflow-grafana | grafana/grafana | latest (v12.2.1) | Metrics visualization dashboards | Docker Hub |
Custom Application Containers (Built from Source)
| Container Name | Base Image | Language/Framework | Purpose | Docker Hub Link |
|---|---|---|---|---|
| kafka-sim-producer | python:3.9-slim | Python 3.9 | Generates stock tick data and publishes to Kafka | Pull Image |
| kafka-sim-consumer | python:3.9-slim | Python 3.9 | Consumes and displays formatted stock data | Pull Image |
| kafka-sim-websocket-bridge | node:18-alpine | Node.js 18 | Bridges Kafka messages to browser WebSockets | Pull Image |
| kafka-sim-ui | node:18-alpine + nginx:1.25-alpine | React 18 + TypeScript | Interactive web dashboard for visualization | Pull Image |
| streamflow-analytics-consumer | python:3.9-slim | Python 3.9 | Calculates SMA, EMA, and price trends | Pull Image |
| streamflow-historical-storage | python:3.9-slim | Python 3.9 | Batch inserts stock ticks to TimescaleDB | Pull Image |
| streamflow-alert-service | python:3.9-slim | Python 3.9 | Monitors thresholds and generates alerts | Pull Image |
| streamflow-api-gateway | python:3.9-slim | Flask (Python 3.9) | RESTful API for stock data queries | Pull Image |
7. Name of Other Software Involved Along with Purpose
| Software Name | Version | Purpose | Download Link |
|---|---|---|---|
| Docker Desktop | Latest (Windows/Mac/Linux) | Container runtime environment with Docker Engine and Docker Compose | Download |
| Docker Compose | v2.40.2 | Multi-container orchestration tool for defining and running Docker applications | Docs |
| Python | 3.9+ | Programming language for producer, consumer, analytics, storage, alert, and API services | Download |
| Node.js | 18.x LTS | JavaScript runtime for WebSocket bridge and React UI build process | Download |
| npm | 9.x+ | Package manager for Node.js dependencies (React, Material-UI, Recharts, Socket.IO) | Website |
| Git | Latest | Version control system for source code management and collaboration | Download |
| Visual Studio Code | Latest | Code editor for development with Docker, Python, and JavaScript extensions | Download |
| PowerShell / Bash | Latest | Command-line interface for executing Docker commands and scripts | Docs |
| Web Browser | Chrome/Firefox/Edge | Access web UI (localhost:3000), Grafana (localhost:3001), Prometheus (localhost:9090) | - |
Python Libraries Used
| Library Name | Purpose | Used In |
|---|---|---|
| kafka-python | Kafka client library for producers and consumers | Producer, Consumer, Analytics, Storage, Alert services |
| psycopg2-binary | PostgreSQL adapter for Python to connect to TimescaleDB | Historical Storage service |
| Flask | Lightweight web framework for building RESTful APIs | API Gateway service |
| Flask-CORS | Cross-Origin Resource Sharing support for Flask | API Gateway service |
| tabulate | Pretty-print tabular data in console | Consumer service |
| prometheus-client | Expose custom metrics for Prometheus scraping | All Python services |
Node.js Libraries Used
| Library Name | Purpose | Used In |
|---|---|---|
| kafkajs | Modern Kafka client for Node.js | WebSocket Bridge service |
| ws (WebSocket) | WebSocket server implementation | WebSocket Bridge service |
| socket.io | Real-time bidirectional event-based communication | WebSocket Bridge service |
| React | JavaScript library for building user interfaces | Web UI |
| Material-UI (MUI) | React component library implementing Material Design | Web UI |
| Recharts | Composable charting library for React | Web UI |
| socket.io-client | Client-side library for Socket.IO connections | Web UI |
8. Overall Architecture of All Three Parts
8.1 Part 1 Architecture - Infrastructure Foundation
Figure 8.1: Part 1 Infrastructure Layer Architecture showing Zookeeper, Kafka Broker with 3 partitions, and TimescaleDB
8.2 Part 2 Architecture - Application Services
Figure 8.2: Part 2 Application Layer Architecture showing Producer, Consumer, Analytics, Storage, Alert, API Gateway, and WebSocket Bridge services
8.3 Part 3 Architecture - Monitoring and UI
Figure 8.3: Part 3 Monitoring and UI Layer Architecture showing Prometheus, Grafana, and React UI with real-time WebSocket connections
8.4 Overall End-to-End System Architecture
Figure 8.4: Complete End-to-End System Architecture integrating all three project parts (Infrastructure, Application Services, Monitoring & UI)
9. Short Description of the Architecture
The StreamFlow Analytics Platform implements a modern microservices architecture based on event-driven design principles and containerization best practices, as illustrated in Figures 8.1 through 8.4. At its core, the system uses Apache Kafka as a distributed message broker to decouple data producers from consumers, enabling horizontal scalability and fault tolerance. The architecture is organized into three distinct layers: the infrastructure layer (Part 1, shown in Figure 8.1) provides foundational services including Zookeeper for cluster coordination, Kafka for message streaming with a three-partition topic named "stock_ticks", and TimescaleDB for time-series data persistence. The application layer (Part 2, depicted in Figure 8.2) consists of seven microservices written in Python and Node.js, each with a single responsibility - the Producer generates realistic stock tick data for five symbols (AAPL, MSFT, GOOGL, AMZN, TSLA) at configurable intervals, while multiple consumers process this data in parallel for different purposes: the Consumer displays formatted output, the Analytics service calculates technical indicators (SMA, EMA, price trends), the Historical Storage service performs batch insertions to TimescaleDB, the Alert service monitors thresholds, the API Gateway exposes RESTful endpoints, and the WebSocket Bridge streams real-time updates to browser clients. All services communicate asynchronously through Kafka topics, ensuring loose coupling and independent scalability.
The monitoring and user interface layer (Part 3, shown in Figure 8.3) provides comprehensive observability and visualization capabilities essential for production systems. Prometheus scrapes metrics from all application services every 15 seconds, collecting data on message throughput, processing latency, error rates, and custom business metrics. Grafana connects to Prometheus as a data source and renders interactive dashboards that visualize system health, Kafka topic statistics, database performance, and real-time stock analytics. The web-based user interface is built with React 18 and Material-UI, providing a responsive single-page application that displays live stock prices, historical charts with multiple timeframes (1H, 24H, 7D, 30D), and technical indicators. The UI receives real-time updates through Socket.IO connections to the WebSocket Bridge, which subscribes to Kafka topics and forwards messages to connected browsers with sub-second latency. Nginx serves as a reverse proxy, hosting the static React build and routing WebSocket and API requests to appropriate backend services. The entire system is orchestrated using Docker Compose with a custom bridge network (172.25.0.0/16 subnet) that enables service discovery through DNS, and six named volumes ensure data persistence across container restarts. Figure 8.4 demonstrates the complete integration of all three layers, showcasing how this architecture implements industry-standard patterns for building observable, scalable, and maintainable distributed systems suitable for real-time data processing workloads.
10. Procedure - Part 1: Infrastructure Foundation
Part 1: Setting Up Distributed Messaging and Data Storage Infrastructure
Step 1: Install Docker Desktop
- Download Docker Desktop from https://www.docker.com/products/docker-desktop
- Install Docker Desktop for your operating system (Windows/Mac/Linux)
- Start Docker Desktop and wait for the Docker Engine to initialize
- Verify installation by opening PowerShell/Terminal and running:
docker --version
docker-compose --version - Expected output: Docker version 24.x.x and Docker Compose version v2.40.x or higher
Figure 10.1: Docker and Docker Compose version verification
Step 2: Clone the Project Repository
- Open PowerShell/Terminal and navigate to your desired directory
- Clone the repository:
git clone https://github.com/rZk1809/StreamFlow-Analytics-Platform.git
cd StreamFlow-Analytics-Platform - Verify the project structure contains:
- docker-compose.yml (main orchestration file)
- producer/, consumer/, ui/, websocket-bridge/ (service directories)
- analytics-consumer/, historical-storage/, alert-service/, api-gateway/ (service directories)
- prometheus/, grafana/ (monitoring configuration)
Figure 10.2: StreamFlow Analytics Platform project folder structure
Step 3: Review docker-compose.yml Configuration
- Open docker-compose.yml in a text editor
- Examine the infrastructure services configuration:
- Zookeeper: Port 2181, environment variables for client port and tick time
- Kafka: Port 9092 (internal), 29092 (external), depends on Zookeeper, configured with KAFKA_ADVERTISED_LISTENERS
- TimescaleDB: Port 5432, PostgreSQL 15 with TimescaleDB extension, volume for data persistence
- Note the network configuration: kafka-network (bridge driver)
- Note the volume definitions: zookeeper-data, zookeeper-logs, kafka-data, timescaledb-data
Figure 10.3: docker-compose.yml configuration file showing service definitions
Step 4: Start Infrastructure Services
- In the project root directory, run:
docker-compose up -d zookeeper kafka timescaledb
- Wait 30-60 seconds for services to initialize
- Verify containers are running:
docker-compose ps
- Expected output: All three containers should show "Up" status with "(healthy)" indicator
Figure 10.4: Running Docker containers showing all infrastructure services up and healthy
Step 5: Verify Zookeeper Health
- Check Zookeeper logs:
docker logs kafka-sim-zookeeper --tail 50
- Look for "binding to port 0.0.0.0/0.0.0.0:2181" message
- Verify no error messages in the logs
Figure 10.5: Zookeeper logs showing successful startup and port binding
Step 6: Verify Kafka Broker Health
- Check Kafka broker logs:
docker logs kafka-sim-broker --tail 50
- Look for "[KafkaServer id=1] started" message
- Create the stock_ticks topic:
docker exec kafka-sim-broker kafka-topics --bootstrap-server localhost:9092 --create --topic stock_ticks --partitions 3 --replication-factor 1
- Verify topic creation:
docker exec kafka-sim-broker kafka-topics --bootstrap-server localhost:9092 --describe --topic stock_ticks
- Expected output: Topic with 3 partitions, each with Leader=1 and ISR=1
Figure 10.6: Kafka topic "stock_ticks" with 3 partitions successfully created
Step 7: Verify TimescaleDB Health
- Connect to TimescaleDB:
docker exec -it streamflow-timescaledb psql -U postgres -d streamflow
- Create the stock_ticks table:
CREATE TABLE IF NOT EXISTS stock_ticks (
time TIMESTAMPTZ NOT NULL,
symbol TEXT NOT NULL,
price DOUBLE PRECISION NOT NULL,
volume INTEGER NOT NULL
);
SELECT create_hypertable('stock_ticks', 'time', if_not_exists => TRUE); - Verify table creation:
\\dt
- Exit psql:
\\q
Figure 10.7: TimescaleDB hypertable "stock_ticks" successfully created
Step 8: Verify Docker Network
- Inspect the Docker network:
docker network inspect kafka-stream-sim-main_kafka-network
- Verify all three containers are connected with assigned IP addresses (see Figure 10.8)
- Note the subnet (typically 172.25.0.0/16) and gateway
Figure 10.8: Docker network inspection showing connected containers with IP addresses
Step 9: Verify Docker Volumes
- List Docker volumes:
docker volume ls
- Inspect TimescaleDB volume:
docker volume inspect kafka-stream-sim-main_timescaledb-data
- Note the mountpoint location for data persistence (shown in Figure 10.9)
Figure 10.9: Docker volumes showing persistent storage for stateful services
Part 1 Completion Checklist:
- ✅ Docker Desktop installed and running
- ✅ Zookeeper container running and healthy (Port 2181)
- ✅ Kafka broker container running and healthy (Port 9092)
- ✅ TimescaleDB container running and healthy (Port 5432)
- ✅ Kafka topic "stock_ticks" created with 3 partitions
- ✅ TimescaleDB hypertable "stock_ticks" created
- ✅ Docker network established with all containers connected
- ✅ Docker volumes created for data persistence
11. Procedure - Part 2: Application Services Development
Part 2: Building Microservices for Data Processing and Analysis
Step 1: Build All Application Service Images
- Build all custom Docker images without cache:
docker-compose build --no-cache producer consumer websocket-bridge analytics-consumer historical-storage alert-service api-gateway
- This process will take 5-10 minutes depending on your system
- Verify images are created:
docker images | grep kafka-stream-sim-main
- Expected output: 7 custom images listed (as shown in Figure 11.1)
Figure 11.1: Successfully built custom Docker images for all application services
Step 2: Start Producer Service
- Start the producer container:
docker-compose up -d producer
- Wait 10 seconds for initialization
- Check producer logs:
docker logs kafka-sim-producer --tail 20
- Expected output: Messages showing stock ticks being published to Kafka (refer to Figure 11.2)
- Verify producer is generating data every ~1.5 seconds
Figure 11.2: Producer service logs showing stock tick data being published to Kafka
Step 3: Start Consumer Service
- Start the consumer container:
docker-compose up -d consumer
- Check consumer logs:
docker logs kafka-sim-consumer --tail 30
- Expected output: Formatted table showing stock symbol, price, volume, and timestamp (see Figure 11.3)
- Verify consumer is receiving messages from all 3 Kafka partitions
Figure 11.3: Consumer service displaying formatted stock tick data in tabular format
Step 4: Start Analytics Consumer Service
- Start the analytics consumer:
docker-compose up -d analytics-consumer
- Check analytics logs:
docker logs streamflow-analytics-consumer --tail 30
- Expected output: Calculated SMA (Simple Moving Average), EMA (Exponential Moving Average), and price trends (as shown in Figure 11.4)
- Verify analytics are being calculated for all 5 stock symbols
Figure 11.4: Analytics consumer calculating technical indicators (SMA, EMA, trends)
Step 5: Start Historical Storage Service
- Start the historical storage service:
docker-compose up -d historical-storage
- Check storage logs:
docker logs streamflow-historical-storage --tail 20
- Expected output: Messages showing batch insertions to TimescaleDB (see Figure 11.5)
- Verify data in TimescaleDB:
docker exec streamflow-timescaledb psql -U postgres -d streamflow -c "SELECT COUNT(*) FROM stock_ticks;"
- Expected output: Growing count of records (should increase every 10-15 seconds)
Figure 11.5: Historical storage service performing batch insertions to TimescaleDB
Step 6: Start Alert Service
- Start the alert service:
docker-compose up -d alert-service
- Check alert logs:
docker logs streamflow-alert-service --tail 20
- Expected output: Monitoring messages and alerts when price thresholds are exceeded (shown in Figure 11.6)
- Verify alert service is processing all stock symbols
Figure 11.6: Alert service monitoring thresholds and generating alerts
Step 7: Start API Gateway Service
- Start the API gateway:
docker-compose up -d api-gateway
- Wait 5 seconds for Flask to initialize
- Test the health endpoint:
curl http://localhost:5000/health
- Expected output: JSON response with status "healthy" (refer to Figure 11.7)
- Test the stocks endpoint:
curl http://localhost:5000/api/stocks
- Expected output: JSON array with stock data from TimescaleDB
Figure 11.7: API Gateway health check endpoint returning successful response
Step 8: Start WebSocket Bridge Service
- Start the WebSocket bridge:
docker-compose up -d websocket-bridge
- Check WebSocket bridge logs:
docker logs kafka-sim-websocket-bridge --tail 20
- Expected output: "WebSocket server listening on port 8080" and Kafka consumer group messages (see Figure 11.8)
- Verify WebSocket server is accessible:
curl http://localhost:8080/health
Figure 11.8: WebSocket bridge service successfully initialized and listening on port 8080
Step 9: Verify Kafka Topic Message Flow
- Check Kafka topic statistics:
docker exec kafka-sim-broker kafka-topics --bootstrap-server localhost:9092 --describe --topic stock_ticks
- Verify all 3 partitions have leaders and in-sync replicas
- Check consumer group lag:
docker exec kafka-sim-broker kafka-consumer-groups --bootstrap-server localhost:9092 --describe --all-groups
- Expected output: Multiple consumer groups with low lag (< 10 messages), as shown in Figure 11.9
Figure 11.9: Kafka consumer groups showing message processing with minimal lag
Step 10: Verify End-to-End Data Flow
- Query TimescaleDB for recent data:
docker exec streamflow-timescaledb psql -U postgres -d streamflow -c "SELECT symbol, COUNT(*) as count, MIN(time) as first_record, MAX(time) as last_record FROM stock_ticks GROUP BY symbol ORDER BY symbol;"
- Expected output: All 5 symbols with growing record counts (see Figure 11.10)
- Verify data freshness (last_record should be within last 5 seconds)
- Calculate average latency from producer to database
Figure 11.10: TimescaleDB query results showing stock tick records for all symbols
Part 2 Completion Checklist:
- ✅ Producer service running and publishing stock ticks every ~1.5 seconds
- ✅ Consumer service running and displaying formatted data
- ✅ Analytics consumer calculating SMA, EMA, and trends
- ✅ Historical storage service inserting batches to TimescaleDB
- ✅ Alert service monitoring thresholds and generating alerts
- ✅ API Gateway serving RESTful endpoints (Port 5000)
- ✅ WebSocket Bridge streaming real-time data (Port 8080)
- ✅ Kafka topic processing messages across 3 partitions
- ✅ TimescaleDB accumulating stock tick records
- ✅ End-to-end latency < 2 seconds verified
12. Procedure - Part 3: Monitoring and User Interface
Part 3: Implementing Observability and Interactive Visualization
Step 1: Start Prometheus Monitoring
- Start Prometheus container:
docker-compose up -d prometheus
- Wait 10 seconds for Prometheus to initialize
- Access Prometheus web UI:
Open browser: http://localhost:9090
- Verify Prometheus is scraping targets: Navigate to Status → Targets
- Expected output: All application services listed with "UP" status (refer to Figure 12.1)
- Test a PromQL query:
up{job="kafka-producer"}
Figure 12.1: Prometheus web interface showing all targets being successfully scraped
Step 2: Start Grafana Dashboards
- Start Grafana container:
docker-compose up -d grafana
- Wait 15 seconds for Grafana to initialize
- Access Grafana web UI:
Open browser: http://localhost:3001
- Login with default credentials: admin / admin (change password when prompted)
- Add Prometheus data source (as shown in Figure 12.2):
- Navigate to Configuration → Data Sources → Add data source
- Select Prometheus
- URL: http://prometheus:9090
- Click "Save & Test"
Figure 12.2: Grafana login page and dashboard configuration interface
Step 3: Create Grafana Dashboard
- Create a new dashboard: Click "+" → Dashboard → Add new panel
- Configure panels for:
- Kafka message rate:
rate(kafka_messages_total[1m]) - Database record count:
timescaledb_records_total - API request rate:
rate(http_requests_total[1m]) - System CPU/Memory usage
- Kafka message rate:
- Save the dashboard with name "StreamFlow Analytics Overview"
Step 4: Build React UI Container
- Build the UI container (this will take 4-5 minutes):
docker-compose build --no-cache ui
- The build process includes:
- Stage 1: Node.js 18-alpine, npm ci (install dependencies), npm run build (React production build)
- Stage 2: Nginx 1.25-alpine, copy build artifacts and nginx.conf
- Verify image is created:
docker images | grep ui
Figure 12.3: React UI Docker image successfully built with multi-stage build process
Step 5: Start React UI Container
- Start the UI container:
docker-compose up -d ui
- Wait 5 seconds for Nginx to start
- Check UI container status:
docker ps | grep ui
- Expected output: Container status "Up X seconds (healthy)"
- Check UI logs for any errors:
docker logs kafka-sim-ui --tail 20
Figure 12.4: UI container logs showing Nginx successfully serving the React application
Step 6: Access and Test Web UI
- Open web browser and navigate to:
http://localhost:3000
- Verify the UI loads with:
- Header showing "StreamFlow Analytics Platform"
- Real-time stock price cards for all 5 symbols
- Historical charts section with timeframe selector (1H, 24H, 7D, 30D)
- Chart type selector (Line, Area, Bar)
- Verify real-time updates (as shown in Figure 12.5):
- Stock prices should update every ~1.5 seconds
- Price changes should show green (up) or red (down) indicators
- Volume numbers should update in real-time
Figure 12.5: StreamFlow Analytics Platform web interface showing real-time stock prices
Step 7: Test Historical Charts
- Click on different stock symbols (AAPL, MSFT, GOOGL, AMZN, TSLA)
- Switch between timeframes (1H, 24H, 7D, 30D)
- Verify charts render without flickering (useMemo optimization)
- Switch between chart types (Line, Area, Bar)
- Verify smooth transitions and no performance issues (demonstrated in Figure 12.6)
Figure 12.6: Interactive historical charts with multiple timeframes and chart types
Step 8: Test WebSocket Connection
- Open browser developer tools (F12)
- Navigate to Console tab
- Look for Socket.IO connection messages
- Expected output: "Socket.IO connected" and real-time stock tick messages (see Figure 12.7)
- Verify WebSocket connection is stable (no disconnections)
Figure 12.7: Browser console showing successful Socket.IO WebSocket connection
Step 9: Verify All Containers Running
- Check all container statuses:
docker-compose ps
- Expected output: All 13 containers showing "Up" status with "(healthy)" indicator (as shown in Figure 12.8)
- Verify port mappings:
- Zookeeper: 2181
- Kafka: 9092, 29092, 9101
- TimescaleDB: 5432
- Prometheus: 9090
- Grafana: 3001
- API Gateway: 5000
- WebSocket Bridge: 8080
- UI: 3000 (internal, accessed via Nginx)
Figure 12.8: All 13 containers running and healthy in the StreamFlow Analytics Platform
Step 10: End-to-End Latency Test
- Monitor producer logs for a specific stock tick timestamp
- Check when the same tick appears in the UI
- Calculate latency: UI display time - Producer publish time
- Expected latency: < 2 seconds (typically < 1 second)
- Verify data consistency across all services (demonstrated in Figure 12.9)
Figure 12.9: End-to-end latency verification showing sub-second data flow from producer to UI
Part 3 Completion Checklist:
- ✅ Prometheus running and scraping all service metrics (Port 9090)
- ✅ Grafana running with Prometheus data source configured (Port 3001)
- ✅ Grafana dashboard created showing system metrics
- ✅ React UI container built and running (Port 3000)
- ✅ Web UI accessible and displaying real-time stock data
- ✅ Historical charts rendering without flickering
- ✅ WebSocket connection stable and receiving real-time updates
- ✅ All 13 containers running and healthy
- ✅ End-to-end latency < 2 seconds verified
- ✅ Complete system operational and observable
13. Modifications Done to Containers After Downloading
The following five key modifications were made to the original containers to fix critical issues and optimize performance:
Modification 1: Fixed Nginx Service Name Configuration in UI Container
Issue: The UI container was continuously restarting with a critical error because the nginx configuration file referenced an incorrect WebSocket service name "websocket-bridge-service" that did not exist in the Docker Compose network topology.
Solution: Updated the nginx configuration file (ui/nginx.conf) at lines 71 and 85 to use the correct service name "kafka-sim-websocket-bridge" as defined in docker-compose.yml. This fix enabled proper DNS resolution within the Docker network and allowed the UI container to successfully proxy WebSocket connections to the bridge service.
Impact: UI container now starts successfully without restarts, WebSocket connections are properly proxied, and API requests are correctly routed to backend services.
Modification 2: Implemented React Performance Optimization to Eliminate Chart Flickering
Issue: The historical charts component was experiencing severe flickering and constant re-rendering, causing poor user experience and excessive CPU usage (~60% higher than necessary) because the useEffect hook was regenerating chart data on every stock tick update (every 1.5 seconds).
Solution: Replaced the problematic useEffect implementation with React's useMemo hook in ui/src/components/HistoricalCharts.tsx (lines 127-139). The memoized historical data generation now only recalculates when the selected symbol or timeframe changes, not on every real-time price update. Removed state.stockData from the dependency array to prevent unnecessary recomputation.
Impact: Charts render smoothly without flickering, CPU usage reduced by approximately 60% during chart display, improved user experience when switching timeframes, and historical data regenerates only when necessary (symbol/timeframe changes).
Modification 3: Resolved Kafka Broker Startup Issues with Stale Zookeeper State
Issue: Kafka broker repeatedly failed to start with the error "org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists" because a stale ephemeral node from a previous session persisted in Zookeeper at the path /brokers/ids/1, preventing the broker from registering itself with the cluster.
Solution: Performed a complete cleanup by removing all containers AND volumes using "docker-compose down -v" followed by a fresh deployment. This approach cleared all stale Zookeeper state and ensured clean initialization of the Kafka cluster with proper broker registration and leader election.
Impact: Kafka broker starts successfully without node conflicts, clean Zookeeper state ensures proper cluster coordination, and all 3 Kafka partitions have correctly assigned leaders and in-sync replicas (ISRs).
Modification 4: Implemented Docker Build Cache Invalidation for Consistent Deployments
Issue: Docker builds were utilizing cached layers from previous builds, resulting in containers running outdated code despite source code modifications. This caching behavior caused inconsistencies between the local development environment and the actual deployed code, leading to confusing debugging sessions.
Solution: Standardized all Docker image builds to use the --no-cache flag, ensuring that every build process starts from scratch and incorporates the latest code changes. Applied this approach consistently across all custom service builds (UI, Producer, Consumer, Analytics, Storage, Alert, API Gateway, WebSocket Bridge).
Impact: All code changes are properly reflected in deployed containers, eliminated issues with stale dependencies or outdated configurations, consistent builds across different development and deployment environments, and improved debugging process by ensuring code-to-container fidelity.
Modification 5: Enhanced Error Handling and Logging Across All Python Services
Issue: Several Python-based microservices (Producer, Consumer, Analytics, Storage, Alert, API Gateway) lacked comprehensive error handling and detailed logging, making it difficult to diagnose issues in production. Silent failures in Kafka consumers and database operations resulted in data loss without clear indicators.
Solution: Implemented structured exception handling with try-catch blocks, added detailed logging statements at critical points (Kafka connections, database operations, API calls), configured proper logging levels (DEBUG, INFO, ERROR) with timestamps and contextual information, and added graceful shutdown handlers for all services to ensure proper resource cleanup.
Impact: Improved system observability with detailed logs for troubleshooting, early detection of issues before they cascade into system failures, graceful degradation when individual services encounter errors, and enhanced production readiness with better monitoring and alerting capabilities.
14. GitHub/Docker Hub Links of Modified Containers
The modified containers and source code are available at the following locations:
Docker Hub Links (Pull Commands)
docker pull rohithgk1809/streamflow-producer
# Consumer Service
docker pull rohithgk1809/streamflow-consumer
# Analytics Consumer Service
docker pull rohithgk1809/streamflow-analytics:v1.0.0
# Historical Storage Service
docker pull rohithgk1809/streamflow-storage:v1.0.0
# Alert Service
docker pull rohithgk1809/streamflow-alerts:v1.0.0
# API Gateway Service
docker pull rohithgk1809/streamflow-api:v1.0.0
# WebSocket Bridge Service
docker pull rohithgk1809/streamflow-websocket-bridge:v1.0.0
# React UI Service
docker pull rohithgk1809/streamflow-ui
Container Details and Links
| Container Name | Docker Hub Repository | GitHub Source | Modifications |
|---|---|---|---|
| kafka-sim-ui | streamflow-ui | View Source | Nginx config fix + React performance optimization |
| kafka-sim-producer | streamflow-producer | View Source | Enhanced error handling and logging |
| kafka-sim-consumer | streamflow-consumer | View Source | Improved error handling |
| kafka-sim-websocket-bridge | streamflow-websocket-bridge:v1.0.0 | View Source | Connection stability improvements |
| streamflow-analytics-consumer | streamflow-analytics:v1.0.0 | View Source | Enhanced analytics calculations |
| streamflow-historical-storage | streamflow-storage:v1.0.0 | View Source | Database connection pooling |
| streamflow-alert-service | streamflow-alerts:v1.0.0 | View Source | Configurable alert thresholds |
| streamflow-api-gateway | streamflow-api:v1.0.0 | View Source | CORS configuration and endpoint optimization |
Main Project Repository: https://github.com/rZk1809/StreamFlow-Analytics-Platform
Docker Compose File: docker-compose.yml
Documentation: README.md
15. Outcomes of the Project
Quantitative Outcomes:
- 13 Containers Successfully deployed and running in healthy state
- 1,743+ Records Stock ticks stored in TimescaleDB with continuous growth
- < 1 Second End-to-end latency from producer to UI display
- 5 Stock Symbols Real-time processing (AAPL, MSFT, GOOGL, AMZN, TSLA)
- 3 Kafka Partitions Distributed message processing with proper load balancing
- ~1.5 Seconds Producer interval for stock tick generation
- 15 Seconds Prometheus scrape interval for metrics collection
- 7 Microservices Custom-built application services in Python and Node.js
- 6 Docker Volumes Persistent storage for stateful services
- 8 Exposed Ports Service endpoints accessible for monitoring and interaction
- 100% Uptime All services running without crashes after fixes applied
- 60% CPU Reduction Achieved through React useMemo optimization
Qualitative Outcomes:
- Distributed Systems Understanding: Gained hands-on experience with Apache Kafka's publish-subscribe model, partition-based message distribution, and consumer group coordination.
- Microservices Architecture: Successfully implemented loosely coupled services with single responsibilities, demonstrating scalability and maintainability principles.
- Containerization Expertise: Mastered Docker containerization, multi-stage builds, Docker Compose orchestration, networking, and volume management.
- Time-Series Data Management: Learned TimescaleDB hypertable concepts, efficient time-based queries, and batch insertion strategies for high-throughput scenarios.
- Real-Time Web Technologies: Implemented WebSocket-based real-time communication using Socket.IO, bridging Kafka messages to browser clients.
- Modern Frontend Development: Built a responsive React application with Material-UI components, Recharts visualizations, and performance optimizations.
- Observability Practices: Configured Prometheus metrics collection and Grafana dashboards for comprehensive system monitoring and alerting.
- Problem-Solving Skills: Debugged and resolved complex issues including nginx configuration errors, React performance problems, and Kafka/Zookeeper state conflicts.
- API Design: Created RESTful endpoints with Flask, implementing proper CORS handling and health check patterns.
- Multi-Language Development: Worked with Python, Node.js, TypeScript, and SQL in a single integrated system.
- DevOps Practices: Implemented infrastructure-as-code with Docker Compose, automated builds, and container health checks.
- Performance Optimization: Applied React hooks (useMemo, useCallback) to prevent unnecessary re-renders and improve UI responsiveness.
Technical Skills Acquired:
| Category | Skills Learned | Proficiency Level |
|---|---|---|
| Message Brokers | Apache Kafka, Zookeeper, Topics, Partitions, Consumer Groups | Intermediate |
| Databases | PostgreSQL, TimescaleDB, Hypertables, Time-series queries | Intermediate |
| Containerization | Docker, Docker Compose, Multi-stage builds, Networking, Volumes | Advanced |
| Backend Development | Python, Flask, kafka-python, psycopg2, Node.js, Express | Intermediate |
| Frontend Development | React, TypeScript, Material-UI, Recharts, Socket.IO Client | Intermediate |
| Monitoring | Prometheus, PromQL, Grafana, Metrics collection, Dashboards | Beginner-Intermediate |
| Web Servers | Nginx, Reverse proxy, WebSocket proxying, Static file serving | Intermediate |
| Real-Time Communication | WebSockets, Socket.IO, Event-driven architecture | Intermediate |
Project Deliverables:
- ✅ Fully functional real-time stock market data processing platform
- ✅ 13 containerized services orchestrated with Docker Compose
- ✅ Interactive web dashboard with real-time updates and historical charts
- ✅ Comprehensive monitoring setup with Prometheus and Grafana
- ✅ RESTful API for programmatic access to stock data
- ✅ Technical documentation covering architecture, setup, and troubleshooting
- ✅ Source code repository with version control
- ✅ Docker images published to Docker Hub
- ✅ Performance benchmarks and latency measurements
- ✅ This comprehensive project report
16. References
Official Documentation and Tutorials:
- Docker Inc. (2024). Docker Documentation. Retrieved from https://docs.docker.com/
- IIT Bombay Department of Computer Science and Engineering. (2024). Docker Tutorial. Retrieved from https://spoken-tutorial.org/tutorial-search/?search_foss=Docker&search_language=English
- Apache Software Foundation. (2024). Apache Kafka Documentation. Retrieved from https://kafka.apache.org/documentation/
- Confluent Inc. (2024). Confluent Platform Documentation. Retrieved from https://docs.confluent.io/
- Timescale Inc. (2024). TimescaleDB Documentation. Retrieved from https://docs.timescale.com/
- Meta Platforms Inc. (2024). React Documentation. Retrieved from https://react.dev/
- MUI Team. (2024). Material-UI Documentation. Retrieved from https://mui.com/
- Prometheus Authors. (2024). Prometheus Documentation. Retrieved from https://prometheus.io/docs/
- Grafana Labs. (2024). Grafana Documentation. Retrieved from https://grafana.com/docs/
- Pallets Projects. (2024). Flask Documentation. Retrieved from https://flask.palletsprojects.com/
- Socket.IO. (2024). Socket.IO Documentation. Retrieved from https://socket.io/docs/
Books and Academic Resources:
- Narkhede, N., Shapira, G., & Palino, T. (2017). Kafka: The Definitive Guide. O'Reilly Media.
- Poulton, N. (2023). Docker Deep Dive. Nigel Poulton.
- Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media.
- Newman, S. (2021). Building Microservices (2nd ed.). O'Reilly Media.
Docker Hub Container Images:
- Confluent Inc. (2024). confluentinc/cp-kafka. Docker Hub. Retrieved from https://hub.docker.com/r/confluentinc/cp-kafka
- Confluent Inc. (2024). confluentinc/cp-zookeeper. Docker Hub. Retrieved from https://hub.docker.com/r/confluentinc/cp-zookeeper
- Timescale Inc. (2024). timescale/timescaledb. Docker Hub. Retrieved from https://hub.docker.com/r/timescale/timescaledb
- Prometheus Authors. (2024). prom/prometheus. Docker Hub. Retrieved from https://hub.docker.com/r/prom/prometheus
- Grafana Labs. (2024). grafana/grafana. Docker Hub. Retrieved from https://hub.docker.com/r/grafana/grafana
Online Learning Resources:
- Stack Overflow. (2024). Community Q&A Platform. Retrieved from https://stackoverflow.com/
- Medium. (2024). Various articles on Apache Kafka, Docker, and React performance optimization.
- YouTube Channels: TechWorld with Nana, freeCodeCamp, Confluent, Tech Primers - Docker and Kafka Tutorials.
17. Acknowledgements
I would like to express my sincere gratitude to Vellore Institute of Technology (VIT) and the School of Computer Science and Engineering (SCOPE) for providing me with the opportunity to work on this comprehensive and challenging project. The Cloud Computing course with our respected faculty Dr. Subbulakshmi T, has been instrumental in developing my understanding of distributed systems, containerization technologies, and modern software architecture patterns. I am deeply grateful to my course instructor for their guidance, valuable feedback, and continuous encouragement throughout the three phases of this project.
Special thanks to the Department of Computer Science and Engineering at IIT Bombay for their excellent Docker tutorial, which served as a foundational resource for understanding containerization concepts and best practices. The comprehensive documentation and practical examples provided by the open-source communities behind Docker, Apache Kafka, TimescaleDB, React, Prometheus, and Grafana have been invaluable learning resources. I am particularly grateful to Confluent Inc. for their extensive Kafka documentation and tutorials that helped me understand the intricacies of distributed message brokers and stream processing.
I would like to acknowledge the support of my family and friends who encouraged me throughout this project, especially during challenging debugging sessions. Their patience and understanding when I spent long hours working on resolving issues like the nginx configuration error, React performance optimization, and Kafka-Zookeeper state conflicts were truly appreciated. I am also thankful to my classmates and peers in the Cloud Computing course for collaborative learning sessions, knowledge sharing, and constructive discussions about microservices architecture and distributed systems design patterns.
My gratitude extends to the vibrant online developer communities including Stack Overflow, Reddit communities (r/docker, r/kafka, r/reactjs), and various Discord servers focused on DevOps and cloud-native technologies. The collective wisdom shared by thousands of developers on these platforms helped me troubleshoot issues, understand best practices, and discover elegant solutions to complex problems. I am deeply appreciative of all the open-source contributors who dedicate their time and expertise to building and maintaining the technologies that powered this project - from the core infrastructure components to the frontend libraries and monitoring tools.
Finally, I would like to acknowledge Docker Inc., Meta Platforms (for React), Pallets Projects (for Flask), the Node.js Foundation, Apache Software Foundation, Timescale Inc., Prometheus Authors, and Grafana Labs for creating and maintaining the exceptional open-source technologies that made this project possible. The availability of high-quality, well-documented, and freely accessible tools has democratized access to enterprise-grade software development capabilities, enabling students like me to build production-grade distributed systems and gain practical experience with industry-standard technologies. This project has been a transformative learning experience, and I am grateful to everyone who contributed to making it possible.
18. Conclusion
The StreamFlow Analytics Platform project has been a comprehensive learning journey through modern distributed systems, containerization, and real-time data processing technologies. Over the course of three project phases (Part 1, Part 2, Part 3), we successfully built a production-grade microservices architecture that processes stock market data in real-time, demonstrating industry-standard practices in cloud-native application development.
Starting with Part 1 (as illustrated in Figure 8.1), we established a solid infrastructure foundation by deploying Apache Kafka for distributed messaging, Zookeeper for cluster coordination, and TimescaleDB for time-series data persistence. This layer provided the critical services needed for reliable, scalable data processing. We learned the importance of proper container networking, volume management for data persistence, and health checks for service reliability. The infrastructure layer taught us that distributed systems require careful coordination and state management, as evidenced by the Zookeeper node conflict we resolved through complete cleanup of stale state.
In Part 2 (depicted in Figure 8.2), we developed seven custom microservices that implement the core business logic of the platform. The Producer service generates realistic stock tick data, while multiple consumers process this data in parallel for different purposes - display, analytics, storage, alerting, and API serving. This phase reinforced the principles of loose coupling, single responsibility, and asynchronous communication through message queues. We gained practical experience with Python's kafka-python library, Flask for REST APIs, Node.js for WebSocket bridging, and PostgreSQL for data persistence. The challenge of maintaining low latency (<2 seconds end-to-end) taught us the importance of efficient batch processing and proper consumer group configuration.
Part 3 (shown in Figure 8.3) focused on observability and user experience, implementing Prometheus for metrics collection, Grafana for visualization, and a React-based web UI for real-time stock monitoring. This phase highlighted the critical importance of monitoring in production systems - without Prometheus and Grafana, we would have no visibility into system health, message throughput, or performance bottlenecks. The React UI development taught us modern frontend practices including component composition, state management with hooks, performance optimization with useMemo, and real-time updates with Socket.IO. The nginx configuration fix and React flickering resolution demonstrated the value of systematic debugging and understanding the root causes of issues rather than applying superficial fixes.
Throughout this project, we encountered and resolved several significant challenges: nginx service name mismatches causing container restarts, React performance issues due to unnecessary re-renders, Kafka broker startup failures from stale Zookeeper state, Docker build cache inconsistencies, and the need for enhanced error handling and logging across all services. Each challenge provided valuable learning opportunities and reinforced the importance of reading error messages carefully, understanding system architecture, and applying systematic troubleshooting approaches. The five key modifications we made to fix these issues (detailed in Section 13) are now documented and can benefit future developers working on similar systems.
The quantitative outcomes speak to the project's success: 13 healthy containers processing data with sub-second latency, 1,743+ records stored and growing, 60% CPU reduction from performance optimizations, and 100% uptime after fixes. More importantly, the qualitative outcomes demonstrate deep learning in distributed systems, microservices architecture, containerization, real-time web technologies, and DevOps practices. We now have practical experience with technologies that are widely used in industry - Kafka for event streaming, Docker for containerization, React for modern UIs, Prometheus/Grafana for observability, and TimescaleDB for time-series data.
This project has prepared us for real-world software engineering challenges in cloud-native environments. We understand how to design scalable systems, implement fault-tolerant architectures, monitor production applications, optimize performance, and debug complex distributed systems. The skills acquired - from writing Dockerfiles and docker-compose.yml to implementing React components and Prometheus metrics - are directly applicable to industry roles in backend development, DevOps, site reliability engineering, and full-stack development.
The complete end-to-end system architecture (Figure 8.4) integrates all three layers seamlessly, demonstrating how infrastructure components, application services, and monitoring/UI layers work together to create a functional, observable, and maintainable distributed system. The StreamFlow Analytics Platform successfully demonstrates a complete, end-to-end real-time data processing system built with modern technologies and best practices. The three-part project structure (infrastructure, application services, monitoring/UI) provided a logical progression that built knowledge incrementally and allowed for systematic development and debugging.
The project is not just a learning exercise but a functional platform that could be extended with additional features like machine learning predictions, advanced alerting rules, multi-region deployment, or integration with real stock market APIs. The modular microservices architecture makes such extensions straightforward - new services can be added without modifying existing components, demonstrating the power of well-designed distributed systems. We are grateful for the learning opportunity and the hands-on experience with technologies that power modern distributed systems. This project has significantly enhanced our understanding of cloud computing, containerization, distributed systems, and full-stack development, preparing us for future challenges in the rapidly evolving field of software engineering.
Comments
Post a Comment