Marco Chen
CS @ UWaterloo, Software Engineer
enjoy working with ML Infra, distributed systems, full-stack.
experience
— Built distributed systems and development tooling for fleet management platform.
— Designed a RAG persistent memory system and a hiearchical memory retrieval strategy.
— Reviewed and curated AI training datasets for LLM Coding Agents.
— Built end-to-end compliance tests in C#, maintained and optimized the Geotab Drive App.
— Built the frontend of a cross-platform desktop application in Next.js, React, and Tauri.
projects
LLM Gateway
High-performance API gateway for large language models with dynamic batching, request prioritization, and real-time monitoring.
Go, gRPC, Redis, Kafka, PostgreSQL/pgvector, ClickHouse, Docker/Kubernetes, Prometheus/Grafana
Nano-vLLM
A minimalistic implementation of a transformer-based LLM inference engine optimized for low-latency and high-throughput on GPU with smart memory management.
Python, CUDA, Multi-Head Attention, KV Cache, Tensor Parallelism, Prefix Cache
High Concurrency Cache System
In-memory cache system optimized for high concurrency workloads with sharding and lock-free data structures.
C++, Concurrency, LRU/LFU/ARC
SnapNote
AI-powered note-taking platform that turns handwritten notes into organized digital documents with RAG semantic search and agentic workflows.
Python, TypeScript, OCR, RAG, LangGraph, PostgreSQL/pgvector, React
WLP4 Compiler
A compiler for the WLP4 language (Simplified C) that generates MIPS assembly code, implemented scanner, parser, semantic analyzer, and code generator from scratch using C++.
C++, Compiler, MIPS
Biquadris
A multiplayer Tetris video game implemented in C++. Supports both CLI and GUI, optimized resources management with RAII, implements software design patterns.
C++, RAII
now
- — learning llm inference engines (building a nano vllm from scratch)
- — exploring CUDA kernels and GPU programming
- — reading Designing Data-Intensive Applications
updated apr 2026