Marco Chen

CS @ UWaterloo, Software Engineer

enjoy working with ML Infra, distributed systems, full-stack.

experience

Software Engineer Intern

GeotabMay 2026 - Aug 2026

— Built distributed systems and development tooling for fleet management platform.

Game Engine Development Apprentice (AIGC)

Tencent GamesJan 2026 - May 2026

— Designed a RAG persistent memory system and a hiearchical memory retrieval strategy.

Agentic AI Trainer

ShipdDec 2025 - April 2026

— Reviewed and curated AI training datasets for LLM Coding Agents.

Software Engineer Intern

GeotabSep 2025 - Dec 2025

— Built end-to-end compliance tests in C#, maintained and optimized the Geotab Drive App.

Software Engineer Intern

Octopodi TechnologiesJan 2025 - April 2025

— Built the frontend of a cross-platform desktop application in Next.js, React, and Tauri.

projects

LLM Gateway

High-performance API gateway for large language models with dynamic batching, request prioritization, and real-time monitoring.

Go, gRPC, Redis, Kafka, PostgreSQL/pgvector, ClickHouse, Docker/Kubernetes, Prometheus/Grafana

Nano-vLLM

A minimalistic implementation of a transformer-based LLM inference engine optimized for low-latency and high-throughput on GPU with smart memory management.

Python, CUDA, Multi-Head Attention, KV Cache, Tensor Parallelism, Prefix Cache

High Concurrency Cache System

In-memory cache system optimized for high concurrency workloads with sharding and lock-free data structures.

C++, Concurrency, LRU/LFU/ARC

SnapNote

AI-powered note-taking platform that turns handwritten notes into organized digital documents with RAG semantic search and agentic workflows.

Python, TypeScript, OCR, RAG, LangGraph, PostgreSQL/pgvector, React

WLP4 Compiler

A compiler for the WLP4 language (Simplified C) that generates MIPS assembly code, implemented scanner, parser, semantic analyzer, and code generator from scratch using C++.

C++, Compiler, MIPS

Biquadris

A multiplayer Tetris video game implemented in C++. Supports both CLI and GUI, optimized resources management with RAII, implements software design patterns.

C++, RAII

now

— learning llm inference engines (building a nano vllm from scratch)
— exploring CUDA kernels and GPU programming
— reading Designing Data-Intensive Applications

updated apr 2026