Banner Image

Skills

  • AI Development
  • Artificial Intelligence
  • Data Management
  • Data Modeling
  • DevOps
  • Machine Learning
  • Mathematics
  • SAS
  • Software Development

Services

  • DevOps SRE AI Development

    $10/hr Starting at $100 Ongoing

    Dedicated Resource

    I work at the intersection of DevOps, Site Reliability Engineering, and AI development, building systems that are designed to survive real-world pressure, not slide decks. My focus is on creating resilient,...

    AI DevelopmentArtificial IntelligenceData ManagementData ModelingDevOps

About

I’m an AI Systems Architect and Senior Platform Engineer who enjoys tackling the kind of distributed systems problems most people run away from. With 24 years of experience across AI/ML engineering, cloud architecture, operating systems, SRE, and backend development, I’ve spent my career building platforms that need to be fast, predictable, scalable, and impossible to kill.

My background blends deep engineering with hands-on coding — Rust, Go, Python, PyTorch, CUDA, K8s internals, compiler work, and OS-level performance tuning. I move comfortably between designing LLM platforms at scale and writing the low-level code that actually makes them efficient.

I completed my MEng at MIT, where I authored a 2018 research paper that later inspired foundational techniques used in the Gemini 3 family. Proud of that? Absolutely. But I stay grounded — in the end, the work speaks louder than the resume.

Over the years, I’ve built everything from internal orchestrators at Globo.com (pre-Kubernetes era) to modern multi-tenant AI platforms with vLLM, TensorRT-LLM, ONNX, DeepSpeed, and GPU-aware autoscaling. I’ve engineered predictive incident systems using Active Learning, anomaly detection, and telemetry embeddings that identify failures before traditional SRE signals even twitch. I’ve built a custom edge-optimized micro-OS to run inference close to users, integrated into a global traffic layer that balanced workloads across 280 CDN POPs.

I also extend Kubernetes with custom operators, controllers, and scheduling logic. I’ve written a smart Rust compiler plugin to optimize runtime binaries with SIMD, constant folding, and ML-guided heuristics — squeezing real performance out of machines that should’ve died two cycles ago.

Where I shine is at the intersection of AI infrastructure, distributed systems, and platform engineering:
• LLM inference optimization (KV-cache, quantization, batching, speculative decoding)
• High-throughput data & event platforms (Flink, Debezium, Kafka, Iceberg, Redis/Dragonfly)
• Cloud-native architecture across AWS, GCP, Azure, OCI
• GitOps, SRE, reliability automation, chaos engineering
• Multi-tenant isolation, security, governance, compliance
• End-to-end ML/LLM lifecycle, evaluation, and observability

I lead teams with humor, clarity, and zero ego, and I’m known for keeping even the most complex systems understandable. My philosophy is simple: engineer things so cleanly that people forget how complicated they really are.

Attachments (Click to Preview)