Designing and deploying AI architectures that scale — from model training pipelines and inference infrastructure to agentic systems and LLM integrations.
I am an AI Engineer and Systems Architect focused on building production-grade infrastructure and machine learning systems. I bring innovation to traditional DevOps and Infrastructure Engineering by leverage AI solutions and architecture that has proven reliable.
My work sits at the intersection of large language models, agentic systems, and distributed computing. I design pipelines that are fast, reliable, and built to evolve with the technology.
AI systems are not just technically impressive, they are thoughtfully architected, maintainable, and grounded in real engineering principles.
Transformer design, attention mechanisms, context window engineering, and KV-cache optimisation for high-throughput inference.
End-to-end model training, distributed data parallelism, gradient checkpointing, and mixed-precision fine-tuning at scale.
Vector store design, embedding pipelines, hybrid search, re-ranking, and context augmentation for grounded generation.
Multi-agent orchestration, tool use, memory systems, and autonomous task planning using LangChain, LlamaIndex, and custom frameworks.
Model serving with vLLM and TensorRT-LLM, containerised deployments, CI/CD for ML, and GPU cluster management on AWS/GCP.
Pinecone, Weaviate, Qdrant, and pgvector — indexing strategies, ANN search, and embedding dimension reduction for production retrieval.
Python for ML pipelines, Rust for performance-critical inference kernels and low-latency data processing layers.
Instruction tuning, DPO, PPO-based RLHF, LoRA and QLoRA adapters for efficient domain-specific model customisation.
Every system I build is designed for reliability at scale — from ingestion and embedding through retrieval, generation, and evaluation loops that close the feedback cycle.