About the Role
We're an online gaming company using AI to power and personalize player experiences. This role sits within the AI Engineering team, which is responsible for taking AI capabilities into production. This role focuses primarily on agent systems, with model deployment and inference engineering as a secondary responsibility.
We do not train foundation models from scratch. Our focus is on production AI systems, model adaptation, inference optimization, and agentic applications.
Responsibilities
Agent Systems — Primary
- Design, build, and optimize LLM-powered agents, including planning, tool use, workflow orchestration, and multi-step reasoning
- Architect memory systems, including short-term memory, long-term memory, context management, and session state
- Build and optimize RAG pipelines for relevance, grounding, freshness, and retrieval quality
- Design and operate vector-store infrastructure (e.g., pgvector, Milvus, Qdrant, Weaviate)
- Define evaluation methodologies for agents, prompts, and workflows
- Optimize end-to-end agent quality, latency, reliability, and operating cost
Model Deployment & Inference — Secondary
- Build and operate production inference services that are low-latency, high-concurrency, and highly reliable
- Serve online-learning models (e.g., contextual bandits and reinforcement learning policies) with real-time inference and online parameter or weight updates
- Deploy and optimize AI inference systems for latency, throughput, reliability, and resource efficiency
- Analyze and resolve inference-serving bottlenecks
- Support deployment and serving of recommendation, ranking, and reinforcement learning models developed by research scientists
- Apply lightweight model adaptation techniques (e.g., LoRA, QLoRA, PEFT) when appropriate for domain-specific requirements
MLOps — Supporting Both
- Build and maintain deployment pipelines, observability systems, and tracing infrastructure for agents and serving endpoints
- Monitor quality regression, performance degradation, and model drift
- Maintain version control for models, prompts, datasets, and agent configurations
- Contribute to automated validation, testing, and CI/CD workflows for AI systems
Collaboration
- Partner with research scientists, backend engineers, and data scientists to integrate AI systems into production products
- Document systems, best practices, and internal tooling
- Contribute to engineering standards and operational excellence across AI initiatives
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field
- 3+ years of industry experience in Machine Learning Engineering or related roles
- Strong software and systems engineering experience, including building low-latency, reliable production services in languages such as Go, Rust, C++, or equivalent
- Experience building or supporting real-time inference systems for recommendation, ranking, contextual bandits, reinforcement learning, or similar adaptive machine learning applications Strong experience with PyTorch and the Hugging Face ecosystem
- Experience building production LLM or agent applications (e.g., LangGraph, LlamaIndex, or equivalent frameworks)
- Hands-on experience with RAG systems, embeddings, and vector databases
- Experience evaluating and monitoring LLM or agent systems in production
- Experience deploying and optimizing production machine learning or LLM systems
- Understanding of inference runtime behavior, resource utilization, latency optimization, and production serving performance
- Experience with Docker and Kubernetes
- Experience with cloud platforms such as AWS, GCP, or Azure
- Fluent Mandarin Chinese
Preferred / Nice to Have
- Experience fine-tuning open-weight LLMs using LoRA, QLoRA, PEFT, or related approaches
- Familiarity with the underlying algorithms used in recommender systems, ranking systems, contextual bandits, or reinforcement learning
- Experience with custom GPU kernel development using CUDA or OpenAI Triton
- Experience with graph-level optimization and low-level inference performance tuning
- Experience with large-scale distributed training (e.g., FSDP, DeepSpeed, multi-GPU workloads)
- Experience deploying models to edge environments using TFLite, CoreML, or NPU accelerators
- Strong understanding of CI/CD principles and deployment workflows
- Background in gaming, gaming AI, or player personalization systems
- Experience with distributed systems, Spark, Hadoop, or large-scale data infrastructure
Pay: From $130,000.00 per year
Benefits:
- 401(k)
- 401(k) matching
- Dental insurance
- Health insurance
- Life insurance
- Paid time off
- Parental leave
- Retirement plan
- Vision insurance
Language:
Ability to Commute:
- Irvine, CA 92618 (Required)
Ability to Relocate:
- Irvine, CA 92618: Relocate before starting work (Required)
Work Location: In person