Basic Qualifications
- Ph.D. plus at least 6 years of experience in Applied Research or M.S. plus at least 8 years of experience in Applied Research
- At least 5 years of people leadership experience
Preferred Qualifications
- PhD in Computer Science, Machine Learning, Computer Engineering, Applied Mathematics, Electrical Engineering or related fields
- LLM
- PhD focus on NLP or Masters with 10 years of industrial NLP research experience
- Core contributor to team that has trained a large language model from scratch (10B + parameters, 500B+ tokens)
- Numerous publications at ACL, NAACL and EMNLP, Neurips, ICML or ICLR on topics related to the pre-training of large language models (e.g. technical reports of pre-trained LLMs, SSL techniques, model pre-training optimization)
- Has worked on an LLM (open source or commercial) that is currently available for use
- Demonstrated ability to guide the technical direction of a large-scale model training team
- Experience working with 500+ node clusters of GPUs Has worked on LLM scaled to 70B parameters and 1T+ tokens
- Experience with common training optimization frameworks (deep speed, nemo)
- Behavioral Models
- PhD focus on topics in geometric deep learning (Graph Neural Networks, Sequential Models, Multivariate Time Series)
- Member of technical leadership for model deployment for a very large user behavior model
- Multiple papers on topics relevant to training models on graph and sequential data structures at KDD, ICML, NeurIPs, ICLR
- Worked on scaling graph models to greater than 50m nodes Experience with large scale deep learning based recommender systems
- Experience with production real-time and streaming environments
- Contributions to common open source frameworks (pytorch-geometric, DGL)
- Proposed new methods for inference or representation learning on graphs or sequences
- Worked datasets with 100m+ users
- Optimization (Training & Inference)
- PhD focused on topics related to optimizing training of very large language models
- 5+ years of experience and/or publications on one of the following topics: Model Sparsification, Quantization, Training Parallelism/Partitioning Design, Gradient Checkpointing, Model Compression
- Finetuning
- PhD focused on topics related to guiding LLMs with further tasks (Supervised Finetuning, Instruction-Tuning, Dialogue-Finetuning, Parameter Tuning)
- Demonstrated knowledge of principles of transfer learning, model adaptation and model guidance
- Experience deploying a fine-tuned large language model
- Data Preparation
- Numerous Publications studying tokenization, data quality, dataset curation, or labeling
- Leading contributions to one or more large open source corpus (1 Trillion + tokens)
- Core contributor to open source libraries for data quality, dataset curation, or labeling
Capital One will consider sponsoring a new qualified applicant for employment authorization for this position