Roles and responsibilities
As a Machine Learning Intern, you will be participating in exciting projects covering the end-to-end Data Science lifecycle - from raw data cleaning and exploration with primary and third-party systems, through advanced state-of-the-art data visualization and Machine learning development.
You will work in a modern cloud-based data warehousing environment hosting Machine Learning models alongside alongside a team of diverse, intense and interesting co-workers. You will liaise with other departments - such as product & tech, the core business verticals, trust & safety, finance and others - to enable them to be successful.
In this role, you will:
- Query large datasets with SQL and feed ML models
- Perform data exploration to find patterns in the data and understand the state and quality of the data available
- Utilize Python code for analyzing data and building statistical models to solve specific business problems
- Evaluate ML models and fine tune model parameters considering the business problem behind
- Collaborate with senior peers to Deploy ML models in production
- Build customer-facing reporting tools to provide insights and metrics which track system performance
- Being part and contributing towards a strong team culture and ambition to be on the cutting edge of big data
- Participate in the off-hours on call stability rotation to support live ML models
Requirements
- Bachelor's degree in AI, Statistics, Math, Operations Research, Engineering, Computer Science, or a related quantitative field
- Statistical modelling and math
- Basic knowledge of Machine learning algorithms
- Basic knowledge of SQL
- Basic knowledge of visualization tools such as Periscope
- Excellent verbal and written communication
- Strong problem solving skills
Desired candidate profile
1. Data Preparation and Preprocessing
- Data Cleaning: Assist in preparing and cleaning data for machine learning models. This could include handling missing values, removing outliers, and converting data into appropriate formats.
- Feature Engineering: Help with creating new features or transforming existing data to improve the performance of machine learning algorithms.
- Data Exploration: Perform exploratory data analysis (EDA) to understand data distributions, identify trends, and visualize relationships in the data.
2. Model Building and Evaluation
- Model Implementation: Assist with the implementation of machine learning models such as linear regression, decision trees, support vector machines, and neural networks, using tools like scikit-learn, TensorFlow, or PyTorch.
- Model Training: Help train models using various datasets, tuning hyperparameters to optimize performance.
- Model Evaluation: Evaluate model performance using appropriate metrics like accuracy, precision, recall, F1 score, or AUC-ROC, and assist in interpreting results.
3. Algorithm Research and Testing
- Literature Review: Conduct research on the latest machine learning algorithms and approaches. You might be tasked with reviewing research papers and experiments to help implement cutting-edge methods in practice.
- Experimentation: Run experiments to test different machine learning algorithms, evaluate their performance, and understand how various approaches affect outcomes.
4. Collaboration and Reporting
- Team Collaboration: Work closely with senior data scientists, machine learning engineers, and other team members to develop machine learning models or contribute to data-driven projects.
- Documentation: Document your work, including code, findings, and explanations for model choices and outcomes, so that results can be easily interpreted and reproduced by others.
- Presentation: Present findings to the team, often through reports or short presentations, to share insights or progress on ongoing projects.
5. Tool and Software Usage
- Machine Learning Libraries: Gain experience using libraries and frameworks such as scikit-learn, TensorFlow, Keras, PyTorch, or XGBoost to implement and fine-tune machine learning models.
- Data Manipulation: Use tools like Pandas for data manipulation, NumPy for numerical computations, and Matplotlib or Seaborn for data visualization.
- Version Control: Use Git and GitHub for code version control, helping ensure that your work can be tracked and shared efficiently with team members.