Pyspark JD:ResponsibilitiesData Pipeline Development: Design develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity and Ingestion: Implement and manage data ingestion processes from a variety of sources (relational databases APIs file systems) to the data lake or data warehouse on Transformation and Processing: Use PySpark to process cleanse and transform large datasets into meaningful formats that support analytical needs and business Optimization: Conduct performance tuning of PySpark code and Cloudera components optimizing resource utilization and reducing runtime of ETL Quality and Validation: Implement data quality checks monitoring and validation routines to ensure data accuracy and reliability throughout the and Orchestration: Automate data workflows using tools like Apache Oozie Airflow or similar orchestration tools within the Cloudera and Maintenance: Monitor pipeline performance troubleshoot issues and perform routine maintenance on the Cloudera Data Platform and associated data : Work closely with other data engineers analysts product managers and other stakeholders to understand data requirements and support various datadriven : Maintain thorough documentation of data engineering processes code and pipeline and ExperienceBachelors or Masters degree in Computer Science Data Engineering Information Systems or a related field.3 years of experience as a Data Engineer with a strong focus on PySpark and the Cloudera Data SkillsPySpark: Advanced proficiency in PySpark including working with RDDs DataFrames and optimization Data Platform: Strong experience with Cloudera Data Platform (CDP) components including Cloudera Manager Hive Impala HDFS and Warehousing: Knowledge of data warehousing concepts ETL best practices and experience with SQLbased tools (Hive Impala).Big Data Technologies: Familiarity with Hadoop Kafka and other distributed computing and Scheduling: Experience with Apache Oozie Airflow or similar orchestration and Automation: Strong scripting skills in SkillsStrong analytical and problemsolving verbal and written communication to work independently and collaboratively in a team to detail and commitment to data quality.