| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate|PROFESSIONAL SUMMARY | ||Over 7+ of hands-on expertise as a Senior Data Engineer |TECHNICAL SKILLS: ||specializing in Database Development, ETL Development, | ||Data Modeling, Report Development, and Big Data |Cloud Platforms: Amazon web ||Technologies. |Services (AWS), Microsoft ||Proficient in programming languages like Python (Pandas, |Azure, Google Cloud Platform ||NumPy, PySpark, scikit-learn, PyTorch), SQL (including |(GCP) ||PL/SQL for Oracle), Scala, and PowerShell. |Big Data Processing and ||Extensive experience with cloud platforms including AWS |Analytics: Apache Spark, ||(S3, Redshift, RDS, DynamoDB, EMR, Glue, Data Pipeline, |Apache Airflow, Hadoop, Hive,||Kinesis, Athena, QuickSight, Lambda, CloudFormation, |Sqoop, Kafka, Impala, Apache ||CodePipeline), Azure (ADF, SQL Server, Cosmos DB, |Beam ||Databricks, HDInsight, Blob Storage, Data Lake Storage), |Programming and Scripting: ||and Google Cloud Platform (BigQuery, Dataflow, Dataproc, |Python (Pandas, NumPy, ||Pub/Sub, Cloud Storage, Cloud SQL, Cloud Datastore, Cloud |PySpark, scikit-learn, ||Pub/Sub, Apache Beam). |PyTorch), Java Spring ||Proficient in Apache Spark, Apache Airflow, Hadoop, Hive, |Framework, SQL (including ||Sqoop, Kafka, Impala, and Apache Beam for large-scale data|PL/SQL for Oracle), Scala, ||processing and analytics. |PowerShell ||Experience with cloud data warehouses like Redshift, |Data Integration and ETL ||Snowflake, and BigQuery for scalable data storage and |Tools: AWS Glue, AWS Data ||retrieval. |Pipeline, Informatica, ||Skilled in data visualization tools such as Tableau, Power|Talend, SSIS ||BI, Google Data Studio, and QuickSight for creating |Containerization and ||insightful reports and dashboards. |Orchestration: Docker, ||Hands-on experience with data integration tools like |Kubernetes ||Informatica, Talend, and SSIS for seamless data flow | ||across systems. |Version Control and ||In-depth knowledge of various database systems including |Collaboration: Git, GitHub, ||SQL Server, Cosmos DB, Oracle, PostgreSQL, Cassandra, |Bitbucket ||MySQL, and DynamoDB for efficient data storage and |CI/CD: Azure DevOps, Jenkins,||retrieval. |AWS CodePipeline ||Proficient in handling data formats like JSON, XML, and |Data Warehousing and Database||Avro for data interchange and storage. |Management: Redshift, ||Familiarity with containerization technologies like Docker|Snowflake, BigQuery, SQL ||and orchestration tools like Kubernetes for scalable and |Server, Cosmos DB, Oracle, ||manageable deployments. |PostgreSQL, Cassandra, MySQL,||Experience with CI/CD pipelines using Azure DevOps, |DynamoDB ||Jenkins, and AWS CodePipeline for automated software |Data Visualization and BI ||delivery and deployment. |Tools: Tableau, Power BI, ||Proficient in Excel Advanced functions, pivot tables, and |Google Data Studio, ||V Lookups for data analysis and reporting. |QuickSight ||Hands-on experience with AWS Glue, AWS Data Pipeline for |Security and Access Control: ||ETL workflows and data processing. |AWS IAM and AWS KMS, Azure ||Familiarity with version control systems like Git, GitHub,|Key Vault and Azure AD, ||and Bitbucket for collaborative development and code |SSL/TLS, AES encryption ||management. |standards ||Strong understanding of security and access control |Miscellaneous Tools and ||principles including AWS IAM, AWS KMS, Azure Key Vault, |Technologies: JSON, Excel ||Azure AD, SSL/TLS, and AES encryption standards. |Advanced functions, pivot ||Proficient in project management tools like Bugzilla, |tables, V Lookups, Bugzilla, ||Confluence, SharePoint, JIRA, Agile, Scrum, and Kanban for|Confluence, SharePoint, JIRA,||efficient project execution and collaboration. |Agile, Scrum, Kanban ||Analyzed business needs, designed efficient processes, and|Operating Systems: Window, ||managed development teams to deliver successful projects. |Linux, UNIX, macOS ||Implemented data pipelines, ensured system integration, | ||and adapted to new technologies for continuous |EDUCATION: ||improvement. |Bachelors ||Collaborated with stakeholders at all levels to align |CERTIFICATIONS: ||project goals and ensure informed decision-making. |Amazon Web Services(AWS) ||Tackled complex data challenges with strong analytical |JAVA SpringBoot ||skills and a drive to contribute to a dynamic and |Software Development Courses ||innovative environment. | ||WORK EXPERIENCE: || ||J.P. Morgan Chase & Co., Plano, TX ||Sr. Data Engineer | Oct 2023 - Present ||Led the software development lifecycle (SDLC) for data engineering projects, from ||requirements gathering to deployment and maintenance, ensuring quality and efficiency ||throughout the process. ||Managed data storage and retrieval using Amazon S3, optimizing data storage and access ||patterns for scalability and performance. ||Designed and implemented data warehouse solutions using Amazon Redshift, ensuring ||efficient data modeling and query performance for analytics. ||Managed relational databases using Amazon RDS, ensuring data integrity, availability, ||and performance. ||Implemented NoSQL database solutions using Amazon DynamoDB, enabling scalable and ||flexible data storage for various use cases. ||Utilized Amazon EMR for big data processing and analytics, leveraging HDFS, MapReduce, ||Hive, and Pig for distributed computing tasks. ||Implemented ETL processes using AWS Glue and AWS Data Pipeline, ensuring seamless data ||integration and transformation. ||Managed real-time data streams using Amazon Kinesis, enabling stream processing and ||real-time analytics. ||Utilized Amazon Athena for interactive query processing, enabling ad-hoc analysis of ||data stored in S3. ||Developed interactive dashboards and reports using Amazon QuickSight, providing business||intelligence insights to stakeholders. ||Leveraged serverless computing with AWS Lambda for event-driven data processing and ||automation. ||Utilized Apache Spark for distributed data processing and analytics, optimizing data ||workflows and performance. ||Orchestrated data workflows using Apache Airflow, ensuring automation and scheduling of ||data pipelines. ||Applied SQL, Python, and PySpark for data manipulation, analysis, and machine learning ||model development, enhancing data processing capabilities. ||Utilized Scala for Spark programming, optimizing Spark code for performance and ||scalability. ||Managed and analyzed data using Pandas and NumPy, ensuring efficient data processing and||analysis workflows. ||Implemented columnar storage using Apache Parquet, optimizing data storage and query ||performance. ||Processed and transformed XML data formats, enabling structured data processing and ||integration. ||Utilized ERwin for data modeling and database design, ensuring data integrity and ||consistency. ||Managed access control and encryption using AWS IAM and AWS KMS, ensuring data security ||and compliance. ||Implemented encryption standards including SSL/TLS and AES, ensuring data protection ||during transmission and storage. ||Implemented data anonymization techniques and data governance policies, ensuring data ||privacy and compliance with regulations. ||Monitored and managed AWS resources using AWS CloudWatch and AWS CloudTrail, ensuring ||performance optimization and security. ||Managed code repositories and collaborated with teams using Git, ensuring version ||control and code quality. ||Managed project workflows and tasks using JIRA, ensuring collaboration and alignment ||with project goals and timelines. ||Automated infrastructure deployment using AWS CloudFormation, ensuring consistent and ||scalable infrastructure configurations. ||Implemented continuous integration and continuous deployment (CI/CD) pipelines using AWS||CodePipeline, ensuring automated and reliable software delivery. ||Containerized applications using Docker, enabling scalable and portable deployment of ||data solutions. ||Orchestrated containerized applications using Kubernetes, ensuring efficient management ||and scaling of containerized workloads. ||Contributed to Agile methodologies, participating in Scrum ceremonies and sprint ||planning to deliver data solutions iteratively and efficiently. ||Tech Stack: AWS, Redshift, DynamoDB, EMR, AWS Glue, Kinesis, Athena, QuickSight, AWS ||Lambda, HDFS, MapReduce, Hive, Pig, Spark, Airflow, SQL, Python, PySpark, Scala, ||Parquet, XML, ERwin, IAM, KMS, CloudWatch, CloudTrail, GIT, JIRA , AWS CloudFormation, ||Docker, Kubernetes, Agile (Scrum), JIRA.eer || || || || ||American Airlines, DFW, TX ||Data Engineer | Aug 2022 - Sep 2023 ||Designed and implemented data integration workflows using Azure Data Factory (ADF), ||ensuring seamless data movement and transformation across on-premises and cloud ||environments. ||Managed and optimized SQL Server databases, ensuring data integrity, performance, and ||availability for business operations. ||Implemented Azure Cosmos DB for globally distributed and scalable NoSQL database ||solutions, ensuring high availability and low-latency data access. ||Utilized Snowflake for cloud-based data warehousing, enabling scalable and flexible ||analytics solutions. ||Implemented data processing and analytics workflows using Azure Databricks, leveraging ||Apache Spark for distributed computing and machine learning. ||Managed and optimized big data clusters using Azure HDInsight, ensuring efficient data ||processing and analytics capabilities. ||Utilized Azure Blob Storage and Azure Data Lake Storage for scalable and cost-effective ||data storage solutions. ||Automated tasks and workflows using PowerShell, streamlining data management and ||operations. ||Applied Python with Pandas, NumPy, and PyTorch for data manipulation, analysis, and ||machine learning model development, enhancing data processing capabilities. ||Implemented data processing pipelines using Spark, handling large-scale data processing ||and analytics tasks. ||Developed serverless functions using Azure Functions, enabling event-driven data ||processing and automation. ||Managed and optimized Hadoop clusters for distributed data processing and analytics, ||ensuring scalability and performance. ||Implemented Kafka for real-time data streaming and processing, enabling real-time ||analytics and event-driven architectures. ||Managed secrets and access control using Azure Key Vault and Azure Active Directory ||(Azure AD), ensuring data security and compliance. ||Processed and analyzed JSON data formats, enabling structured data processing and ||integration. ||Managed code repositories and collaborated with teams using Bitbucket, ensuring version ||control and code quality. ||Utilized Impala for interactive SQL queries and analytics on Hadoop-based data ||platforms. ||Implemented continuous integration and continuous deployment (CI/CD) pipelines using ||Azure DevOps, ensuring automated and reliable software delivery. ||Monitored and managed Azure resources using Azure Monitor and Azure Log Analytics, ||ensuring performance optimization and troubleshooting. ||Automated infrastructure deployment and management using Terraform, ensuring consistent ||and scalable infrastructure configurations. ||Containerized applications and services using Docker, enabling scalable and portable ||deployment of data solutions. ||Orchestrated containerized applications using Kubernetes, ensuring efficient management ||and scaling of containerized workloads. ||Developed and deployed interactive data visualizations using Power BI, enabling ||data-driven insights and decision-making. ||Contributed to Agile methodologies, participating in Scrum ceremonies and sprint ||planning to deliver data solutions iteratively and efficiently. ||Managed project workflows and tasks using JIRA, ensuring collaboration and alignment ||with project goals and timelines. ||Tech Stack: ADF, SQL Server, Azure Cosmos DB, Snowflake, Azure Databricks, Azure ||HDInsight, Azure Blob Storage, Azure Data Lake Storage, PowerShell, Python, Spark, Azure||Functions, Hadoop, Kafka, JSON, Bitbucket, Impala, Azure DevOps, Azure Monitor, ||Terraform, Docker, Kubernetes, Power BI, JIRA. || ||SONY Play Station , San Mateo, CA ||Data Engineer | Oct 2021 - Aug 2022 ||Implemented Pub/Sub and Cloud Storage for real-time data ingestion and storage, ensuring||reliable and scalable data pipelines. ||Managed Cloud SQL and Cloud Datastore for structured and unstructured data storage, ||maintaining data integrity and accessibility. ||Designed and implemented data streaming pipelines using Cloud Pub/Sub, Apache Beam, and ||Apache Kafka, enabling real-time data processing and analytics. ||Leveraged Apache Spark and Hadoop for distributed data processing and analytics, ||handling large-scale datasets efficiently. ||Orchestrated data workflows using Apache Airflow, ensuring automation and scheduling of ||data pipelines for timely processing. ||Integrated data sources using Sqoop and Informatica, facilitating seamless data ||extraction, transformation, and loading processes. ||Utilized Python with Pandas and NumPy for data manipulation, analysis, and modeling, ||enhancing data processing capabilities. ||Developed data visualizations and dashboards using Data Studio and Google Analytics, ||providing actionable insights to stakeholders. ||Managed code repositories and collaborated with teams using GitHub, ensuring version ||control and code quality. ||Orchestrated containerized applications using Docker and Kubernetes, ensuring ||scalability and reliability of deployed data solutions. ||Automated infrastructure deployment and management using Terraform, optimizing resource ||utilization and cost efficiency. ||Implemented continuous integration and deployment pipelines using Jenkins, ensuring ||seamless delivery of data solutions. ||Utilized ELK Stack (Elasticsearch, Logstash, Kibana) for log analysis and monitoring, ||ensuring data visibility and troubleshooting capabilities. ||Handled data serialization using Avro, ensuring efficient data storage and processing. ||Managed relational databases including PostgreSQL and NoSQL databases like Cassandra, ||ensuring data availability and performance. ||Developed and deployed machine learning models using TensorFlow, enhancing data ||analytics and predictive capabilities. ||Utilized VS Code for code development and debugging, ensuring code efficiency and ||reliability. ||Contributed to Agile and Kanban methodologies, participating in sprint planning, daily ||stand-ups, and backlog grooming to deliver data solutions efficiently. ||Collaborated and documented technical specifications using Confluence, ensuring ||knowledge sharing and documentation of data solutions. ||Tech Stack: Apache Beam, Apache Spark, Apache, Airflow, Hadoop, Kafka, Sqoop, ||Informatica, Python, Scala, Data Studio, Google Analytics, GitHub, Terraform, Jenkins, ||ELK Stack, Avro, PostgreSQL, Cassandra, TensorFlow, VS Code, Agile, Kanban, Confluence.|| || ||Dun&Bradstreet (Cognizant), India ||Data Engineer | Feb 2019 - Sep 2021 ||Utilized Hadoop, Spark, and Hive to process and analyze large volumes of data, ||optimizing performance and scalability for big data applications. ||Implemented Sqoop for efficient data transfer between Hadoop and relational databases, ||ensuring seamless data integration and synchronization. ||Developed complex SQL and PL/SQL queries to extract, transform, and load data from ||diverse sources into data warehouses, improving data accessibility and analysis ||capabilities. ||Managed AWS resources including EC2 instances, S3 buckets, RDS databases, and Lambda ||functions, leveraging cloud services for scalable and cost-effective data solutions. ||Utilized Python with NumPy and Pandas for data manipulation, statistical analysis, and ||machine learning model development, enhancing data processing workflows. ||Maintained version control and collaborated with teams using Git, ensuring code quality,||and facilitating efficient project management. ||Designed and implemented ETL processes using SSIS (SQL Server Integration Services), ||ensuring data quality and consistency across data pipelines. ||Managed bug tracking and issue resolution using Bugzilla, ensuring data integrity and ||timely resolution of data-related issues. ||Contributed to Agile and Kanban methodologies, participating in sprint planning, daily ||stand-ups, and backlog grooming to deliver data solutions efficiently. ||Developed interactive dashboards and visualizations using Tableau, providing ||stakeholders with actionable insights and data-driven decision-making capabilities. ||Collaborated with SharePoint for document management and collaboration, ensuring data ||governance and compliance with organizational standards. ||Implemented data security measures and access controls, ensuring data privacy and ||compliance with regulatory requirements. ||Conducted performance tuning and optimization of database queries and processes, ||improving data processing efficiency and reducing latency. ||Participated in data architecture design and data modeling activities, ensuring ||scalability, flexibility, and performance of data solutions. ||Provided technical expertise and support to cross-functional teams, contributing to the ||successful delivery of data projects and initiatives. ||Tech Stack: Hadoop, Spark, Hive, Sqoop, SQL, PL/SQL, AWS, EC2, S3, RDS, Lambda, Python ||(NumPy, Pandas), Git, SSIS, Bugzilla, Agile, Kanban, Tableau. || ||MINFY Technologies, India ||Data Analyst/ Engineer | June 2017 - Jan 2019 ||Utilized Python, pandas, NumPy, and scikit-learn for data cleaning, preprocessing, ||analysis, and machine learning model development, resulting in improved data quality and||predictive accuracy. ||Leveraged SQL to query, manipulate, and extract insights from large datasets stored in ||Oracle databases, ensuring data integrity and optimizing data retrieval performance. ||Demonstrated expertise in Excel Advanced functions, pivot tables, and V Lookups to ||create interactive dashboards and reports for stakeholders, facilitating data-driven ||decision-making processes. ||Implemented Apache Spark for big data processing and analytics, handling large-scale ||datasets efficiently and performing distributed computing tasks for faster data ||processing. ||Designed and implemented data integration workflows using Talend, ensuring seamless data||flow between heterogeneous systems and maintaining data consistency across platforms. ||Managed version control and collaborated with teams using Git, tracking changes and ||resolving issues efficiently to maintain code quality and project progress. ||Utilized Bugzilla for bug tracking and issue management, ensuring timely resolution of ||data-related issues and maintaining data accuracy. ||Applied data engineering techniques in Hadoop ecosystem, including HDFS, Hive, and ||HBase, to store, process, and analyze large volumes of structured and unstructured data.|| ||Collaborated with cross-functional teams to develop and deploy data pipelines and ETL ||processes, ensuring data availability and reliability for business analytics and ||reporting. ||Contributed to data governance initiatives by establishing data quality standards, ||monitoring data quality metrics, and implementing data cleansing and enrichment ||strategies. ||Tech Stack: Python, pandas, NumPy, scikit-learn, SQL, Excel Advanced functions, pivot ||tables, V Lookups, Spark, Talend, Oracle, Hadoop, Hive, HBase, Git, Bugzilla. |----------------------- Candidate's Name
Sr. Data Engineer EMAIL AVAILABLE PHONE NUMBER AVAILABLE |