| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate Candidate's Name
Data EngineerLocation | Mobile: PHONE NUMBER AVAILABLE | Email: EMAIL AVAILABLE
SUMMARY Accomplished Data Engineer with over all 11+ years of experience in designing, implementing, and optimizing large-scale data infrastructure and pipelines. Proficient in leveraging a variety of data processing and analytics tools to transform raw data into actionable insights.
Demonstrates strong analytical, problem-solving, and collaboration skills, ensuring seamless integration and performance of data systems. Adept at working with cross-functional teams to deliver robust data solutions that drive business value. Expertise in designing and managing data warehouses using platforms like Amazon Redshift, Google Big Query, and Snowflake. Proficient in developing, scheduling, and maintaining ETL/ELT processes with tools like Apache Nefi, Talend, and Informatica. Demonstrated ability in optimizing data infrastructure for performance and cost-efficiency, utilizing monitoring and logging tools to ensure high availability and reliability of data systems. Worked for major health care providers like CIGNA, Humana. Hands on experience in Data Analytics Services such as Athena, Glue, Data Catalog & Quick Sight. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and PySpark concepts. Experienced in implementing machine learning pipelines and integrating advanced analytics solutions, contributing to predictive modelling and real-time decision-making processes. Strong experience in migrating other databases to Snowflake. Extensive experience with Hadoop ecosystem tools including HDFS, Hive, Pig, PySpark and Spark for large-scale data processing. Proficiency in analyzing large and complex datasets to come up with and present answers to several business questions using variety of tools and technologies like T-SQL, Spark SQL, Databricks. Strong knowledge of relational and NoSQL databases such as MySQL, PostgreSQL, Cassandra, and MongoDB and Skilled in deploying and managing data solutions on AWS, Azure, and Google Cloud Platform (GCP). Proficiency in Python, Java, and SQL for data manipulation, analysis, and automation. Experience with integrating diverse data sources including APIs, flat files, and streaming data using tools like Apache Kafka, Apache Flink and Azure Databricks, Spark SQL Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters. Strong abilities in creating logical and physical data models to support business intelligence and analytics. Competent in using BI tools like Tableau, Power BI, and Looker to create insightful data visualizations and reports. Possesses knowledge of implementing data governance and security best practices to ensure data quality and compliance.TECHNICAL SKILLSProgramming Language: Python, PowerShell, Bash, SQLMethodologies:SDLC, Agile, WaterfallPackages:
PyTorch, NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, SeabornVisualization Tools:Microsoft Power BI, Qlik Sense, Grafana, Tableau, Advanced Excel (Pivot Tables, VLOOKUP)IDEs:Jupiter Notebooks, Visual Studio Code, PyCharmDatabase:MongoDB, Redis, MySQL, PostgreSQL, Lambda, Snowflake, Big Query (on GCP), MongoDB, T-SQL, Azure SynapseOther Technical Skills:
Azure, Apache Kafka, Apache Spark, Azure Databricks, Apache NiFi, Flink, Prefect, Apache Beam, Apache Pulsar, AWS Glue, Azure Data Lake Storage, Azure Data Factory, Azure Active Directory, Apache Ranger, Apache Atlas, Azure Data Factory, Prometheus, Splunk, Ansible, Puppet, Nagios, ELK Stack (Elasticsearch, Logstash, Kibana), Docker, Kubernetes, Terraform, SNMP, Data Quality and Governance, Machine Learning Algorithms, Natural Language Process, Big Data, PySpark, Advance Analytics, Statistical Methods, Data Mining, Data Visualization, Data warehousing, Data transformation, Critical Thinking, Communication Skills, Presentation Skills, Problem-SolvingCloud TechnologiesMicrosoft Azure, AWS (Amazon Web Services), GCP (Google Cloud Platform)Version Control Tools:Git, Jenkins, GitHubOperating Systems:Windows, Linux, Mac iOSDOMAIN SKILLSData EngineeringETL/ELT ProcessesBig Data TechnologiesData WarehousingDatabase ManagementCloud PlatformsData IntegrationData ModellingData VisualizationEXPERIENCE
Sr. Data EngineerThe Cigna Group- Dallas-Texas Sep 2023 Till Now Designing and implementing efficient ETL processes to extract, transform, and load large volumes of healthcare data from various sources, ensuring high data quality and performance for datasets. Architecting scalable and resilient data infrastructure that supports complex analytics and reporting needs, aligning with healthcare compliance standards. Establishing and enforcing data governance protocols, including data lineage, data cataloguing, and metadata management, to ensure data integrity and accessibility for clinical and operational analytics. Developing and maintaining real-time data processing frameworks using technologies such as Apache Kafka and Spark, enabling timely decision-making for healthcare providers. Implementing robust data security strategies, including encryption, access controls, and auditing mechanisms, to protect sensitive patient information and comply with regulatory requirements. Experience in moving data between GCP and Azure using Azure Data Factory. Installed the C# libraries (Adjudication Engine) into GAC on the APP. Server, for better visibility and access to internal teams at HUMANA. Used cloud shell SDK in GCP to configure the services Data Proc, Storage, Big Query Azure Data Factory, and Google Cloud Dataflow for building scalable and automated data pipelines. Working closely with data scientists, analysts, and healthcare professionals to understand data needs, translate requirements into technical solutions, and deliver actionable insights. Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using CloudWatch Creating automated data integration workflows to streamline the ingestion and synchronization of disparate data sources, reducing manual intervention and errors by 40%. Designed library for emailing executive reports from Tableau REST API using python, Kubernetes, Git, AWS Code Build, DBT, Stitch Data, and Airflow. Providing guidance to the development team working on PySpark as ETL platform Involved in Migrating Objects from Teradata to Snowflake and Created Snow pipe for continuous data load. Performing comprehensive data quality assessments, identifying anomalies, inconsistencies, and gaps in the data, and implementing corrective actions to maintain data accuracy and reliability. Use Snowflake time travel feature to access historical data. Installed and upgraded MS SQL Server 2008 to SQL Server 2017 HA Always on Availability Group Cluster on premise. Use of AWS Appsync (GraphQL) for web API creation and Data synchronization into Aurora Postgres or Dynamo DB engineer. Design, Development & Implementation of Encounters system for companies like Humana, Centreline, Integral, Village Care, Extended Care etc. Used python Boto 3 to configure the services AWS glue, EC2, SNS, SQS, Elastic search, Dynamo DB, Lambda, S3 Directing data migration efforts from legacy systems to modern data platforms, ensuring minimal disruption to ongoing operations and seamless transition of historical data. Continuously monitoring data pipeline performance and system health, using advanced monitoring tools and techniques to identify bottlenecks and optimize processing efficiency, improving overall system throughput by 30%.Shell Oil. Houston, TX Sep 2022 Aug 2023Data EngineerResponsibilities:
Worked effectively on SQL Profiler, Index Tuning Wizard, Estimated Query Plan to optimize the performance tuning of SQL Queries and Stored Procedures.
Developed tools using python and Shell Scripting to automate some of menial tasks.
Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.
Experienced in developing Web Services with Python programming language.
Used Spark and Scala for developing machine learning algorithms that analyse clickstream data.
Implemented a CI/CD pipeline using Jenkins, Airflow for Containers from Docker, and Kubernetes.
Working directly with Development and Data team to validate fields from Source DB to Target DB using the Data Mapping document.
Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks.
Validate the data feed from the source systems to Snowflake DW cloud platform. Integrated and automated data workloads to Snowflake Warehouse. Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
Defined virtual warehouse sizing for Snowflake for different type of workloads. Designing Dashboards on the Information views created in SAP BW on HANA environment. Created a task scheduling application to run in an EC2 environment on multiple servers.
Designed built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behaviour prediction and support multiple marketing segmentation programs.
Worked on the design of Star & Snowflake schema data model.
Performed information purging and applied changes utilizing Databricks and Spark information analysis.
Extensively utilized Databricks notebooks for interactive analysis utilizing Spark APIs.
Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
Used Spark for data analysis and store final computation result to HBase tables.
Developed and tested many features for dashboard using Python, Java, Bootstrap, CSS, Java script and jQuery.
Developed a fully automated continuous integration system using Git, Jenkins, MySQL, and custom tools developed in Python and Bash.
Managed large datasets using PySpark, Pandas and Disk Data frames.
Worked effectively on SQL Profiler, Index Tuning Wizard, Estimated Query Plan to optimize the performance tuning of SQL Queries and Stored Procedures.
Write SQL queries effectively and efficiently - including inner/outer joins, inserts, and table creation.
Sr. Data EngineerMicrosoft Corp, Redmond, WA Oct 2021 Aug 2022 Designed and implemented scalable data pipelines to integrate loan and financial transaction records from various sources into the central data warehouse, ensuring high availability and performance. Optimized ETL processes to improve data ingestion efficiency by 20%, reducing latency for real-time analytics on mortgage application statuses and loan processing metrics. Developed and maintained data models for financial forecasting and risk assessment, increasing the accuracy of loan performance predictions by 25% and enhancing investment strategies. Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations Created and managed data quality frameworks that identified and rectified data inconsistencies in over 5 million records, improving overall data reliability by 30% for business intelligence purposes. Utilized Azure Data Factory, SQL API, and Mongo DB API to integrate data from sources such as Mongo DB, MS SQL, and GCP cloud storage (Blob, Azure SQL DB). Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas Collaborated with data scientists and analysts to design and deploy data solutions for customer segmentation and targeted marketing campaigns, boosting engagement and conversion rates by 20%. Implemented data governance policies to ensure compliance with financial regulations and data privacy laws, safeguarding sensitive customer and loan information across 15 different data sources. Encoded and decoded Json objects using PySpark to create and modify the data frames in Apache Spark Integrated external financial data sources (such as credit score agencies and market data feeds) into the company s data ecosystem, enhancing the granularity and richness of loan analytics by 35%. Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc. Monitored containers in AWS EC2 machines using Datadog API and ingest, enrich data into the internal cache system. Conducted performance tuning and optimization of SQL queries and database operations, reducing query response times by 50% and improving user experience for loan application tracking tools. Developed automated reporting tools for real-time monitoring of key performance indicators (KPIs) related to loan origination, processing, and servicing, reducing manual reporting time by 60% for stakeholders. Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators both old and newer operators. Provided technical guidance and mentorship to a team of junior data engineers and analysts, fostering skill development, and ensuring adherence to best practices in data engineering within the organization.Data EngineerFirst American Title Omaha NE Aug 2015 Sep 2021 Developed and implemented robust data architecture solutions, incorporating best practices for scalability and performance to support complex financial models and real-time data processing needs. Streamlined and enhanced ETL processes using advanced tools and techniques, ensuring efficient data flow from various sources, and significantly reducing data integration latency. Established and enforced rigorous data validation and cleansing procedures, utilizing automated scripts and manual checks to significantly reduce data inconsistencies and enhance the reliability of financial data. Oversaw the maintenance, optimization, and scaling of data warehousing solutions, ensuring secure and rapid retrieval of large, complex datasets critical for financial analysis and reporting. Enforced strict data governance policies, including data lineage, metadata management, and access controls, ensuring compliance with regulatory standards, and safeguarding sensitive financial information. Engaged with finance analysts, data scientists, and business leaders to gather requirements, understand data needs, and deliver tailored data solutions that supported strategic financial initiatives. Created and refined complex data models to support advanced analytics, predictive modelling, and machine learning projects, driving data-driven decision-making processes across the organization. Designed and deployed automated reporting systems using tools such as Power BI and Tableau, reducing manual effort, and providing real-time insights into financial performance and key business metrics. Experience in moving data between GCP and Azure using Azure Data Factory. Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Have written python DAGs in airflow which orchestrate end to end data pipelines for multiple applications was involved in setting up of Apache airflow service in GCP. Developed Python scripts to take backup of EBS volumes using AWS Lambda and Cloud Watch. Conducted comprehensive performance tuning and optimization of databases and data processing workflows, achieving significant improvements in query execution times and overall system efficiency. Provided guidance, training, and mentorship to junior data engineers, fostering a collaborative environment, and promoting continuous learning and professional development within the team. Landsky Engineers Pvt Ltd Jan 2012 to Dec 2013 Associate SQL Developer Hyderabad, IndiaDescription:This project deals with the complete implementation of the business needs of the client, a world class Industrial Solution provider for material handling, Control and Automation and Automotive industry. It deals with implementation of Document control, sales and marketing, Inventory modules as per the client requirements.Responsibilities: Extracting, Transforming and Loading data from Flat files to Staging area. Developing the presentation layer (UI) Developing some common utilities required by the application. Developed stored procedures, functions, optimized queries to whole application. Designed database architecture for Marketing, Stores and Purchase Department s data. Stores information, purchasing information is frequently changing data which is provided by Marketing Department. Developed functions and procedures to build default user ID s and Passwords and getting credentials from module owners for authentication and authorization. Maintenance of the above processes.Environment: SSIS, SSRS, SQL Server 2005/2008/2008 R2EDUCATIONB.Tech (ECE), JNTUH, 2011, India |