Sr Data Analyst Data Engineer Resume Lee...

Sr Data Analyst Data Engineer Resume Lee...
Resumes | Register
Candidate Information
Name	Available: Register for Free
Title	SR. Data Analyst/ Data Engineer
Target Location	US-VA-Leesburg
Email	Available with paid plan
Phone	Available with paid plan
20,000+ Fresh Resumes Monthly
View Phone Numbers
Receive Resume E-mail Alerts
Post Jobs Free
Link your Free Jobs Page
... and much more
Register on Jobvertise Free
Data Analyst Service Bowie, MD
Data Analyst Information Systems College Park, MD
Data Analyst Reporting Woodbridge, VA
Data Analyst Falls Church, VA
Power Bi Data Analyst Ashburn, VA
Data Analyst Software Alexandria, VA
Click here or scroll down to respond to this candidate
Candidate's Name
Senor Data EngineerPHONE NUMBER AVAILABLEEMAIL AVAILABLEProfessional Overview:Senior Data Analyst/Data Engineer with 9+ years in Hadoop ecosystems, AWS, and Airflow. Skilled at designing and implementing data infrastructure solutions, solving complex problems, and translating technical concepts into actionable insights.Proficient in Python and cloud-based architecture, with extensive experience in developing, debugging, and optimizing data pipelines and analytics solutions.Experience in writing complex, performance-optimized SQL queries tailored to Snowflake's environment for data analysis and reporting.Monitor and optimize Airflow's performance, scalability, and reliability.Designed and implemented scalable and maintainable data pipelines using Apache Airflow, leveraging directed acyclic graphs (DAGs) to orchestrate complex workflows.Managed end-to-end ETL project lifecycle, from requirements gathering and design to testing, deployment, and post- implementation support.Expertise in data pipeline automation and Data Lake construction using AWS services (Glue, S3, EC2, RDS, Lambda, SNS, Cloud Watch, EMR, Athena) and tools like Pyspark, Hive, and Airflow.Designed and developed custom Boto3 scripts to automate provisioning, monitoring, and management of AWS resources, resulting in a significant reduction in manual efforts and operational costs.Proficient in version control and CI/CD practices using Git, GitHub, Bitbucket, AWS Code Commit, Jenkins, Docker, and Terraform. Experienced in building ETL pipelines in GCP Airflow, utilizing diverse operators for streamlined data processing.Leveraged dbt's powerful features such as models, snapshots, and tests to orchestrate complex data transformations and ensure data quality and consistency.Designed and implemented Apache Airflow deployments on cloud infrastructure providers such as AWS, GCP, or Azure, leveraging managed services for scalability, reliability, and cost efficiency.Proficient in architecting and overseeing resilient, scalable data warehousing solutions using Snowflake, Redshift, and Teradata. Specializes in optimizing data storage, safeguarding data integrity, and streamlining data retrieval processes to bolster advanced analytics and business intelligence efforts.Technical Skills:Data Processing and Analytics Hadoop Ecosystem, Airflow, Pyspark, Hive, Spark-SQLCloud Platforms and Services AWS Services (Glue, S3, RDS, Lambda, SNS, Cloud Watch, EMR, Athena),Google Cloud Platform (GCP) with Airflow, Azure (Databricks, Azure SQL Data Warehouse), AWS Redshift, Google Big Query, Azure SQL DataData Visualization and Reporting Power BI, Tableau, SSIS, SSRSData Warehousing Snowflake, TeradataDatabase Management and Optimization MySQL, Oracle DB, PostgreSQLMachine Learning and Data Science Scikit-Learn, TensorFlow, PyTorch, KerasDevOps and Version Control Git, GitHub, Bitbucket, AWS Code Commit, Jenkins, Docker, Terraform Data Integration and ETL Sqoop, Databricks,Scripting and Programming Python, Bash, Java, Django, C++ Data Governance and Metadata Management Collibra DGC REST APICertifications:AWS Certified Solutions Architect.Microsoft Certified Azure Data Engineer.Professional Experience:Berkadia, New York, NY Sep 2023 - PresentRole: Senior Data Analyst / Data Engineer ( Remote)Masterfully optimized Azure infrastructure, employing Terraform for automation, Datadog for comprehensive monitoring, and Azure Data Factory, Synapse, and Event Grid to refine data operations, achieving top-tier system efficiency and reliability.Spearheaded complex data migrations from heterogeneous databases (Oracle, SQL Server, DB2, MongoDB) to Azure Data Lake, leveraging Azure Data Factory for seamless data transitions and enhanced storage capabilities.Took full ownership of ETL lifecycle, from migration and pipeline development to data transformation and modeling within Azure, ensuring impeccable data quality and integration.Conducted performance tuning and optimization of dbt models and queries to improve processing efficiency, reduce resource consumption, and enhance overall system performance.Used AWS SDKs (boto3 for Python) to interact with AWS services from within your Pyspark scripts or Airflow DAGs.Defined project scopes, milestones, and deliverables for dbt-based initiatives, managing resources and timelines effectively to deliver projects on time and within budget.Implemented error handling and monitoring within your Airflow DAGs to track task execution status, handle failures and send alerts or notifications when issues occur.Utilized dbt to design and implement efficient data models, optimizing data transformations for improved analysis and reporting.Designed and implemented scalable ETL processes to efficiently extract, transform, and load large volumes of data from diverse sources into data warehouses/data lakes.Designed and implemented data models within Snowflake, including schema design, table structures, and relationships.Developed advanced analytical models and predictive analytics within Tableau to forecast trends and drive business strategy.Launch EMR clusters dynamically using Airflow to perform distributed data processing tasks using Apache Spark, Hadoop, or other frameworks.Proficient in configuring and managing EC2 instances to run applications and workloads in the cloud, including selection of instance types, sizing, and capacity planning based on workload requirements.Integrated Kafka with Spark for efficient error logging in Postgres databases. Developed NiFi data pipelines within Docker containers, optimizing log file processing with Elasticsearch and Kafka.Orchestrated the integration of Apache Airflow with AWS S3, designing complex DAGs to optimize workflow management. Pioneered serverless ETL workflows with Glue, enhancing data pipeline automation and efficiency.Designed and implemented highly available and scalable AWS architectures using services like EC2, S3, RDS, Lambda, and VPC, meeting stringent performance and security requirements.Designed and executed Python Spark AWS Glue ETL jobs to process, aggregate, and transform data into Parquet format, subsequently distributing processed data to S3 and DynamoDB for enhanced accessibility and analysis.Designed and implemented IAM policies and security controls to ensure compliance with industry standards and regulations.Implemented infrastructure automation strategies to achieve Infrastructure as Code (IaC) maturity within the organization.Extensive experience in Jira administration, including configuring workflows, issue types, permissions schemes, and custom fields to meet the unique needs of the organization.Senior AWS Data Analyst / Data Engineer Client: Elevance Health, Alexandria, Virginia, United States July 2021- Aug 2023Spearheaded the strategic migration and optimization of data ingestion frameworks to AWS, employing CloudWatch, CloudTrail, and AWS Glue. Enhanced monitoring and ETL performance, embodying a cloud-first strategy.Trigger Airflow tasks based on messages in SQS queues using the SQS Sensor or execute tasks in response to SNS notifications.Implemented data pipelines for real-time or batch data ingestion from various sources into Snowflake, ensuring data consistency, accuracy, and timeliness.Proficient in leveraging Tableau's extensive features such as calculated fields, parameters, sets, and filters to create interactive visualizations.Collaborated closely with business stakeholders, data analysts, and data scientists to understand requirements, define data models, and deliver insights-driven solutions using dbt.Schedule data loading and transformation tasks in Airflow to keep Redshift tables up to date.Created impactful visualizations and dashboards with Tableau and Quick Sight, connecting to various data sources for comprehensive data analysis. Implemented advanced visualization techniques for insightful data presentation.Experience integrating Snowflake with relevant AWS services (e.g., S3 for data storage, IAM for access management) to create a robust and scalable data management ecosystem.Revolutionized AWS operations through advanced Python, Pyspark, and Scala scripting for data automation, including snapshot generation and data ingestion from diverse sources. Developed sophisticated scripts for deep data analysis, significantly improving data processing efficiency.Worked on visualize insights from data using SQL, Python, and visualization tools integrated with Airflow.Utilized AWS Redshift, S3, and Athena for executing queries on large-scale datasets, leading to the formation of a Virtual Data Lake that modernized traditional ETL workflows. Ensured seamless integration between AWS Glue and Amazon S3 for optimal data lake functionality.Implemented performance tuning strategies in dbt to enhance query efficiency and reduce processing times.Orchestrated the integration of Apache Airflow with AWS S3, designing complex DAGs to optimize workflow management. Pioneered serverless ETL workflows with AWS Glue, enhancing data pipeline automation and efficiency.Deep understanding of security principles and best practices in cloud environments, implementing security controls, encryption, IAM policies, and network security configurations using Terraform.Identified performance bottlenecks in ETL workflows and implemented optimization techniques to enhance system throughput and resource utilization.Executed the deployment of cloud infrastructure components, including SNS, SQS, Lambda, and EMR, across multiple environments. Mastered cloud resource management with Terraform and Hortonworks Manager for Hadoop clusters.Integrated Kafka with Spark for efficient error logging in Postgres databases. Developed NiFi data pipelines within Docker containers, optimizing log file processing with Elasticsearch and Kafka.Performed data transfers with Sqoop and managed diverse NoSQL databases, including Cassandra and MongoDB. Leveraged AWS Glue Data Catalog for streamlined metadata management and querying.Implemented data quality checks and validation rules to ensure the accuracy, completeness, and consistency of data across the ETL pipeline.Used Python, PySpark, and Scala by applying within the AWS ecosystem for automation, data ingestion, and complex data analysis.Senior AWS Data Analyst / Data Engineer Client: MasterCard, New York, United States Jan 2019 - June 2021Responsible for integration of external vendor data into internal data warehousing systems, ensuring seamless data flow and storage. Employed advanced data cleansing, formatting, and quality assurance techniques to maintain data integrity before warehousing.Instituted robust security protocols within Snowflake data warehousing environments, including role-based access control, data encryption, and comprehensive auditing to uphold data protection standards and regulatory compliance.Manage and maintain Aurora databases, including provisioning, scaling, and monitoring. Implement database clustering and replication for high availability and disaster recovery.Implemented caching mechanisms, data partitioning strategies, and incremental loading techniques to optimize ETL job execution.Worked closely with business stakeholders to translate requirements into dbt models, facilitating data-driven decision-making.Designed and implemented Apache Airflow deployments on cloud infrastructure providers such as AWS, GCP, or Azure, leveraging managed services for scalability, reliability, and cost efficiency.Leveraged Tableau for sophisticated data analysis, identifying key trends and crafting compelling data narratives to support evidence-based decision-making by business stakeholders.Engineered interactive and visually engaging Tableau dashboards and reports, delivering critical business insights, and enabling stakeholders to derive actionable intelligence.Experience in writing complex, performance-optimized SQL queries tailored to Snowflake's environment for data analysis and reporting.Implemented performance tuning strategies in dbt to enhance query efficiency and reduce processing times.Utilized AWS S3 in Airflow to trigger tasks based on file presence or modification in S3 buckets.Developed audit notebooks in Databricks using PySpark/Python for meticulous data validation. Automated daily job executions to ensure ongoing data accuracy and reliability.Demonstrated proficiency with an array of AWS services, including S3, EC2, EMR, Lambda, Redshift, CloudWatch, SNS, and IAM, to support diverse data engineering tasks and cloud-based data solutions.Worked on visualize insights from data using SQL, Python, and visualization tools integrated with Airflow.Designed and executed Python Spark AWS Glue ETL jobs to process, aggregate, and transform data into Parquet format, subsequently distributing processed data to S3 and DynamoDB for enhanced accessibility and analysis.Orchestrated the deployment and management of Apache Spark and Hadoop clusters via Amazon EMR, facilitating scalable data processing and advanced analytics across large datasets.Adopted CI/CD pipelines using Jenkins and GitHub Actions for automating the deployment of data applications and ETL jobs, significantly reducing deployment times and improving operational efficiency.Implemented real-time data streaming solutions using Apache Kafka and AWS Kinesis for immediate data ingestion and processing, enabling faster decision-making and dynamic data analysis.Utilized machine learning models in conjunction with PySpark for predictive analytics and trend forecasting, adding a layer of intelligence to data analysis efforts and driving forward-looking business strategies.Engaged in data lake architecture development leveraging AWS Lake Formation, streamlining data storage, and management while facilitating seamless access and analysis of structured and unstructured data.Senior Data Analyst / Data Engineer Client: All State / Atos Syntel (Offshore) Sep 2016- Dec 2017Used AWS CloudFormation for infrastructure as code, automating AWS resource management for improved efficiency.Orchestrated containerized workloads with Kubernetes, enabling seamless application deployment and scaling.Utilized Pig for advanced ETL transformations and Sqoop for importing data from relational databases into HDFS, streamlining data preparation for analysis.Designed and implemented hybrid cloud architectures using AWS and Azure, enhancing system interoperability, and leveraging Azure Data Lake, Storage, SQL, and Data Warehouse for comprehensive data ingestion and processing.Developed efficient data pipelines within Azure utilizing Data Factory and Databricks, optimizing data flow and processing.Designed Azure IAM solutions to manage access controls and ensure compliance with security frameworks, supporting secure cloud operations.Deployed and refined Apache Nifi flows, complemented by Python scripts for data integrity checks, enhancing data management across platforms.Created scalable and fault-tolerant cloud architectures on Azure, ensuring high availability for business-critical applications.Implemented flexible parameterization in Azure Data Factory pipelines, accommodating changes in data sources and processing requirements efficiently.Configured and maintained Hadoop ecosystem components, including HDFS, MapReduce, Pig, Hive, and Sqoop, to support big data initiatives.Employed Hive for effective querying of structured log data and utilized Oozie for efficient job scheduling, streamlining data analysis workflows.Data Analyst / Data Engineer Client: Ctrl S, IndiaJune 2013- Aug 2016Ensured data accuracy and integrity by implementing data quality checks and measures, resulting in reliable data availability for downstream analytics and decision-making.Worked on ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.Led the setup, configuration, and administration of large-scale Hadoop clusters, including HDFS, YARN, MapReduce, and Hadoop ecosystem components (e.g., Hive, HBase, Kafka), ensuring high availability, reliability, and performance.Implemented data enrichment and aggregation processes.Monitored and optimized the pipeline for low-latency data processing.Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR and MapReduce.Utilized AWS Elastic MapReduce (EMR) to process and analyze large-scale datasets for a specific business use case.Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and SqoopArchitected and implemented data lake solutions using Hadoop HDFS and distributed file systems (e.g., Amazon S3, Data Lake Storage) to store, manage, and analyze structured and semi-structured data at scale.Education:Master s in data science, Pace University, New York, Dec 2018.Bachelors in computer science, Gitam University, India, April 2013
Respond to this candidate
Your Message
Please type the code shown in the image: