| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidate
Candidate's Name
PHONE NUMBER AVAILABLE
EMAIL AVAILABLEPROFESSIONAL SUMMARY Have strong experience in Data Engineering with BIG DATA and AWS Technologies Good understanding of data governance principles, data quality, data stewardship practices, and regulatory compliance requirements. Good understanding of data-related technologies and tools such as ETL, data warehouses, and data lakes. Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters. Expertise in working with different kind of data files such XML, JSON, Parquet, Avro and Databases. Experience in shell and Python scripting languages. Experience in data management, metadata management, data governance, or information management in environmental context. Good experience working on both Hadoop distributions: Cloudera and Hortonworks. Developed and designed fully automated self-service data ingestion application for Cloud Enterprise Data Lake. Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS and vice-versa using Sqoop. Hands on experience with working on Spark using Python.
Performed various actions and transformations on spark RDD's and Data Frames.
Excellent analytical and problem-solving skills to identify and resolve data quality issues. Proficiency in data profiling and data analysis techniques using tools like SQL, Excel, or data profiling software. Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data. Skills using BOTO3 Packages (Python) to interact with AWS Services like S3 etc. Hands-on experience using AWS EMR (HIVE/Spark/Zeppelin and other services) Experience and knowledge of AWS RedShift Experienced with batch processing of data sources using Apache Spark. Hands on Experience in Writing Python Scripts for Data Extract and Data Transfer from various data sources.
Experienced in implementing Spark RDD transformations actions to implement business analysis.
Good understanding of data visualization and reporting tools such as Qlik Sense. Have ability to determine work priorities and ensure proper completion of work assignments.
Worked on Matillion, DBT ETL tools and created jobs and workflows. Have experience in Redshift serverless, data sharing and maintaining the cluster. Experience with agile software development methodologies (e.g. Scrum) Experience with NiFi to create data flows for different sources and different types of data. Experience with Snowflake and Involved in migration activities. Ability to work independently and establish and maintain effective working relationships with others.SOFTWARE EXPOSURE
Scripting Languages:Shell scripting, Python, Unix scripting
Big Data Technologies:HDFS, EMRFS, MapReduce, Hive, HQL, Pig, SQOOP, Flume, Spark, Zookeeper, Oozie, Kafka, HBase, EMRProgramming Languages:
Python, SQLDatabases:
Oracle 10g/9i, MS SQL Server 2000/2005/2008, MySQL, Teradata, Mongo DB, Redshift, SnowflakeData warehousing and Reporting tools:
SSIS, SSRS, DBT, MatillionIDE's:NetBeans, Eclipse, IDLE
Virtual Machines:VMWare, Virtual Box
Operating Systems:Cent OS 5.5, Unix, Red Hat Linux, Windows7, UbuntuPROFESSIONAL EXPERIENCEFOX Corporation- Los Angeles, CA Sep 2018 - PresentManager, Data EngineeringDCG Data Services project provides all data needs for all the projects and applications of the FOX organization. We are building of a Redshift-based data mart, implementing ETL processes and integrating with various marketing platforms (SAS, Salesforce) and other systems. Building Data Lake in S3 and maintain, monitor, and improve a real time scalable, fault tolerant, data processing pipeline.Responsibilities:
Lead data engineering team with management responsibilities, including but not limited to roles & task assignments, mentoring, professional development, trainings, performance evaluations, staffing, and developing operating procedures and policies. Utilize data analytics capabilities to monitor KPIs and report out to the internal stakeholders, including operations and program leadership. Engage with distributed cross-functional teams to be the focal point on the technical performance of the operations. Worked on different POC s and on boarded new tools and technologies.
Involved in Hire, coach, and mentor individuals by setting up one on one discussions.
Worked on Data Governance and implemented CCPA and iOS delete processes. Created data architecture/modeling designs and worked on putting the standards into practice. Collaborated with the infrastructure and infosec team for infrastructure provisioning, Architecture and security review. Involved in building Redshift-based data mart and creating datalake in S3. Implemented ETL processes using MATILLION and worked with all the components and created workflows. Closely worked with Architects, business teams and provided general, best practice and/or design patterns. Provided Best Practice & Error Handling techniques for S3 and Redshift
Presented new ideas to automate the deployment using version controls like GIT in Redshift and Matillion Worked on automation of jobs such as granting access to all schemas for users. Worked on creating spectrum external tables using Glue catalog and Glue ETL to convert the data into different formats. Developed GLUE framework to create multiple spectrum tables by providing multiple s3 paths at once. Worked with DBT and created etl models. Worked on EMR to use different cluster types and best performances for ETL in PYSPARK Created and worked on framework to implement the ETL using transient EMR s, lambda, SQS, SNS, CloudWatch rules, Matillion and redshift. Created dependency framework for ETL jobs to run the target jobs to complete only after completion of source jobs. There by not missing SLA s and not to send partial data to other teams. Worked on Unload Raw Data from clusters into new S3 structure there by we gain the benefits of additional capacity without having to resize your cluster. For monitoring, created triggers and job alerts using SQS and SNS As DISNEY is separated from FOX, worked on migration and separation of pieces. Worked on cross account S3 buckets and move buckets from different accounts. Also, provided policies and encryptions in S3. Worked with devops team to create CI/CD pipelines, GIT in integration, structured deployments and Splunk dashboards. Involved in creating file ingestion framework, worked with different 3rd party integrations and loaded data into redshift and S3. Involved on configuration, development of AWS cloud such as Lambda, S3, EC2, EMR and Redshift. Worked on dimensional data model and involved in data model discussions. Worked on Automation unload and purging framework. Involved in developing and deploying reports and Power BI dashboards using to mobile platforms. Worked on the automation of advanced VACCUM and SORTING using MATILLION. Guided team members by offering support, advice and best practices recommendations throughout the project implementation. Worked on Snowflake and migrated some pipelines from redshift to Snowflake. Automated the Snowflake jobs using Snowpipe and snowpark. Worked on some pipelines using Airflow and orchestrated the jobs. Embedded BI dashboard using Amazon QuickSight Involved in BI consolidation project and worked on creating data share for required objects and provided required permissions to the teams.Environment: SQL, RedShift, S3, GLUE, MATILLION, Lambda, EC2, EMR, Git, AWS, Power BI,Athena, SPARK, Python, Looker, Tableau, Amazon QuickSight, Snowflake, DBT, Airflow, Astronomer.American Water -Cherry Hill, NJ Jan 2018 Aug 2018Sr. AWS Big Data ConsultantAmerican Water is the largest and most geographically diverse U.S. publicly traded water and wastewater utility company. The company employs more than 6,900 dedicated professionals who provide regulated and market-based drinking water, wastewater and other related services to an estimated 15 million people in 46 states and Ontario, Canada. American Water provides safe, clean, affordable and reliable water services to our customers to make sure we keep their lives flowing.Responsibilities:
Build Nifi processors for ingesting data from various sources (mostly Hana) to structured Hive tables Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data into the Data Lake. Designed and developed using HDFS, Hive, Kafka, Ambari, Hive CLI, Spark Developed complete end to end Big data processing in Hadoop echo system.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Designed and developed using Nifi to ingest time series (SCADA, IOT) Analyze and translate client s needs into an industry-leading enterprise big data architecture that seamlessly integrates with the HANA system and data asset with the new strategy. Responsible for designing, testing, deploying, and documenting Big Data platform and analytics procedures and their outputs. Architect, Design and Develop scalable and distributed applications using Hadoop Technology Stack such as KAFKA, Apache Hive and HDFS. Involved on configuration, development Hadoop Environment on AWS cloud such as Lambda, S3, EC2, EMR (Electronic MapReduce), Redshift. Worked on GLUE to create spectrum tables and triggered scheduled jobs.
Partnered with data scientists, analysts and stewards to provide summary results of data analysis, which will be used to make decisions regarding how to measure business rules and quality of the data Involved in developing and deploying reports and Power BI dashboards using to mobile platforms. Employed extreme attention to detail and flexibility to adapt to dynamic client environments and changing business, operations and technology priorities.Environment: Nifi, SAP HANA, Apache Hadoop, HDFS, Hive, SPARK, Scala, Python, HBase, Beeline, Map Reduce, Kafka, Zookeeper, Git, AWS, Power BI.
Aetna-West Hartford, CT Jun 2016 Dec 2017Data EngineerAetna helps health care organizations enhance the patient experience at home, in the hospital and in the physician's office through innovative solutions such as web self-service applications and technology that streamlines everyday patient interactions and improves patient flow through the health care process.
Responsibilities:
Worked as Big Data Engineer in the team dealing with platform issues. Providing data analysis for the team as well as developing enhancements.
Involved in creating Hive tables, and loading and analyzing data using hive queries Developed Hive queries to process the data and generate the data cubes for visualizing Implemented schema extraction for Parquet and Avro file Formats in Hive Involved in working with large sets of big data in dealing with various security logs. Involved in developing the Spark Streaming jobs by writing RDD's and developing data frame using Spark SQL as needed.
Backend NiFi ETL transformation. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Building and Implementing ETL process developed using Big data tools such as Spark(scala/python), Nifi Developed customized classes for serialization and Deserialization in Hadoop
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it. Developing predictive analytic using Apache Spark Scala APIs. Involved in working of big data analysis using Pig and User defined functions (UDF). Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Environment: PL/SQL and Python, Apache Hadoop, HDFS, Hive, SPARK, AWS, Pig, Beeline, Sqoop, Map Reduce, Kafka, Oozie, Impala, NiFi, Flume, Zookeeper, Git.IGATE Global Solutions, INDIA Apr 2013 - Dec 2015
SQL Developer
DLSS Direct Loan Servicing System receives all booked Direct Loans and maintains them for their duration. It services Direct Loans while students are in-school, grace, and repayment periods. Also, the DLSS establishes payment plans for borrowers; maintains and updates borrowers' loan(s); provides customer service, billing and collection services; and collects on delinquent loans. In addition, the DLSS grants the borrowers' deferments, forbearances, and loan discharges; reports to the credit bureaus; and provides all other activities necessary to service direct student loans properly.Responsibilities: Created complex Stored Procedures, Functions, Triggers, Tables, Indexes, Views, SQL joins and T-SQL Queries to test and implement business rules. Worked closely with DBA team to regularly monitor system for bottlenecks and implement appropriate solutions.
Created Non-clustered indexes to improve query performance and query optimization.
Maintained and managed database/stored procedures using SQL server tools like Performance Tuner and SQL Profiler.
Involved in package migration from DTS to SSIS, running upgrade advisor against DTS Packages before migration, troubleshooting issues and conversion into SSIS through wizard.
Extract data from Flat and Excel files and loaded to SQL Server database using Bulk Insert.
Created SQL Queries to extract and compare data in the different sources.
Created SSRS Reports from the SQL server tables, for different client's needs for data analysis. Involved in data Analysis, comparison and validation. Created ETL Packages to validate, extract, transform and load data to data warehouse and data marts.
Developed SSIS Packages for extracting the data from the file system, transformed and loaded the data into OLAP.
Created reports using Global Variables, Expressions and Functions using MS SQL Server Reporting Services. Designed and deliver dynamic reporting solutions using MS SQL Server Reporting Services.
Applied conditional formatting in SSRS to highlight key areas in the report data.
Heavy integration of ASW400 IBD DB2 with SQL Server and Star Quest replication tool.
Used Report Builder to do Ad-hoc Reporting. Develop various types of complex reports like Drill Down, Drill Through, Gauges, Pie Chart, Bar Chart, and Sub Reports. Used Reporting Services to schedule various reports to be generated on predetermined time.
Created Stored Procedures for commonly used complex queries involving join and union of multiple tables. Created Views to enforce security and data customization.Environment: SQL Server (SSIS, SSRS, SSAS) 2008R2/2008/2005, DB2, HP Quality center 10.0, Visual Studio 2005, XML, XSLT, MS Office and Visual source safe.EDUCATIONMaster s Degree in Information Assurance from Wilmington University, Delaware.****References available upon request |