| 20,000+ Fresh Resumes Monthly | |
|
|
| | Click here or scroll down to respond to this candidateCandidate's Name
Email: EMAIL AVAILABLE PH: PHONE NUMBER AVAILABLELinkedIn: LINKEDIN LINK AVAILABLESenior Data EngineerProfessional SummaryHaving 9+ years of experience in Big Data Environment, Hadoop Ecosystem with 4 years of experience on AWS and AZURE. Interacting with business users to analyze the business process and requirements and transforming requirements into data warehouse design, documenting, and rolling out the deliverables.Strong working experience in Cloud data migration using AWS and SnowflakeExperience in a Data Integration platform like AWS Glue.Extensive knowledge of Spark Streaming, Spark SQL, and other Spark components such as accumulators, broadcast variables, various levels of caching, and optimization techniques for Spark employmentHands-on experience with Bigdata ecosystem implementation, including Hadoop MapReduce, NoSQL, Informatica cloud services, Fivetran, Apache Spark, Pyspark, Python, Scala, Hive, Impala, Sqoop, Kafka, AWS, Azure, and Oozie.Defined product requirements and created high level architectural specifications to ensure that existing platforms are feasible and functionalPrototyped components were benchmarked, and templates were provided for development teams to test design solutions.Familiar with data processing performance optimization techniques such as dynamic partitioning, bucketing, file compression, and cache management in Hive, Impala, and SparkExperience with various data formats such as Json, Avro, parquet, RC and ORC formats and compressions like snappy & bzip.Successfully completed a proof of concept for Azure implementation, with the larger goal of migrating on-premises servers and data to the cloudWorking knowledge in AWS environment and AWS spark, Snowflake, Lamda, AWS RedShift, DMS, EMR, RDS, EC2, AWS stack with Strong experience in Cloud computing platforms such as AWS services.Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining the Hadoop cluster on AWS EMR.Experience in Data Pipelines, phases of ETL, ELT data process, converting BigData/unstructured data sets (JSON, log data) to structured data sets for Product analysts, Data Scientists.As a Data Engineer, responsible for data modeling, data migration, design, and ETL pipeline preparation for both cloud and Exadata platforms.Extensive experience with Teradata, Oracle, SQL, PL/SQL, Informatica, UNIX Shell scripts, SQL*Plus, and SQL*Loader for data warehouse ETL architecture and development.Extensive experience in integrating various data sources like SQL Server, DB2, PostgreSQL, Oracle, and Excel.Strong Data Warehousing knowledge with Informatica, including considerable experience creating Tasks, Workflows, Mappings, Mapplets, and Scheduling Workflows and Sessions.Experienced in using object-oriented programming (OOP) concepts using Python.Solid knowledge of usability engineering, user interface design and development.Outstanding knowledge of Reporting tools like Power BI, Data Studio, and Tableau.Good backend skills like creating SQL objects like tables, Stored Procedures, Triggers, Indexes and Views to facilitate Data manipulation and consistency.Expertise in leveraging and implementing best SDLC and ITIL techniques.Team handling experience, which include work planning, allocation, tracking and execution. Relationship driven, result driven and Creative out of the box thinking.Technical Skills:Big Data Eco SystemsHadoop, HDFS, MapReduce, HiveSpark EcosystemsSpark SQL, Scala streamingProgrammingPythonData WarehousingInformatica Power Center 9.x/8.x/7.x, Informatica Cloud, Fivetran, Talend.ApplicationsSalesforce, RightNowDatabasesOracle (9i/10g/11g), SQL Server, MySQLCloud platformsAWS, AzureBI ToolsTableau 9.1, Power BIQuery LanguagesSQL, PL/SQL, T-SQLScripting LanguagesUnix, Python, Windows PowerShellRDBMS UtilityToad, SQL Plus, SQL LoaderScheduling ToolsAirflow, Autosys, Windows schedulerProfessional ExperienceSenior Data EngineerAIG, Houston, TX May 2023 to PresentResponsibilities:Creating and maintaining a Data Pipeline architecture that is optimal.Responsible for loading data into S3 buckets from the internal server and the Snowflake data warehouse.Built the framework for efficient data extraction, transformation, and loading (ETL) from a variety of data sources.Designed and developed ETL process using Informatica 10.4 tool to load data from a wide range of sources such as Oracle, flat files, salesforce, Aws cloud.Using Amazon Web Services (Linux/Ubuntu), launch Amazon EC2 Cloud Instances and configure launched instances for specific applications.Worked extensively on moving data from Snowflake to S3 for the TMCOMP/ESD feeds.For code Producionization, writing codes for data pipeline definitions in JSON format.Used AWS Athena extensively to import structured data from S3 into multiple systems, including RedShift, and to generate reports. For constructing the common learner data model, which obtains data from Kinesis in near real time, we used Spark-Streaming APIs to perform necessary conversions and operations on the fly.Developed Snowflake views to load and unload data from and to an AWS S3 bucket, as well as transferring the code to production.Using Python programming and SQL queries, data sources are extracted, transformed, and loaded to make CSV data files.Worked in a Hadoop and RDBMS environment, designing, developing, and maintaining data integration applications that worked with both traditional and non-traditional source systems, as well as RDBMS and NoSQL data storage for data access and analysis.Facilitate data integration using AWS Glue and facilitated workflow in Informatica cloud services.Advanced activities such as text analytics and processing were performed using Spark's in-memory computing capabilities. RDDs and data frames are supported by Spark SQL queries that mix Hive queries with Scala and Python programmable data manipulations.Analyzed Hive data using the Spark API in conjunction with the EMR Cluster Hadoop YARN.Enhancements to existing Hadoop algorithms using Spark Context, Spark-SQL, Data Frames, and Pair RDDsAssisted with the creation of Hive tables and the loading and analysis of data using Hive queries.Conducted exploratory data analysis and data visualizations using Python (MatPlotLib, numpy, pandas, seaborn).Environment: AWS S3, AWS Glue, Hadoop YARN, SQL Server, Spark, Spark Streaming, Scala, Kinesis, Python, Hive, Linux, Sqoop, Informatica, Tableau, Talend, Cassandra, oozie, Control-M, Fivetran, EMR, EC2, RDS, Dynamo DB Oracle 12c.Data EngineerCharter Communications, Negaunee, MI August 2021 to April 2023Responsibilities:Using the Azure PaaS service, analyze, create, and develop modern data solutions that enable data visualization.Contributed to the creation of Pyspark DataFrames in Azure Databricks to read data from Data Lake or Blob storage and manipulate it using Spark SQL context.Using Linked Services/Datasets/Pipeline/, created Pipelines in ADF to extract, transform, and load data from various sources such as Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backwards.Design, development, and implementation of performant ETL pipelines using PySpark and Azure Data FactoryCreated technical specification documents like system design and detail design documents for the development of Informatica Extraction, Transformation and Loading (ETL) mappings to load data into various tables.Install, configure, customize and implement the Alation data catalog software solution and facilitate the day to day operations in the software.Worked on a cloud POC to choose the optimal cloud vendor based on a set of strict success criteria.Spark integration of data storage systems, particularly Azure Data Lake and Blob storage.Using PySpark and Azure Data Factory, design, build, and implement large ETL pipelines.Developed ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.Improving the performance of Hive and Spark jobs.To process data in Hadoop, developed Hive scripts using Teradata SQL scripts.Good understanding of Hive partitions and bucketing concepts built both Managed and External tables in Hive to maximize performance.Created JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.Created Hive tables to store the processed results and written Hive scripts to transform and aggregate the disparate data.Used Hive queries to analyze massive data sets of structured, unstructured, and semi-structured data.Used HQL in Hive to write and use complicated data types in storing and retrieving data.Used Hive queries to analyze massive data sets of structured, unstructured, and semi-structured data.Using advanced techniques such as bucketing, partitioning, and optimizing self joins, worked with structured data in Hive to increase performance.Environment: Azure Data Lake, Azure SQL, Azure Data Factory(V2), Azure Databricks, Informatica, Python 2.0, SSIS, Azure Blob Storage, Spark 2.0, Hive, Fivetran.Data EngineerFord, Dearborn, MI October 2018 to July 2021Responsibilities:Collaborated with business user's/product owners/developers to contribute to the analysis of functional requirements.Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.Created a Python script that called the Cassandra Rest API, transformed the data, and loaded it into Hive.Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.Used Cloudera Manager for installation and management of Hadoop Cluster.Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, HBase database and Sqoop.Loaded, transformed the data continuously using Snowpipe from Amazon S3 buckets to Snowflake and used Spark Connector.Integrated data sources from Kafka (Producer and Consumer API) for data stream-processing in Spark using AWS networkCollaborated with business owners of products for understanding business needs and automated business processes and data storytelling in TableauEnvironment: Hadoop 3.0, Hive 2.1, Pig 0.16, Sqoop, NoSQL, Java, XML, Spark 1.9, PL/SQL, Snowflake, HDFS, JSON, AWS, Tableau, KafkaBig Data DeveloperHudda Infotech Private Limited Hyderabad, India September 2016 to July 2018Responsibilities:Develop, refine, and scale data management and analytics procedures, systems, workflows, and best practices.Work with product owners to establish design of experiment and the measurement system for effectiveness of product improvements.Work with Project Management to provide timely estimates, updates & status.Work closely with data scientists to assist on feature engineering, model training frameworks, and model deployments at scale.Work with application developers and DBAs to diagnose and resolve query performance problems.Perform development and operations duties, sometimes requiring support during off-work hours.Work with the Product Management and Software teams to develop the features for the growing Amazon businessEnvironment: PL/SQL, Python, JSON, Data ModelingData AnalystCybage Software Private Limited Hyd India October 2014 to August 2016Responsibilities:Worked with Business Analyst and helped represent the business domain details and prepared low-level analysis documentation.Created Hive tables and created Sqoop jobs to import the data from Oracle/SQL Server to HDFSDeveloped Oozie workflows and scheduled them in Control-M as daily jobs to load incremental updates from the RDBMS source systems.Wrote different pig scripts to clean up the ingested data and created partitions for the daily data.Prepared pig scripts and Spark SQL to handle all the transformations specified in the S2TMs and to support SCD2 and SCD1 scenarios.Wrote different UDFs to convert the date format and to create hash value using MD5 Algorithm in Java.Implemented Partitioning and bucketing in Hive based on the requirement.Involved in converting Hive SQL queries into Spark transformations using Spark SQL and Scala.Experienced in implementing Spark RDD transformations, actions to implement business analysis and workedCreate Sqoop import jobs to import source tables from Microsoft SQL Server.Create Sqoop export jobs to export target tables to Teradata and to make the target tables available to the reporting layer.Worked with BI and QA team to test the application and fixed the defects immediately.Leveraged open-source monitoring toolkit Prometheus to capture pod metrics and built sample dashboards in Splunk.Involved in Unit and Integration level testing and prepared supporting documents for deployment.Environment: HDP 2.2.4, Hadoop 2.6, Hive 0.14, Pig 0.14, HBase, Spark1.6, Scala, Kafka, Oozie, SQL Server, Jenkins, Nexus, Shell, Java, Eclipse. |