Senior Big Data Engineer Resume San marc...

Senior Big Data Engineer Resume San marc...
Resumes | Register

Candidate Information
Name	Available: Register for Free
Title	Senior big data engineer
Target Location	US-TX-San Marcos
Email	Available with paid plan
Phone	Available with paid plan

20,000+ Fresh Resumes Monthly

View Phone Numbers

Receive Resume E-mail Alerts

Post Jobs Free

Link your Free Jobs Page

... and much more
Register on Jobvertise Free

Related Resumes

Click here or scroll down to respond to this candidate

Candidate's Name
Senior Big Data EngineerEmail: EMAIL AVAILABLEContact: PHONE NUMBER AVAILABLELinkedIn: http://LINKEDIN LINK AVAILABLESUMMARY: Data Engineer with 10+ years of experience executing data driven solutions to increase efficiency, accuracy, and utility of internal Data Processing. Sound knowledge in Data Quality & Data Governance practices & processes. Experience in developing machine learning models like Classification, Regression, Clustering, Decision Tree. Good experience in developing web applications implementing Model View Control (MVC) architecture using Django, Flask, Pyramid and Python web application frameworks. Experience in working with a number of public and private cloud platforms like Amazon Web Services (AWS), Microsoft Azure. Strong Experience in implementing Data warehouse solutions in Confidential Redshift; Worked on various projects to migrate data from on premise databases to Confidential Redshift, RDS and S3. Extensive experience in Analyzing, Developing, Managing, and implementing various stand - alone, client-server enterprise applications using Python, Django and mapping the requirements to the systems. Well versed with Agile with SCRUM, Waterfall Model and Test-driven Development (TDD) methodologies. Experience in developing web applications by using Python, Django, C++, XML, CSS, HTML, JavaScript, and jQuery. Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Mining and Machine Learning. Manage metadata alongside the data for visibility of where data came from, its linage to ensure and quickly and efficiently finding data for customer projects using AWS Data Lake and its complex functions like AWS Lambda, AWS Glue. Experience on Cloud Databases and Data warehouses (SQL Azure and Confidential Redshift/RDS). Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached&Redis). Extensive experience in Amazon Web Services (Amazon EC2, Amazon S3, Amazon Simple DB, Amazon RDS, Amazon Elastic Load Balancing, Elastic Search, Amazon MQ, Amazon Lambdas, Amazon SQS, AWS Identity and access management, AWS Cloud Watch, Amazon EBS and Amazon CloudFormation). Proficient in SQLite, MySQL and SQL databases with Python. Experienced in working with various Python IDE s using PyCharm, PyScripter, Spyder, PyStudio, PyDev, IDLE, NetBeans and Sublime Text. Experience with Requests, Report Lab, NumPy, SciPy, Pytables, cv2, imageio, Python-Twitter, Matplotlib, HTTPLib2, Urllib2, Beautiful Soup, Data Frame and Pandas python libraries during development lifecycle. Hands-on experience in handling database issues and connections with SQL and NoSQL databases like MongoDB, Cassandra, Redis, CouchDB, DynamoDB by installing and configuring various packages in python. Good Knowledge in writing different kinds of tests like Unit test/Pytest and build them. Experienced with version control systems like Git, GitHub, CVS, and SVN to keep the versions and configurations of the code organized. Experienced with containerization and orchestration services like Docker, Kubernetes. Expertise in Build Automation and Continuous Integration tools such as Apache ANT, Maven, Jenkins. Strong experience in developing Web Services like SOAP, REST, Restful with Python programming language. Experienced in writing SQL Queries, Stored procedures, functions, packages, tables, views, triggers using relational database like Oracle, DB2, MySQL, Sybase, PostgreSQL, and MS SQL server. Experience in using Docker and Ansible to fully automate the deployment and execution of the benchmark suite on a cluster of machines. Experience in building applications in different operating systems like Linux (Ubuntu, CentOS, Debian), Mac OS. Excellent Interpersonal and communication skills, efficient time management and organization skills, ability to handle multiple tasks and work well in a team environment.Education Details: B. Tech in ECE Anna university 2013 TECHNICAL SKILLS:Operating Systems:
Windows, Mac OS and Linux CentOS, Debian, UbuntuProgramming Languages:
Python, R, C, C++Web Technologies:
HTML/HTML5, CSS/CSS3, XML, jQuery, JSON, Bootstrap, AngularPython Libraries/Packages:
NumPy, SciPy, Boto, Pickle, PySide, PyTables, Data Frames, Pandas, Matplotlib, SQL Alchemy, HTTPLib2, Urllib2, Beautiful Soup, Py QueryStatistical Analysis Skills:
A/B Testing, Time Series Analysis, MarkoIDE:
PyCharm, PyScripter, Spyder, PyStudio, PyDev, IDLE, NetBeans, Sublime Text, Visual CodeMachine Learning and Analytical Tools:
Supervised Learning (Linear Regression, Logistic Regression, Decision Tree, Random Forest, SVM, Classification), Unsupervised Learning (Clustering, KNN, Factor Analysis, PCA), Natural Language Processing, Google Analytics Fiddler, Tableau.Cloud Computing:
AWS, Azure, Rackspace, OpenStack, Redshift and AWS Glue.AWS Services:
Amazon EC2, Amazon S3, Amazon Simple DB, Amazon MQ, Amazon ECS, Amazon Lambdas, Amazon Sagemaker, Amazon RDS, Amazon Elastic Load Balancing, Elastic Search, GCP Cloud Functions, Amazon SQS, AWS Identity and access management, Cloud Composer, AWS Cloud Watch, Amazon EBS and Amazon CloudFormationDatabases/Servers:
MySQL, SQLite3, Cassandra, Redis, PostgreSQL, CouchDB, MongoDB, Teradata, Apache Web Server 2.0, NginX, Tomcat, JBoss, WebLogicETL:
Informatica 9.6, Data Stage, SSISWeb Services/ Protocols:
TCP/IP, UDP, FTP, HTTP/HTTPS, SOAP, Rest, RestfulMiscellaneous:
Git, GitHub, SVN, CVSBuild and CI tools:
Docker, Kubernetes, Maven, Gradle, Jenkins, Hudson, BambooSDLC/Testing Methodologies:
Agile, Waterfall, Scrum, TDDPROFESSIONAL EXPERIENCE:Client: BNY Mellon, NYC, NY Oct 2022 till dateSenior Big Data EngineerResponsibilities: Developed a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements. Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift Developed scripts to load data to hive from HDFS and involved in ingesting data into Data Warehouse using various data loading techniques. Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue. Scheduled Jobs using crontab, run deck and control-m. Build Cassandra queries for performing various CRUD operations like create, update, read and delete, also used Bootstrap as a mechanism to manage and organize the html page layout. Developed entire frontend and backend modules using Python on Django Web Framework and created User Interface (UI) using JavaScript, bootstrap, Cassandra with MySQL, and HTML5/CSS. Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop and developed Spark code and Spark-SQL/Streaming for faster testing and processing of data. Analyzed SQL scripts and designed the solutions to implement using PySpark. Used JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into Hive tables. Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL. Migrated on premise database structure to Confidential Redshift data warehouse. Create Data pipelines for Kafka cluster and process the data by using spark streaming and created Glue jobs in AWS and load incremental data to S3 staging area and persistence area. Developed rest API's using python with flask and Django framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files. Worked with Hadoop architecture and the daemons of Hadoop including Name-Node, Data Node, Job Tracker, Task Tracker, and Resource Manager. Developing data processing tasks using PySpark such as reading data from external sources, merging data, perform data enrichment and load in to target data destinations. Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud. Worked in development of applications especially in LINUX environment and familiar with all its commands and worked on Jenkins continuous integration tool for deployment of project and deployed the project into Jenkins using GIT version control system. Managed the imported data from different data sources, performed transformation using Hive, Pig and Map- Reduce and loaded data in HDFS. Executed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability and developed Oozie workflow to run job onto data availability of transactions. To achieve Continuous Delivery goal on high scalable environment, used Docker coupled with load-balancing tool Nginx. Used MongoDB to store data in JSON format and developed and tested many features of dashboard using Python, Bootstrap, CSS, and JavaScript.Environment: Hadoop, Hive, Sqoop, Pig, java, Django, Flask, XML, MySQL, MS SQL Server, Linux, Shell Scripting, Mongo dB, SQL, Python 3.3, Django, HTML5/CSS, Cassandra, JavaScript, PyCharm, GIT, Linux, Shell Scripting, RESTful, Docker, Jenkins, JIRA, jQuery, MySQL, Bootstrap, HTML5, CSS, AWS, EC2, S3.Client: Johnson & Johnson - Raritan, NJ Apr 2020 Oct 2022Role: Big Data EngineerResponsibilities: Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC). Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSRS/SSIS (SQL Server Integration Services) in SQL Server. Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongoDB connector. Involved and worked on Python Open stack API's and used several python libraries such as wxPython, NumPy and matplotlib. Used JSON schema to define table and column mapping from S3 data to Redshift. Performed data cleaning and feature selection using Scikit-learn package in python. Partition clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together. Advanced knowledge of Confidential Redshift and MPP database concepts. Used Python to perform ANOVA test to analyze the differences among hotel clusters. Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model. Applied linear regression, multiple regression, ordinary least square method, mean-variance, the theory of large numbers, logistic regression, dummy variable, residuals, Poisson distribution, Bayes, Naive Bayes, fitting function etc. to data with help of Scikit, SciPy, NumPy and Pandas module of Python.
Worked with ARIMAX, Holt Winters VARMAX to predict the sales in the regular and seasonal intervals. Worked on Automating the provisioning of AWS cloud using cloud formation for Ticket routing techniques. Worked with Amazon Redshift tools like SQL workbench/J, PG Admin, DB Hawk, Squirrel SQL. Determined the most accurately prediction model based on the accuracy rate. Used text-mining process of reviews to determine customer s concentrations. Delivered result analysis to support team for hotel and travel recommendations. Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards. Developed hybrid model to improve the accuracy rate.Environment: Hadoop, Hive, Sqoop, Map Reduce, Python, NumPy, matplotlib, Pandas, ETL, SQL Server, MongoDB, AWS cloud, Redshift, TableauClient: JPMC - Plano, TX Jun 2018 Mar 2020Data EngineerResponsibilities: Worked on Python Open stack API's and used Python scripts to update content in the database and manipulate files. Involved in using AWS for the Tableau server scaling and secured Tableau server on AWS to protect the Tableau environment using Amazon VPC, security group, AWS IAM and AWS Direct Connect. Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources. Built a mechanism for automatically moving the existing proprietary binary format data files to HDFS using a service called Ingestion service. Performed Data transformations in HIVE and used partitions, buckets for performance improvements. Ingestion of data into Hadoop using Sqoop and apply data transformations and using Pig and HIVE. Used Python and Django creating graphics, XML processing, data exchange and business logic implementation. Used Git, GitHub, and Amazon EC2 and deployment using Heroku and Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy. Developed server-based web traffic using RESTful API's statistical analysis tool using Flask, Pandas. Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval. Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud. Performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team. Developed and executed complex SQL queries to pull data from data sources like SQL server database, and Oracle. Evaluated Information Management System Database to improve Data Quality issues using DQ Analyzer and other Data preprocessing tools. Implemented Data Governance policies & procedures in the Students Information Management Database.Environment: Python, Hive, Oozie, Amazon AWS S3, MySQL, HTML, Python 2.7, Django, HTML5, CSS, XML, MySQL, MS SQL Server, GIT, Jenkins, JIRA, MySQL, Cassandra, Pig, Hadoop, AWS Cloud Watch, AWS Redshift, SQL, SOAP, Rest APIs, AWS EC2, XML, JavaScript, AWS, Linux, Shell Scripting, AJAX, Mongo dBClient: Tracfone, Miami, FL Nov 2016 May 2018 Role: Data Engineer/Data ScientistResponsibilities: Applied concepts of probability, distribution, and statistical inference on given dataset to unearth interesting findings through the use of comparison, T-test, F-test, R-squared, P-value etc.
Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Designed and developed Natural Language Processing models for sentiment analysis.
Used predictive modeling with tools in SAS, SPSS, R, Python.
Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and SciPy.
Performs complex pattern recognition of financial time series data and forecast of returns through the ARMA and ARIMA models and exponential smoothening for multivariate time series data. Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Pipelined (ingest/clean/munge/transform) data for feature extraction toward downstream classification.
Built and analyzed datasets using R, SAS, MATLAB, and Python (in decreasing order of usage).
Worked in large-scale database environments like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes, and Hadoop Distributed File System (HDFS).
Interfaced with large-scale database system through an ETL server for data extraction and preparation.
Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.
Environment: Machine learning, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/SciPy/NumPy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau.Client: Hitachi, India Nov 2013 Aug 2016Software DeveloperResponsibilities: Developed entire frontend and backend modules using Python on Django Web Framework. Used Django framework for application development. Designed and developed the UI of the website using HTML, AJAX, CSS and JavaScript Worked on CSS Bootstrap to develop web applications. Designed ETL Process using Informatica to load data from Flat Files, and Excel Files to target Oracle Data Warehouse database. Designed and developed Web services using XML and jQuery. Built various graphs for business decision making using Python matplotlib library. Worked in development of applications especially in UNIX environment and familiar with all its commands. Used NumPy for Numerical analysis for Insurance premium. Implement code in Python to retrieve and manipulate data.Environment: Python, Django, MySQL, Linux, Informatica Power Centre, PL/SQL, HTML, XHTML, CSS, AJAX, JavaScript, Apache Web Server, NO SQL, jQuery.

Respond to this candidate
Your Email	«
Your Message
Please type the code shown in the image: