Quantcast

Sr Data Engineer Resume Chicago, IL
Resumes | Register

Candidate Information
Name Available: Register for Free
Title sr data engineer
Target Location US-IL-Chicago
Phone Available with paid plan
20,000+ Fresh Resumes Monthly
    View Phone Numbers
    Receive Resume E-mail Alerts
    Post Jobs Free
    Link your Free Jobs Page
    ... and much more

Register on Jobvertise Free

Search 2 million Resumes
Keywords:
City or Zip:
Related Resumes

Senior Data Engineer Chicago, IL

Data Engineer Senior Chicago, IL

Data Analytical Engineer Chicago, IL

Machine Learning / Data Analyst / Software Engineer Chicago, IL

Software Engineer Data Entry Lake Zurich, IL

Data Engineer Machine Learning Aurora, IL

Data Engineer Chicago, IL

Click here or scroll down to respond to this candidate
  Candidate's Name
AWS Certified Solution ArchitectSr. Cloud Data EngineerProfessional Summary      Almost 9 years of extensive IT experience as a data engineer with expertise in designing data-intensive applications using Hadoop Ecosystem and Big Data Analytical, Cloud Data engineering (AWS, Azure), Data Visualization, Data Warehouse, Reporting, and Data Quality solutions.
      Hands-on expertise with the Hadoop ecosystem, including strong knowledge of Big Data technologies such as HDFS, Spark, YARN, Kafka, MapReduce, Apache Cassandra, HBase, Zookeeper, Hive, Oozie, Impala, Pig, and Flume.
      With the knowledge on Spark Context, Spark-SQL, Data frame API, Spark Streaming, and Pair RDD's, worked extensively on PySpark to increase the efficiency and optimization of existing Hadoop approaches.
      Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors, and Tasks.
      In-depth understanding and experience with real-time data streaming technologies such as Kafka and Spark Streaming.
      Hands-on experience on AWS components such as EMR, EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, Redshift, DynamoDB to ensure a secure zone for an organization in AWS public cloud.
      Skilled in setting up Kubernetes clusters using tools like kubeadm, kops, or managed Kubernets services (e.g., Amazon EKS, Google GKE, Azure AKS).      Proven experience deploying software development solutions for a wide range of high-end clients, including Big Data Processing, Ingestion, Analytics, and Cloud Migration from On-Premises to AWS Cloud.
      Expertise in Azure infrastructure management (Azure Web Roles, Worker Roles, SQL Azure, Azure Storage).
      Strong Experience in working with ETL Informatica which includes components Informatica PowerCenter Designer, Workflow manager, Workflow monitor, Informatica server and Repository Manager.
      Good understanding of Spark Architecture with Data bricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Data bricks, Data bricks Workspace for Business Analytics, Manage Clusters In Data bricks, Managing the Machine Learning Lifecycle
      Demonstrated understanding of the Fact/Dimension data warehouse design model, including star and snowflake design methods.
      Experience in working with Waterfall and Agile development methodologies.
      Experience identifying data anomalies performing statistical analysis and data mining techniques.
      Experience in Hadoop Development/Administration Proficient in programming knowledge of Hadoop and Ecosystem components Hive, HDFS, Pig, Sqoop, HBase, Python, spark.
      Experience in developing custom UDFs for Pig and Hive.
      Experienced in designing and implementing scalable and efficient data warehousing solutions using Azure Synapse, including schema design, partitioning, and indexing strategies.      Strong expertise in writing complex T-SQL queries, stored procedures, and user-defined functions within Azure Synapse to support data transformation and analytics requirements.      Developed clear and testable hypotheses to guide statistical analyses and investigations.      Applied Bayesian hypothesis testing methodologies to provide a probabilistic framework foe decision making.      Experienced in building Snow pipe and In-depth knowledge of Data Sharing in Snowflake and Snowflake Database, Schema and Table structures.
      Demonstrated ability to ensure high availability and fault tolerance by setting up Kubernetes clusters across multiple nodes.      Designed and developed logical and physical data models that utilize concepts such as Star Schema, Snowflake Schema and Slowly Changing Dimensions.
      Expertise in using Airflow and Oozie to create, debug, schedule, and monitor ETL jobs.
      Developed ETL pipelines using AWS Glue, Python, and PostgreSQL to extract, transform, and load data from various sources into PostgreSQL databases, enabling data integration and analysis.      Developed Java-based ETL jobs using AWS Glue, automating data transformations and loading processes to ensure consistent and accurate data flows.      Generated script in AWS Glue to transfer the data and utilized AWS Glue to run ETL jobs and run aggregation on PySpark code      Experience of Partitions, bucketing concepts in Hive, and designed both Managed and External tables in Hive to optimize performance.
      Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
      Hands-on experience in handling database issues and connections with SQL and No SQL databases such as MongoDB, HBase, SQL server. Created Java apps to handle data in MongoDB and HBase.TECHNICAL SKILLS      Big Data Technologies: Hadoop MapReduce, HDFS Sqoop, PIG, Hive, HBase, Oozie, Flume, NiFi, Kafka, Zookeeper. Yarn, Apache Spark, Mahout, Sparklib, Apache Druid.      Databases: Oracle, MySQL, SQL Server, Azure Synapse, MongoDB, Cassandra, DynamoDB, PostgreSQL, Teradata, Cosmos.      Programming: Python, PySpark, Scala, Java, C, C++, Shell script, Perl script, SQL, Splunk.      Cloud Technologies: AWS, Microsoft Azure, GCP      Frameworks: Django REST framework, MVC, Hortonworks
      Tools: PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, TOAD, SQL Navigator, Query Analyzer, SQL Server Management Studio, SQL Assistance, Eclipse, Postman, NOSQL.      Versioning tools: SVN, Git, GitHub      Operating Systems: Windows 7/8/XP/2008/2012, Ubuntu Linux, MacOS      Network Security: Kerberos      Database Modelling: Dimension Modeling, ER Modeling, Star Schema Modeling, Snowflake Modeling      Monitoring Tool: Apache Airflow      Visualization/ Reporting: Tableau, ggplot2, matplotlib, SSRS and Power BI
      Machine Learning Techniques: Linear & Logistic Regression, Classification and Regression Trees, Random Forest, Associative rules, NLP and Clustering.Professional ExperienceSr. Data Engineer							  July 2021 to PresentApple - Sunnyvale, CA
Responsibilities:
      Implemented solutions utilizing Advanced AWS Components: EMR, EC2, etc. integrated with Big Data/Hadoop Distribution Frameworks: Hadoop YARN, MapReduce, Spark, Hive, etc.
      Designed and implemented Azure infrastructure solutions using azure resource manager (ARM) templates and Azure CLI.      Created on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
      Proficient in designing and implementing data models to support business requirements and improve data integrity and efficiency.      Experience with Azure Devops for continuous integration and continuous deployment (CI/CD) pipelines.      Integrated Kubernetes with CI/CD pipelines automating the deployment process using tools like Jenkins, Gitlab CI/CD or CircleCI.      Demonstrated proficiency in parallel data processing techniques, utilizing framework such as Apache Hadoop, apache spark or similar technologies to efficiently handle large datasets.      Experienced in working with distributed file systems like Hadoop distributed file systems or similar solutions understanding their architecture and effectively utilizing them for parallel data processing tasks.      Strong understanding of container orchestration principles and experience in scaling, updating, and monitoring containerized applications using Kubernetes.      We designed and implemented data extraction, transformation, and loading (ETL) processes with talend to meet project requirements.      Experienced in building parallel data processing pipelines for machine learning tasks, including feature extraction, model training and distributed computing frame works to handle large scale datasets efficiently.      Skilled in utilizing various data visualization formats, including the iceberg table format, to effectively communicate complex information to diverse.      Utilized iceberg tables to analyze business data and identity underlying trends, patterns, and outliers, aiding in informed decision making processes.      Conducted P-tests to determine the statistical significance of observed differences in means and proportions.      Utilized P-tests for correlation coefficients to evaluate the strength and significance of relationships between variables.      Proficient in using IBM Info Sphere DataStage for designing, developing, and maintaining ETL (Extract, Transform, Load) processes to integrate and transform data from various sources.      Also, Implemented ETL pipelines using Apache Beam and Java, streamlining the processing of large volumes of clientele data.      Experienced in feature engineering techniques to extract meaningful features from raw data, including numerical, text data enhancing the predictive power of machine learning models.      Implemented data structures using best practices in Data Modeling, ETL/ELT processes, SQL and python. Established scalable, efficient, automated processes for large scale data analyses.      Experienced in integrating Azure Synapse with external data sources using PolyBase, enabling seamless querying and processing of data across on-premises and cloud environments.      Leveraged Azure Synapse Analytics to seamlessly integrate big data processing and analytics capabilities, empowering data exploration and insights generation.      Excellent experience with SSIS to create ETL packages to validate, extract, transform and load data to data warehouse and data marts and Power BI.      Developed custom-built ETL solution, batch processing, and real-time data ingestion pipeline to move data in and out of the Hadoop cluster using PySpark and Shell Scripting.      Experience in implementing and managing Azure Active Directory (Azure AD) for identity and access management.      Developed custom solutions within AWS connect to tailor the customer experience, including dynamic call routing based on customer attributes and personalized IVR menus.      Experienced MLOps professional with a strong background in deploying and maintaining machine learning models in production environment.      Implemented Prometheus monitoring solutions to track and collect metrics from various components within the infrastructure.      Integrated Grafana with various data sources such as Prometheus, influx DB, and Elasticsearch to aggregate and display data from different systems.      Proficient in designing, implementing, and managing NoSQL databases, particularly DynamoDB, with a focus on scalability, performances, and reliability.      Skilled in designing efficient data models and schemas tailored to DynamoDB s key value and document based database structure, optimizing for query performance and cost effectiveness.      Proficient in Apache spark and Data bricks, including data processing, data manipulation, and data analysis using pyspark.      Developed and optimized complex T-SQL queries to enhance database performance, reducing query execution time.      Experienced in setting up data inputs in Splunk to collect data from various sources such as logs, metrics, events and other machine generated data.      Involved in configuring Splunk to parse raw data and extract relevant fields using regular expressions, field extractions and data transformations.      Proficient in using Splunk s search processing language to create complex queries and search for specific patterns, trends or anomalies in large volumes of data.
      Utilized T-SQL to create parameterized stored procedure for dynamic data processing, ensuring flexibility and reusability.      Involved in code migration of quality monitoring tool from AWS EC2 to AWS Lambda and built logical datasets to administer quality monitoring on snowflake warehouses.
      Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
      Implemented robust security measures within AWS Govcloud including encryption, access control and multifactor authentication ensuring the protection of sensitive data and compliance with ITAR requirements.      Collaborated with cross functional teams to design and implement high availability and disaster recovery solutions specific to AWS govcloud resulting in a uptime for mission critical applications.      Hands-on-experience with Docker for containerization, creating Docker images, and optimizing image sizes for efficient deployments.      In depth understanding of Kubernetes concepts such as namespaces, labels, selectors, and resource management.      Loaded the data into Spark RDD and performed in-memory data computation to generate the output response.
      Proficient in working with Parquet, a columnar storage file format designed for big data processing frameworks like Apache Hadoop and Apache Spark.      Developed java databased database connectors and drivers for various databases(MySQL, PostgreSQL, Oracle) to enable seamless interaction between the data processing applications and databases systems, ensuring data consistency and integrity.      Used Airbyte to optimize data integration pipelines for improved performance, scalability, and efficiency.      Proficient in working with Apache Iceberg, a table format that supports schema evolution, versioning, and time travel queries, enabling easy and efficient data schema changes.      Experienced in building and optimizing data pipelines in Databricks, leveraging Spark SQL and Data Frame APIs      Designed scalable and resilient AWS connect architectures, leveraging services such as Amazon EC2, Amazon RDS, AWS Lambda, and Amazon S3 to ensure high availability and fault tolerance.      Experience in leveraging Data Bricks for machine learning and implementing scalable ML models.      Implemented ETL workflows on data bricks, integrating various data sources and transforming raw data into meaningful insights using Apache spark libraries.      Knowledge of Data bricks clusters and their configurations for optimal performance and scalability.      Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
      Successfully designed, implemented, and optimized Druid clusters for real time analytics and batch data processing.      Demonstrated expertise in designing efficient Druid schemas to support analytical queries and optimize data storage.      Developed real time analytics solutions leveraging Druid s capabilities to provide instant insights into streaming data.      Extensive experience working with GCP to design, deploy, and manage cloud infrastructure for various applications and workloads.      Proficient in using GCP services such as Compute Engine, App Engine, Kubernetes Engine, Cloud Storage, and Cloud SQL for various projects.      Hands on experience with GCP tools and technologies such as Cloud SDK, Cloud Console, Cloud Shell, and Stack driver for monitoring and troubleshooting.      Orchestrated deployment pipelines for ML models, ensuring seamless integration with production systems.      Collaborated with cross-functional team including data scientists, engineers, and business analysts, to streamline ML Ops workflows.      Using Load runner for performance testing that helps assess the performance and scalability applications under various conditions.      Queried both Managed and External tables created by Hive using Impala.
      Monitored and controlled Local disk storage and Log files using Amazon CloudWatch.
      Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
      Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
      Encoded and decoded js on objects using PySpark to create and modify the data frames in Apache Spark.      Engineered end-to-end data pipelines for processing and storage large volumes of data in Azure Data Lake Storage.      Proficient in writing kusto query language KQL queries for data analysis and retrieval.      Demonstrated proficiency in designing, implementing, and managing Azure cosmos DB Solutions.      Designed Effective data models tailored to specific application requirements in Azure cosmos DB.      Hands on experience with Azure Data Explorer, utilizing Kusto for data exploration, analysis and visualization.      Designed and optimized data partitioning strategies in Azure Data Lake Storage for efficient data retrieval and storage cost reduction.      Experience in integrating Databricks with data source such as Azure data lake storage, Azure blob storage and AWS S3.
      Developed and maintained robust, scalable web applications using Django adhering to the Model-View-Controller Architectural pattern.      Built efficient backend services in Django handling authentication, and authorization, and user management functionalities.      Used Informatica Power Center for extraction, transformation, and loading (ETL) of data in the data warehouse.
Environment: Spark RDD. AWS Glue, Apache Kafka, Amazon S3, Java, SQL, T-SQL, Spark, AWS cloud, AWS Govcloud, Azure Data Lake,  Azure, Data bricks, ETL, GCP, Kusto, NumPy, SciPy, pandas, Scikit-learn, Seaborn, NLTK) and Spark 1.6 / 2.0 (PySpark, MLlib, EMR, EC2, and amazon RDS. Data lake, Kubernetes, Docker, Python, Cloudera Stack, HBase, Hive, Impala, Pig, NiFi, Spark, Spark Streaming, Elastic Search, Logstash, Microstrategy, Apache parquet, Apache iceberg, Kibana, JAX-RS, Spring, Hibernate, Apache Airflow, Oozie, RESTFul API, JSON, JAXB, XML, WSDL, MySQL, Talend, fCassandra, MongoDB, HDFS, ELK/Splunk, Athena, tableau, redshift, Scala, snowflake.Syntel/Cuna Mutual - Madison, 				   October 2019 to June 2021Sr. AWS Data EngineerRESPONSIBILITIES:
      Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, and Spark Yarn.
      Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS and converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
      Wrote Spark applications for Data validation, cleansing, transformations and custom aggregations and imported data from different sources into Spark RDD for processing and developed custom aggregate functions using Spark SQL and performed interactive querying.      Involved in converting Hive/SQL queries into Spark Transformations using Spark.      Used HBase for application requiring low latency access to large volumes of data, such as social media analytics , fraud detection, and monitoring systems.      Utilized HBase for storing time series data, like sensor data, log data, and IoT telemetry data. https://LINKEDIN LINK AVAILABLE      RDDs and Scala and involved in using SQOOP for importing and exporting data between RDBMS and HDFS.
      Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on the fly to build the common learner data model and persistence the data in HDFS.      Integrated AWS Connect with existing data pipelines, enabling seamless capture and analysis of call data for business intelligence and reporting purposes.      Created AWS Glue job for archiving data from Redshift tables to S3 (online to cold storage) as per data retention requirements and involved in managing S3 data layers and databases including Redshift and Postgres.
      Sink and loaded into MongoDB for further analysis and worked on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
      Developed a Python Script to load the CSV files into the S3 buckets and created
      Experienced in building and optimizing data pipelines in Databricks, leveraging Spark SQL and Data Frame APIs.      Developed comprehensive MongoDB database designs, including collections, indexes, and sharding strategies to optimize query performances and enhance scalability.      Collaborated with cross functional teams to design and implement high availability and disaster recovery solutions specific to AWS govcloud resulting in a uptime for mission critical applications.      Implemented ETL workflows on data bricks, integrating various data sources and transforming raw data into meaningful insights using Apache spark libraries.      Developed java databased database connectors and drivers for various databases(MySQL, PostgreSQL, Oracle) to enable seamless interaction between the data processing applications and databases systems, ensuring data consistency and integrity      Experience in leveraging Data Bricks for machine learning and implementing scalable ML models.      Implemented efficient data ingestion processes to bring structured and unstructured into Azure Data Lake Storage.      Implemented data encryption and ensured data security using Azure Data Lake encryption and Azure key vault integration.      Developed data processing workflows using Azure Data Factory for ETL operations, ensuring data quality, transformation, and enrichment.      AWS S3 buckets, performed folder management in each bucket, managed logs and objects within each bucket
      Worked with different file formats like JSON, AVRO and parquet and compression techniques like snappy and developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
      Extensive hands on experience with percona MYSQL, including installation, configuration, and ongoing maintenance.      Proven ability to optimize MYSQL databases for higher performances, including query optimization, indexing strategies, and server parameter tuning.      Managed the deployment and ongoing maintenance of percona MYSQL database ensuring high availability and optimal performances for critical applications.      Expertise in implementing high availability solutions such as MYSQL replication, clustering, and failover mechanism to ensure database reliability.      Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location.
      Worked with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
      Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive structured and unstructured data.
      Involved with writing scripts in Oracle, SQL Server and Netezza databases to extract data for reporting and analysis and worked in importing and cleansing of data from various sources like DB2, Oracle, flat files onto SQL Server with high volume data
      Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes.
      Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud and making the data available in Athena and Snowflake.
      Extensively used Stash Git-Bucket for Code Control and Worked on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena and Snowflake.
Environment: Spark, AWS, AWS govcloud, Azure Data Lake, EC2, EMR, Hive, Java, SQL Workbench, Tableau, Kibana, Sqoop, Spark SQL, Spark Streaming, Scala, Python, Hadoop (Cloudera Stack), Informatica, NVIDIA Clara, Jenkins, Docker, Hue, Spark, Netezza, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Oracle, ETL, AWS S3, AWS Glue, GIT, Grafana.Value Payment System   Nashville, TN		      January 2018 to September 2019Big Data EngineerResponsibilities:
      Created Spark jobs by writing RDDs in Python and created data frames in Spark SQL to perform data analysis and stored in Azure Data Lake.
      Engineered Robust data ingestion pipelines using Azure Data factory to efficiently bring diverse data sources into Azure Data Lake Storage.      Implemented Optimized data storage solutions within Azure Data Lake, including file formats partitioning and compression techniques, reducing storage costs and improving query performances.      Configured Spark Streaming to receive real-time data from the Apache Kafka and store the stream data to HDFS using Scala.
      Designed and implemented robust NOSQL data models specifically tailored for MongoDB, accommodating unstructured and semi structured data while ensuring high performance and scalability.      Developed comprehensive MongoDB database designs, including collections, indexes, and sharding strategies to optimize query performances and enhance scalability.      Developed Spark Applications by using kafka and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
      Created various data pipelines using Spark, Scala and SparkSQL for faster processing of data.
      Designed batch processing jobs using Apache Spark to increase speed compared to that of MapReduce jobs.
      Written Spark-SQL and embedded the SQL in SCALA files to generate jar files for submission onto the Hadoop cluster.      Developed data pipeline using Flume to ingest data and customer histories into HDFS for analysis.
      Executing Spark SQL operations on JSON, transforming the data into a tabular structure using data frames, and storing and writing the data to Hive and HDFS.
      Worked with HIVE data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing, and optimizing the HQL queries.
      Created hive tables as per requirement were Internal or External tables defined with appropriate static, dynamic partitions, and bucketing, intended for efficiency.
      Used Hive as an ETL tool for event joins, filters, transformations, and pre-aggregations.
      Involved in moving all log files generated from various sources to HDFS for further processing through Kafka.
      Extracting real-time data using Kafka and Spark streaming by Creating DStreams and converting them into RDD, processing it, and stored it into.
      Used Spark SQL for Scala interface that automatically converts RDD case classes to schema RDD.
      Extracted source data from Sequential files, XML files, CSV files, transformed and loaded it into the target Data warehouse.
      Solid understanding of No SQL Database (MongoDB and Cassandra).
      Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala extracted large datasets from Cassandra and Oracle servers into HDFS and vice versa using Sqoop.
      Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
      Involved in Migrating the platform from Cloudera to EMR platform.
      Developed analytical component using Scala, Spark and Spark Streaming.
      Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and performed structural modifications using HIVE.
      Provided technical solutions on MS Azure HDInsight, Hive, HBase, MongoDB, Telerik, Power BI, Spot Fire, Tableau, Azure SQL Data Warehouse Data Migration Techniques using BCP, Azure Data Factory, and Fraud prediction using Azure Machine Learning.
Environment: Hadoop, Hive, Azure Data Lake,  Kafka, Snowflake, Spark, Scala, HBase, Cassandra, JSON, XML, UNIX Shell Scripting, Cloudera, MapReduce, Power BI, ETL, MySQL, No SQLBig Data EngineerRenee Systems Inc-Hyderabad		             September 2016 to November 2017Responsibilities:
      Collaborated with business user's/product owners/developers to contribute to the analysis of functional requirements.
      Implemented Spark SQL queries that combine hive queries with Python programmatic data manipulations supported by RDDs and data frames.
      Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
      Extract Real-time feed using Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data in HDFS.
      Developing Spark scripts, UDFS using Spark SQL query for data aggregation, querying, and writing data back into RDBMS through Sqoop.
      Installed and configured Hadoop MapReduce HDFS Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
      Installed and configured Pig and also written Pig Latin scripts.
      Wrote MapReduce job using Pig Latin.
      Worked on analyzing Hadoop clusters using different big data analytic tools including HBase database and Sqoop.
      Worked on importing and exporting data from Oracle, and DB2 into HDFS and HIVE using Sqoop for analysis, visualization, and generating reports.
      Creating and inserting data into Hive tables for dynamically inserting data into data tables using partitioning and bucketing for EDW tables and historical metrics.
      Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations, and others during the ingestion process itself.
      Created ETL packages with different data sources (SQL Server, Oracle, Flat files, Excel, DB2, and Teradata) and loaded the data into target tables by performing different kinds of transformations using SSIS.
      Designed, developed data integration programs in a Hadoop environment with No SQL data store Cassandra for data access and analysis.
      Created partitions, bucketing across the state in Hive to handle structured data using Elastic search.
      Performed Sqooping for various file transfers through the HBase tables for processing of data to several No SQL DBs- Cassandra, Mongo DB.
Environment: Hadoop, MapReduce, HDFS, Hive, python, Kafka, HBase, Sqoop, No SQL, Spark 1.9, PL/SQL, Oracle, Cassandra, Mongo DB, ETL, MySQLData Analyst						                 May 2015 to August 2016Concept IT INC, Noida
Responsibilities:
      Involved in designing physical and logical data model using ERwin Data modeling tool.
      Designed the relational data model for operational data store and staging areas, Designed Dimension & Fact tables for data marts.
      Extensively used ERwin data modeler to design Logical/Physical Data Models, relational database design.
      Created Stored Procedures, Database Triggers, Functions and Packages to manipulate the database and to apply the business logic according to the user's specifications.
      Created Triggers, Views, Synonyms and Roles to maintain integrity plan and database security.
      Creation of database links to connect to the other server and access the required info.
      Integrity constraints, database triggers and indexes were planned and created to maintain data integrity and to facilitate better performance.
      Used Advanced Querying for exchanging messages and communicating between different modules.
      System analysis and design for enhancements Testing Forms, Reports and User Interaction.
Environment: Oracle 9i, SQL* Plus, PL/SQL, ERwin, TOAD, Stored Procedures.

Respond to this candidate
Your Email «
Your Message
Please type the code shown in the image:
Register for Free on Jobvertise