Data and DevOps Engineer (4663757)

Programming & Development

Data and DevOps Engineer

$10/hr Starting at $100

Currently working for a global restaurant chain. It owns 35000+ stores worldwide and serving more than 3000+ users for their business analytics need. I currently own the Production, QA and Development instances of the ETL processing which includes all the jobs of MapReduce on EMR clusters. My key responsibilities in the project can be broadly classified under Development, Environmental Support and RCA for the client. As a part of the core team, my contributions involve deploying the code, RCA (Root Cause Analysis) for various frequent issues. Also, apart from that, as the necessity to do repetitive manual work which increases chances of human errors and incurring extra cost and man-hours is undesirable to every firm, apart from carrying out my day to day tasks related to Big Data, I have done a major number of shell script automations for our ecosystem to function smoothly and without dependencies. Below is the descriptive list of the tools I have worked with: • Big Data Suite – Oozie for scheduling Hadoop and Spark jobs. Hive for having a Hadoop based data warehouse. YARN for managing resource allocations for various jobs running parallel and alone. HDFS for storage and containing namenodes and datanodes for every jobs. MapReduce for retrieving data and crunching numbers from input file as per requirement achieved by writing java code. Spark for in-memory data processing. • AWS Product Suite – AWS EMR, S3 for file storage purposes. EC2 for maintaining Linux boxes under EMRFS which host the edge nodes, master nodes and Landing Zone for our Hadoop Ecosystem. Redshift for maintaining data warehouse and RDS for filewatcher audit and SQL Servers. • Technologies used in Real time processing – Spark Streaming for real-time generated events, flume agents, RabbitMQ, Kinesis, Adobe Campaign Services, Datastax, Cassandra DB(via Solr nodes as well as Cassandra terminal). • ETL – I have also had the chance to closely work with IBM Infosphere and Talend as all of our jobs were providing input to these tools for further processing and loading to data warehouse. • Reporting – Tableau, as ultimately all the Hadoop and ETL jobs were crunching volume of input data into valuable data insights via these reports. • Code Maintenance – Atlassian Stash • Issue Tracking – Atlassian JIRA, McDonald’s Service-Now Café • Documentation – Atlassian Confluence

About

$10/hr Ongoing

Download Resume

Currently working for a global restaurant chain. It owns 35000+ stores worldwide and serving more than 3000+ users for their business analytics need. I currently own the Production, QA and Development instances of the ETL processing which includes all the jobs of MapReduce on EMR clusters. My key responsibilities in the project can be broadly classified under Development, Environmental Support and RCA for the client. As a part of the core team, my contributions involve deploying the code, RCA (Root Cause Analysis) for various frequent issues. Also, apart from that, as the necessity to do repetitive manual work which increases chances of human errors and incurring extra cost and man-hours is undesirable to every firm, apart from carrying out my day to day tasks related to Big Data, I have done a major number of shell script automations for our ecosystem to function smoothly and without dependencies. Below is the descriptive list of the tools I have worked with: • Big Data Suite – Oozie for scheduling Hadoop and Spark jobs. Hive for having a Hadoop based data warehouse. YARN for managing resource allocations for various jobs running parallel and alone. HDFS for storage and containing namenodes and datanodes for every jobs. MapReduce for retrieving data and crunching numbers from input file as per requirement achieved by writing java code. Spark for in-memory data processing. • AWS Product Suite – AWS EMR, S3 for file storage purposes. EC2 for maintaining Linux boxes under EMRFS which host the edge nodes, master nodes and Landing Zone for our Hadoop Ecosystem. Redshift for maintaining data warehouse and RDS for filewatcher audit and SQL Servers. • Technologies used in Real time processing – Spark Streaming for real-time generated events, flume agents, RabbitMQ, Kinesis, Adobe Campaign Services, Datastax, Cassandra DB(via Solr nodes as well as Cassandra terminal). • ETL – I have also had the chance to closely work with IBM Infosphere and Talend as all of our jobs were providing input to these tools for further processing and loading to data warehouse. • Reporting – Tableau, as ultimately all the Hadoop and ETL jobs were crunching volume of input data into valuable data insights via these reports. • Code Maintenance – Atlassian Stash • Issue Tracking – Atlassian JIRA, McDonald’s Service-Now Café • Documentation – Atlassian Confluence

Skills & Expertise

AWS RedshiftCassandraDevOpsJavaLinuxSQL

utkarsh srivastava 11