July 1, 2019
As the the big data has become more matured, data engineering has emerged as a separate and related role that works in concert with data scientists. Big data engineer or data engineer becomes more and more important role in big data orgnization. This role is quite like the ETL developer role in the data warehouse or database developer role in database development. However, it more focus on the senario in Applied Big Data. Here Applied means how to make big data applied in the business use case. As explained, it has to master the data and also make it useful. It is more like a “Full Stack Developer” in big data.
A typical list of highlighted skills for data enginner is as follows (I’ll keep adding new necessary skills to it later on).
- Experienced in Linux command line and shell scripting
- Experienced in programming in Python/Scala/Java
- Experienced in SQL (Hive/Spark/Impala)
- Experienced in distributed systems such as Hadoop Ecosystem, Redis, No-SQL.
- Experienced in building big data processing pipeline/framework
- Experienced in applied big data with message (such as Kafka) and REST service
- Expereinced in batch and stream process as well as big data warehousing
- Expereinced in applying and deploying common machine learning algorithm
Here is a list of resource collected in the github for How to Become A Data Engineer