Road to Success — Professional Data Engineer — Google Cloud Platform
Before I start writing about my experience, I would like to say that I did not study regularly, but I study with a deadline and somehow I managed to complete the deadline. I’m a father of two daughters with a full-fledged family living in Amsterdam, Netherlands. So, as you can imagine, I hardly got time to study during the day and even in the week-end. So, I chose to study in the early morning hours. I started with some deadline, as for example I will complete the lecture-20 to 30 by such and such day and always counted the achievement. I took noes from the Video lecture by pen and paper and kept it in the Notebook. I never imagined that these tiny little notes are going to be helpful, but as a matter of fact these are most valuable tips I had. Because of those points written in form of notes, I successfully cleared the exam. I’ve scanned the hand written (my hand-writing is horrible) notebook and kept it in Github. If anyone wants, please write in comment box, I will be able to send the link.
I advise to start with create a new google account and start login to your GCP console with your google user id and password and memorize all the options from the Menu bar. For your understanding, let me write down the options and provide the details about each of these option. I list down all the functionality in parity with the actual GCP console.
Compute — App Engine (Serverless application like database etc.) , Compute Engine (Use for scalable high performance VMs) , Kubernetes Engine (Secured way to run containerized apps), Cloud Function. There is a topic called Cloud GPU
Storage — Bigtable (Based on HBase), Datastore (Scalable NoSQL DB), Cloud Storage (for Blob Data), Cloud SQL (RDBMS/OLTP), Cloud Spanner (Horizontal scalable RDBMS).
Big Data — Cloud Composer, Cloud Data Proc (Hadoop Cluster using Google Storage as file system. Cloud Pub/Sub (Messaging like Kafka. ), Cloud Dataflow (Real-time batch and stream data processing), BigQuery (Google’s analytics engine)
Artificial Intelligence — Natural Language Processing, Vision API, Data Leveling, Translation API
Now once you memorize all these and you know where and how can you navigate in the your GCP console, start studying the theory part, because questions will mostly cover theoretical aspects . Most of the questions will come from the Theory and those are mostly on “How-To” type rather “What-Is”. There are no questions which will ask you the steps for the performing certain operation. Most of the questions are Use Case oriented, so it is very important to cover the Use Cases with each section. around 90% of all the questions will be asked from the area which I described above. In this article I’ll share my experience with GCP — Exam as well as the Preparation.
Complete GCP Certification will take your 90 hours of study including 45 hrs of tutorial, regardless of which course you’re taking. Let me briefly describe you the path of the study which I’ve taken -
After completing the above training course these are the different QuickLabs I did -
1. Advanced: Machine Learning APIs (9 labs)
2. Adva nced: Data Science on the Google Cloud Platform (10 labs)
3. Advanced: Scientific Data Processing (8 labs)
4. Expert: Google Cloud Solutions II: Data and Machine Learning (10 labs)
All the above took almost 3–4 months to complete. As I said I’ve not studied rigorously as I work in Cognizant as Data Architect and Consultant and always had delivery pressure and timeline. After I complete all the previous training I took a break of nearly 3 months. So, I was not at all studying anything during this 3 months. I started again at the beginning of September 2019 with a proper plan aiming to gain the certificate. This time, I started studying regularly. I studied 2 hours a day 2 times a week. The course which I took during this time is the Course for Data Engineer and Data from Udemy. I recommend this course to everyone who wants to gain this certificate. I went through the course very minutely and tried to get as much as possible. Since I was already had a good knowledge in the subject, I could relate the courses by Udemy easily. I took notes again on the same notebook. You should always assess yourself regularly with the Sample Test provided by Google.
I followed A Good Success Story from Medium. I found this the best and I book marked this URL in my computer.
About the Exam
Go to the Official Site and check out the Syllabus. Also, go to the Practice Exam. You can give as many number of attempts for the Sample Exam.
Fact file
- The main aspect for cracking this exam is to have a complete 360 degree view about Google Cloud Platform and relating the Google Cloud with related open source technology. As for example Bigtable with HBase, Hadoop Cluster with Cloud Dataproc etc.
- Try to Go through your notes everyday and try to analyse your understanding with the various blobs, links and training, which is in opinion is enriching your learning and understanding.
- Try to discuss with your friends (like minded) with different solution of industry problems. One problem could have multiple solution, so try to draw the diagram staring from Data Ingestion to Data Publications.
- Machine Learning is a vast topic — DO NOT SPEND much time on it. Only thing which you’re going to learn is “How-To-Apply” the Machine Learning algorithm. What are the steps to be followed in order to run a distributed job using pySpark (for example). In exam you may be asked to provide the right solution using Opensource stack
- Storage is one of most important aspect in GCP Data Engineering exam. At 15–20 questions will be coming from this section. Apparently, this section looks easy in compare with AI or Dataflow, DO NOT TAKE THIS TOPIC LIGHTLY. This is main section where you have to score 100%. Read everything from Google Documentation regarding BigQuery (everything !!). Learn the use cases where you can use BigQuery and BigTable. Learn completely about Cloud Storage. Learn all the case studies published in Google Cloud related to Cloud Storage. You can expect 15 questions from this.
- A very good understanding about the Hadoop ecosystem. What are different components evolving Hadoop Eco System (Map-Reduce paradidgm, Columnar Data Storage, NoSQL DB, Hive, HBase, jobs). You can expect around 10 questions from this section.
- Streaming Data processing is one of the difficult section (according to mine) in GCP. You should have the complete grip on Windowing functionality and what are the basic usage of different Window Types. Cloud PubSub and Apache Kafka with Cloud Storage/Big Query/Big Table. You can expect 4–8 questions from this section.
- Different use Cases of Migrating on-premise Hadoop Cluster to Google Cloud Platform. There could be multiple answers for this type of questions. You have to be careful answering these questions. The key could be hidden in the statement. The statement for these questions are very big. If you’re not sure, mark these for later review. There are different Youtube Videos which you can refer. There are 3–7 questions from this section. It is important to know the concepts of Dataproc
- I could not find any specific question only dealing with App Engine or Compute Engine, but the questions could be coming in combination with Cloud Dataproc with one Master Node with some specification and Worker Node with some other specification. Now, question could be asked to build a Lift and Shift Hadoop Cluster what are different configuration you’ll prefer.
- Some questions are dealing with SSD Vs HDD, Persistent Storage (Ephemeral vs non-ephemeral), Nearline Vs Coldline, HDFS Vs Cloud Storage. There will definitely 2 questions on that.
- Understanding Vision API, Natural Language Processing API. Understand how call these API using Service Call and what are the different output with different parameters. You can expect around 5 questions from this.
- IAM and Admin — Frankly speaking, I am not confident about this. I just read in theory and never tried to experiment with real GCP environment. I found this section little difficult to understand. In my paper, I got 5–7 questions on this related to access, API Key, Service Account. Try to learn how many different options you’ve in order to provide access, There should not be any direct question, but, questions can be asked in combination with Stackdriver Motioning log file or Google Storage access in the account/project level/account level.
- Stackdriver Monitoring (Read this Blog) and Google Datalab are two areas which is related to almost 10% of the questions from the above sections. There cold not be any direct questions from these, but in order to answer the questions asked, a very good understanding of these topics are essential.
- The below diagram depicts the typical architecture for Data Engineering. Try to memorize the components used. While you can relate the Use Cases with the diagram below, you’ll be ready for the exam.
Finally hope you like the article. Feel free to add in comments in case of enrichment of the content. Wish all of you a best of luck.
Published By
Biswarup Ghoshal — Professional Data Engineer (Google Cloud)
Originally published at https://www.linkedin.com.