Kubernetes (K8s), the open source container orchestration platform, is a big deal, all around the industry. And beyond container technology per se, K8s is really a cluster computing platform, which has made it increasingly important in the big data space. Meanwhile, the major cloud big data services — including Amazon Web Services’ (AWS’) Elastic MapReduce (EMR), Microsoft’s Azure HDInsight (HDI) and Google Cloud Dataproc — have heretofore each run Apache Spark on virtual machine-based Hadoop clusters. In this day and age, wouldn’t running Spark directly on K8s clusters make more sense?
Not surprisingly, Google, the company that created K8s, thinks the answer to that question is yes. And so, today, the company is announcing the Alpha release of Cloud Dataproc for Kubernetes (K8s Dataproc), allowing Spark to run directly on Google Kubernetes Engine (GKE)-based K8s clusters. The service promises to reduce complexity, in terms of open source data components’ inter-dependencies, and portability of Spark applications. That should allow data engineers, analytics experts and data scientists to run their Spark workloads in a streamlined way, with less integration and versioning hassles.
To read the entire article, please click on https://www.zdnet.com/article/google-announces-alpha-of-cloud-dataproc-for-kubernetes/