Interviews

What’s new with your favorite virtualization companies and executives.

Events

Check out what’s happening in your area, from webinars to lunch and learns.

Blogs

Get the scoop on the latest technology news from industry experts.

How To’s

Step by step instructions on a variety of need to know virtualization topics.

News

Take a look at the industries most recent company and product annoucements.


Home » Blogs

What is Apache Spark? The big data analytics platform explained

Submitted by on November 14, 2017 – 1:49 amNo Comment

From its humble beginnings in the AMPLab at U.C. Berkeley in 2009, Apache Spark has become one of the key big data distributed processing frameworks in the world. Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. You’ll find it used by banks, telecommunications companies, games companies, governments, and all of the major tech giants such as Apple, Facebook, IBM, and Microsoft.

Big Data

Out of the box, Spark can run in a standalone cluster mode that simply requires the Apache Spark framework and a JVM on each machine in your cluster. However, it’s more likely you’ll want to take advantage of a resource or cluster management system to take care of allocating workers on demand for you. In the enterprise, this will normally mean running on Hadoop YARN (this is how the Cloudera and Hortonworks distributions run Spark jobs), but Apache Spark can also run on Apache Mesos, while work is progressing on adding native support for Kubernetes.

To read the entire article, please click on this https://www.itworld.com/article/3236869/analytics/what-is-apache-spark-the-big-data-analytics-platform-explained.html