chapter 1 introduction to the analysis with spark
the conponents of Sparks
spark core(contains the basic functionality of sparks. spark Core is also the home to the APIs that defines the RDDs),
spark sql(structured data ) is the package for working with the structured data.it allow query data via SQL as well as Apache hive , and it support many sources of data ,including Hive tables ,Parquet And jason.also allow developers to intermix SQL queries with the programatic data manipulation supported By the RDDs in Python ,java And Scala .
spark streaming(real-time),enables processing the live of streaming of data.
MLib(machine learning)
GraphX(graph processing )is a library for manipulating the graph .
A Brief History of Spark
spark is a open source project that has beed And is maintained By a thriving And diverse community of developer .
chapter 2 downloading spark And getting started
walking through the process of downloding And running the sprak on local mode on single computer .
you don‘t needmaster Scala,java orPython.Spark itself is written in Scala, and runs on the Java Virtual Machine (JVM). To run Spark
on either your laptop or a cluster, all you need is an installation of Java 6 or newer. If you wish to use the Python API you will also need a Python interpreter (version 2.6 or newer).Spark does not yet work with Python 3.
downloading spark,select the "pre-build for Hadoop 2.4 And later".
tips:
widows user May run into issues installing .you can use the ziptool untar the .tar file Note :instatll spark in a directionalry with no space (e.g. c:\spark).
after you untar you will get a new directionaru with the same name but without the final .tar suffix .
damn it:
Most of this book includes code in all of Spark’s languages, but interactive shells are
available only in Python and Scala. Because a shell is very useful for learning the API, we recommend using one of these languages for these examples even if you are a Java
developer. The API is similar in every language.
change the directionaty to the spark,type bin\pyspark,you will see the logo.