Setup Spark and Jupyter on Windows

Windows Terminal

Windows Terminal in Microsoft Store

Install Java

Checking version/availability of javac

Install Anaconda

Download Spark

Link to how to download Apache Spark

Install libraries to support Hadoop functionalities

Setup environment variables

> where.exe javac
> where.exe spark-shell
> where.exe winutils

Setup Jupyter and pyspark

> conda create -y -n pyspark python=3.6
> conda init powershell
> conda activate pyspark
> conda install -y -c conda-forge findspark
> conda install -y ipykernel
> python -m ipykernel install --user --name=pyspark

Test Jupyter and pyspark

import os
import sys
spark_path = os.environ['SPARK_HOME']
sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.10.9-src.zip")
import findsparkfindspark.init() import pyspark
number_cores = 8
memory_gb = 16
conf = (pyspark.SparkConf().setMaster('local[{}]'.format(number_cores)).set('spark.driver.memory', '{}g'.format(memory_gb))) sc = pyspark.SparkContext(conf=conf)

textFile = sc.textFile("PATH_TO_DOWNLOADED_SHAKESPEARE_TEXT_FILE")
wordcount = textFile.flatMap(lambda line: line.split(" ")) .map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)wordcount.saveAsTextFile("output-wordcount-01")

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store