EMR AWS EMR is a managed service provided by AWS to run Spark, HDFS, HIVE and other select software. Protip: Start the EMR cluster only after you have you project setup to prevent unnecessary cost We will use EMR to run our Spark and HDFS cluster Go to AWS Service -> EMR Click on Create Cluster Click on the Go to advanced options Select the shown options and copy paste the config below into the Edit software settings section| www.startdataengineering.com
1. AWS account Sign up for an AWS account at AWS Sign Up . You will be eligible for some free services for the first time sign up, ref: AWS Free Tier get your access key by clicking on your name -> My Security Credentials on the top pane and then clicking Create New Access Key. download to a safe location, you wont be able to see it a second time Install AWS CLI from AWS cli Configure your cli from the terminal by typing in aws configure, and use the access credentials from step 2 and for reg...| www.startdataengineering.com
Performance Tuning| spark.apache.org