일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
Tags
- TensorFlow
- licence delete curl
- API
- plugin
- analyzer test
- zip 파일 암호화
- Test
- MySQL
- flask
- token filter test
- License
- ELASTIC
- docker
- springboot
- 900gle
- aggs
- Mac
- Elasticsearch
- zip 암호화
- high level client
- sort
- license delete
- aggregation
- Python
- 차트
- matplotlib
- Kafka
- query
- Java
- 파이썬
Archives
- Today
- Total
개발잡부
Spark install 본문
반응형
○ 파일 다운로드
먼저 설치파일을 다운 받습니다.
spark.apache.org/downloads.html
다운로드 파일 :
https://dlcdn.apache.org/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz
설치경로 & 압축 해제
원하는 위치에 파일을 복사하고 압축을 해제
tar -xvf spark-3.2.1-bin-hadoop3.2.tgz
mv spark-3.2.1-bin-hadoop3.2 spark-3.2.1
설정
spark 설정파일은 SPARK_HOME/conf
cd $SPARK_HOME/conf
cp spark-defaults.conf.template spark-defaults.conf
cp spark-env.sh.template spark-env.sh
cp log4j.properties.template log4j.properties
log4j 설정
vi log4j.properties
log4j.rootCategory=WARN, console
실행
spark 실행되는지 확인
cd $SPARK_HOME/
./bin/pyspark
정상 결과 화면입니다.
Python 3.9.7 (default, Sep 16 2021, 08:50:36)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
22/06/16 21:59:10 WARN Utils: Your hostname, iduhyeon-ui-MacBookPro-2.local resolves to a loopback address: 127.0.0.1; using 172.30.1.48 instead (on interface en0)
22/06/16 21:59:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/doo/spark/spark-3.2.1/jars/spark-unsafe_2.12-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/06/16 21:59:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.2.1
/_/
Using Python version 3.9.7 (default, Sep 16 2021 08:50:36)
Spark context Web UI available at http://172.30.1.48:4040
Spark context available as 'sc' (master = local[*], app id = local-1655384352356).
SparkSession available as 'spark'.
>>>
○ 예제 실행
기본동작인 워드카운트를 해보겠습니다.
lines =sc.textFile("README.md")
lines.count()
lines.first()
pythonLines = lines.filter(lambda line : "Python" in line)
pythonLines.first()
한줄씩 쳐보고 결과를 확인합니다.
>>> lines =sc.textFile("README.md")
>>> lines.count()
108
>>> lines.first()
'# Apache Spark'
>>> pythonLines = lines.filter(lambda line : "Python" in line)
>>> pythonLines.first()
'high-level APIs in Scala, Java, Python, and R, and an optimized engine that'
이번엔 python 파일을 만들어 보겠습니다.
# test.py
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("test")
sc = SparkContext(conf=conf)
lines =sc.textFile("./README.md")
print("lines.count() :", lines.count() )
print("lines.first() :", lines.first() )
pythonLines = lines.filter(lambda line : "Python" in line)
print("pythonLines.first() :", pythonLines.first())
파일을 저장한 뒤 spark 로 실행해 보겠습니다.
./bin/spark-submit test.py
로그가 중간에 많아서 찾기는 힘들지만 동일한 결과를 보여줍니다.
반응형
'etc.' 카테고리의 다른 글
brakets download (0) | 2022.09.18 |
---|---|
[mac] mkdir Read-only file system (0) | 2022.08.07 |
Process 'command '/Library/Java/JavaVirtualMachines/adoptopenjdk-15.jdk/Contents/Home/bin/java'' finished with non-zero exit value 1 (0) | 2022.06.05 |
[java] spark (0) | 2022.06.01 |
[Sourcetree] 푸시하기 (0) | 2022.04.05 |
Comments