澶辨晥閾炬帴澶勭悊 |
澶ф暟鎹妧鏈箣Spark鍩虹瑙f瀽 PDF 涓嬭澆
鏈珯鏁寸悊涓嬭澆錛?/strong>
閾炬帴錛?a target="_blank">https://pan.baidu.com/s/1OOzVirXhR1e8wV3T3vMBfw
鎻愬彇鐮侊細(xì)h38x
鐩稿叧鎴浘錛?/strong>
![]()
涓昏鍐呭錛?/strong>
絎?1 绔?Spark 姒傝堪 1.1浠€涔堟槸 Spark Spark 鏄竴縐嶅揩閫熴€侀€氱敤銆佸彲鎵╁睍鐨勫ぇ鏁版嵁鍒嗘瀽寮曟搸錛?009 騫磋癁鐢熶簬鍔犲窞澶у浼厠鍒╁垎鏍?AMPLab錛?010 騫村紑婧愶紝2013 騫?6 鏈堟垚涓?Apache 瀛靛寲欏圭洰錛?014 騫?2 鏈堟垚涓?Apache 欏剁駭 欏圭洰銆傞」鐩槸鐢?Scala 榪涜緙栧啓銆?1.2Spark 鍐呯疆妯″潡 Spark Core錛氬疄鐜頒簡 Spark 鐨勫熀鏈姛鑳斤紝鍖呭惈浠誨姟璋冨害銆佸唴瀛樼鐞嗐€侀敊璇仮澶嶃€佷笌瀛樺偍 緋葷粺浜や簰絳夋ā鍧椼€係park Core 涓繕鍖呭惈浜嗗寮規(guī)€у垎甯冨紡鏁版嵁闆?Resilient Distributed DataSet錛?綆€縐?RDD)鐨?API 瀹氫箟銆?Spark SQL錛氭槸Spark鐢ㄦ潵鎿嶄綔緇撴瀯鍖栨暟鎹殑紼嬪簭鍖呫€傞€氳繃Spark SQL錛屾垜浠彲浠ヤ嬌鐢?SQL 鎴栬€?Apache Hive 鐗堟湰鐨?SQL 鏂硅█(HQL)鏉ユ煡璇㈡暟鎹€係park SQL 鏀寔澶氱鏁版嵁婧愶紝姣斿 Hive 琛ㄣ€丳arquet 浠ュ強(qiáng) JSON 絳夈€?Spark Streaming錛氭槸 Spark 鎻愪緵鐨勫瀹炴椂鏁版嵁榪涜嫻佸紡璁$畻鐨勭粍浠躲€傛彁渚涗簡鐢ㄦ潵鎿嶄綔鏁?鎹祦鐨?API錛屽茍涓斾笌 Spark Core 涓殑 RDD API 楂樺害瀵瑰簲銆?Spark MLlib錛氭彁渚涘父瑙佺殑鏈哄櫒瀛︿範(fàn)(ML)鍔熻兘鐨勭▼搴忓簱銆傚寘鎷垎綾匯€佸洖褰掋€佽仛綾匯€佸崗鍚?榪囨護(hù)絳夛紝榪樻彁渚涗簡妯″瀷璇勪及銆佹暟鎹?瀵煎叆絳夐澶栫殑鏀寔鍔熻兘銆?/div>
闆嗙兢綆$悊鍣細(xì)Spark 璁捐涓哄彲浠ラ珮鏁堝湴鍦ㄤ竴涓綆楄妭鐐瑰埌鏁板崈涓綆楄妭鐐逛箣闂翠幾緙╄ 綆椼€備負(fù)浜嗗疄鐜拌繖鏍風(fēng)殑瑕佹眰錛屽悓鏃惰幏寰楁渶澶х伒媧繪€э紝Spark 鏀寔鍦ㄥ悇縐嶉泦緹ょ鐞嗗櫒(Cluster Manager)涓婅繍琛岋紝鍖呮嫭 Hadoop YARN銆丄pache Mesos錛屼互鍙?Spark 鑷甫鐨勪竴涓畝鏄撹皟搴?鍣紝 鍙綔鐙珛璋冨害鍣ㄣ€?Spark 寰楀埌浜嗕紬澶氬ぇ鏁版嵁鍏徃鐨勬敮鎸侊紝榪欎簺鍏徃鍖呮嫭 Hortonworks銆両BM銆両ntel銆丆loudera銆?MapR銆丳ivotal銆佺櫨搴︺€侀樋閲屻€佽吘璁€佷含涓溿€佹惡紼嬨€佷紭閰峰湡璞嗐€傚綋鍓嶇櫨搴︾殑 Spark 宸插簲鐢ㄤ簬 澶ф悳绱€佺洿杈懼彿銆佺櫨搴﹀ぇ鏁版嵁絳変笟鍔★紱闃塊噷鍒╃敤 GraphX 鏋勫緩浜嗗ぇ瑙勬ā鐨勫浘璁$畻鍜屽浘鎸栨帢緋?緇燂紝瀹炵幇浜嗗緢澶氱敓浜х郴緇熺殑鎺ㄨ崘綆楁硶錛涜吘璁?Spark 闆嗙兢杈懼埌 8000 鍙扮殑瑙勬ā錛屾槸褰撳墠宸茬煡鐨?涓栫晫涓婃渶澶х殑 Spark 闆嗙兢銆?1.3 Spark 鐗圭偣 蹇笌 Hadoop 鐨?MapReduce 鐩告瘮錛孲park 鍩轟簬鍐呭瓨鐨勮繍綆楄蹇?100 鍊嶄互涓婏紝鍩轟簬紜?鐩樼殑榪愮畻涔熻蹇?10 鍊嶄互涓娿€係park 瀹炵幇浜嗛珮鏁堢殑 DAG 鎵ц寮曟搸錛屽彲浠ラ€氳繃鍩轟簬 鍐呭瓨鏉ラ珮鏁堝鐞嗘暟鎹祦銆傝綆楃殑涓棿緇撴灉鏄瓨鍦ㄤ簬鍐呭瓨涓殑銆?鏄撶敤Spark 鏀寔 Java銆丳ython 鍜?Scala 鐨?API錛岃繕鏀寔瓚呰繃 80 縐嶉珮綰х畻娉曪紝浣跨敤鎴峰彲 浠ュ揩閫熸瀯寤轟笉鍚岀殑搴旂敤銆傝€屼笖 Spark 鏀寔浜や簰寮忕殑 Python 鍜?Scala 鐨?shell錛屽彲浠?闈炲父鏂逛究鍦板湪榪欎簺 shell 涓嬌鐢?Spark 闆嗙兢鏉ラ獙璇佽В鍐抽棶棰樼殑鏂規(guī)硶銆?閫氱敤Spark 鎻愪緵浜嗙粺涓€鐨勮В鍐蟲柟妗堛€係park 鍙互鐢ㄤ簬鎵瑰鐞嗐€佷氦浜掑紡鏌ヨ錛圫park SQL錛夈€?瀹炴椂嫻佸鐞嗭紙Spark Streaming錛夈€佹満鍣ㄥ涔?fàn)锛圫park MLlib錛夊拰鍥捐綆楋紙GraphX錛夈€?榪欎簺涓嶅悓綾誨瀷鐨勫鐞嗛兘鍙互鍦ㄥ悓涓€涓簲鐢ㄤ腑鏃犵紳浣跨敤銆係park 緇熶竴鐨勮В鍐蟲柟妗堥潪 甯稿叿鏈夊惛寮曞姏錛屾瘯绔熶換浣曞叕鍙擱兘鎯崇敤緇熶竴鐨勫鉤鍙板幓澶勭悊閬囧埌鐨勯棶棰橈紝鍑忓皯寮€鍙戝拰 緇存姢鐨勪漢鍔涙垚鏈拰閮ㄧ講騫沖彴鐨勭墿鍔涙垚鏈€?鍏煎鎬?/div>
Spark 鍙互闈炲父鏂逛究鍦頒笌鍏朵粬鐨勫紑婧愪駭鍝佽繘琛岃瀺鍚堛€傛瘮濡傦紝Spark 鍙互浣跨敤 Hadoop 鐨?YARN 鍜?Apache Mesos 浣滀負(fù)瀹冪殑璧勬簮綆$悊鍜岃皟搴﹀櫒錛屽櫒錛屽茍涓斿彲浠ュ鐞嗘墍鏈?Hadoop 鏀寔鐨勬暟鎹紝鍖呮嫭 HDFS銆丠Base 鍜?Cassandra 絳夈€傝繖瀵逛簬宸茬粡閮ㄧ講 Hadoop 闆嗙兢鐨勭敤鎴風(fēng)壒鍒噸瑕侊紝鍥犱負(fù)涓嶉渶瑕佸仛浠諱綍鏁版嵁榪佺Щ灝卞彲浠ヤ嬌鐢?Spark 鐨勫己澶у鐞?鑳藉姏銆係park 涔熷彲浠ヤ笉渚濊禆浜庣涓夋柟鐨勮祫婧愮鐞嗗拰璋冨害鍣紝瀹冨疄鐜頒簡 Standalone 浣滀負(fù)鍏跺唴緗殑璧勬簮綆$悊鍜岃皟搴︽鏋訛紝榪欐牱榪涗竴姝ラ檷浣庝簡 Spark 鐨勪嬌鐢ㄩ棬妲涳紝浣垮緱 鎵€鏈変漢閮藉彲浠ラ潪甯稿鏄撳湴閮ㄧ講鍜屼嬌鐢?Spark銆傛澶栵紝Spark 榪樻彁渚涗簡鍦?EC2 涓婇儴 緗?Standalone 鐨?Spark 闆嗙兢鐨勫伐鍏楓€?絎?2 绔?Spark 榪愯妯″紡 2.1 Spark 瀹夎鍦板潃 1錛庡畼緗戝湴鍧€ http://spark.apache.org/ 2錛庢枃?。鏌ョ湅鍦板潃 https://spark.apache.org/docs/2.1.1/ 3錛庝笅杞藉湴鍧€ https://spark.apache.org/downloads.html 2.3 Local 妯″紡 2.3.1 姒傝堪 Local 妯″紡灝辨槸榪愯鍦ㄤ竴鍙拌綆楁満涓婄殑妯″紡錛岄€氬父灝辨槸鐢ㄤ簬鍦ㄦ湰鏈轟笂緇冩墜鍜屾祴 璇曘€傚畠鍙互閫氳繃浠ヤ笅闆嗕腑鏂瑰紡璁劇疆 Master Local:鎵€鏈夎綆楅兘榪愯鍦ㄤ竴涓嚎紼嬪綋涓紝娌℃湁浠諱綍騫惰璁$畻錛岄€氬父鎴戜滑鍦ㄦ湰鏈烘墽 涓€浜涙祴璇曚唬鐮侊紝鎴栬€呯粌鎵嬶紝灝辯敤榪欑妯″紡 Local[K]鎸囧畾浣跨敤鍑犱釜綰跨▼鏉ヨ繍琛岃綆楋紝姣斿 local[4]灝辨槸榪愯 4 涓?Worker 綰跨▼銆傞€氬父鎴戜滑鐨?Cpu 鏈夊嚑涓?Core錛屽氨鎸囧畾鍑犱釜綰跨▼錛屾渶澶у寲鍒╃敤 Cpu 鐨勮綆?鑳藉姏Loca[*]榪欑妯″紡鐩存帴甯綘鎸夌収 Cpu 鏈€澶?Cores 鏉ヨ緗嚎紼嬫暟浜嗐€?2.3.2 瀹夎浣跨敤 1錛変笂浼犲茍瑙e帇 spark 瀹夎鍖?[itstar@bigdata111 sorfware]$ tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz -C /opt/module/ [itstar@bigdata111 module]$ mv spark-2.1.1-bin-hadoop2.7 spark
2錛夊畼鏂規(guī)眰 PI 妗堜緥 [itstar@bigdata111 spark]$ bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --executor-memory 1G \ --total-executor-cores 2 \ ./examples/jars/spark-examples_2.11-2.1.1.jar \ 100 錛?錛夊熀鏈娉?bin/spark-submit \ --class <main-class> --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments] 錛?錛夊弬鏁拌鏄庯細(xì) --master 鎸囧畾 Master 鐨勫湴鍧€錛岄粯璁や負(fù) Local --class: 浣犵殑搴旂敤鐨勫惎鍔ㄧ被 (濡?org.apache.spark.examples.SparkPi) --deploy-mode: 鏄惁鍙戝竷浣犵殑椹卞姩鍒?worker 鑺傜偣(cluster) 鎴栬€呬綔涓轟竴涓湰鍦板鎴風(fēng) (client) (default: client)* --conf: 浠繪剰鐨?Spark 閰嶇疆灞炴€э紝 鏍煎紡 key=value. 濡傛灉鍊煎寘鍚┖鏍鹼紝鍙互鍔犲紩鍙?“key=value” application-jar: 鎵撳寘濂界殑搴旂敤 jar,鍖呭惈渚濊禆. 榪欎釜 URL 鍦ㄩ泦緹や腑鍏ㄥ眬鍙銆傛瘮濡?hdfs:// 鍏變韓瀛樺偍緋葷粺錛?濡傛灉鏄?file:// path錛?閭d箞鎵€鏈夌殑鑺傜偣鐨?path 閮藉寘鍚悓鏍風(fēng)殑 jar application-arguments: 浼犵粰 main()鏂規(guī)硶鐨勫弬鏁?--executor-memory 1G 鎸囧畾姣忎釜 executor 鍙敤鍐呭瓨涓?1G --total-executor-cores 2 鎸囧畾姣忎釜 executor 浣跨敤鐨?cup 鏍告暟涓?2 涓?3錛夌粨鏋滃睍紺?璇ョ畻娉曟槸鍒╃敤钂欑壒·鍗$綏綆楁硶姹?PI
|