Spark源码剖析(一):如何将spark源码导入到IDEA中

由于近期准备深入研究一下Spark的核心源码,所以开了这一系列用来记录自己研究spark源码的过程!

想要读源码,那么第一步肯定导入spark源码啦(笔者使用的是IntelliJ IDEA),在网上找了一圈,尝试了好几种方法都没有成功,最终通过自己摸索出了一种非常简单的方式(只需要两步即可!

环境要求

  1. IntelliJ IDEA(Community版本即可)
  2. maven(当然jdk是不可少的)

具体信息如下:

C:\Users\Administrator>mvn -version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: D:\java\apache-maven-3.3.9\bin\..
Java version: 1.8.0_151, vendor: Oracle Corporation
Java home: D:\java\jdk-1.8u151\jre
Default locale: zh_CN, platform encoding: GBK
OS name: "windows 7", version: "6.1", arch: "amd64", family: "dos"

  

顺便贴一下maven的settings.xml

<?xml version="1.0" encoding="UTF-8"?>

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

<!--
 | This is the configuration file for Maven. It can be specified at two levels:
 |
 |  1. User Level. This settings.xml file provides configuration for a single user,
 |                 and is normally provided in ${user.home}/.m2/settings.xml.
 |
 |                 NOTE: This location can be overridden with the CLI option:
 |
 |                 -s /path/to/user/settings.xml
 |
 |  2. Global Level. This settings.xml file provides configuration for all Maven
 |                 users on a machine (assuming they‘re all using the same Maven
 |                 installation). It‘s normally provided in
 |                 ${maven.conf}/settings.xml.
 |
 |                 NOTE: This location can be overridden with the CLI option:
 |
 |                 -gs /path/to/global/settings.xml
 |
 | The sections in this sample file are intended to give you a running start at
 | getting the most out of your Maven installation. Where appropriate, the default
 | values (values used when the setting is not specified) are provided.
 |
 |-->
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
  <!-- localRepository
   | The path to the local repository maven will use to store artifacts.
   |
   | Default: ${user.home}/.m2/repository
  <localRepository>D:\Mavenworkspace\m2\repository</localRepository>
  -->
<localRepository>D:\java\mavenRepository</localRepository>
  <!-- interactiveMode
   | This will determine whether maven prompts you when it needs input. If set to false,
   | maven will use a sensible default value, perhaps based on some other setting, for
   | the parameter in question.
   |
   | Default: true
  <interactiveMode>true</interactiveMode>
  -->

  <!-- offline
   | Determines whether maven should attempt to connect to the network when executing a build.
   | This will have an effect on artifact downloads, artifact deployment, and others.
   |
   | Default: false
  <offline>false</offline>
  -->

  <!-- pluginGroups
   | This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e.
   | when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers
   | "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list.
   |-->
  <pluginGroups>
    <!-- pluginGroup
     | Specifies a further group identifier to use for plugin lookup.
    <pluginGroup>com.your.plugins</pluginGroup>
    -->
  </pluginGroups>

  <!-- proxies
   | This is a list of proxies which can be used on this machine to connect to the network.
   | Unless otherwise specified (by system property or command-line switch), the first proxy
   | specification in this list marked as active will be used.
   |-->
  <proxies>
    <!-- proxy
     | Specification for one proxy, to be used in connecting to the network.
     |
    <proxy>
      <id>optional</id>
      <active>true</active>
      <protocol>http</protocol>
      <username>proxyuser</username>
      <password>proxypass</password>
      <host>proxy.host.net</host>
      <port>80</port>
      <nonProxyHosts>local.net|some.host.com</nonProxyHosts>
    </proxy>
    -->
  </proxies>

  <!-- servers
   | This is a list of authentication profiles, keyed by the server-id used within the system.
   | Authentication profiles can be used whenever maven must make a connection to a remote server.
   |-->
  <servers>
    <!-- server
     | Specifies the authentication information to use when connecting to a particular server, identified by
     | a unique name within the system (referred to by the ‘id‘ attribute below).
     |
     | NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are
     |       used together.
     |
    <server>
      <id>deploymentRepo</id>
      <username>repouser</username>
      <password>repopwd</password>
    </server>
    -->

    <!-- Another sample, using keys to authenticate.
    <server>
      <id>siteServer</id>
      <privateKey>/path/to/private/key</privateKey>
      <passphrase>optional; leave empty if not used.</passphrase>
    </server>
    -->
  </servers>

  <!-- mirrors
   | This is a list of mirrors to be used in downloading artifacts from remote repositories.
   |
   | It works like this: a POM may declare a repository to use in resolving certain artifacts.
   | However, this repository may have problems with heavy traffic at times, so people have mirrored
   | it to several places.
   |
   | That repository definition will have a unique id, so we can create a mirror reference for that
   | repository, to be used as an alternate download site. The mirror site will be the preferred
   | server for that repository.
   |-->
  <mirrors>
    <!-- mirror
     | Specifies a repository mirror site to use instead of a given repository. The repository that
     | this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used
     | for inheritance and direct lookup purposes, and must be unique across the set of mirrors.
     |
    <mirror>
      <id>mirrorId</id>
      <mirrorOf>repositoryId</mirrorOf>
      <name>Human Readable Name for this Mirror.</name>
      <url>http://my.repository.com/repo/path</url>
    </mirror>
     -->
     <mirror>
        <id>nexus-osc</id>
        <mirrorOf>central</mirrorOf>
        <name>Nexus osc</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    </mirror>

    <mirror>
        <id>osc_thirdparty</id>
        <mirrorOf>thirdparty</mirrorOf>
        <url>http://maven.aliyun.com/nexus/content/repositories/central</url>
    </mirror>
  </mirrors>

  <!-- profiles
   | This is a list of profiles which can be activated in a variety of ways, and which can modify
   | the build process. Profiles provided in the settings.xml are intended to provide local machine-
   | specific paths and repository locations which allow the build to work in the local environment.
   |
   | For example, if you have an integration testing plugin - like cactus - that needs to know where
   | your Tomcat instance is installed, you can provide a variable here such that the variable is
   | dereferenced during the build process to configure the cactus plugin.
   |
   | As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles
   | section of this document (settings.xml) - will be discussed later. Another way essentially
   | relies on the detection of a system property, either matching a particular value for the property,
   | or merely testing its existence. Profiles can also be activated by JDK version prefix, where a
   | value of ‘1.4‘ might activate a profile when the build is executed on a JDK version of ‘1.4.2_07‘.
   | Finally, the list of active profiles can be specified directly from the command line.
   |
   | NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact
   |       repositories, plugin repositories, and free-form properties to be used as configuration
   |       variables for plugins in the POM.
   |
   |-->
  <profiles>
    <!-- profile
     | Specifies a set of introductions to the build process, to be activated using one or more of the
     | mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/>
     | or the command line, profiles have to have an ID that is unique.
     |
     | An encouraged best practice for profile identification is to use a consistent naming convention
     | for profiles, such as ‘env-dev‘, ‘env-test‘, ‘env-production‘, ‘user-jdcasey‘, ‘user-brett‘, etc.
     | This will make it more intuitive to understand what the set of introduced profiles is attempting
     | to accomplish, particularly when you only have a list of profile id‘s for debug.
     |
     | This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo.
    <profile>
      <id>jdk-1.4</id>

      <activation>
        <jdk>1.4</jdk>
      </activation>

      <repositories>
        <repository>
          <id>jdk14</id>
          <name>Repository for JDK 1.4 builds</name>
          <url>http://www.myhost.com/maven/jdk14</url>
          <layout>default</layout>
          <snapshotPolicy>always</snapshotPolicy>
        </repository>
      </repositories>
    </profile>
    -->
<profile>
      <profile>
        <id>jdk-1.8</id>
      <activation>
        <activeByDefault>true</activeByDefault>
        <jdk>1.8</jdk>
      </activation>
      <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <maven.compiler.compilerVersion>1.8</maven.compiler.compilerVersion>
      </properties>

        <repositories>

            <repository>
                <id>nexus</id>
                <name>local private nexus</name>
                <url>http://maven.aliyun.com/nexus/content/repositories/central</url>
                <releases>
                    <enabled>true</enabled>
                </releases>
                <snapshots>
                    <enabled>false</enabled>
                </snapshots>
            </repository>

        </repositories>

        <pluginRepositories>

            <pluginRepository>
                    <id>nexus</id>
                    <name>local private nexus</name>
                    <url>http://maven.aliyun.com/nexus/content/repositories/central</url>
                    <releases>
                        <enabled>true</enabled>
                    </releases>
                    <snapshots>
                        <enabled>false</enabled>
                    </snapshots>
            </pluginRepository>

        </pluginRepositories>
    </profile>

    <profile>
        <id>osc</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>

        <repositories>            

            <repository>
                <id>osc</id>
                <url>http://maven.aliyun.com/nexus/content/repositories/central</url>
            </repository>

            <repository>
                <id>osc_thirdparty</id>
                <url>http://maven.aliyun.com/nexus/content/repositories/central</url>
            </repository>

        </repositories>

        <pluginRepositories>

            <pluginRepository>
                <id>osc</id>
                <url>http://maven.aliyun.com/nexus/content/repositories/central</url>
            </pluginRepository>

        </pluginRepositories>
    </profile>

    </profile>

    <!--
     | Here is another profile, activated by the system property ‘target-env‘ with a value of ‘dev‘,
     | which provides a specific path to the Tomcat instance. To use this, your plugin configuration
     | might hypothetically look like:
     |
     | ...
     | <plugin>
     |   <groupId>org.myco.myplugins</groupId>
     |   <artifactId>myplugin</artifactId>
     |
     |   <configuration>
     |     <tomcatLocation>${tomcatPath}</tomcatLocation>
     |   </configuration>
     | </plugin>
     | ...
     |
     | NOTE: If you just wanted to inject this configuration whenever someone set ‘target-env‘ to
     |       anything, you could just leave off the <value/> inside the activation-property.
     |
    <profile>
      <id>env-dev</id>

      <activation>
        <property>
          <name>target-env</name>
          <value>dev</value>
        </property>
      </activation>

      <properties>
        <tomcatPath>/path/to/tomcat/instance</tomcatPath>
      </properties>
    </profile>
    -->
  </profiles>

  <!-- activeProfiles
   | List of profiles that are active for all builds.
   |
  <activeProfiles>
    <activeProfile>alwaysActiveProfile</activeProfile>
    <activeProfile>anotherAlwaysActiveProfile</activeProfile>
  </activeProfiles>
  -->
</settings>

settings.xml

好了,一旦环境准备就绪,那就速战速决吧!

第一步:从github上下载源代码

先选择你想要阅读的spark版本,笔者这里选择的是spark1.3版本

接着直接下载zip包到本地解压(当然也可以使用git拉下来啦)

第二步:使用IDEA导入spark源码

打开你的IntelliJ IDEA ,File -> Open 选中你源码解压后的文件夹即可! (不需要使用Import

到这里基本已经大功告成!接下来只需要等待maven解决各种依赖即可(大概需要半个小时,大家耐心一点)

成功后的界面如下(提示:可以使用ctrl + N 搜索你想要阅读的类文件):

原文地址:https://www.cnblogs.com/LiCheng-/p/8128003.html

时间: 2024-11-05 21:58:50

Spark源码剖析(一):如何将spark源码导入到IDEA中的相关文章

libgo 源码剖析(2. libgo调度策略源码实现)

本文将从源码实现上对 libgo 的调度策略进行分析,主要涉及到上一篇文章中的三个结构体的定义: 调度器 Scheduler(简称 S) 执行器 Processer(简称 P) 协程 Task(简称 T) 三者的关系如下图所示: 本文会列出类内的主要成员和主要函数做以分析. 1. 协程调度器:class Scheduler libgo/scheduler/scheduler.h class Scheduler{ public: /* * 创建一个调度器,初始化 libgo * 创建主线程的执行器

STL源码剖析——STL算法之merge合并算法

前言 由于在前文的<STL算法剖析>中,源码剖析非常多,不方便学习,也不方便以后复习,这里把这些算法进行归类,对他们单独的源码剖析进行讲解.本文介绍的STL算法中的merge合并算法.源码中介绍了函数merge.inplace_merge.并对这些函数的源码进行详细的剖析,并适当给出使用例子,具体详见下面源码剖析. merge合并算法源码剖析 // merge, with and without an explicitly supplied comparison function. //将两个

Redis源码剖析和注释(八)--- 对象系统(redisObject)

Redis 对象系统 1. 介绍 redis中基于双端链表.简单动态字符串(sds).字典.跳跃表.整数集合.压缩列表.快速列表等等数据结构实现了一个对象系统,并且实现了5种不同的对象,每种对象都使用了至少一种前面的数据结构,优化对象在不同场合下的使用效率. 双端链表源码剖析和注释 简单动态字符串(SDS)源码剖析和注释 字典结构源码剖析和注释 跳跃表源码剖析和注释 整数集合源码剖析和注释 压缩列表源码剖析和注释 快速列表源码剖析和注释 2. 对象的系统的实现 redis 3.2版本.所有注释在

Java HashSet和HashMap源码剖析

转自: Java HashSet和HashMap源码剖析 总体介绍 之所以把HashSet和HashMap放在一起讲解,是因为二者在Java里有着相同的实现,前者仅仅是对后者做了一层包装,也就是说HashSet里面有一个HashMap(适配器模式).因此本文将重点分析HashMap. HashMap实现了Map接口,允许放入null元素,除该类未实现同步外,其余跟Hashtable大致相同,跟TreeMap不同,该容器不保证元素顺序,根据需要该容器可能会对元素重新哈希,元素的顺序也会被重新打散,

Java ArrayList源码剖析

转自: Java ArrayList源码剖析 总体介绍 ArrayList实现了List接口,是顺序容器,即元素存放的数据与放进去的顺序相同,允许放入null元素,底层通过数组实现.除该类未实现同步外,其余跟Vector大致相同.每个ArrayList都有一个容量(capacity),表示底层数组的实际大小,容器内存储元素的个数不能多于当前容量.当向容器中添加元素时,如果容量不足,容器会自动增大底层数组的大小.前面已经提过,Java泛型只是编译器提供的语法糖,所以这里的数组是一个Object数组

Netty源码剖析与实战

课程目录:01.课程介绍02.内容综述03.揭开Netty面纱04.为什么舍近求远:不直接用JDKNIO?05.为什么孤注一掷:独选Netty?06.Netty的前尘往事07.Netty的现状与趋势08.Netty怎么切换三种I-O模式?09.源码剖析:Netty对I-O模式的支持10.Netty如何支持三种Reactor?11.源码剖析:Netty对Reactor的支持12.TCP粘包-半包Netty全搞定13.源码剖析:Netty对处理粘包-半包的支持14.常用的“二次”编解码方式15.源码

(升级版)Spark从入门到精通(Scala编程、案例实战、高级特性、Spark内核源码剖析、Hadoop高端)

本课程主要讲解目前大数据领域最热门.最火爆.最有前景的技术——Spark.在本课程中,会从浅入深,基于大量案例实战,深度剖析和讲解Spark,并且会包含完全从企业真实复杂业务需求中抽取出的案例实战.课程会涵盖Scala编程详解.Spark核心编程.Spark SQL和Spark Streaming.Spark内核以及源码剖析.性能调优.企业级案例实战等部分.完全从零起步,让学员可以一站式精通Spark企业级大数据开发,提升自己的职场竞争力,实现更好的升职或者跳槽,或者从j2ee等传统软件开发工程

《Apache Spark源码剖析》

Spark Contributor,Databricks工程师连城,华为大数据平台开发部部长陈亮,网易杭州研究院副院长汪源,TalkingData首席数据科学家张夏天联袂力荐1.本书全面.系统地介绍了Spark源码,深入浅出,细致入微2.提供给读者一系列分析源码的实用技巧,并给出一个合理的阅读顺序3.始终抓住资源分配.消息传递.容错处理等基本问题,抽丝拨茧4.一步步寻找答案,所有问题迎刃而解,使读者知其然更知其所以然 内容简介 书籍计算机书籍 <Apache Spark源码剖析>以Spark

Spark源码剖析——SparkContext的初始化(四)_TaskScheduler的启动

7. TaskScheduler的启动 第五节介绍了TaskScheduler的创建,要想TaskScheduler发挥作用,必须要启动它,代码: TaskScheduler在启动的时候,实际调用了backend的start方法,即同时启动了backend.local模式下,这里的backend是localSchedulerBackend.在TaskScheduler初始化时传入localSchedulerBackend.以LocalSchedulerBackend为例,启动LocalSched

Spark修炼之道(高级篇)——Spark源码阅读:第一节 Spark应用程序提交流程

作者:摇摆少年梦 微信号: zhouzhihubeyond spark-submit 脚本应用程序提交流程 在运行Spar应用程序时,会将spark应用程序打包后使用spark-submit脚本提交到Spark中运行,执行提交命令如下: root@sparkmaster:/hadoopLearning/spark-1.5.0-bin-hadoop2.4/bin# ./spark-submit --master spark://sparkmaster:7077 --class SparkWordC