返回顶部
首页 > 资讯 > 后端开发 > Python >Hadoop源码分析三启动及脚本剖析
  • 610
分享到

Hadoop源码分析三启动及脚本剖析

2024-04-02 19:04:59 610人浏览 独家记忆

Python 官方文档:入门教程 => 点击学习

摘要

目录1、 启动2、 脚本分析start-all.sh脚本内容如下:start-dfs.sh的内容如下:启动上述角色调用的hadoop-daemons.sh脚本内容如下:我们继续看ha

1、 启动

hadoop的启动是通过其sbin目录下的脚本来启动的。与启动相关的叫脚本有以下几个:

start-all.shstart-dfs.shstart-yarn.shhadoop-daemon.shyarn-daemon.sh

hadoop-daemon.sh是用来启动与hdfs相关的服务的

yarn-daemon.sh是用来启动和yarn相关的服务

start-dfs.sh是用来启动hdfs集群

start-yarn.sh是用来启动yarn集群

start-all.sh是用来启动yarn和hdfs集群的

这几个start开头的脚本都是通过调用那两个daemon脚本来启动的。

2、 脚本分析

这里先从start-all.sh开始分析,然后逐步分析其脚本的调用。

start-all.sh脚本内容如下:


#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional infORMation regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     Http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language Governing permissions and
# limitations under the License.


# Start all hadoop daemons.  Run this on master node.

echo "This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh"

bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

# start hdfs daemons if hdfs is present
if [ -f "${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh ]; then
  "${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh --config $HADOOP_CONF_DIR
fi

# start yarn daemons if yarn is present
if [ -f "${HADOOP_YARN_HOME}"/sbin/start-yarn.sh ]; then
  "${HADOOP_YARN_HOME}"/sbin/start-yarn.sh --config $HADOOP_CONF_DIR
fi

这个脚本的重点在第31行到末尾,这里是两个if语句,第一个if语句里(第34行)调用的是start-dfs.sh脚本,第二个if语句里(第37行)调用的是start-yarn.sh。

然后我们以start-dfs.sh为例,继续向下分析。

start-dfs.sh的内容如下:


#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


# Start hadoop dfs daemons.
# Optinally upgrade or rollback dfs state.
# Run this on master node.

usage="Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]"

bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

# get arguments
if [[ $# -ge 1 ]]; then
  startOpt="$1"
  shift
  case "$startOpt" in
    -upgrade)
      nameStartOpt="$startOpt"
    ;;
    -rollback)
      dataStartOpt="$startOpt"
    ;;
    *)
      echo $usage
      exit 1
    ;;
  esac
fi

#Add other possible options
nameStartOpt="$nameStartOpt $@"

#---------------------------------------------------------
# namenodes

NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)

echo "Starting namenodes on [$NAMENODES]"

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
  --config "$HADOOP_CONF_DIR" \
  --hostnames "$NAMENODES" \
  --script "$bin/hdfs" start namenode $nameStartOpt

#---------------------------------------------------------
# datanodes (using default slaves file)

if [ -n "$HADOOP_SECURE_DN_USER" ]; then
  echo \
    "Attempting to start secure cluster, skipping datanodes. " \
    "Run start-secure-dns.sh as root to complete startup."
else
  "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
    --config "$HADOOP_CONF_DIR" \
    --script "$bin/hdfs" start datanode $dataStartOpt
fi

#---------------------------------------------------------
# secondary namenodes (if any)

SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null)

if [ -n "$SECONDARY_NAMENODES" ]; then
  echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"

  "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
      --config "$HADOOP_CONF_DIR" \
      --hostnames "$SECONDARY_NAMENODES" \
      --script "$bin/hdfs" start secondarynamenode
fi

#---------------------------------------------------------
# quorumjournal nodes (if any)

SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-)

case "$SHARED_EDITS_DIR" in
qjournal://g')
  echo "Starting journal nodes [$JOURNAL_NODES]"
  "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
      --config "$HADOOP_CONF_DIR" \
      --hostnames "$JOURNAL_NODES" \
      --script "$bin/hdfs" start journalnode ;;
esac

#---------------------------------------------------------
# ZK Failover controllers, if auto-HA is enabled
AUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled)
if [ "$(echo "$AUTOHA_ENABLED" | tr A-Z a-z)" = "true" ]; then
  echo "Starting ZK Failover Controllers on NN hosts [$NAMENODES]"
  "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
    --config "$HADOOP_CONF_DIR" \
    --hostnames "$NAMENODES" \
    --script "$bin/hdfs" start zkfc
fi

# eof

 首先是第23行到30行,这里是在处理hadoop的路径。

然后是第32行到51行是在处理脚本传入的参数。最后就启动hdfs的各个角色了。

首先启动的是namenode,在第60行到63行,这段代码是调用了hadoop-daemons.sh脚本来启动,这个脚本和之前提到的hadoop-daemon.sh脚本的区别在于这个脚本可以在集群的其他机器上启动该启动的角色,而hadoop-daemon.sh只能启动当前机器上的角色。其实hadoop-daemons.sh也是通过调用hadoop-daemon.sh来启动的,这个稍后再分析。这个脚本还有几个参数,其中最重要的是:start namenode。它表示启动namenode。

然后是第65行到第76行,这里是在启动datanode,启动的方式和namenode相同。启动DataNode的代码在第73行。

然后是第78行到第90行,这里在启动secondarynamenode。如果是配置了namenode的高可用,secondarynamenode便不会启动。

然后是第92行到第105行,这里在启动journalnode。如果配置了namenode的高可用,journalnode才会启动。

最后是第107行到末尾,这里启动的是zkfc。同样这个也是要配置高可用才会启动。

如果按照文档(2)中配置的高可用来看,这里启动的角色应该为:namenode、datanode、journalnode、zkfc。

启动上述角色调用的hadoop-daemons.sh脚本内容如下:


#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Run a Hadoop command on all slave hosts.
usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."
# if no args specified, show usage
if [ $# -le 1 ]; then
  echo $usage
  exit 1
fi
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

这个脚本的重点在最后一行。这里有两个脚本slaves.sh和hadoop-daemon.sh。slaves.sh脚本会使用ssh登录到指定的服务器中然后执行其中的hadoop-daemon.sh脚本。这里就不分析slaves.sh脚本了。

我们继续看hadoop-daemon.sh脚本。

其内容如下:


#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


# Runs a Hadoop command as a daemon.
#
# Environment Variables
#
#   HADOOP_CONF_DIR  Alternate conf dir. Default is ${HADOOP_PREFIX}/conf.
#   HADOOP_LOG_DIR   Where log files are stored.  PWD by default.
#   HADOOP_MASTER    host:path where hadoop code should be rsync'd from
#   HADOOP_PID_DIR   The pid files are stored. /tmp by default.
#   HADOOP_IDENT_STRING   A string representing this instance of hadoop. $USER by default
#   HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.
##

usage="Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>"

# if no args specified, show usage
if [ $# -le 1 ]; then
  echo $usage
  exit 1
fi

bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

# get arguments

#default value
hadoopScript="$HADOOP_PREFIX"/bin/hadoop
if [ "--script" = "$1" ]
  then
    shift
    hadoopScript=$1
    shift
fi
startStop=$1
shift
command=$1
shift

hadoop_rotate_log ()
{
    log=$1;
    num=5;
    if [ -n "$2" ]; then
	num=$2
    fi
    if [ -f "$log" ]; then # rotate logs
	while [ $num -gt 1 ]; do
	    prev=`expr $num - 1`
	    [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"
	    num=$prev
	done
	mv "$log" "$log.$num";
    fi
}

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
  . "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables
if [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then
  export HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
  export HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
  export HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
  starting_secure_dn="true"
fi

#Determine if we're starting a privileged NFS, if so, redefine the appropriate variables
if [ "$command" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then
    export HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR
    export HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR
    export HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER
    starting_privileged_nfs="true"
fi

if [ "$HADOOP_IDENT_STRING" = "" ]; then
  export HADOOP_IDENT_STRING="$USER"
fi


# get log directory
if [ "$HADOOP_LOG_DIR" = "" ]; then
  export HADOOP_LOG_DIR="$HADOOP_PREFIX/logs"
fi

if [ ! -w "$HADOOP_LOG_DIR" ] ; then
  mkdir -p "$HADOOP_LOG_DIR"
  chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR
fi

if [ "$HADOOP_PID_DIR" = "" ]; then
  HADOOP_PID_DIR=/tmp
fi

# some variables
export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log
export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"}
export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"}
export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"}
log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out
pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid
HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}

# Set default scheduling priority
if [ "$HADOOP_NICENESS" = "" ]; then
    export HADOOP_NICENESS=0
fi

case $startStop in

  (start)

    [ -w "$HADOOP_PID_DIR" ] ||  mkdir -p "$HADOOP_PID_DIR"

    if [ -f $pid ]; then
      if kill -0 `cat $pid` > /dev/null 2>&1; then
        echo $command running as process `cat $pid`.  Stop it first.
        exit 1
      fi
    fi

    if [ "$HADOOP_MASTER" != "" ]; then
      echo rsync from $HADOOP_MASTER
      rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_PREFIX"
    fi

    hadoop_rotate_log $log
    echo starting $command, logging to $log
    cd "$HADOOP_PREFIX"
    case $command in
      namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)
        if [ -z "$HADOOP_HDFS_HOME" ]; then
          hdfsScript="$HADOOP_PREFIX"/bin/hdfs
        else
          hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs
        fi
        nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
      ;;
      (*)
        nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
      ;;
    esac
    echo $! > $pid
    sleep 1
    head "$log"
    # capture the ulimit output
    if [ "true" = "$starting_secure_dn" ]; then
      echo "ulimit -a for secure datanode user $HADOOP_SECURE_DN_USER" >> $log
      # capture the ulimit info for the appropriate user
      su --shell=/bin/bash $HADOOP_SECURE_DN_USER -c 'ulimit -a' >> $log 2>&1
    elif [ "true" = "$starting_privileged_nfs" ]; then
        echo "ulimit -a for privileged nfs user $HADOOP_PRIVILEGED_NFS_USER" >> $log
        su --shell=/bin/bash $HADOOP_PRIVILEGED_NFS_USER -c 'ulimit -a' >> $log 2>&1
    else
      echo "ulimit -a for user $USER" >> $log
      ulimit -a >> $log 2>&1
    fi
    sleep 3;
    if ! ps -p $! > /dev/null ; then
      exit 1
    fi
    ;;

  (stop)

    if [ -f $pid ]; then
      TARGET_PID=`cat $pid`
      if kill -0 $TARGET_PID > /dev/null 2>&1; then
        echo stopping $command
        kill $TARGET_PID
        sleep $HADOOP_STOP_TIMEOUT
        if kill -0 $TARGET_PID > /dev/null 2>&1; then
          echo "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9"
          kill -9 $TARGET_PID
        fi
      else
        echo no $command to stop
      fi
      rm -f $pid
    else
      echo no $command to stop
    fi
    ;;

  (*)
    echo $usage
    exit 1
    ;;

esac

这段代码的重点在第131行到结束。这里是真正在启动服务的代码,这个文件在调用的时候,会传入两个重要的参数start/stop xxx。用于启动或停止某些服务。以启动服务为例,其重点在第153行,这里会执行一个hdfsScript脚本。这个参数的定义在第155行,

这里可以看见它实际是hadoop的bin目录下的hdfs文件

文件的内容如下:


#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Environment Variables
#
#   jsVC_HOME  home directory of jsvc binary.  Required for starting secure
#              datanode.
#
#   JSVC_OUTFILE  path to jsvc output file.  Defaults to
#                 $HADOOP_LOG_DIR/jsvc.out.
#
#   JSVC_ERRFILE  path to jsvc error file.  Defaults to $HADOOP_LOG_DIR/jsvc.err.

bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin" > /dev/null; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

function print_usage(){
  echo "Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND"
  echo "       where COMMAND is one of:"
  echo "  dfs                  run a filesystem command on the file systems supported in Hadoop."
  echo "  classpath            prints the classpath"
  echo "  namenode -format     format the DFS filesystem"
  echo "  secondarynamenode    run the DFS secondary namenode"
  echo "  namenode             run the DFS namenode"
  echo "  journalnode          run the DFS journalnode"
  echo "  zkfc                 run the ZK Failover Controller daemon"
  echo "  datanode             run a DFS datanode"
  echo "  dfsadmin             run a DFS admin client"
  echo "  haadmin              run a DFS HA admin client"
  echo "  fsck                 run a DFS filesystem checking utility"
  echo "  balancer             run a cluster balancing utility"
  echo "  jmxget               get JMX exported values from NameNode or DataNode."
  echo "  mover                run a utility to move block replicas across"
  echo "                       storage types"
  echo "  oiv                  apply the offline fsimage viewer to an fsimage"
  echo "  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage"
  echo "  oev                  apply the offline edits viewer to an edits file"
  echo "  fetchdt              fetch a delegation token from the NameNode"
  echo "  getconf              get config values from configuration"
  echo "  groups               get the groups which users belong to"
  echo "  snapshotDiff         diff two snapshots of a directory or diff the"
  echo "                       current directory contents with a snapshot"
  echo "  lsSnapshottableDir   list all snapshottable dirs owned by the current user"
  echo "						Use -help to see options"
  echo "  portmap              run a portmap service"
  echo "  nfs3                 run an NFS version 3 gateway"
  echo "  cacheadmin           configure the HDFS cache"
  echo "  crypto               configure HDFS encryption zones"
  echo "  storagepolicies      list/get/set block storage policies"
  echo "  version              print the version"
  echo ""
  echo "Most commands print help when invoked w/o parameters."
  # There are also debug commands, but they don't show up in this listing.
}

if [ $# = 0 ]; then
  print_usage
  exit
fi

COMMAND=$1
shift

case $COMMAND in
  # usage flags
  --help|-help|-h)
    print_usage
    exit
    ;;
esac

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables
if [ "$COMMAND" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then
  if [ -n "$JSVC_HOME" ]; then
    if [ -n "$HADOOP_SECURE_DN_PID_DIR" ]; then
      HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
    fi

    if [ -n "$HADOOP_SECURE_DN_LOG_DIR" ]; then
      HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
      HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"
    fi

    HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
    HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"
    starting_secure_dn="true"
  else
    echo "It looks like you're trying to start a secure DN, but \$JSVC_HOME"\
      "isn't set. Falling back to starting insecure DN."
  fi
fi

# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variables
if [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then
  if [ -n "$JSVC_HOME" ]; then
    if [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; then
      HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR
    fi

    if [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; then
      HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR
      HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"
    fi

    HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER
    HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"
    starting_privileged_nfs="true"
  else
    echo "It looks like you're trying to start a privileged NFS server, but"\
      "\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server."
  fi
fi

if [ "$COMMAND" = "namenode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
elif [ "$COMMAND" = "zkfc" ] ; then
  CLASS='org.apache.hadoop.hdfs.tools.DFSZKFailoverController'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_ZKFC_OPTS"
elif [ "$COMMAND" = "secondarynamenode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS"
elif [ "$COMMAND" = "datanode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
  if [ "$starting_secure_dn" = "true" ]; then
    HADOOP_OPTS="$HADOOP_OPTS -JVM server $HADOOP_DATANODE_OPTS"
  else
    HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
  fi
elif [ "$COMMAND" = "journalnode" ] ; then
  CLASS='org.apache.hadoop.hdfs.qjournal.server.JournalNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOURNALNODE_OPTS"
elif [ "$COMMAND" = "dfs" ] ; then
  CLASS=org.apache.hadoop.fs.FsShell
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfsadmin" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "haadmin" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSHAAdmin
  CLASSPATH=${CLASSPATH}:${TOOL_PATH}
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "fsck" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DFSck
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "balancer" ] ; then
  CLASS=org.apache.hadoop.hdfs.server.balancer.Balancer
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_BALANCER_OPTS"
elif [ "$COMMAND" = "mover" ] ; then
  CLASS=org.apache.hadoop.hdfs.server.mover.Mover
  HADOOP_OPTS="${HADOOP_OPTS} ${HADOOP_MOVER_OPTS}"
elif [ "$COMMAND" = "storagepolicies" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.StoragePolicyAdmin
elif [ "$COMMAND" = "jmxget" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.JMXGet
elif [ "$COMMAND" = "oiv" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB
elif [ "$COMMAND" = "oiv_legacy" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer
elif [ "$COMMAND" = "oev" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer
elif [ "$COMMAND" = "fetchdt" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcher
elif [ "$COMMAND" = "getconf" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.GetConf
elif [ "$COMMAND" = "groups" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.GetGroups
elif [ "$COMMAND" = "snapshotDiff" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.snapshot.SnapshotDiff
elif [ "$COMMAND" = "lsSnapshottableDir" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.snapshot.LsSnapshottableDir
elif [ "$COMMAND" = "portmap" ] ; then
  CLASS=org.apache.hadoop.portmap.Portmap
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_PORTMAP_OPTS"
elif [ "$COMMAND" = "nfs3" ] ; then
  CLASS=org.apache.hadoop.hdfs.nfs.nfs3.Nfs3
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NFS3_OPTS"
elif [ "$COMMAND" = "cacheadmin" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.CacheAdmin
elif [ "$COMMAND" = "crypto" ] ; then
  CLASS=org.apache.hadoop.hdfs.tools.CryptoAdmin
elif [ "$COMMAND" = "version" ] ; then
  CLASS=org.apache.hadoop.util.VersionInfo
elif [ "$COMMAND" = "debug" ]; then
  CLASS=org.apache.hadoop.hdfs.tools.DebugAdmin
elif [ "$COMMAND" = "classpath" ]; then
  if [ "$#" -gt 0 ]; then
    CLASS=org.apache.hadoop.util.Classpath
  else
    # No need to bother starting up a JVM for this simple case.
    if $cygwin; then
      CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)
    fi
    echo $CLASSPATH
    exit 0
  fi
else
  CLASS="$COMMAND"
fi

# cygwin path translation
if $cygwin; then
  CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)
  HADOOP_LOG_DIR=$(cygpath -w "$HADOOP_LOG_DIR" 2>/dev/null)
  HADOOP_PREFIX=$(cygpath -w "$HADOOP_PREFIX" 2>/dev/null)
  HADOOP_CONF_DIR=$(cygpath -w "$HADOOP_CONF_DIR" 2>/dev/null)
  HADOOP_COMMON_HOME=$(cygpath -w "$HADOOP_COMMON_HOME" 2>/dev/null)
  HADOOP_HDFS_HOME=$(cygpath -w "$HADOOP_HDFS_HOME" 2>/dev/null)
  HADOOP_YARN_HOME=$(cygpath -w "$HADOOP_YARN_HOME" 2>/dev/null)
  HADOOP_MAPRED_HOME=$(cygpath -w "$HADOOP_MAPRED_HOME" 2>/dev/null)
fi

export CLASSPATH=$CLASSPATH

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"

# Check to see if we should start a secure datanode
if [ "$starting_secure_dn" = "true" ]; then
  if [ "$HADOOP_PID_DIR" = "" ]; then
    HADOOP_SECURE_DN_PID="/tmp/hadoop_secure_dn.pid"
  else
    HADOOP_SECURE_DN_PID="$HADOOP_PID_DIR/hadoop_secure_dn.pid"
  fi

  JSVC=$JSVC_HOME/jsvc
  if [ ! -f $JSVC ]; then
    echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run secure datanodes. "
    echo "Please download and install jsvc from http://arcHive.apache.org/dist/commons/daemon/binaries/ "\
      "and set JSVC_HOME to the directory containing the jsvc binary."
    exit
  fi

  if [[ ! $JSVC_OUTFILE ]]; then
    JSVC_OUTFILE="$HADOOP_LOG_DIR/jsvc.out"
  fi

  if [[ ! $JSVC_ERRFILE ]]; then
    JSVC_ERRFILE="$HADOOP_LOG_DIR/jsvc.err"
  fi

  exec "$JSVC" \
           -Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \
           -errfile "$JSVC_ERRFILE" \
           -pidfile "$HADOOP_SECURE_DN_PID" \
           -nodetach \
           -user "$HADOOP_SECURE_DN_USER" \
            -cp "$CLASSPATH" \
           $JAVA_HEAP_MAX $HADOOP_OPTS \
           org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"
elif [ "$starting_privileged_nfs" = "true" ] ; then
  if [ "$HADOOP_PID_DIR" = "" ]; then
    HADOOP_PRIVILEGED_NFS_PID="/tmp/hadoop_privileged_nfs3.pid"
  else
    HADOOP_PRIVILEGED_NFS_PID="$HADOOP_PID_DIR/hadoop_privileged_nfs3.pid"
  fi

  JSVC=$JSVC_HOME/jsvc
  if [ ! -f $JSVC ]; then
    echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run privileged NFS gateways. "
    echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ "\
      "and set JSVC_HOME to the directory containing the jsvc binary."
    exit
  fi

  if [[ ! $JSVC_OUTFILE ]]; then
    JSVC_OUTFILE="$HADOOP_LOG_DIR/nfs3_jsvc.out"
  fi

  if [[ ! $JSVC_ERRFILE ]]; then
    JSVC_ERRFILE="$HADOOP_LOG_DIR/nfs3_jsvc.err"
  fi

  exec "$JSVC" \
           -Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \
           -errfile "$JSVC_ERRFILE" \
           -pidfile "$HADOOP_PRIVILEGED_NFS_PID" \
           -nodetach \
           -user "$HADOOP_PRIVILEGED_NFS_USER" \
           -cp "$CLASSPATH" \
           $JAVA_HEAP_MAX $HADOOP_OPTS \
           org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter "$@"
else
  # run it
  exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
fi

这段代码的重点在第134行到219行和第304行。首先看第134行到219行,这段代码虽然很长但全是一堆if else语句。其逻辑也很简单,就是根据传入的COMMAND的值来为CLASS和HADOOP_OPTS赋值。然后是第304行,执行CLASS类。以第134行的namenode类为例,这里CLASS的值为org.apache.hadoop.hdfs.server.namenode.NameNode,这是一个java类随后便会执行这个类启动namenode。

如果想要debug hdfs的源码,最好在这里设置远程调试。因为这里有单个的服务类与启动参数,可以准确的定位需要的服务。

以上就是Hadoop源码分析之启动及脚本剖析的详细内容,本系列下一篇文章传送门Hadoop源码分析四远程debug调试更多关于Hadoop源码分析分析的资料请继续关注编程网其它相关文章!

--结束END--

本文标题: Hadoop源码分析三启动及脚本剖析

本文链接: https://lsjlt.com/news/134461.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com    QQ/279061341

猜你喜欢
  • Hadoop源码分析三启动及脚本剖析
    目录1、 启动2、 脚本分析start-all.sh脚本内容如下:start-dfs.sh的内容如下:启动上述角色调用的hadoop-daemons.sh脚本内容如下:我们继续看ha...
    99+
    2024-04-02
  • Hadoop源码分析五hdfs架构原理剖析
    目录1、 hdfs架构如果在hadoop配置时写的配置文件不同,启动的服务也有所区别namenode的下方是三台datanode。namenode左右两边的是两个zkfc。namen...
    99+
    2024-04-02
  • Hadoop源码分析六启动文件namenode原理详解
    1、 namenode启动 在本系列文章三中分析了hadoop的启动文件,其中提到了namenode启动的时候调用的类为 org.apache.hadoop.hdfs.server...
    99+
    2024-04-02
  • Spring启动过程源码分析及简介
    目录1、BeanDefinition2、beanFactory3、BeanDefinitionReader4、ClassPathBeanDefinitionScanner5、Cond...
    99+
    2024-04-02
  • Linux启动脚本的示例分析
    这篇文章主要介绍Linux启动脚本的示例分析,文中介绍的非常详细,具有一定的参考价值,感兴趣的小伙伴们一定要看完!redhat的启动方式和执行次序是:加载内核执行init程序/etc/rc.d/rc.sysinit # 由init执行的**...
    99+
    2023-06-17
  • ContentProvider启动流程源码分析
    本文小编为大家详细介绍“ContentProvider启动流程源码分析”,内容详细,步骤清晰,细节处理妥当,希望这篇“ContentProvider启动流程源码分析”文章能帮助大家解决疑惑,下面跟着小编的思路慢慢深入,一起来学习新知识吧。C...
    99+
    2023-07-05
  • python源码剖析之PyObject的示例分析
    这篇文章主要介绍python源码剖析之PyObject的示例分析,文中介绍的非常详细,具有一定的参考价值,感兴趣的小伙伴们一定要看完!一、Python中的对象Python中一切皆是对象。————Guido van Rossum(1989)这...
    99+
    2023-06-15
  • SpringBoot启动流程SpringApplication源码分析
    这篇“SpringBoot启动流程SpringApplication源码分析”文章的知识点大部分人都不太理解,所以小编给大家总结了以下内容,内容详细,步骤清晰,具有一定的借鉴价值,希望大家阅读完这篇文章能有所收获,下面我们一起来看看这篇“S...
    99+
    2023-07-05
  • Netty源码分析NioEventLoop线程的启动
    目录NioEventLoop开启方法跟进 inEventLoop()方法跟一下addTask(task)回顾一下初始构造方法我们跟进doStartThread()方法中回顾...
    99+
    2024-04-02
  • Spring源码分析容器启动流程
    目录前言源码解析1、初始化流程流程分析核心代码剖析2、刷新流程流程分析核心代码剖析前言 本文基于 Spring 的 5.1.6.RELEASE 版本 Spring的启动流程可以归纳为...
    99+
    2024-04-02
  • Android okhttp的启动流程及源码解析
    目录前言 什么是OKhttp OkHttp是如何做网络请求的 1.它是如何使用的? 1.1 通过构造者模式添加 url,method,header,body 等完成一个请求的信息 R...
    99+
    2024-04-02
  • Hadoop源码分析四远程debug调试
    1、 hadoop远程debug 从文档(3)中可以知道hadoop启动服务的时候最终都是通过java命令来启动的,其本质是一个java程序。在研究源码的时候debug是一种很重要的...
    99+
    2024-04-02
  • 【SpringBoot3.0源码】启动流程源码解析 • 上
    文章目录 初始化 SpringBoot启动类: @SpringBootApplicationpublic class AppRun { public static void main(String[] args...
    99+
    2023-08-23
    spring boot java spring
  • 详解SpringBoot启动代码和自动装配源码分析
    目录一、SpringBoot启动代码主线分析二、SpringBoot自动装配原理分析1.自动装配的前置知识@Import2.@SpringApplication注解分析2.1@Spr...
    99+
    2024-04-02
  • Linux下启动伪分布式HADOOP与MySQL命令及脚本是什么
    这篇文章主要介绍“Linux下启动伪分布式HADOOP与MySQL命令及脚本是什么”,在日常操作中,相信很多人在Linux下启动伪分布式HADOOP与MySQL命令及脚本是什么问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希...
    99+
    2023-06-03
  • Vite的createServer启动源码解析
    目录启动Vite的createServer通过vite3安装一个vue的工程添加断点并开启调试边调试边理解代码启动Vite的createServer 为了能够了解vite里面运行了什...
    99+
    2024-04-02
  • Hadoop源码分析一架构关系简介
    1、 简介 Hadoop是一个由Apache基金会所开发的分布式系统基础架构 Hadoop起源于谷歌发布的三篇论文:GFS、MapReduce、BigTable。其中GFS是谷歌的分...
    99+
    2024-04-02
  • 怎么用Hadoop源码分析心跳机制
    这篇文章将为大家详细讲解有关怎么用Hadoop源码分析心跳机制,文章内容质量较高,因此小编分享给大家做个参考,希望大家阅读完这篇文章后对相关知识有一定的了解。一.心跳机制 hadoop集群是master/slave模式,master包括Na...
    99+
    2023-06-17
  • [图解]Android源码分析——Activity的启动过程
    Activity的启动过程一.Launcher进程请求AMSLauncher.java的startActivitySafely方法的执行过程:A...
    99+
    2022-06-06
    启动 activity Android
  • RocketMQ之Consumer整体介绍启动源码分析
    目录前言Consumer整体介绍Consumer实现类Consumer消费类型DefaultMQPushConsumer主要APIDefaultMQPushConsumer关键属性C...
    99+
    2023-05-19
    RocketMQ Consumer源码解析 RocketMQ Consumer
软考高级职称资格查询
编程网,编程工程师的家园,是目前国内优秀的开源技术社区之一,形成了由开源软件库、代码分享、资讯、协作翻译、讨论区和博客等几大频道内容,为IT开发者提供了一个发现、使用、并交流开源技术的平台。
  • 官方手机版

  • 微信公众号

  • 商务合作