CSE 124: Networked Services

Files


hadoop/
       bin/
           hadoop
           start-all.sh
           stop-all.sh
       conf/
            mapred-default.xml
            hadoop-default.xml
            hadoop-site.xml
            slaves

Purpose

  • hadoop/bin/hadoop

    Usage:

    hadoop [--config confdir] command
    Where command is one of:
    namenode -format Format the NameNode
    namenode Run a NameNode
    datanode Run a DataNode
    dfs Run a DFS admin client
    fsck Run a DFS filesystem checking utility
    jobtracker Run a MapReduce JobTracker node
    tasktracker Run a MapReduce TaskTracker
    job Manipulate MapReduce jobs
    -status jobId Display the status of MapReduce job jobId. Where jobId is of the form job_####
    -kill jobId Kill the status of MapReduce job jobId. Where jobId is of the form job_####
    jar jar [MainClass] [args...] Run a jar file. If MainClass is specified, run it, otherwise run the class specififed by the manifest, with args args
    distcp srcurl desturl Copy a file or directories recursively
    MainClass [args...] Run the class named MainClass with the args args
    distcp srcurl desturl Copy a file or directories recursively
  • hadoop/bin/start-all.sh

    Starts the NameNode and JobTracker locally and the DataNode and TaskTracker on all the machines listed in conf/slaves.

  • hadoop/bin/stop-all.sh

    Stops the NameNode and JobTracker which are running locally and the DataNode and TaskTracker which are running on all the machines listed in conf/slaves.

  • hadoop/conf/mapred-default.xml

    Contains default settings for map reduce jobs. There should be no need to edit this file.

  • hadoop/conf/hadoop-default.xml

    Contains default settings for hadoop jobs. There should be no need to edit this file.

  • hadoop/conf/hadoop-site.xml

    This configuration file is processed after conf/hadoop-default.xml. Any property elements here override the property element in conf/hadoop-default.xml with the same name element. Any setting of property elements should be done in this file instead of conf/hadoop-default.xml.

    The structure is the same as conf/hadoop-default.xml. It is an xml document with top-level element configuration containing many property elements. Each property contains a name, a value, and an optional description, in that order. HTML-style comments may also be included (<!-- this is a comment -->).

    If using multiple JobTracker on one machine, make sure that you override the listening ports (like the webserver or tasktracker listener), as two processes can't listen on the same port.

  • hadoop/conf/slaves

    A newline-delimited list of the names of other machines on which to start DataNodes and TaskTrackers. Used by the bin/start-all.sh and bin/stop-all.sh scripts.


Log Files

Log files are of the following form
        logs/hadoop-$USER-task-$HOST.(out|log)
Where task is one of the following
  • namenode
  • datanode
  • jobtracker
  • tasktracker

Status Pages

Note that these port numbers are configurable, in case you want to run multiple JobTrackers, TaskTrackers, NameNodes, or DataNodes per machine. Port numbers provided are specified in conf/hadoop-default.xml

  • http://job_tracker_machine_name:50030/jobdetails.jsp?jobid=job_id

    A status page for the JobTracker.

    This page will give you the status of all your jobs as well as snippets of error logs for debugging purposes. Turn here if you get something like the nondescript "JobFailed" from your invocation command-line.

  • http://task_tracker_machine_name:50060/

    A status page for the TaskTracker.

    This page will give details on which tasks the given node is working, as well as access to the logs directory. Note that this logs directory is simply the directory logs. If you have not redirected output to files in this directory (as is done by the bin/start-all.sh script), then you will have just logs describing the history of submitted jobs in the logs/history directory.

  • http://name_node_machine_name:50070/

    A status page for the NamenodeNode.

    This page will give you the status of nodes in the DFS cluster. You can see how much space the entire file system is using, as well as how much each machine in the cluster is using.

  • http://data_node_machine_name:50070/

    A status page for the DataNode.

    This page will give you the status of nodes in the DFS cluster. You can see how much space the entire file system is using, as well as how much each machine in the cluster is using.


<- Hadoop Intro.