CSE 124: Networked Services


Hadoop is a complicated system and if you're not used to the error messages you'll be getting, they won't make much sense. This page is intended to address some of the more common issues faced by beginners. Please feel free to let the course instructor and TAs know if there's another issue which stumped you and you would like to see documented here.


  • How do I log anything in Hadoop? Where does the output go?

    The easiest solution is to use System.out.println for debugging. The output will get logged to a file called stdout under logs/userlogs/task_id, where task_id is the identifier for that particular task. The web interface for your job tracker can help you figure out which machine to inspect the logs at, and the relevant task identifier. It is usually easiest to view the logs directly from the Hadoop webserver on the JobTracker: http://job_tracker_machine_name:50030.

  • I keep getting the "df" error below. What does it mean?

                2007-11-04 13:13:33,869 WARN org.apache.hadoop.mapred.TaskTracker:
                running child
                java.io.IOException: Cannot run program "df": java.io.IOException:
                Cannot allocate memory
                        at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
                        at java.lang.Runtime.exec(Runtime.java:593)
                        at java.lang.Runtime.exec(Runtime.java:466)
                        at org.apache.hadoop.fs.DF.doDF(DF.java:60)
                        at org.apache.hadoop.fs.DF.(DF.java:53)

    There are several reasons why you might be getting this error. Here are some of the things you can try:

    • First, try your code on a small input set. For small input sets, memory should NEVER be a bottleneck. Once you have established the correctness of your basic code, you can work on optimizations to get around memory bottlenecks.
    • If you see this error during the map phase, try reducing the size of your keys and values. For instance, instead of using the full path to a file, you may want to just use the file name (the final segment of the absolute path).
    • If you see this error during the reduce phase, try increasing the number of reduce tasks to 2 or 4 (use jobconf.setNumReduceTasks)
  • Help! My MapReduce failed! What do I do?

    Visit http://JobTracker_Machine_Name:50030/jobtracker.jsp. Locate the MapReduce job which failed (under Failed Jobs). Clicking on the job name should bring you to a page with an overview and links to failure pages. From the failure pages you should be able to see which tasks failed and for what reason. Sometimes no reason will be given. Usually this just means the process failed before it could hand off the exception and notify the JobTracker. Don't worry, likely the next exception will be the same one as before, and one of them will be listed at some point. The exception should provide a stack trace. You can use this to see where your task failed.

    Alternatively, you can search the log files. Look at the JobTracker page for this job to see which machines failed, and check the logs on that machine. There are a lot of log files, so you'll probably want to check the date to see which were updated around the time of failure. If you started your tasks manually, and didn't redirect your errors, they will be listed on stdout, instead of files in the logs directory. You should find a stack trace with the exception there, even if it didn't show up on the JobTracker page.

  • I've got a non-descript error! (i.e. "JobFailed" was all the description I was given)

    If you get an error and your MapReduce job is unable to continue, the process which invoked the job (bin/hadoop classname) will report something like:

    06/09/31 10:42:56 INFO mapred.JobClient:  map 100%  reduce 100%
    Exception in thread "main" java.io.IOException: Job failed!
            at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
            at edu.ucsd.cs.fa07.cse124.proj2.examples.MainClass.main(MainClass.java:355)

    This is just the JobTracker's "clean" exit. It means your job failed and continued to fail after multiple restarts. You should check the log files to see which error specifically was the culprit. You should find a stack trace. See above.

  • I keep getting the error below. What does it mean?
    java.io.IOException: wrong value class: blah_blah_as_string is not class blah_blah
            at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:171)
            at org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:147)
            at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
            at org.apache.hadoop.mapred.CombiningCollector.flush(CombiningCollector.java:79)
            at org.apache.hadoop.mapred.CombiningCollector.collect(CombiningCollector.java:71)
            at edu.ucsd.cs.fa07.cse124.proj2.examples.Spellcheck$MapSpellcheck.map(Spellcheck.java:149)
            at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:196)
            at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1075)

    This error isn't in your code, so how could you have caused it? This particular error is from setting an incompatible class for JobConf.setOutputValueClass(class_name). Similar errors an arrise with any of the other set*Class methods. Check your code to make sure that your Map and Reduce classes are returning the proper objects as specified by your set*Class functions. Remember, the classes for {key, value} must implement {WritableComparable, Writable}, respectively.

  • What is a .() function? I didn't define anything like this, why is it trying to use a function I didn't write or call?
    java.lang.RuntimeException: java.lang.NoSuchMethodException: com.google.hadoop.examples.Simple$MyMapper.()
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:45)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:32)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:53)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1210)
    Caused by: java.lang.NoSuchMethodException: com.google.hadoop.examples.Simple$MyMapper.()
        at java.lang.Class.getConstructor0(Class.java:2705)
        at java.lang.Class.getDeclaredConstructor(Class.java:1984)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:41)
        ... 4 more

    This is actually the <init>() function. The display on the web page doesn't translate into html, but dumps plain text, so <init> is treated as a (nonexistant) tag by your browser. This function is created as a default initializer for non-static classes. This is most likely caused by having a non-static Mapper or Reducer class. Try adding the static keyword to your class declaration, ie:

    public static class MyMapper extends MapReduceBase implements Mapper {...}

  • bin/hadoop dfs -put <input path> <output path> or bin/hadoop dfs jar path/to/jar.jar class_name keep failing with something similar to the following error:
    put: java.io.IOException: Failed to create file file_name on client namenode_host because this cluster has no datanodes

    Ensure that you've run bin/start-all.sh. If you haven't, run bin/stop-all.sh then run bin/start-all.sh to make sure you have all the necessary daemons running.

    If you're still having problems because the DataNodes don't last long before terminating themselves, This is because your DataNodes have crashed. The most likely cause is that the DataNodes have gotten confused as to the version of the Distributed File System is running. It is unknown why this happens, but the only fix at this time is to delete the file system and recreate it (reformatting with bin/hadoop namenode -format doesn't seem to fix it). You'll have to delete /mnt/hadoop/dfs, then run bin/hadoop namenode -format again before restarting everything and trying again.

Status Pages

Note that these port numbers are configurable, in case you want to run multiple JobTrackers, TaskTrackers, NameNodes, or DataNodes per machine. Port numbers provided are specified in conf/hadoop-default.xml

  • http://job_tracker_machine_name:50030/jobdetails.jsp?jobid=job_id

    A status page for the JobTracker.

    This page will give you the status of all your jobs as well as snippets of error logs for debugging purposes. Turn here if you get something like the nondescript "JobFailed" from your invocation command-line.

  • http://task_tracker_machine_name:50060/

    A status page for the TaskTracker.

    This page will give details on which tasks the given node is working, as well as access to the logs directory. Note that this logs directory is simply the directory logs. If you have not redirected output to files in this directory (as is done by the bin/start-all.sh script), then you will have just logs describing the history of submitted jobs in the logs/history directory.

  • http://name_node_machine_name:50070/

    A status page for the NamenodeNode.

    This page will give you the status of nodes in the DFS cluster. You can see how much space the entire file system is using, as well as how much each machine in the cluster is using.

  • http://data_node_machine_name:50070/

    A status page for the DataNode.

    This page will give you the status of nodes in the DFS cluster. You can see how much space the entire file system is using, as well as how much each machine in the cluster is using.

<- Hadoop Intro.