Major bug reported by Aditya Acharya and fixed by Aditya Acharya (client)<br>
<b>Introduce timeout for async polling operations in YarnClientImpl</b><br>
<blockquote>I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was "Watiting for application application_1389036507624_0018 to be killed."
The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated.
I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer</b><br>
<blockquote>This can occur when the second-to-last app in a queue's pending app list is made runnable. The app is pulled out from under the iterator. </blockquote></li>
Major bug reported by Aditya Acharya and fixed by Aditya Acharya (scheduler)<br>
<b>QueuePlacementPolicy format is not easily readable via a JAXB parser</b><br>
<blockquote>The current format for specifying queue placement rules in the fair scheduler allocations file does not lend itself to easy parsing via a JAXB parser. In particular, relying on the tag name to encode information about which rule to use makes it very difficult for an xsd-based JAXB parser to preserve the order of the rules, which is essential.</blockquote></li>
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Fix invalid RMApp transition from NEW to FINAL_SAVING</b><br>
<blockquote>YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing.
Previous description:
ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update.</blockquote></li>
Trivial improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>RMFatalEventDispatcher should log the cause of the event</b><br>
<blockquote>RMFatalEventDispatcher#handle() logs the receipt of an event and its type, but leaves out the cause. The cause captures why the event was raised and would help debugging issues. </blockquote></li>
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
<b>3rd party JARs are missing from hadoop-dist output</b><br>
<blockquote>With the build changes of YARN-888 we are leaving out all 3rd party JArs used directly by YARN under /share/hadoop/yarn/lib/.
We did not notice this when running minicluster because they all happen to be in the classpath from hadoop-common and hadoop-yarn.
As 3d party JARs are not 'public' interfaces we cannot rely on them being provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd party dependency that yarn uses this would break yarn if yarn does not pull that dependency explicitly).
Also, this will break bigtop hadoop build when they move to use branch-2 as they expect to find jars in /share/hadoop/yarn/lib/</blockquote></li>
Blocker bug reported by Jason Lowe and fixed by Haohui Mai (resourcemanager)<br>
<b>RM does not startup when security is enabled without spnego configured</b><br>
<blockquote>We have a custom auth filter in front of our various UI pages that handles user authentication. However currently the RM assumes that if security is enabled then the user must have configured spnego as well for the RM web pages which is not true in our case.</blockquote></li>
Critical sub-task reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Public localizer crashes with "Localized unkown resource"</b><br>
<blockquote>The public localizer can crash with the error:
{noformat}
2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>RMDispatcher should be reset on transition to standby</b><br>
<blockquote>Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService.
Almost every time when we transit RM from Active to Standby, we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times.</blockquote></li>
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>ZK store should use a private password for root-node-acls</b><br>
<blockquote>Currently, when HA is enabled, ZK store uses cluster-timestamp as the password for root node ACLs to give the Active RM exclusive access to the store. A more private value like a random number might be better. </blockquote></li>
Trivial task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Rename clusterid to clusterId in ActiveRMInfoProto </b><br>
<blockquote>YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field clusterid, which is inconsistent with other fields. Better to fix it immediately than leave the inconsistency. </blockquote></li>
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE</b><br>
<blockquote>RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and ClientRMProxy set it. This leads to races as witnessed on - YARN-1482.
Sample trace:
{noformat}
java.lang.IllegalArgumentException: RM does not support this client protocol
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119)
at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88)
at org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56)
Blocker bug reported by Xuan Gong and fixed by Xuan Gong <br>
<b>WebAppProxyServer should not set localhost as YarnConfiguration.PROXY_ADDRESS by itself</b><br>
<blockquote>At WebAppProxyServer::startServer(), it will set up YarnConfiguration.PROXY_ADDRESS to localhost:9099 by itself. So, no matter what is the value we set YarnConfiguration.PROXY_ADDRESS in configuration, the proxyserver will bind to localhost:9099</blockquote></li>
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs</b><br>
<blockquote>After YARN-1325, the YarnConfiguration.RM_HA_IDS will contain multiple RM_Ids. We need to verify that the RM service addresses configurations have been set for all of RM_Ids.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong <br>
<b>WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM</b><br>
<blockquote>This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Move internal services logic from AdminService to ResourceManager</b><br>
<blockquote>This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues
- Not easy to follow RM's service life cycle
-- RM adds only AdminService as its service directly.
-- Other services are added to RM when AdminService's init calls RM.activeServices.init()
- Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server.</blockquote></li>
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>TestResourceManager relies on the scheduler assigning multiple containers in a single node update</b><br>
<blockquote>TestResourceManager rely on the capacity scheduler.
It relies on a scheduler that assigns multiple containers in a single heartbeat, which not all schedulers do by default. It also relies on schedulers that don't consider CPU capacities. It would be simple to change the test to use multiple heartbeats and increase the vcore capacities of the nodes in the test.</blockquote></li>
Major sub-task reported by Wangda Tan and fixed by Wangda Tan (api)<br>
<b>Common PB type definitions for container resizing</b><br>
<blockquote>As described in YARN-1197, we need add some common PB types for container resource change, like ResourceChangeContext, etc. These types will be both used by RM/NM protocols</blockquote></li>
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Change killing application to wait until state store is done</b><br>
<blockquote>When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts.</blockquote></li>
Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (scheduler)<br>
<b>TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 </b><br>
<blockquote>QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable.</blockquote></li>
<blockquote>When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this.
Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports.</blockquote></li>
Major bug reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA <br>
<b>NonAggregatingLogHandler can throw RejectedExecutionException</b><br>
<blockquote>This problem is caused by handling APPLICATION_FINISHED events after calling sched.shotdown() in NonAggregatingLongHandler#serviceStop(). org.apache.hadoop.mapred.TestJobCleanup can fail because of RejectedExecutionException by NonAggregatingLogHandler.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>RM Web UI and REST APIs should uniformly use YarnApplicationState</b><br>
<blockquote>RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState.
It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change. The change would only reduce the set of possible strings that the API returns, so I think not. We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1).</blockquote></li>
Major sub-task reported by Yesha Vora and fixed by Jian He <br>
<b>RM hangs on shutdown if calling system.exit in serviceInit or serviceStart</b><br>
<blockquote>Enable yarn.resourcemanager.recovery.enabled=true and Pass a local path to yarn.resourcemanager.fs.state-store.uri. such as "file:///tmp/MYTMP"
if the directory /tmp/MYTMP is not readable or writable, RM should crash and should print "Permission denied Error"
Currently, RM throws "java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist" Error. RM returns Exiting status 1 but RM process does not shutdown.
Snapshot of Resource manager log:
2013-09-27 18:31:36,621 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for nm-tokens
2013-09-27 18:31:36,694 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:379)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:188)
at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:112)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:635)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
2013-09-27 18:31:36,697 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1</blockquote></li>
Major bug reported by Gera Shegalov and fixed by Gera Shegalov (nodemanager)<br>
<b>With zero sleep-delay-before-sigkill.ms, no signal is ever sent</b><br>
<blockquote>If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 then an unresponsive child JVM is never killed. In MRv1, TT used to immediately SIGKILL in this case. </blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications/distributed-shell)<br>
<b>Distributed shell application master launched with debug flag can hang waiting for external ls process.</b><br>
<blockquote>Distributed shell launched with the debug flag will run {{ApplicationMaster#dumpOutDebugInfo}}. This method launches an external process to run ls and print the contents of the current working directory. We've seen that this can cause the application master to hang on {{Process#waitFor}}.</blockquote></li>
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Allow sophisticated app-to-queue placement policies in the Fair Scheduler</b><br>
<blockquote>Currently the Fair Scheduler supports app-to-queue placement by username. It would be beneficial to allow more sophisticated policies that rely on primary and secondary groups and fallbacks.</blockquote></li>
<blockquote>YARN-1044 fixed min/max/used resource display problem in the scheduler page. But the "Fair Share" has the same problem and need to fix it.</blockquote></li>
Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (api)<br>
<b>RMWebServices should use ClientRMService for filtering applications</b><br>
<blockquote>YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality.</blockquote></li>
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>NodeManager mistakenly loses resources and relocalizes them</b><br>
<blockquote>When a local resource that should already be present is requested again, the nodemanager checks to see if it still present. However the method it uses to check for presence is via File.exists() as the user of the nodemanager process. If the resource was a private resource localized for another user, it will be localized to a location that is not accessible by the nodemanager user. Therefore File.exists() returns false, the nodemanager mistakenly believes the resource is no longer available, and it proceeds to localize it over and over.</blockquote></li>
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Implement a RMStateStore cleaner for deleting application/attempt info</b><br>
<blockquote>Now that we are storing the final state of application/attempt instead of removing application/attempt info on application/attempt completion(YARN-891), we need a separate RMStateStore cleaner for cleaning the application/attempt state.</blockquote></li>
Blocker bug reported by Devaraj K and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Resource Manager fails to start due to ConcurrentModificationException</b><br>
<blockquote>Resource Manager is failing to start with the below ConcurrentModificationException.
{code:xml}
2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException
java.util.ConcurrentModificationException
at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby
2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby
Minor test reported by Chuan Liu and fixed by Chuan Liu (client)<br>
<b>TestYarnCLI fails on Windows due to line endings</b><br>
<blockquote>The unit test fails on Windows due to incorrect line endings was used for comparing the output from command line output. Error messages are as follows.
{noformat}
junit.framework.ComparisonFailure: expected:<...argument for options[]
usage: application
...> but was:<...argument for options[
]
usage: application
...>
at junit.framework.Assert.assertEquals(Assert.java:85)
at junit.framework.Assert.assertEquals(Assert.java:91)
at org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878)
Trivial bug reported by Konstantin Weitz and fixed by Konstantin Weitz (resourcemanager)<br>
<b>Invalid string format in Fair Scheduler log warn message</b><br>
<blockquote>While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file:
The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory.
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>yarn.cmd does not support passthrough to any arbitrary class.</b><br>
<blockquote>The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands. The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument.</blockquote></li>
Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager)<br>
<b>NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs</b><br>
<blockquote>If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running.</blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication</b><br>
<blockquote>FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places. They both extend SchedulerApplication. We can move a lot of this duplicate code into SchedulerApplication.</blockquote></li>
Minor improvement reported by Sandy Ryza and fixed by Sebastian Wong <br>
<b>In TestAMRMClient, replace assertTrue with assertEquals where possible</b><br>
<blockquote>TestAMRMClient uses a lot of "assertTrue(amClient.ask.size() == 0)" where "assertEquals(0, amClient.ask.size())" would make it easier to see why it's failing at a glance.</blockquote></li>
Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>yarn.cmd exits with NoClassDefFoundError trying to run rmadmin or logs</b><br>
<blockquote>The yarn shell script was updated so that the rmadmin and logs sub-commands launch {{org.apache.hadoop.yarn.client.cli.RMAdminCLI}} and {{org.apache.hadoop.yarn.client.cli.LogsCLI}}. The yarn.cmd script also needs to be updated so that the commands work on Windows.</blockquote></li>
Major sub-task reported by Tsuyoshi OZAWA and fixed by Xuan Gong (resourcemanager)<br>
<b>Enabling HA should check Configuration contains multiple RMs</b><br>
<blockquote>Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS). This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs.
One idea is to support "strict mode" to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled).</blockquote></li>
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (client)<br>
<b>NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly</b><br>
<blockquote>NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens.
The error observed in the client side is something like:
{code}
ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
NMToken for application attempt : appattempt_1382038445650_0002_000001 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_000001
Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Promote AdminService to an Always-On service and merge in RMHAProtocolService</b><br>
<blockquote>Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. </blockquote></li>
Trivial sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Fix app specific scheduler-events' names to be app-attempt based</b><br>
<blockquote>Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart.</blockquote></li>
Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>Rethink znode structure for RM HA</b><br>
<blockquote>Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222:
{quote}
We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore.
Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
<b>RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException</b><br>
<blockquote>When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit calls HAUtil.setAllRpcAddresses. If the configuration values are null, it just throws IllegalArgumentException.
It's messy to analyse which keys are null, so we should handle it and log the name of keys which are null.
A current log dump is as follows:
{code}
2013-10-15 06:24:53,431 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: Service RMHAProtocolService failed in state INITED; cause: java.lang.IllegalArgumentException: Property value must not be null
java.lang.IllegalArgumentException: Property value must not be null
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
at org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
at org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>Allow multiple commands separating with ";" in distributed-shell</b><br>
<blockquote>In shell, we can do "ls; ls" to run 2 commands at once.
In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command.</blockquote></li>
Major bug reported by Ted Yu and fixed by Ted Yu <br>
<b>SLS tests fail because conf puts yarn properties in fair-scheduler.xml</b><br>
<blockquote>I was looking at https://builds.apache.org/job/PreCommit-YARN-Build/2165//testReport/org.apache.hadoop.yarn.sls/TestSLSRunner/testSimulatorRunning/
I am able to reproduce the failure locally.
I found that FairSchedulerConfiguration.getAllocationFile() doesn't read the yarn.scheduler.fair.allocation.file config entry from fair-scheduler.xml
This leads to the following:
{code}
Caused by: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Bad fair scheduler config file: top-level element not <allocations>
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.reloadAllocs(QueueManager.java:302)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.initialize(QueueManager.java:108)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1145)
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec <<< FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 0.114 sec <<< FAILURE!
junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:48)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertTrue(Assert.java:27)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
Major improvement reported by Wei Yan and fixed by Wei Yan <br>
<b>Let continuous scheduling achieve more balanced task assignment</b><br>
<blockquote>Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks.
We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Make Fair Scheduler ACLs more user friendly</b><br>
<blockquote>The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to "*". Now that YARN-1258 enables configuring the root queue, we should reverse this. This will also bring the Fair Scheduler in line with the Capacity Scheduler.
We should also not trim the acl strings, which makes it impossible to only specify groups in an acl.</blockquote></li>
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)<br>
<b>LCE: Race condition leaves dangling cgroups entries for killed containers</b><br>
<blockquote>When LCE & cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup.
LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like:
{code}
2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_000011 is : 143
2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_000011 of type UPDATE_DIAGNOSTICS_MSG
2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_000011
{code}
CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers.
Still, waiting for extra 500ms seems too expensive.
We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair Scheduler chokes on unhealthy node reconnect</b><br>
<blockquote>Only nodes in the RUNNING state are tracked by schedulers. When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state. The FairScheduler doesn't guard against this.
I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it.</blockquote></li>
Trivial bug reported by Sandy Ryza and fixed by Robert Kanter (scheduler)<br>
<b>In Fair Scheduler web UI, queue num pending and num active apps switched</b><br>
<blockquote>The values returned in FairSchedulerLeafQueueInfo by numPendingApplications and numActiveApplications should be switched.</blockquote></li>
Blocker new feature reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)<br>
<b>Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode</b><br>
<blockquote>When using cgroups we require LCE to be configured in the cluster to start containers.
When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues:
* LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
* Because users can impersonate other users, any user would have access to any local file of other users
Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>In Fair Scheduler, maxRunningApps does not work for non-leaf queues</b><br>
<blockquote>Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Save version information in the state store</b><br>
<blockquote>When creating root dir for the first time we should write version 1. If root dir exists then we should check that the version in the state store matches the version from config.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Make improvements in ZKRMStateStore for fencing</b><br>
<blockquote>Using multi-operations for every ZK interaction.
In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>During RM restart, RM should start a new attempt only when previous attempt exits for real</b><br>
<blockquote>When RM recovers, it can wait for existing AMs to contact RM back and then kill them forcefully before even starting a new AM. Worst case, RM will start a new AppAttempt after waiting for 10 mins ( the expiry interval). This way we'll minimize multiple AMs racing with each other. This can help issues with downstream components like Pig, Hive and Oozie during RM restart.
In the mean while, new apps will proceed as usual as existing apps wait for recovery.
This can continue to be useful after work-preserving restart, so that AMs which can properly sync back up with RM can continue to run and those that don't are guaranteed to be killed before starting a new attempt.</blockquote></li>
Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (resourcemanager)<br>
<b>FileSystemRMStateStore can leave partial files that prevent subsequent recovery</b><br>
<blockquote>FileSystemRMStateStore writes directly to the destination file when storing state. However if the RM were to crash in the middle of the write, the recovery method could encounter a partially-written file and either outright crash during recovery or silently load incomplete state.
To avoid this, the data should be written to a temporary file and renamed to the destination file afterwards.</blockquote></li>
Major bug reported by Andrey Klochkov and fixed by Andrey Klochkov <br>
<b>MiniYARNCluster shutdown takes several minutes intermittently</b><br>
<blockquote>As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for >6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources.</blockquote></li>
Trivial bug reported by Thomas Graves and fixed by Chen He (capacityscheduler)<br>
<b>Update capacity scheduler docs to include types on the configs</b><br>
<blockquote>The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int. It also the only setting for the Resource Allocation configs that is an Int rather then a float.</blockquote></li>
Major bug reported by Yingda Chen and fixed by Chuan Liu (api)<br>
<b>yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows</b><br>
<blockquote>yarn-default.xml has "yarn.application.classpath" entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed.</blockquote></li>
Major test reported by Robert Parker and fixed by Mit Desai (resourcemanager)<br>
<b>Add ClusterMetrics checks to tho TestRMNodeTransitions tests</b><br>
<blockquote>YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions.</blockquote></li>
Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)<br>
<b>Active nodes can be decremented below 0</b><br>
<blockquote>The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class. The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.</blockquote></li>
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
<b>Separate out RM services into "Always On" and "Active"</b><br>
<blockquote>From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to be started on transitionToActive() and completely shutdown on transitionToStandby().
The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby.
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Diagnostic message from ContainerExitEvent is ignored in ContainerImpl</b><br>
<blockquote>If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure.</blockquote></li>
Critical bug reported by Sangjin Lee and fixed by Sangjin Lee (resourcemanager , scheduler)<br>
<b>used/min/max resources do not display info in the scheduler page</b><br>
<blockquote>Go to the scheduler page in RM, and click any queue to display the detailed info. You'll find that none of the resources entries (used, min, or max) would display values.
It is because the values contain brackets ("<" and ">") and are not properly html-escaped.</blockquote></li>
Major sub-task reported by Nemon Lou and fixed by Karthik Kambatla <br>
<b>Expose RM active/standby state to Web UI and REST API</b><br>
<blockquote>Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Allow embedding leader election into the RM</b><br>
<blockquote>It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Add FailoverProxyProvider like capability to RMProxy</b><br>
<blockquote>RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
<b>Implement RMHAProtocolService</b><br>
<blockquote>Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services.</blockquote></li>
467 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249)
468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl: Interrupted while waiting for queue
469 java.lang.InterruptedException
470 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer. java:1961)
471 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
472 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
473 at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275)</blockquote></li>
Major new feature reported by Wei Yan and fixed by Wei Yan (scheduler)<br>
<b>Yarn Scheduler Load Simulator</b><br>
<blockquote>The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful.
We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation.
The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM.
To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler.
The simulator will produce real time metrics while executing, including:
* Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity.
* The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc).
* Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits.
The simulator will provide real time charts showing the behavior of the scheduler and its performance.
A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler.</blockquote></li>
Critical improvement reported by Alejandro Abdelnur and fixed by Wei Yan (scheduler)<br>
<b>FairScheduler: decouple container scheduling from nodemanager heartbeats</b><br>
<blockquote>Currently scheduling for a node is done when a node heartbeats.
For large cluster where the heartbeat interval is set to several seconds this delays scheduling of incoming allocations significantly.
We could have a continuous loop scanning all nodes and doing scheduling. If there is availability AMs will get the allocation in the next heartbeat after the one that placed the request.</blockquote></li>
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (nodemanager)<br>
<b>Nodemanager should log where a resource was localized</b><br>
<blockquote>When a resource is localized, we should log WHERE on the local disk it was localized. This helps in debugging afterwards (e.g. if the disk was to go bad).</blockquote></li>
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (documentation)<br>
<b>Document the meaning of a virtual core</b><br>
<blockquote>As virtual cores are a somewhat novel concept, it would be helpful to have thorough documentation that clarifies their meaning.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Store completed application information in RM state store</b><br>
<blockquote>Store completed application/attempt info in RMStateStore when application/attempt completes. This solves some problems like finished application get lost after RM restart and some other races like YARN-1195</blockquote></li>
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
<b>clean up POM dependencies</b><br>
<blockquote>Intermediate 'pom' modules define dependencies inherited by leaf modules.
This is causing issues in intellij IDE.
We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency.</blockquote></li>
<blockquote>getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception.</blockquote></li>
Major sub-task reported by Robert Parker and fixed by Robert Parker (nodemanager , resourcemanager)<br>
<b>ResourceManager and NodeManager should check for a minimum allowed version</b><br>
<blockquote>Our use case is during upgrade on a large cluster several NodeManagers may not restart with the new version. Once the RM comes back up the NodeManager will re-register without issue to the RM.
The NM should report the version the RM. The RM should have a configuration to disallow the check (default), equal to the RM (to prevent config change for each release), equal to or greater than RM (to allow NM upgrades), and finally an explicit version or version range.
The RM should also have an configuration on how to treat the mismatch: REJECT, or REBOOT the NM.</blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>When querying apps by queue, iterating over all apps is inefficient and limiting </b><br>
<blockquote>The question "which apps are in queue x" can be asked via the RM REST APIs, through the ClientRMService, and through the command line. In all these cases, the question is answered by scanning through every RMApp and filtering by the app's queue name.
All schedulers maintain a mapping of queues to applications. I think it would make more sense to ask the schedulers which applications are in a given queue. This is what was done in MR1. This would also have the advantage of allowing a parent queue to return all the applications on leaf queues under it, and allow queue name aliases, as in the way that "root.default" and "default" refer to the same queue in the fair scheduler.
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Expose application resource usage in RM REST API</b><br>
<blockquote>It might be good to require users to explicitly ask for this information, as it's a little more expensive to collect than the other fields in AppInfo.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi (resourcemanager)<br>
<b>Slow or failing DelegationToken renewals on submission itself make RM unavailable</b><br>
<blockquote>This was caused by YARN-280. A slow or a down NameNode for will make it look like RM is unavailable as it may run out of RPC handlers due to blocked client submissions.</blockquote></li>
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>Make container logs available over HTTP in plain text</b><br>
<blockquote>It would be good to make container logs available over the REST API for MAPREDUCE-4362 and so that they can be accessed programatically in general.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Harshit Daga (scheduler)<br>
<b>In scheduler web UIs, queues unexpand on refresh</b><br>
<blockquote>In the fair scheduler web UI, you can expand queue information. Refreshing the page causes the expansions to go away, which is annoying for someone who wants to monitor the scheduler page and needs to reopen all the queues they care about each time.</blockquote></li>
Major bug reported by Lohit Vijayarenu and fixed by Sandy Ryza (scheduler)<br>
<b>Allow disabling the Fair Scheduler event log</b><br>
<blockquote>Hadoop 1.0 supported an option to turn on/off FairScheduler event logging using mapred.fairscheduler.eventlog.enabled. In Hadoop 2.0, it looks like this option has been removed (or not ported?) which causes event logging to be enabled by default and there is no way to turn it off.</blockquote></li>
Major sub-task reported by Junping Du and fixed by Junping Du (api)<br>
<b>Add updateNodeResource in ResourceManagerAdministrationProtocol</b><br>
<blockquote>Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's resource change. For design detail, please refer parent JIRA: YARN-291.</blockquote></li>
<blockquote>As the first step, we go for resource change on RM side and expose admin APIs (admin protocol, CLI, REST and JMX API) later. In this jira, we will only contain changes in scheduler.
The flow to update node's resource and awareness in resource scheduling is:
1. Resource update is through admin API to RM and take effect on RMNodeImpl.
2. When next NM heartbeat for updating status comes, the RMNode's resource change will be aware and the delta resource is added to schedulerNode's availableResource before actual scheduling happens.
3. Scheduler do resource allocation according to new availableResource in SchedulerNode.
For more design details, please refer proposal and discussions in parent JIRA: YARN-291.</blockquote></li>
Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu (resourcemanager)<br>
<b>Fair scheduler logs too many "Node offered to app:..." messages</b><br>
<blockquote>Running fair scheduler YARN shows that RM has lots of messages like the below.
{noformat}
INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable: Node offered to app: application_1357147147433_0002 reserved: false
{noformat}
They dont seem to tell much and same line is dumped many times in RM log. It would be good to have it improved with node information or moved to some other logging level with enough debug information</blockquote></li>
Major new feature reported by BitsOfInfo and fixed by Mariappan Asokan <br>
<b>FixedLengthInputFormat and FixedLengthRecordReader</b><br>
<blockquote>Addition of FixedLengthInputFormat and FixedLengthRecordReader in the org.apache.hadoop.mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
Please see javadoc for more details.</blockquote></li>
Major bug reported by Suresh Srinivas and fixed by Jing Zhao (namenode)<br>
<b>Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK</b><br>
<blockquote>Add a new editlog record (OP_ADD_BLOCK) that only records allocation of the new block instead of the entire block list, on every block allocation.</blockquote></li>
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Implement HTTP policy for Namenode and DataNode</b><br>
<blockquote>Add new HTTP policy configuration. Users can use "dfs.http.policy" to control the HTTP endpoints for NameNode and DataNode. Specifically, The following values are supported:
- HTTP_ONLY : Service is provided only on http
- HTTPS_ONLY : Service is provided only on https
- HTTP_AND_HTTPS : Service is provided both on http and https
hadoop.ssl.enabled and dfs.https.enabled are deprecated. When the deprecated configuration properties are still configured, currently http policy is decided based on the following rules:
1. If dfs.http.policy is set to HTTPS_ONLY or HTTP_AND_HTTPS. It picks the specified policy, otherwise it proceeds to 2~4.
2. It picks HTTPS_ONLY if hadoop.ssl.enabled equals to true.
3. It picks HTTP_AND_HTTPS if dfs.https.enable equals to true.
4. It picks HTTP_ONLY for other configurations.</blockquote></li>
Major sub-task reported by Haohui Mai and fixed by Haohui Mai <br>
<b>Fix HTTPS support in HsftpFileSystem</b><br>
<blockquote>Fix the https support in HsftpFileSystem. With the change the client now verifies the server certificate. In particular, client side will verify the Common Name of the certificate using a strategy specified by the configuration property "hadoop.ssl.hostname.verifier".</blockquote></li>
Major bug reported by Colin Patrick McCabe and fixed by Colin Patrick McCabe (libhdfs)<br>
<b>libhdfs doesn't return correct error codes in most cases</b><br>
<blockquote>libhdfs now returns correct codes in errno. Previously, due to a bug, many functions set errno to 255 instead of the more specific error code.</blockquote></li>
Trivial improvement reported by Harsh J and fixed by Harsh J <br>
<b>DU refresh interval is not configurable</b><br>
<blockquote>The 'du' (disk usage command from Unix) script refresh monitor is now configurable in the same way as its 'df' counterpart, via the property 'fs.du.interval', the default of which is 10 minute (in ms).</blockquote></li>
Blocker bug reported by Alejandro Abdelnur and fixed by Siddharth Seth (nodemanager)<br>
<b>LCE fails to run containers that don't have resources to localize</b><br>
<blockquote>LCE container launch assumes the usercache/USER directory exists and it is owned by the user running the container process.
But the directory is created only if there are resources to localize by the LCE localization command, if there are not resourcdes to localize, LCE localization never executes and launching fails reporting 255 exit code and the NM logs have something like:
{code}
2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 1
2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is llama
2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_000004 - Permission denied
<blockquote>The error is shown below in the comments.
MAPREDUCE-2374 fixed this by removing "-c" when running the container launch script. It looks like the "-c" got brought back during the windows branch merge, so we should remove it again.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Karthik Kambatla <br>
<b>TestApplicationCleanup relies on all containers assigned in a single heartbeat</b><br>
<blockquote>TestApplicationCleanup submits container requests and waits for allocations to come in. It only sends a single node heartbeat to the node, expecting multiple containers to be assigned on this heartbeat, which not all schedulers do by default.
This is causing the test to fail when run with the Fair Scheduler.</blockquote></li>
Critical sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
<b>NM silently ignores non-existent service in StartContainerRequest</b><br>
<blockquote>A container can set token service metadata for a service, say shuffle_service. If that service does not exist then the errors is silently ignored. Later, when the next container wants to access data written to shuffle_service by the first task, then it fails because the service does not have the token that was supposed to be set by the first task.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>NM is polluting container's credentials</b><br>
<blockquote>Before launching the container, NM is using the same credential object and so is polluting what container should see. We should fix this.</blockquote></li>
Major bug reported by Junping Du and fixed by Xuan Gong (applications/distributed-shell)<br>
<b>TestDistributedShell#TestDSShell failed with timeout</b><br>
<blockquote>TestDistributedShell#TestDSShell on trunk Jenkins are failed consistently recently.
The Stacktrace is:
{code}
java.lang.Exception: test timed out after 90000 milliseconds
at com.google.protobuf.LiteralByteString.<init>(LiteralByteString.java:234)
at com.google.protobuf.ByteString.copyFromUtf8(ByteString.java:255)
at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getMethodNameBytes(ProtobufRpcEngineProtos.java:286)
at org.apache.hadoop.ipc.protobuf.ProtobufRpcEngineProtos$RequestHeaderProto.getSerializedSize(ProtobufRpcEngineProtos.java:462)
at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:84)
at org.apache.hadoop.ipc.ProtobufRpcEngine$RpcMessageWithHeader.write(ProtobufRpcEngine.java:302)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:989)
at org.apache.hadoop.ipc.Client.call(Client.java:1377)
at org.apache.hadoop.ipc.Client.call(Client.java:1357)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy70.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:137)
at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at $Proxy71.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:195)
at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:622)
at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:597)
at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:125)
Major bug reported by Roman Shaposhnik and fixed by Roman Shaposhnik (nodemanager)<br>
<b>test-container-executor has gotten out of sync with the changes to container-executor</b><br>
<blockquote>If run under the super-user account test-container-executor.c fails in multiple different places. It would be nice to fix it so that we have better testing of LCE functionality.</blockquote></li>
Minor improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>Log application status in the rm log when app is done running</b><br>
<blockquote>Since there is no yarn history server it becomes difficult to determine what the status of an old application is. One has to be familiar with the state transition in yarn to know what means a success.
We should add a log at info level that captures what the finalStatus of an app is. This would be helpful while debugging applications if the RM has restarted and we no longer can use the UI.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)<br>
<b>FairScheduler setting queue name in RMApp is not working </b><br>
<blockquote>The fair scheduler sometimes picks a different queue than the one an application was submitted to, such as when user-as-default-queue is turned on. It needs to update the queue name in the RMApp so that this choice will be reflected in the UI.
This isn't working because the scheduler is looking up the RMApp by application attempt id instead of app id and failing to find it.</blockquote></li>
Blocker bug reported by Tassapol Athiapinya and fixed by Xuan Gong (nodemanager)<br>
<b>Define constraints on Auxiliary Service names. Change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.</b><br>
<blockquote>I run sleep job. If AM fails to start, this exception could occur:
13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_000001 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_000001/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
': not a valid identifier
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
.Failing this attempt.. Failing the application.</blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Clean up Fair Scheduler configuration loading</b><br>
<blockquote>Currently the Fair Scheduler is configured in two ways
* An allocations file that has a different format than the standard Hadoop configuration file, which makes it easier to specify hierarchical objects like queues and their properties.
* With properties like yarn.scheduler.fair.max.assign that are specified in the standard Hadoop configuration format.
The standard and default way of configuring it is to use fair-scheduler.xml as the allocations file and to put the yarn.scheduler properties in yarn-site.xml.
It is also possible to specify a different file as the allocations file, and to place the yarn.scheduler properties in fair-scheduler.xml, which will be interpreted as in the standard Hadoop configuration format. This flexibility is both confusing and unnecessary.
Additionally, the allocation file is loaded as fair-scheduler.xml from the classpath if it is not specified, but is loaded as a File if it is. This causes two problems
1. We see different behavior when not setting the yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, which is its default.
2. Classloaders may choose to cache resources, which can break the reload logic when yarn.scheduler.fair.allocation.file is not specified.
We should never allow the yarn.scheduler properties to go into fair-scheduler.xml. And we should always load the allocations file as a file, not as a resource on the classpath. To preserve existing behavior and allow loading files from the classpath, we can look for files on the classpath, but strip of their scheme and interpret them as Files.
Major bug reported by shanyu zhao and fixed by shanyu zhao (nodemanager)<br>
<b>FSDownload changes file suffix making FileUtil.unTar() throw exception</b><br>
<blockquote>While running a Hive join operation on Yarn, I saw exception as described below. This is caused by FSDownload copy the files into a temp file and change the suffix into ".tmp" before unpacking it. In unpack(), it uses FileUtil.unTar() which will determine if the file is "gzipped" by looking at the file suffix:
Major bug reported by Chuan Liu and fixed by Chuan Liu (api)<br>
<b>Yarn URL should include userinfo</b><br>
<blockquote>In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an userinfo as part of the URL. When converting a {{java.net.URI}} object into the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will set uri host as the url host. If the uri has a userinfo part, the userinfo is discarded. This will lead to information loss if the original uri has the userinfo, e.g. foo://username:password@example.com will be converted to foo://example.com and username/password information is lost during the conversion.
Critical sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Register ClientToken MasterKey in SecretManager after it is saved</b><br>
<blockquote>Currently, app attempt ClientToken master key is registered before it is saved. This can cause problem that before the master key is saved, client gets the token and RM also crashes, RM cannot reloads the master key back after it restarts as it is not saved. As a result, client is holding an invalid token.
We can register the client token master key after it is saved in the store.</blockquote></li>
Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi <br>
<b>Need to add https port related property in Yarn</b><br>
<blockquote>There is no yarn property available to configure https port for Resource manager, nodemanager and history server. Currently, Yarn services uses the port defined for http [defined by 'mapreduce.jobhistory.webapp.address','yarn.nodemanager.webapp.address', 'yarn.resourcemanager.webapp.address'] for running services on https protocol.
Yarn should have list of property to assign https port for RM, NM and JHS.
<blockquote>Submit distributed shell application. Once the application turns to be RUNNING state, app master host should not be empty. In reality, it is empty.
==console logs==
distributedshell.Client: Got application report from ASM for, appId=12, clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, distributedFinalState=UNDEFINED,
Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (resourcemanager)<br>
<b>ResourceManager UI has invalid tracking URL link for distributed shell application</b><br>
<blockquote>Submit YARN distributed shell application. Goto ResourceManager Web UI. The application definitely appears. In Tracking UI column, there will be history link. Click on that link. Instead of showing application master web UI, HTTP error 500 would appear.</blockquote></li>
Major bug reported by Ramya Sunil and fixed by Xuan Gong <br>
<b>NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING</b><br>
<blockquote>When nodemanager receives a kill signal when an application has finished execution but log aggregation has not kicked in, InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
{noformat}
2013-08-25 20:45:00,875 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just finished : application_1377459190746_0118
2013-08-25 20:45:00,876 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate log-file for app application_1377459190746_0118 at /app-logs/foo/logs/application_1377459190746_0118/<host>_45454.tmp
2013-08-25 20:45:00,876 INFO logaggregation.LogAggregationService (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation to complete for application_1377459190746_0118
2013-08-25 20:45:00,891 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for container container_1377459190746_0118_01_000004. Current good log dirs are /tmp/yarn/local
2013-08-25 20:45:00,915 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate log-file for app application_1377459190746_0118
2013-08-25 20:45:00,925 WARN application.Application (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
at java.lang.Thread.run(Thread.java:662)
2013-08-25 20:45:00,926 INFO application.Application (ApplicationImpl.java:handle(430)) - Application application_1377459190746_0118 transitioned from RUNNING to null
2013-08-25 20:45:00,927 WARN monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(463)) - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2013-08-25 20:45:00,938 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 8040
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Updating resource requests should be decoupled with updating blacklist</b><br>
<blockquote>Currently, in CapacityScheduler and FifoScheduler, blacklist is updated together with resource requests, only when the incoming resource requests are not empty. Therefore, when the incoming resource requests are empty, the blacklist will not be updated even when blacklist additions and removals are not empty.</blockquote></li>
Minor sub-task reported by Tassapol Athiapinya and fixed by Siddharth Seth (client)<br>
<b>$yarn logs command should return an appropriate error message if YARN application is still running</b><br>
<blockquote>In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId <app ID> while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress.
At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException.
Major bug reported by Yesha Vora and fixed by Jian He <br>
<b>Job does not get into Pending State</b><br>
<blockquote>When there is no resource available to run a job, next job should go in pending state. RM UI should show next job as pending app and the counter for the pending app should be incremented.
But Currently. Next job stays in ACCEPTED state and No AM has been assigned to this job.Though Pending App count is not incremented.
Running 'job status <nextjob>' shows job state=PREP.
$ mapred job -status job_1377122233385_0002
13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at host1/ip1
Critical bug reported by Lohit Vijayarenu and fixed by Lohit Vijayarenu <br>
<b>NPE in RackResolve</b><br>
<blockquote>We found a case where our rack resolve script was not returning rack due to problem with resolving host address. This exception was see in RackResolver.java as NPE, ultimately caught in RMContainerAllocator.
{noformat}
2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN CONTACTING RM.
java.lang.NullPointerException
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
Major sub-task reported by Sandy Ryza and fixed by Xuan Gong (scheduler)<br>
<b>Get queue administration ACLs working</b><br>
<blockquote>The Capacity Scheduler documents the yarn.scheduler.capacity.root.<queue-path>.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1.</blockquote></li>
Critical sub-task reported by Allen Wittenauer and fixed by Omkar Vinit Joshi (resourcemanager)<br>
<b>RM triggers web auth failure before first job</b><br>
<blockquote>On a secure YARN setup, before the first job is executed, going to the web interface of the resource manager triggers authentication errors.</blockquote></li>
Blocker sub-task reported by Colin Patrick McCabe and fixed by Sanjay Radia (fs)<br>
<b>disable symlinks temporarily</b><br>
<blockquote>During review of symbolic links, many issues were found related impact on semantics of existing APIs such FileSystem#listStatus, FileSystem#globStatus etc. There were also many issues brought up about symbolic links and the impact on security and functionality of HDFS. All these issues will be address in the upcoming release 2.3. Until then the feature is temporarily disabled.</blockquote></li>
Blocker bug reported by Jason Lowe and fixed by Omkar Vinit Joshi <br>
<b>NMTokenSecretManagerInNM is not being told when applications have finished </b><br>
<blockquote>The {{appFinished}} method is not being called when applications have finished. This causes a couple of leaks as {{oldMasterKeys}} and {{appToAppAttemptMap}} are never being pruned.
java.lang.ClassCastException: java.util.Collections$UnmodifiableSet cannot be cast to java.util.NavigableSet
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getContainersToPreempt(ProportionalCapacityPreemptionPolicy.java:403)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:202)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:173)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
Blocker bug reported by Arun C Murthy and fixed by Binglin Chang <br>
<b>yarn proto definitions should specify package as 'hadoop.yarn'</b><br>
<blockquote>yarn proto definitions should specify package as 'hadoop.yarn' similar to protos with 'hadoop.common' & 'hadoop.hdfs' in Common & HDFS respectively.</blockquote></li>
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Invalid key to HMAC computation error when getting application report for completed app attempt</b><br>
<blockquote>On a secure cluster, an invalid key to HMAC error is thrown when trying to get an application report for an application with an attempt that has unregistered.</blockquote></li>
Major improvement reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)<br>
<b>Add support whitelist for system users to Yarn container-executor.c</b><br>
<blockquote>Currently container-executor.c has a banned set of users (mapred, hdfs & bin) and configurable min.user.id (defaulting to 1000).
This presents a problem for systems that run as system users (below 1000) if these systems want to start containers.
Systems like Impala fit in this category. A (local) 'impala' system user is created when installing Impala on the nodes.
Note that the same thing happens when installing system like HDFS, Yarn, Oozie, from packages (Bigtop); local system users are created.
For Impala to be able to run containers in a secure cluster, the 'impala' system user must whitelisted.
For this, adding a configuration 'allowed.system.users' option in the container-executor.cfg and the logic in container-executor.c would allow the usernames in that list.
Because system users are not guaranteed to have the same UID in different machines, the 'allowed.system.users' property should use usernames and not UIDs.
Blocker bug reported by Omkar Vinit Joshi and fixed by Xuan Gong <br>
<b>By default yarn application -list should display all the applications in a state other than FINISHED / FAILED</b><br>
<blockquote>Today we are just listing application in RUNNING state by default for "yarn application -list". Instead we should show all the applications which are either submitted/accepted/running.</blockquote></li>
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>Make ApplicationConstants.Environment.USER definition OS neutral</b><br>
<blockquote>In YARN-557, we added some code to make {{ApplicationConstants.Environment.USER}} has OS-specific definition in order to fix the unit test TestUnmanagedAMLauncher. In YARN-571, the relevant test code was corrected. In YARN-602, we actually will explicitly set the environment variables for the child containers. With these changes, I think we can revert the YARN-557 change to make {{ApplicationConstants.Environment.USER}} OS neutral. The main benefit is that we can use the same method over the Enum constants. This should also fix the TestContainerLaunch#testContainerEnvVariables failure on Windows. </blockquote></li>
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)<br>
<b>Improve help message for $ yarn applications and $yarn node</b><br>
<blockquote>There is standardization of help message in YARN-1080. It is nice to have similar changes for $ yarn appications and yarn node</blockquote></li>
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Populate AMRMTokens back to AMRMTokenSecretManager after RM restarts</b><br>
<blockquote>The AMRMTokens are now only saved in RMStateStore and not populated back to AMRMTokenSecretManager after RM restarts. This is more needed now since AMRMToken also becomes used in non-secure env.</blockquote></li>
Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)<br>
<b>Active nodes can be decremented below 0</b><br>
<blockquote>The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class. The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.</blockquote></li>
Blocker task reported by Jaimin D Jetly and fixed by Omkar Vinit Joshi (nodemanager , resourcemanager)<br>
<b>Yarn and MRv2 should do HTTP client authentication in kerberos setup.</b><br>
<blockquote>In kerberos setup it's expected for a http client to authenticate to kerberos before allowing user to browse any information.</blockquote></li>
Major bug reported by Yesha Vora and fixed by Zhijie Shen (resourcemanager)<br>
<b>ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval</b><br>
<blockquote>if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat iterval, all the node managers will be added in 'Lost Nodes'
Instead, Resource Manager should validate these property and It should fail to start if combination of such property is invalid.</blockquote></li>
Minor improvement reported by Tassapol Athiapinya and fixed by Akira AJISAKA (client)<br>
<b>Minor improvement to output header for $ yarn node -list</b><br>
<blockquote>Output of $ yarn node -list shows number of running containers at each node. I found a case when new user of YARN thinks that this is container ID, use it later in other YARN commands and find an error due to misunderstanding.
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)<br>
<b>Improve help message for $ yarn logs</b><br>
<blockquote>There are 2 parts I am proposing in this jira. They can be fixed together in one patch.
1. Standardize help message for required parameter of $ yarn logs
YARN CLI has a command "logs" ($ yarn logs). The command always requires a parameter of "-applicationId <arg>". However, help message of the command does not make it clear. It lists -applicationId as optional parameter. If I don't set it, YARN CLI will complain this is missing. It is better to use standard required notation used in other Linux command for help message. Any user familiar to the command can understand that this parameter is needed more easily.
-appOwner <arg> AppOwner (assumed to be current user if not
specified)
-containerId <arg> ContainerId (must be specified if node address is
specified)
-nodeAddress <arg> NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}
2. Add description for help command. As far as I know, a user cannot get logs for running job. Since I spent some time trying to get logs of running applications, it should be nice to say this in command description.
{code:title=proposed help}
Retrieve logs for completed/killed YARN application
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows</b><br>
<blockquote>The three unit tests fail on Windows due to host name resolution differences on Windows, i.e. 127.0.0.1 does not resolve to host name "localhost".
{noformat}
org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container container_0_0000_01_000000 identifier is not valid for current Node manager. Expected : 127.0.0.1:12345 Found : localhost:12345
{noformat}
{noformat}
testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater) Time elapsed: 8343 sec <<< FAILURE!
org.junit.ComparisonFailure: expected:<[localhost]:12345> but was:<[127.0.0.1]:12345>
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at $Proxy26.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestContainerLaunch fails on Windows</b><br>
<blockquote>Several cases in this unit tests fail on Windows. (Append error log at the end.)
testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and bash script error handling. If some command fails in the cmd script, cmd will continue execute the the rest of the script command. Error handling needs to be explicitly carried out in the script file. The error code of the last command will be returned as the error code of the whole script. In this test, some error happened in the middle of the cmd script, the test expect an exception and non-zero error code. In the cmd script, the intermediate errors are ignored. The last command "call" succeeded and there is no exception.
testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands used by the test.
testContainerEnvVariables and testDelayedKill fail due to a regression from YARN-906.
testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 583 sec <<< FAILURE!
junit.framework.AssertionFailedError: Should catch exception
at junit.framework.Assert.fail(Assert.java:50)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269)
...
testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 561 sec <<< FAILURE!
junit.framework.AssertionFailedError: Should catch exception
at junit.framework.Assert.fail(Assert.java:50)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314)
...
testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 4136 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<137> but was:<143>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:500)
...
testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 2744 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<137> but was:<143>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:601)
Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (client)<br>
<b>Clean up YARN CLI app list to show only running apps.</b><br>
<blockquote>Once a user brings up YARN daemon, runs jobs, jobs will stay in output returned by $ yarn application -list even after jobs complete already. We want YARN command line to clean up this list. Specifically, we want to remove applications with FINISHED state(not Final-State) or KILLED state from the result.
{code}
[user1@host1 ~]$ yarn application -list
Total Applications:150
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (api)<br>
<b>ContainerExistStatus should define a status for preempted containers</b><br>
<blockquote>With the current behavior is impossible to determine if a container has been preempted or lost due to a NM crash.
Adding a PREEMPTED exit status (-102) will help an AM determine that a container has been preempted.
Note the change of scope from the original summary/description. The original scope proposed API/behavior changes. Because we are passed 2.1.0-beta I'm reducing the scope of this JIRA.</blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager , resourcemanager)<br>
<b>ResourceManager and NodeManager do not load native libraries on Windows.</b><br>
<blockquote>ResourceManager and NodeManager do not have the correct setting for java.library.path when launched on Windows. This prevents the processes from loading native code from hadoop.dll. The native code is required for correct functioning on Windows (not optional), so this ultimately can cause failures.</blockquote></li>
Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)<br>
<b>MiniYARNCluster with multiple nodemanagers, all nodes have same key for allocations</b><br>
<blockquote>While the NMs are keyed using the NodeId, the allocation is done based on the hostname.
This makes the different nodes indistinguishable to the scheduler.
There should be an option to enabled the host:port instead just port for allocations. The nodes reported to the AM should report the 'key' (host or host:port).
Major bug reported by Jian He and fixed by Xuan Gong <br>
<b>Nodes list web page on the RM web UI is broken</b><br>
<blockquote>The nodes web page which list all the connected nodes of the cluster is broken.
1. The page is not showing in correct format/style.
2. If we restart the NM, the node list is not refreshed, but just add the new started NM to the list. The old NMs information still remain.</blockquote></li>
Blocker task reported by Srimanth Gunturi and fixed by Zhijie Shen (api)<br>
<b>YARN should provide per application-type and state statistics</b><br>
<blockquote>In Ambari we plan to show for MR2 the number of applications finished, running, waiting, etc. It would be efficient if YARN could provide per application-type and state aggregated counts.
ContainerImpl.getLocalizedResources() is called in ContainerLaunch.call(), which is scheduled on a separate thread. If the container is not at LOCALIZED (e.g. it is at KILLING, see YARN-906), an AssertError will be thrown and fails the thread without notifying NM. Therefore, the container cannot receive more events, which are supposed to be sent from ContainerLaunch.call(), and move towards completion. </blockquote></li>
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Capacity Scheduler tries to reserve the memory more than what node manager reports.</b><br>
<blockquote>I have 2 node managers.
* one with 1024 MB memory.(nm1)
* second with 2048 MB memory.(nm2)
I am submitting simple map reduce application with 1 mapper and one reducer with 1024mb each. The steps to reproduce this are
* stop nm2 with 2048MB memory.( This I am doing to make sure that this node's heartbeat doesn't reach RM first).
* now submit application. As soon as it receives first node's (nm1) heartbeat it will try to reserve memory for AM-container (2048MB). However it has only 1024MB of memory.
* now start nm2 with 2048 MB memory.
It hangs forever... Ideally this has two potential issues.
* It should not try to reserve memory on a node manager which is never going to give requested memory. i.e. Current max capability of node manager is 1024MB but 2048MB is reserved on it. But it still does that.
* Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available memory. In this case if the original request was made without any locality then scheduler should unreserve memory on nm1 and allocate requested 2048MB container on nm2.
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>RM should validate the release container list before actually releasing them</b><br>
<blockquote>At present we are blinding passing the allocate request containing containers to be released to the scheduler. This may result into one application releasing another application's container.
{code}
@Override
@Lock(Lock.NoLock.class)
public Allocation allocate(ApplicationAttemptId applicationAttemptId,
Major improvement reported by Sandy Ryza and fixed by Alejandro Abdelnur (nodemanager)<br>
<b>Allow auxiliary services to listen for container starts and completions</b><br>
<blockquote>Making container start and completion events available to auxiliary services would allow them to be resource-aware. The auxiliary service would be able to notify a co-located service that is opportunistically using free capacity of allocation changes.</blockquote></li>
Major bug reported by Abhishek Kapoor and fixed by Omkar Vinit Joshi (applications/distributed-shell)<br>
<b>DistributedShell throwing Errors in logs after successfull completion</b><br>
<blockquote>I have tried running DistributedShell and also used ApplicationMaster of the same for my test.
The application is successfully running through logging some errors which would be useful to fix.
Below are the logs from NodeManager and ApplicationMasterode
Log Snippet for NodeManager
=============================
2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586
2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570
2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of <memory:10240, vCores:8>
2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000001 by user sunny
2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001
2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING
2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000001 to application application_1373184544832_0001
2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from INITING to RUNNING
2013-07-07 13:39:35,528 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from NEW to LOCALIZING
2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from INIT to DOWNLOADING
2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1373184544832_0001_01_000001
2013-07-07 13:39:35,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens. Credentials list:
2013-07-07 13:39:35,694 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user sunny
2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_000001.tokens to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001.tokens
2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001 = file:/home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001
2013-07-07 13:39:36,136 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:36,406 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from DOWNLOADING to LOCALIZED
2013-07-07 13:39:36,409 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZING to LOCALIZED
2013-07-07 13:39:36,524 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from LOCALIZED to RUNNING
2013-07-07 13:39:36,692 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000001/default_container_executor.sh]
2013-07-07 13:39:37,144 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:38,147 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,151 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:39,209 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000001
2013-07-07 13:39:39,259 WARN org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Unexpected: procfs stat file is not in the expected format for process with pid 11552
2013-07-07 13:39:39,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 29524 for container-id container_1373184544832_0001_01_000001: 79.9 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used
2013-07-07 13:39:39,645 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_000002 by user sunny
2013-07-07 13:39:39,651 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_000002 to application application_1373184544832_0001
2013-07-07 13:39:39,652 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from NEW to LOCALIZED
2013-07-07 13:39:39,660 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002
2013-07-07 13:39:39,728 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from LOCALIZED to RUNNING
2013-07-07 13:39:39,873 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, -c, /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_000002/default_container_executor.sh]
2013-07-07 13:39:39,898 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000002 succeeded
2013-07-07 13:39:39,899 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-07-07 13:39:39,900 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000002
2013-07-07 13:39:39,943 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000002 transitioned from EXITED_WITH_SUCCESS to DONE
2013-07-07 13:39:39,944 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000002 from application application_1373184544832_0001
2013-07-07 13:39:40,155 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:40,157 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 2, }, state: C_COMPLETE, diagnostics: "", exit_status: 0,
2013-07-07 13:39:40,158 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000002
2013-07-07 13:39:40,683 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000002
2013-07-07 13:39:40,686 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
2013-07-07 13:39:40,687 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51085: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000002 is not handled by this NodeManager
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862)
2013-07-07 13:39:41,162 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_RUNNING, diagnostics: "", exit_status: -1000,
2013-07-07 13:39:41,691 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1373184544832_0001_01_000001 succeeded
2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from RUNNING to EXITED_WITH_SUCCESS
2013-07-07 13:39:41,692 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1373184544832_0001_01_000001
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_000001 transitioned from EXITED_WITH_SUCCESS to DONE
2013-07-07 13:39:41,714 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Removing container_1373184544832_0001_01_000001 from application application_1373184544832_0001
2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 1, cluster_timestamp: 1373184544832, }, attemptId: 1, }, id: 1, }, state: C_COMPLETE, diagnostics: "", exit_status: 0,
2013-07-07 13:39:42,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed container container_1373184544832_0001_01_000001
2013-07-07 13:39:42,191 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_000001 (auth:SIMPLE)
2013-07-07 13:39:42,195 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1373184544832_0001_01_000001
2013-07-07 13:39:42,196 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:appattempt_1373184544832_0001_000001 (auth:TOKEN) cause:org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
2013-07-07 13:39:42,196 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9993, call org.apache.hadoop.yarn.api.ContainerManagementProtocolPB.stopContainer from 127.0.0.1:51086: error: org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
org.apache.hadoop.yarn.exceptions.YarnException: Container container_1373184544832_0001_01_000001 is not handled by this NodeManager
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeGetAndStopContainerRequest(ContainerManagerImpl.java:614)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stopContainer(ContainerManagerImpl.java:538)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.stopContainer(ContainerManagementProtocolPBServiceImpl.java:88)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:85)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1033)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1868)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1864)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1862)
2013-07-07 13:39:42,264 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1373184544832_0001_01_000002
2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000002
2013-07-07 13:39:42,265 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1373184544832_0001_01_000001
2013-07-07 13:39:43,173 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2013-07-07 13:39:43,174 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1373184544832_0001
2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2013-07-07 13:39:43,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1373184544832_0001, with delay of 10800 seconds
Log Snippet for Application Manager
==================================
13/07/07 13:39:36 INFO client.SimpleApplicationMaster: Initializing ApplicationMaster
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Application master for app, appId=1, clustertimestamp=1373184544832, attemptId=1
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Starting ApplicationMaster
13/07/07 13:39:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/07/07 13:39:37 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
13/07/07 13:39:37 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Max mem capabililty of resources in this cluster 8192
13/07/07 13:39:37 INFO client.SimpleApplicationMaster: Requested container ask: Capability[<memory:100, vCores:0>]Priority[0]ContainerCount[1]
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Got response from RM for container ask, allocatedCnt=1
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Launching shell command on a new container., containerId=container_1373184544832_0001_01_000002, containerNode=sunny-Inspiron:9993, containerNodeURI=sunny-Inspiron:8042, containerResourceMemory1024
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Setting up container launch container for containerid=container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.ContainerManagementProtocolProxy: Opening proxy : sunny-Inspiron:9993
13/07/07 13:39:39 INFO client.SimpleApplicationMaster: Succeeded to start Container container_1373184544832_0001_01_000002
13/07/07 13:39:39 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1373184544832_0001_01_000002
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got response from RM for container ask, completedCnt=1
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Got container status for containerID=container_1373184544832_0001_01_000002, state=COMPLETE, exitStatus=0, diagnostics=
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Container completed successfully., containerId=container_1373184544832_0001_01_000002
13/07/07 13:39:40 INFO client.SimpleApplicationMaster: Application completed. Stopping running containers
Major improvement reported by Trevor Lorimer and fixed by Trevor Lorimer (resourcemanager)<br>
<b>Enable multiple states to to be specified in Resource Manager apps REST call</b><br>
<blockquote>Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://<rm http address:port>/ws/v1/cluster/apps).
There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7).
The proposal is to be able to specify multiple states in a single REST call.</blockquote></li>
Major bug reported by Jian He and fixed by Xuan Gong <br>
<b>WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition</b><br>
<blockquote>The jira is tracking why appToken and clientToAMToken is removed separately, and why they are distributed in different transitions, ideally there may be a common place where these two tokens can be removed at the same time. </blockquote></li>
Major bug reported by Xuan Gong and fixed by Kenji Kikushima <br>
<b>NodeManager should mandatorily set some Environment variables into every containers that it launches</b><br>
<blockquote>NodeManager should mandatorily set some Environment variables into every containers that it launches, such as Environment.user, Environment.pwd. If both users and NodeManager set those variables, the value set by NM should be used </blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Expose a REST API for monitoring the fair scheduler</b><br>
<blockquote>The fair scheduler should have an HTTP interface that exposes information such as applications per queue, fair shares, demands, current allocations.</blockquote></li>
Critical sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Shared data structures in Public Localizer and Private Localizer are not Thread safe.</b><br>
<blockquote>PublicLocalizer
1) pending accessed by addResource (part of event handling) and run method (as a part of PublicLocalizer.run() ).
PrivateLocalizer
1) pending accessed by addResource (part of event handling) and findNextResource (i.remove()). Also update method should be fixed. It too is sharing pending list.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Race condition causing RM to potentially relaunch already unregistered AMs on RM restart</b><br>
<blockquote>When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded</blockquote></li>
Major sub-task reported by Lohit Vijayarenu and fixed by Mayank Bansal <br>
<b>RM crash with NPE on NODE_REMOVED event with FairScheduler</b><br>
<blockquote>While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha
{noformat}
2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node YYYY:55680 as it is now LOST
2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: YYYY:55680 Node Transitioned from UNHEALTHY to LOST
2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
at java.lang.Thread.run(Thread.java:662)
2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@XXXX:50030
<blockquote>When the ResourceManager kills an application, it leaves the proxy URL redirecting to the original tracking URL for the application even though the ApplicationMaster is no longer there to service it. It should redirect it somewhere more useful, like the RM's web page for the application, where the user can find that the application was killed and links to the AM logs.
In addition, sometimes the AM during teardown from the kill can attempt to unregister and provide an updated tracking URL, but unfortunately the RM has "forgotten" the AM due to the kill and refuses to process the unregistration. Instead it logs:
{noformat}
2013-01-09 17:37:49,671 [IPC Server handler 2 on 8030] ERROR
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AppAttemptId doesnt exist in cache appattempt_1357575694478_28614_000001
{noformat}
It should go ahead and process the unregistration to update the tracking URL since the application offered it.</blockquote></li>
Major sub-task reported by Devaraj K and fixed by Zhijie Shen (resourcemanager)<br>
<b>ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt</b><br>
<blockquote>{code:xml}
2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_000001
2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525
java.lang.ArrayIndexOutOfBoundsException: 0
at java.util.Arrays$ArrayList.get(Arrays.java:3381)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
Major new feature reported by Jing Zhao and fixed by Jing Zhao <br>
<b>Provide testing support for DFSClient to drop RPC responses</b><br>
<blockquote>Used for testing when NameNode HA is enabled. Users can use a new configuration property "dfs.client.test.drop.namenode.response.number" to specify the number of responses that DFSClient will drop in each RPC call. This feature can help testing functionalities such as NameNode retry cache.</blockquote></li>
<blockquote>Fix configs yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs} to have a *resourcemanager* only once, make them consistent with other such yarn configs and add entries in yarn-default.xml</blockquote></li>
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>Disable mem monitoring by default in MiniYARNCluster</b><br>
<blockquote>Have been running into this frequently inspite of MAPREDUCE-3709 on centos6 machines. However, when I try to run it independently on the machines, I have not been able to reproduce it.
{noformat}
2013-08-07 19:17:35,048 WARN [Container Monitor] monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(444)) - Container [pid=16556,containerID=container_1375928243488_0001_01_000001] is running beyond virtual memory limits. Current usage: 132.4 MB of 512 MB physical memory used; 1.2 GB of 1.0 GB virtual memory used. Killing container.
Major improvement reported by Siddharth Seth and fixed by Jian He <br>
<b>Improve toString implementation for PBImpls</b><br>
<blockquote>The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll("\\n", ", ").replaceAll("\\s+", " ");{code} is rather inefficient - replacing "\n" and "\s" to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}.
If we can get this into 2.1.0 - great, otherwise the next release.</blockquote></li>
Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>ContainerManagerImpl should enforce token on server. Today it is [TOKEN, SIMPLE]</b><br>
<blockquote>We should only accept SecurityAuthMethod.TOKEN for ContainerManagementProtocol. Today it also accepts SIMPLE for unsecured environment.</blockquote></li>
Blocker bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli <br>
<b>AM register failing after AMRMToken</b><br>
<blockquote>509 2013-07-19 15:53:55,569 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 54313: readAndProcess from client 127.0.0.1 threw exception [org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]]
510 org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN]
511 at org.apache.hadoop.ipc.Server$Connection.initializeAuthContext(Server.java:1531)
512 at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1482)
513 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:788)
514 at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:587)
515 at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:562)
Major bug reported by Sandy Ryza and fixed by Karthik Kambatla <br>
<b>TestResourceLocalizationService.testLocalizationInit can fail on JDK7</b><br>
<blockquote>It looks like this is occurring when testLocalizationInit doesn't run first. Somehow yarn.nodemanager.log-dirs is getting set by one of the other tests (to ${yarn.log.dir}/userlogs), but yarn.log.dir isn't being set.</blockquote></li>
Major task reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Change ContainerRequest to not have more than 1 container count and remove StoreContainerRequest</b><br>
<blockquote>The downside is having to use more than 1 container request when requesting more than 1 container at * priority. For most other use cases that have specific locations we anyways need to make multiple container requests. This will also remove unnecessary duplication caused by StoredContainerRequest. It will make the getMatchingRequest() always available and easy to use removeContainerRequest().</blockquote></li>
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>ContainerManagerProtcol APIs should take in requests for multiple containers</b><br>
<blockquote>AMs typically have to launch multiple containers on a node and the current single container APIs aren't helping. We should have all the APIs take in multiple requests and return multiple responses.
The client libraries could expose both the single and multi-container requests.</blockquote></li>
Minor bug reported by Mayank Bansal and fixed by Mayank Bansal <br>
<b>Document setting default heap sizes in yarn env</b><br>
<blockquote>Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script.
There is no straight forward way to change it in script. Just updating the variables with defaults.</blockquote></li>
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701</b><br>
<blockquote>Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta.</blockquote></li>
Major bug reported by Bikas Saha and fixed by Mayank Bansal <br>
<b>Create exceptions package in common/api for yarn and move client facing exceptions to them</b><br>
<blockquote>Exceptions like InvalidResourceBlacklistRequestException, InvalidResourceRequestException, InvalidApplicationMasterRequestException etc are currently inside ResourceManager and not visible to clients.</blockquote></li>
Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
<b>Disable TestLinuxContainerExecutorWithMocks on Windows</b><br>
<blockquote>This unit test tests a Linux specific feature. We should skip this unit test on Windows. A similar unit test 'TestLinuxContainerExecutor' was already skipped when running on Windows.</blockquote></li>
<blockquote>The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources.
Minor bug reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
<b>NodeHealthScriptRunner timeout checking is inaccurate on Windows</b><br>
<blockquote>In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution.
Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout.
We have following execution sequence in Shell:
1) In main thread, schedule a delayed timer task that will kill the original process upon timeout.
2) In main thread, open a buffered reader and feed in the process's standard input stream.
3) When timeout happens, the timer task will call {{Process#destroy()}}
to kill the main process.
On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: "Stream closed" in main thread.
On Windows, we don't have the IOException. Only "-1" was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this.
Major sub-task reported by Junping Du and fixed by Junping Du (scheduler)<br>
<b>Allow for black-listing resources in FifoScheduler</b><br>
<blockquote>YARN-750 already addressed black-list staff in YARN API and CS scheduler, this jira add implementation for FifoScheduler.</blockquote></li>
Major bug reported by Bikas Saha and fixed by Xuan Gong <br>
<b>Application can hang if AMRMClientAsync callback thread has exception</b><br>
<blockquote>Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError().</blockquote></li>
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ResourceManagerAdministrationProtocol should neither be public(yet) nor in yarn.api</b><br>
<blockquote>This is a admin only api that we don't know yet if people can or should write new tools against. I am going to move it to yarn.server.api and make it @Private..</blockquote></li>
Blocker bug reported by Ramya Sunil and fixed by Omkar Vinit Joshi <br>
<b>App submission fails on secure deploy</b><br>
<blockquote>App submission on secure cluster fails with the following exception:
{noformat}
INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0
main : user is qa_user
javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.
at org.apache.hadoop.ipc.Client.call(Client.java:1298)
at org.apache.hadoop.ipc.Client.call(Client.java:1250)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
at $Proxy7.heartbeat(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
Major bug reported by Devaraj K and fixed by Devaraj K (capacityscheduler)<br>
<b>maximum-am-resource-percent doesn't work after refreshQueues command</b><br>
<blockquote>If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.<queue-path>.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing 'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User. </blockquote></li>
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestAggregatedLogFormat.testContainerLogsFileAccess fails on Windows</b><br>
<blockquote>The YARN unit test case fails on Windows when comparing expected message with log message in the file. The expected message constructed in the test case has two problems: 1) it uses Path.separator to concatenate path string. Path.separator is always a forward slash, which does not match the backslash used in the log message. 2) On Windows, the default file owner is Administrators group if the file is created by an Administrators user. The test expect the user to be the current user.</blockquote></li>
Major bug reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Nodemanager does not register with RM using the fully qualified hostname</b><br>
<blockquote>If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only "foo". This can create problems if DNS cannot resolve the hostname properly.
Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations. </blockquote></li>
2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(832)) - Application appattempt_1371448527090_0844_000001 released container container_1371448527090_0844_01_000005 on node: host: hostXX:45454 #containers=4 available=2048 used=6144 with event: FINISHED
2013-06-17 12:43:53,656 INFO capacity.CapacityScheduler (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for application application_1371448527090_0844 on node: hostXX:45454
2013-06-17 12:43:53,656 INFO fica.FiCaSchedulerApp (FiCaSchedulerApp.java:unreserve(435)) - Application application_1371448527090_0844 unreserved on node host: hostXX:45454 #containers=4 available=2048 used=6144, currently has 4 at priority 20; currentReservation <memory:6144, vCores:4>
2013-06-17 12:43:53,656 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for deactivate...
2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
at java.lang.Thread.run(Thread.java:662)
2013-06-17 12:43:53,659 INFO resourcemanager.ResourceManager (ResourceManager.java:run(426)) - Exiting, bbye..
2013-06-17 12:43:53,665 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped SelectChannelConnector@hostXX:8088
Major sub-task reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
<b>Annotate and document AuxService APIs</b><br>
<blockquote>For users writing their own AuxServices, these APIs should be annotated and need better documentation. Also, the classes may need to move out of the NodeManager.</blockquote></li>
Minor bug reported by Chuan Liu and fixed by Chuan Liu <br>
<b>TestContainerLaunch.testContainerEnvVariables fails on Windows</b><br>
<blockquote>The unit test case fails on Windows due to job id or container id was not printed out as part of the container script. Later, the test tries to read the pid from output of the file, and fails.
Critical sub-task reported by Bikas Saha and fixed by Jian He <br>
<b>Need to make Resource arithmetic methods accessible</b><br>
<blockquote>org.apache.hadoop.yarn.server.resourcemanager.resource has stuff like Resources and Calculators that help compare/add resources etc. Without these users will be forced to replicate the logic, potentially incorrectly.</blockquote></li>
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Rename ApplicationToken to AMRMToken</b><br>
<blockquote>API change. At present this token is getting used on scheduler api AMRMProtocol. Right now name wise it is little confusing as it might be useful for the application to talk to complete yarn system (RM/NM) but that is not the case after YARN-694. NM will have specific NMToken so it is better to name it as AMRMToken.</blockquote></li>
Major sub-task reported by Hitesh Shah and fixed by Jian He <br>
<b>Difficult to diagnose a failed container launch when error due to invalid environment variable</b><br>
<blockquote>The container's launch script sets up environment variables, symlinks etc.
If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure.
To reproduce, set an env var where the value contains characters that throw syntax errors in bash. </blockquote></li>
Major bug reported by Wei Yan and fixed by Wei Yan (scheduler)<br>
<b>Fair scheduler queue metrics should subtract allocated vCores from available vCores</b><br>
<blockquote>The queue metrics of fair scheduler doesn't subtract allocated vCores from available vCores, causing the available vCores returned is incorrect.
This is happening because {code}QueueMetrics.getAllocateResources(){code} doesn't return the allocated vCores.</blockquote></li>
Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (scheduler)<br>
<b>Enable zero capabilities resource requests in fair scheduler</b><br>
<blockquote>Per discussion in YARN-689, reposting updated use case:
1. I have a set of services co-existing with a Yarn cluster.
2. These services run out of band from Yarn. They are not started as yarn containers and they don't use Yarn containers for processing.
3. These services use, dynamically, different amounts of CPU and memory based on their load. They manage their CPU and memory requirements independently. In other words, depending on their load, they may require more CPU but not memory or vice-versa.
By using YARN as RM for these services I'm able share and utilize the resources of the cluster appropriately and in a dynamic way. Yarn keeps tab of all the resources.
These services run an AM that reserves resources on their behalf. When this AM gets the requested resources, the services bump up their CPU/memory utilization out of band from Yarn. If the Yarn allocations are released/preempted, the services back off on their resources utilization. By doing this, Yarn and these service correctly share the cluster resources, being Yarn RM the only one that does the overall resource bookkeeping.
The services AM, not to break the lifecycle of containers, start containers in the corresponding NMs. These container processes do basically a sleep forever (i.e. sleep 10000d). They are almost not using any CPU nor memory (less than 1MB). Thus it is reasonable to assume their required CPU and memory utilization is NIL (more on hard enforcement later). Because of this almost NIL utilization of CPU and memory, it is possible to specify, when doing a request, zero as one of the dimensions (CPU or memory).
The current limitation is that the increment is also the minimum.
If we set the memory increment to 1MB. When doing a pure CPU request, we would have to specify 1MB of memory. That would work. However it would allow discretionary memory requests without a desired normalization (increments of 256, 512, etc).
If we set the CPU increment to 1CPU. When doing a pure memory request, we would have to specify 1CPU. CPU amounts a much smaller than memory amounts, and because we don't have fractional CPUs, it would mean that all my pure memory requests will be wasting 1 CPU thus reducing the overall utilization of the cluster.
Finally, on hard enforcement.
* For CPU. Hard enforcement can be done via a cgroup cpu controller. Using an absolute minimum of a few CPU shares (ie 10) in the LinuxContainerExecutor we ensure there is enough CPU cycles to run the sleep process. This absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the shares for 1 CPU are 1024.
* For Memory. Hard enforcement is currently done by the ProcfsBasedProcessTree.java, using a minimum absolute of 1 or 2 MBs would take care of zero memory resources. And again, this absolute minimum would only kick-in if zero is allowed, otherwise will never kick in as the increment memory is in several MBs if not 1GB.</blockquote></li>
Critical improvement reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>vcores-pcores ratio functions differently from vmem-pmem ratio in misleading way </b><br>
<blockquote>The vcores-pcores ratio functions differently from the vmem-pmem ratio in the sense that the vcores-pcores ratio has an impact on allocations and the vmem-pmem ratio does not.
If I double my vmem-pmem ratio, the only change that occurs is that my containers, after being scheduled, are less likely to be killed for using too much virtual memory. But if I double my vcore-pcore ratio, my nodes will appear to the ResourceManager to contain double the amount of CPU space, which will affect scheduling decisions.
The lack of consistency will exacerbate the already difficult problem of resource configuration.
Major bug reported by Sandy Ryza and fixed by Niranjan Singh (nodemanager)<br>
<b>NodeManager throws AvroRuntimeException on failed start</b><br>
<blockquote>NodeManager wraps exceptions that occur in its start method in AvroRuntimeExceptions, even though it doesn't use Avro anywhere else.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Rename AllocateResponse.reboot to AllocateResponse.resync</b><br>
<blockquote>For work preserving rm restart the am's will be resyncing instead of rebooting. rebooting is an action that currently satisfies the resync requirement. Changing the name now so that it continues to make sense in the real resync case. </blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (api , applications)<br>
<b>In AMRMClient, automatically add corresponding rack requests for requested nodes</b><br>
<blockquote>A ContainerRequest that includes node-level requests must also include matching rack-level requests for the racks that those nodes are on. When a node is present without its rack, it makes sense for the client to automatically add the node's rack.</blockquote></li>
Major sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>rename Service.register() and Service.unregister() to registerServiceListener() & unregisterServiceListener() respectively</b><br>
<blockquote>make it clear what you are registering on a {{Service}} by naming the methods {{registerServiceListener()}} & {{unregisterServiceListener()}} respectively.
This only affects a couple of production classes; {{Service.register()}} and is used in some of the lifecycle tests of the YARN-530. There are no tests of {{Service.unregister()}}, which is something that could be corrected.</blockquote></li>
Major bug reported by Kihwal Lee and fixed by Jason Lowe (nodemanager)<br>
<b>Log aggregation causes a lot of redundant setPermission calls</b><br>
<blockquote>In one of our clusters, namenode RPC is spending 45% of its time on serving setPermission calls. Further investigation has revealed that most calls are redundantly made on /mapred/logs/<user>/logs. Also mkdirs calls are made before this.
Major sub-task reported by Siddharth Seth and fixed by Omkar Vinit Joshi <br>
<b>NM startContainer should validate the NodeId</b><br>
<blockquote>The NM validates certain fields from the ContainerToken on a startContainer call. It shoudl also validate the NodeId (which needs to be added to the ContianerToken).</blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Add a multi-resource fair sharing metric</b><br>
<blockquote>Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it. This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions.
With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense. It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state.</blockquote></li>
assertTrue("" + i, status.getDiagnostics().contains(
"Container killed by the ApplicationMaster."));
assertEquals(-1000, status.getExitStatus());
} catch (YarnRemoteException e) {
fail("Exception is not expected");
}
{code}
NMClientImpl#stopContainer returns, but container hasn't been stopped immediately. ContainerManangerImpl implements stopContainer in async style. Therefore, the container's status is in transition. NMClientImpl#getContainerStatus immediately after stopContainer will get either the RUNNING status or the COMPLETE one.
There will be the similar problem wrt NMClientImpl#startContainer.
Blocker sub-task reported by Siddharth Seth and fixed by Xuan Gong <br>
<b>ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter</b><br>
<blockquote>Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type.</blockquote></li>
Major sub-task reported by Siddharth Seth and fixed by Zhijie Shen <br>
<b>container-log4j.properties should not refer to mapreduce properties</b><br>
<blockquote>This refers to yarn.app.mapreduce.container.log.dir and yarn.app.mapreduce.container.log.filesize. This should either be moved into the MR codebase. Alternately the parameters should be renamed.</blockquote></li>
Major sub-task reported by Jian He and fixed by Jian He <br>
<b>Copy BuilderUtil methods into token-related records</b><br>
<blockquote>This is separated from YARN-711,as after changing yarn.api.token from interface to abstract class, eg: ClientTokenPBImpl has to extend two classes: both TokenPBImpl and ClientToken abstract class, which is not allowed in JAVA.
We may remove the ClientToken/ContainerToken/DelegationToken interface and just use the common Token interface </blockquote></li>
Major bug reported by Siddharth Seth and fixed by Vinod Kumar Vavilapalli <br>
<b>TestDistributedShell and TestUnmanagedAMLauncher are failing</b><br>
<blockquote>Tests are timing out. Looks like this is related to YARN-617.
{code}
2013-05-21 17:40:23,693 ERROR [IPC Server handler 0 on 54024] containermanager.ContainerManagerImpl (ContainerManagerImpl.java:authorizeRequest(412)) - Unauthorized request to start container.
Expected containerId: user Found: container_1369183214008_0001_01_000001
2013-05-21 17:40:23,694 ERROR [IPC Server handler 0 on 54024] security.UserGroupInformation (UserGroupInformation.java:doAs(1492)) - PriviledgedActionException as:user (auth:SIMPLE) cause:org.apache.hado
Expected containerId: user Found: container_1369183214008_0001_01_000001
2013-05-21 17:40:23,695 INFO [IPC Server handler 0 on 54024] ipc.Server (Server.java:run(1864)) - IPC Server handler 0 on 54024, call org.apache.hadoop.yarn.api.ContainerManagerPB.startContainer from 10.
Expected containerId: user Found: container_1369183214008_0001_01_000001
org.apache.hadoop.yarn.exceptions.YarnRemoteException: Unauthorized request to start container.
Expected containerId: user Found: container_1369183214008_0001_01_000001
at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:43)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeRequest(ContainerManagerImpl.java:413)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainer(ContainerManagerImpl.java:440)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagerPBServiceImpl.startContainer(ContainerManagerPBServiceImpl.java:72)
at org.apache.hadoop.yarn.proto.ContainerManager$ContainerManagerService$2.callBlockingMethod(ContainerManager.java:83)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Jian He <br>
<b>Copy BuilderUtil methods into individual records</b><br>
<blockquote>BuilderUtils is one giant utils class which has all the factory methods needed for creating records. It is painful for users to figure out how to create records. We are better off having the factories in each record, that way users can easily create records.
As a first step, we should just copy all the factory methods into individual classes, deprecate BuilderUtils and then slowly move all code off BuilderUtils.</blockquote></li>
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Start using NMTokens to authenticate all communication with NM</b><br>
<blockquote>AM uses the NMToken to authenticate all the AM-NM communication.
NM will validate NMToken in below manner
* If NMToken is using current or previous master key then the NMToken is valid. In this case it will update its cache with this key corresponding to appId.
* If NMToken is using the master key which is present in NM's cache corresponding to AM's appId then it will be validated based on this.
* If NMToken is invalid then NM will reject AM calls.
Modification for ContainerToken
* At present RPC validates AM-NM communication based on ContainerToken. It will be replaced with NMToken. Also now onwards AM will use NMToken per NM (replacing earlier behavior of ContainerToken per container per NM).
* startContainer in case of Secured environment is using ContainerToken from UGI YARN-617; however after this it will use it from the payload (Container).
* ContainerToken will exist and it will only be used to validate the AM's container start request.</blockquote></li>
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Sending NMToken to AM on allocate call</b><br>
<blockquote>This is part of YARN-613.
As per the updated design, AM will receive per NM, NMToken in following scenarios
* AM is receiving first container on underlying NM.
* AM is receiving container on underlying NM after either NM or RM rebooted.
** After RM reboot, as RM doesn't remember (persist) the information about keys issued per AM per NM, it will reissue tokens in case AM gets new container on underlying NM. However on NM side NM will still retain older token until it receives new token to support long running jobs (in work preserving environment).
** After NM reboot, RM will delete the token information corresponding to that AM for all AMs.
* AM is receiving container on underlying NM after NMToken master key is rolled over on RM side.
In all the cases if AM receives new NMToken then it is suppose to store it for future NM communication until it receives a new one.
AMRMClient should expose these NMToken to client. </blockquote></li>
Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Creating NMToken master key on RM and sharing it with NM as a part of RM-NM heartbeat.</b><br>
<blockquote>This is related to YARN-613 . Here we will be implementing NMToken generation on RM side and sharing it with NM during RM-NM heartbeat. As a part of this JIRA mater key will only be made available to NM but there will be no validation done until AM-NM communication is fixed.</blockquote></li>
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM exits on token cancel/renew problems</b><br>
<blockquote>The DelegationTokenRenewer thread is critical to the RM. When a non-IOException occurs, the thread calls System.exit to prevent the RM from running w/o the thread. It should be exiting only on non-RuntimeExceptions.
The problem is especially bad in 23 because the yarn protobuf layer converts IOExceptions into UndeclaredThrowableExceptions (RuntimeException) which causes the renewer to abort the process. An UnknownHostException takes down the RM...</blockquote></li>
Major bug reported by Jian He and fixed by Jian He <br>
<b>Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater</b><br>
<blockquote>Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers event happens to be on the same dispatcher thread, CleanupContainers Event will not be processed until SHUTDOWN event is processed. see similar problem on YARN-495.
On normal NM shutdown, this is not a problem since normal stop happens on shutdownHook thread.</blockquote></li>
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
<b>Flatten NodeReport</b><br>
<blockquote>The NodeReport returned by getClusterNodes or given to AMs in heartbeat responses includes both a NodeState (enum) and a NodeHealthStatus (object). As UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem necessary. I propose eliminating NodeHealthStatus#getIsNodeHealthy and moving its two other methods, getHealthReport and getLastHealthReportTime, into NodeReport.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>NM fails to cleanup local directories for users</b><br>
<blockquote>YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>In scheduler UI, including reserved memory in "Memory Total" can make it exceed cluster capacity.</b><br>
<blockquote>"Memory Total" is currently a sum of availableMB, allocatedMB, and reservedMB. Including reservedMB in this sum can make the total exceed the capacity of the cluster. </blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler metrics should subtract allocated memory from available memory</b><br>
<blockquote>In the scheduler web UI, cluster metrics reports that the "Memory Total" goes up when an application is allocated resources.</blockquote></li>
Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
<b>Change ContainerManagerPBClientImpl and RMAdminProtocolPBClientImpl to throw IOException and YarnRemoteException</b><br>
<blockquote>YARN-632 AND YARN-633 changes RMAdmin and ContainerManager api to throw YarnRemoteException and IOException. RMAdminProtocolPBClientImpl and ContainerManagerPBClientImpl should do the same changes</blockquote></li>
Major bug reported by Dapeng Sun and fixed by Dapeng Sun (documentation)<br>
<b>Some issues in Fair Scheduler's document</b><br>
<blockquote>Issues are found in the doc page for Fair Scheduler http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
1.In the section “Configuration”, It contains two properties named “yarn.scheduler.fair.minimum-allocation-mb”, the second one should be “yarn.scheduler.fair.maximum-allocation-mb”
2.In the section “Allocation file format”, the document tells “ The format contains three types of elements”, but it lists four types of elements following that.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (api , resourcemanager)<br>
<b>Fix up /nodes REST API to have 1 param and be consistent with the Java API</b><br>
<blockquote>The code behind the /nodes RM REST API is unnecessarily muddled, logs the same misspelled INFO message repeatedly, and does not return unhealthy nodes, even when asked.</blockquote></li>
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>Restore RMDelegationTokens after RM Restart</b><br>
<blockquote>This is missed in YARN-581. After RM restart, RMDelegationTokens need to be added both in DelegationTokenRenewer (addressed in YARN-581), and delegationTokenSecretManager</blockquote></li>
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>FS: maxAssign is not honored</b><br>
<blockquote>maxAssign limits the number of containers that can be assigned in a single heartbeat. Currently, FS doesn't keep track of number of assigned containers to check this.</blockquote></li>
Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Make YarnRemoteException not backed by PB and introduce a SerializedException</b><br>
<blockquote>LocalizationProtocol sends an exception over the wire. This currently uses YarnRemoteException. Post YARN-627, this needs to be changed and a new serialized exception is required.</blockquote></li>
Major sub-task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Fix YarnException unwrapping</b><br>
<blockquote>Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, RPCUtil post YARN-625) is broken, and often ends up throwin UndeclaredThrowableException. This needs to be fixed.</blockquote></li>
Minor sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>In unsercure mode, AM can fake resource requirements </b><br>
<blockquote>Without security, it is impossible to completely avoid AMs faking resources. We can at the least make it as difficult as possible by using the same container tokens and the RM-NM shared key mechanism over unauthenticated RM-NM channel.
In the minimum, this will avoid accidental bugs in AMs in unsecure mode.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>ContainerLaunchContext.containerTokens should simply be called tokens</b><br>
<blockquote>ContainerToken is the name of the specific token that AMs use to launch containers on NMs, so we should rename CLC.containerTokens to be simply tokens.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Omkar Vinit Joshi <br>
<b>Create NM proxy per NM instead of per container</b><br>
<blockquote>Currently a new NM proxy has to be created per container since the secure authentication is using a containertoken from the container.</blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Hook up cgroups CPU settings to the number of virtual cores allocated</b><br>
<blockquote>YARN-3 introduced CPU isolation and monitoring through cgroups. YARN-2 and introduced CPU scheduling in the capacity scheduler, and YARN-326 will introduce it in the fair scheduler. The number of virtual cores allocated to a container should be used to weight the number of cgroups CPU shares given to it.</blockquote></li>
Major bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>Refactoring submitApplication in ClientRMService and RMAppManager</b><br>
<blockquote>Currently, ClientRMService#submitApplication call RMAppManager#handle, and consequently call RMAppMangager#submitApplication directly, though the code looks like scheduling an APP_SUBMIT event.
In addition, the validation code before creating an RMApp instance is not well organized. Ideally, the dynamic validation, which depends on the RM's configuration, should be put in RMAppMangager#submitApplication. RMAppMangager#submitApplication is called by ClientRMService#submitApplication and RMAppMangager#recover. Since the configuration may be changed after RM restarts, the validation needs to be done again even in recovery mode. Therefore, resource request validation, which based on min/max resource limits, should be moved from ClientRMService#submitApplication to RMAppMangager#submitApplication. On the other hand, the static validation, which is independent of the RM's configuration should be put in ClientRMService#submitApplication, because it is only need to be done once during the first submission.
Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. RMAppMangager#submitApplication has a flaw is not synchronized. If two application submissions with the same application ID enter the function, and one progresses to the completion of RMApp instantiation, and the other progresses the completion of putting the RMApp instance into rmContext, the slower submission will cause an exception due to the duplicate application ID. However, the exception will cause the RMApp instance already in rmContext (belongs to the faster submission) being rejected with the current code flow.</blockquote></li>
Major bug reported by Ivan Mitic and fixed by Ivan Mitic <br>
<b>TestFSDownload fails on Windows because of dependencies on tar/gzip/jar tools</b><br>
<blockquote>{{testDownloadArchive}}, {{testDownloadPatternJar}} and {{testDownloadArchiveZip}} fail with the similar Shell ExitCodeException:
{code}
testDownloadArchiveZip(org.apache.hadoop.yarn.util.TestFSDownload) Time elapsed: 480 sec <<< ERROR!
org.apache.hadoop.util.Shell$ExitCodeException: bash: line 0: cd: /D:/svn/t/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/TestFSDownload: No such file or directory
gzip: 1: No such file or directory
at org.apache.hadoop.util.Shell.runCommand(Shell.java:377)
at org.apache.hadoop.util.Shell.run(Shell.java:292)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:497)
at org.apache.hadoop.yarn.util.TestFSDownload.createZipFile(TestFSDownload.java:225)
at org.apache.hadoop.yarn.util.TestFSDownload.testDownloadArchiveZip(TestFSDownload.java:503)
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Refactor fair scheduler to use common Resources</b><br>
<blockquote>resourcemanager.fair and resourcemanager.resources have two copies of basically the same code for operations on Resource objects</blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>container launch on Windows does not correctly populate classpath with new process's environment variables and localized resources</b><br>
<blockquote>On Windows, we must bundle the classpath of a launched container in an intermediate jar with a manifest. Currently, this logic incorrectly uses the nodemanager process's environment variables for substitution. Instead, it needs to use the new environment for the launched process. Also, the bundled classpath is missing some localized resources for directories, due to a quirk in the way {{File#toURI}} decides whether or not to append a trailing '/'.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>RM recovery related records do not belong to the API</b><br>
<blockquote>We need to move out AppliationStateData and ApplicationAttemptStateData into resourcemanager module. They are not part of the public API..</blockquote></li>
Major improvement reported by Vinod Kumar Vavilapalli and fixed by Mayank Bansal <br>
<b>Add an optional mesage to RegisterNodeManagerResponse as to why NM is being asked to resync or shutdown</b><br>
<blockquote>We should log such message in NM itself. Helps in debugging issues on NM directly instead of distributed debugging between RM and NM when such an action is received from RM.</blockquote></li>
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Application cache files should be localized under local-dir/usercache/userid/appcache/appid/filecache</b><br>
<blockquote>Currently application cache files are getting localized under local-dir/usercache/userid/appcache/appid/. however they should be localized under filecache sub directory.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Restore appToken and clientToken for app attempt after RM restart</b><br>
<blockquote>These need to be saved and restored on a per app attempt basis. This is required only when work preserving restart is implemented for secure clusters. In non-preserving restart app attempts are killed and so this does not matter.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
<b>Test and verify that app delegation tokens are added to tokenRenewer after RM restart</b><br>
<blockquote>The code already saves the delegation tokens in AppSubmissionContext. Upon restart the AppSubmissionContext is used to submit the application again and so restores the delegation tokens. This jira tracks testing and verifying this functionality in a secure setup.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Make ApplicationToken part of Container's token list to help RM-restart</b><br>
<blockquote>Container is already persisted for helping RM restart. Instead of explicitly setting ApplicationToken in AM's env, if we change it to be in Container, we can avoid env and can also help restart.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>NodeManager should use SecureIOUtils for serving and aggregating logs</b><br>
<blockquote>Log servlets for serving logs and the ShuffleService for serving intermediate outputs both should use SecureIOUtils for avoiding symlink attacks.</blockquote></li>
Major sub-task reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>ApplicationReport does not provide progress value of application</b><br>
<blockquote>An application sends its progress % to the RM via AllocateRequest. This should be able to be retrieved by a client via the ApplicationReport.</blockquote></li>
Major bug reported by Hitesh Shah and fixed by Kenji Kikushima <br>
<b>RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations</b><br>
<blockquote>If the minimum resource allocation configured for the RM scheduler is 1 GB, the RM should drop all NMs that register with a total capacity of less than 1 GB. </blockquote></li>
Major sub-task reported by Hitesh Shah and fixed by Omkar Vinit Joshi <br>
<b>User should not be part of ContainerLaunchContext</b><br>
<blockquote>Today, a user is expected to set the user name in the CLC when either submitting an application or launching a container from the AM. This does not make sense as the user can/has been identified by the RM as part of the RPC layer.
Solution would be to move the user information into either the Container object or directly into the ContainerToken which can then be used by the NM to launch the container. This user information would set into the container by the RM.
Preemption policies are by design pluggable, in the following we present an initial policy (ProportionalCapacityPreemptionPolicy) we have been experimenting with. The ProportionalCapacityPreemptionPolicy behaves as follows:
# it gathers from the scheduler the state of the queues, in particular, their current capacity, guaranteed capacity and pending requests (*)
# if there are pending requests from queues that are under capacity it computes a new ideal balanced state (**)
# it computes the set of preemptions needed to repair the current schedule and achieve capacity balance (accounting for natural completion rates, and
respecting bounds on the amount of preemption we allow for each round)
# it selects which applications to preempt from each over-capacity queue (the last one in the FIFO order)
# it remove reservations from the most recently assigned app until the amount of resource to reclaim is obtained, or until no more reservations exits
# (if not enough) it issues preemptions for containers from the same applications (reverse chronological order, last assigned container first) again until necessary or until no containers except the AM container are left,
# (if not enough) it moves onto unreserve and preempt from the next application.
# containers that have been asked to preempt are tracked across executions. If a containers is among the one to be preempted for more than a certain time, the container is moved in a the list of containers to be forcibly killed.
Notes:
(*) at the moment, in order to avoid double-counting of the requests, we only look at the "ANY" part of pending resource requests, which means we might not preempt on behalf of AMs that ask only for specific locations but not any.
(**) The ideal balance state is one in which each queue has at least its guaranteed capacity, and the spare capacity is distributed among queues (that wants some) as a weighted fair share. Where the weighting is based on the guaranteed capacity of a queue, and the function runs to a fix point.
Tunables of the ProportionalCapacityPreemptionPolicy:
# observe-only mode (i.e., log the actions it would take, but behave as read-only)
# how frequently to run the policy
# how long to wait between preemption and kill of a container
# which fraction of the containers I would like to obtain should I preempt (has to do with the natural rate at which containers are returned)
# deadzone size, i.e., what % of over-capacity should I ignore (if we are off perfect balance by some small % we ignore it)
# overall amount of preemption we can afford for each run of the policy (in terms of total cluster capacity)
In our current experiments this set of tunables seem to be a good start to shape the preemption action properly. More sophisticated preemption policies could take into account different type of applications running, job priorities, cost of preemption, integral of capacity imbalance. This is very much a control-theory kind of problem, and some of the lessons on designing and tuning controllers are likely to apply.
Generality:
The monitor-based scheduler edit, and the preemption mechanisms we introduced here are designed to be more general than enforcing capacity/fairness, in fact, we are considering other monitors that leverage the same idea of "schedule edits" to target different global properties (e.g., allocate enough resources to guarantee deadlines for important jobs, or data-locality optimizations, IO-balancing among nodes, etc...).
Note that by default the preemption policy we describe is disabled in the patch.
Depends on YARN-45 and YARN-567, is related to YARN-568
Major improvement reported by Carlo Curino and fixed by Carlo Curino (scheduler)<br>
<b>FairScheduler: support for work-preserving preemption </b><br>
<blockquote>In the attached patch, we modified the FairScheduler to substitute its preemption-by-killling with a work-preserving version of preemption (followed by killing if the AMs do not respond quickly enough). This should allows to run preemption checking more often, but kill less often (proper tuning to be investigated). Depends on YARN-567 and YARN-45, is related to YARN-569.
Major sub-task reported by Carlo Curino and fixed by Carlo Curino (resourcemanager)<br>
<b>RM changes to support preemption for FairScheduler and CapacityScheduler</b><br>
<blockquote>A common tradeoff in scheduling jobs is between keeping the cluster busy and enforcing capacity/fairness properties. FairScheduler and CapacityScheduler takes opposite stance on how to achieve this.
The FairScheduler, leverages task-killing to quickly reclaim resources from currently running jobs and redistributing them among new jobs, thus keeping the cluster busy but waste useful work. The CapacityScheduler is typically tuned
to limit the portion of the cluster used by each queue so that the likelihood of violating capacity is low, thus never wasting work, but risking to keep the cluster underutilized or have jobs waiting to obtain their rightful capacity.
By introducing the notion of a work-preserving preemption we can remove this tradeoff. This requires a protocol for preemption (YARN-45), and ApplicationMasters that can answer to preemption efficiently (e.g., by saving their intermediate state, this will be posted for MapReduce in a separate JIRA soon), together with a scheduler that can issues preemption requests (discussed in separate JIRAs YARN-568 and YARN-569).
The changes we track with this JIRA are common to FairScheduler and CapacityScheduler, and are mostly propagation of preemption decisions through the ApplicationMastersService.
Major sub-task reported by Thomas Weise and fixed by Mayank Bansal <br>
<b>Add application type to ApplicationReport </b><br>
<blockquote>This field is needed to distinguish different types of applications (app master implementations). For example, we may run applications of type XYZ in a cluster alongside MR and would like to filter applications by type.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>NM should reject containers allocated by previous RM</b><br>
<blockquote>Its possible that after RM shutdown, before AM goes down,AM still call startContainer on NM with containers allocated by previous RM. When RM comes back, NM doesn't know whether this container launch request comes from previous RM or the current RM. we should reject containers allocated by previous RM </blockquote></li>
Major sub-task reported by Hitesh Shah and fixed by Xuan Gong <br>
<b>Nodemanager should set some key information into the environment of every container that it launches.</b><br>
<blockquote>Information such as containerId, nodemanager hostname, nodemanager port is not set in the environment when any container is launched.
For an AM, the RM does all of this for it but for a container launched by an application, all of the above need to be set by the ApplicationMaster.
At the minimum, container id would be a useful piece of information. If the container wishes to talk to its local NM, the nodemanager related information would also come in handy. </blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications)<br>
<b>TestUnmanagedAMLauncher fails on Windows</b><br>
<blockquote>{{TestUnmanagedAMLauncher}} fails on Windows due to attempting to run a Unix-specific command in distributed shell and use of a Unix-specific environment variable to determine username for the {{ContainerLaunchContext}}.</blockquote></li>
A simplified way may be to have the GetNewApplicationResponse itself provide a helper method that builds a usable ApplicationSubmissionContext for us. Something like:
[The above method can also take an arg for the container launch spec, or perhaps pre-load defaults like min-resource, etc. in the returned object, aside of just associating the application ID automatically.]</blockquote></li>
Major sub-task reported by Zhijie Shen and fixed by Zhijie Shen <br>
<b>YarnClient.submitApplication should wait for application to be accepted by the RM</b><br>
<blockquote>Currently, when submitting an application, storeApplication will be called for recovery. However, it is a blocking API, and is likely to block concurrent application submissions. Therefore, it is good to make application submission asynchronous, and postpone storeApplication. YarnClient needs to change to wait for the whole operation to complete so that clients can be notified after the application is really submitted. YarnClient needs to wait for application to reach SUBMITTED state or beyond.</blockquote></li>
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>Race condition in Public / Private Localizer may result into resource getting downloaded again</b><br>
<blockquote>Public Localizer :
At present when multiple containers try to request a localized resource
* If the resource is not present then first it is created and Resource Localization starts ( LocalizedResource is in DOWNLOADING state)
* Now if in this state multiple ResourceRequestEvents arrive then ResourceLocalizationEvents are sent for all of them.
Most of the times it is not resulting into a duplicate resource download but there is a race condition present there. Inside ResourceLocalization (for public download) all the requests are added to local attempts map. If a new request comes in then first it is checked in this map before a new download starts for the same. For the current download the request will be there in the map. Now if a same resource request comes in then it will rejected (i.e. resource is getting downloaded already). However if the current download completes then the request will be removed from this local map. Now after this removal if the LocalizerRequestEvent comes in then as it is not present in local map the resource will be downloaded again.
PrivateLocalizer :
Here a different but similar race condition is present.
* Here inside findNextResource method call; each LocalizerRunner tries to grab a lock on LocalizerResource. If the lock is not acquired then it will keep trying until the resource state changes to LOCALIZED. This lock will be released by the LocalizerRunner when download completes.
* Now if another ContainerLocalizer tries to grab the lock on a resource before LocalizedResource state changes to LOCALIZED then resource will be downloaded again.
At both the places the root cause of this is that all the threads try to acquire the lock on resource however current state of the LocalizedResource is not taken into consideration.</blockquote></li>
Major bug reported by Vinod Kumar Vavilapalli and fixed by Zhijie Shen <br>
<b>Change the default global AM max-attempts value to be not one</b><br>
<blockquote>Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.
I propose we change it to atleast two. Can change it to 4 to match other retry-configs.</blockquote></li>
Blocker bug reported by Krishna Kishore Bonagiri and fixed by Bikas Saha (resourcemanager)<br>
<b>getAllocatedContainers() is not returning all the allocated containers</b><br>
<blockquote>I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another.
My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
<b>LocalizedResources are leaked in memory in case resource localization fails</b><br>
<blockquote>If resource localization fails then resource remains in memory and is
1) Either cleaned up when next time cache cleanup runs and there is space crunch. (If sufficient space in cache is available then it will remain in memory).
2) reused if LocalizationRequest comes again for the same resource.
I think when resource localization fails then that event should be sent to LocalResourceTracker which will then remove it from its cache.</blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>RM address DNS lookup can cause unnecessary slowness on every JHS page load </b><br>
<blockquote>When I run the job history server locally, every page load takes in the 10s of seconds. I profiled the process and discovered that all the extra time was spent inside YarnConfiguration#getRMWebAppURL, trying to resolve 0.0.0.0 to a hostname. When I changed my yarn.resourcemanager.address to localhost, the page load times decreased drastically.
There's no that we need to perform this resolution on every page load.
Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
<b>AM max attempts is not checked when RM restart and try to recover attempts</b><br>
<blockquote>Currently,AM max attempts is only checked if the current attempt fails and check to see whether to create new attempt. If the RM restarts before the max-attempt fails, it'll not clean the state store, when RM comes back, it will retry attempt again.</blockquote></li>
Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>RMAdminProtocolPBClientImpl should implement Closeable</b><br>
<blockquote>Required for RPC.stopProxy to work. Already done in most of the other protocols. (MAPREDUCE-5117 addressing the one other protocol missing this)</blockquote></li>
<blockquote>the config yarn.scheduler.capacity.node-locality-delay doesn't change when you change the value in capacity_scheduler.xml and then run yarn rmadmin -refreshQueues.</blockquote></li>
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
<b>Augment AM - RM client module to be able to request containers only at specific locations</b><br>
<blockquote>When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality</blockquote></li>
Major improvement reported by Dapeng Sun and fixed by Sandy Ryza (documentation)<br>
<b>Fair Scheduler's document link could be added to the hadoop 2.x main doc page</b><br>
<blockquote>Currently the doc page for Fair Scheduler looks good and it’s here, http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html.
It would be better to add the document link to the YARN section in the Hadoop 2.x main doc page, so that users can easily find the doc to experimentally try Fair Scheduler as Capacity Scheduler.
Blocker bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans <br>
<b>Node Manager not getting the master key</b><br>
<blockquote>On branch-2 the latest version I see the following on a secure cluster.
{noformat}
2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now
2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of <me
mory:12288, vCores:16>
2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started.
2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407)
{noformat}
The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Zhijie Shen (resourcemanager)<br>
<b>Delayed store operations should not result in RM unavailability for app submission</b><br>
<blockquote>Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients.</blockquote></li>
Minor bug reported by Jason Lowe and fixed by Maysam Yabandeh (nodemanager)<br>
<b>Log aggregation root directory check is more expensive than it needs to be</b><br>
<blockquote>The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call. That effectively stats the file twice. It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case.
In addition we may consider caching the presence of the directory rather than checking it each time a node aggregates logs for an application.</blockquote></li>
Minor bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler configs are refreshed inconsistently in reinitialize</b><br>
<blockquote>When FairScheduler#reinitialize is called, some of the scheduler-wide configs are refreshed and others aren't. They should all be refreshed.
Ones that are refreshed: userAsDefaultQueue, nodeLocalityThreshold, rackLocalityThreshold, preemptionEnabled
Ones that aren't: minimumAllocation, maximumAllocation, assignMultiple, maxAssign</blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>NodeManager job control logic flaws on Windows</b><br>
<blockquote>Both product and test code contain some platform-specific assumptions, such as availability of bash for executing a command in a container and signals to check existence of a process and terminate it.</blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications/distributed-shell)<br>
<b>TestDistributedShell fails on Windows</b><br>
<blockquote>There are a few platform-specific assumption in distributed shell (both main code and test code) that prevent it from working correctly on Windows.</blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (nodemanager)<br>
<b>TestDiskFailures fails on Windows due to path mishandling</b><br>
<blockquote>{{TestDiskFailures#testDirFailuresOnStartup}} fails due to insertion of an extra leading '/' on the path within {{LocalDirsHandlerService}} when running on Windows. The test assertions also fail to account for the fact that {{Path}} normalizes '\' to '/'.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Xuan Gong <br>
<b>Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land</b><br>
<blockquote>Currently, id, resource request etc need to be copied over from Container to ContainerLaunchContext. This can be brittle. Also it leads to duplication of information (such as Resource from CLC and Resource from Container and Container.tokens). Sending Container directly to startContainer solves these problems. It also makes CLC clean by only having stuff in it that it set by the client/AM.</blockquote></li>
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>TestProcfsProcessTree#testProcessTree() doesn't wait long enough for the process to die</b><br>
<blockquote>TestProcfsProcessTree#testProcessTree fails occasionally with the following stack trace
{noformat}
Stack Trace:
junit.framework.AssertionFailedError: expected:<false> but was:<true>
        at org.apache.hadoop.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java)
{noformat}
kill -9 is executed asynchronously, the signal is delivered when the process comes out of the kernel (sys call). Checking if the process died immediately after can fail at times.</blockquote></li>
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>FS: Extend SchedulingMode to intermediate queues</b><br>
<blockquote>FS allows setting {{SchedulingMode}} for leaf queues. Extending this to non-leaf queues allows using different kinds of fairness: e.g., root can have three child queues - fair-mem, drf-cpu-mem, drf-cpu-disk-mem taking different number of resources into account. In turn, this allows users to decide on the scheduling latency vs sophistication of the scheduling mode.</blockquote></li>
Major bug reported by Chris Riccomini and fixed by Chris Riccomini (client)<br>
<b>Add AM Host and RPC Port to ApplicationCLI Status Output</b><br>
<blockquote>Hey Guys,
I noticed that the ApplicationCLI is just randomly not printing some of the values in the ApplicationReport. I've added the getHost and getRpcPort. These are useful for me, since I want to make an RPC call to the AM (not the tracker call).
Major bug reported by Hitesh Shah and fixed by Jian He <br>
<b>NM retry behavior for connection to RM should be similar for lost heartbeats</b><br>
<blockquote>Regardless of connection loss at the start or at an intermediate point, NM's retry behavior to the RM should follow the same flow. </blockquote></li>
Minor bug reported by Jason Lowe and fixed by Sandy Ryza <br>
<b>ProcfsBasedProcessTree info message confuses users</b><br>
<blockquote>ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such as the following:
{noformat}
2013-03-13 12:41:51,957 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may have finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may have finished in the interim.
2013-03-13 12:41:51,958 INFO [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may have finished in the interim.
{noformat}
As described in MAPREDUCE-4570, this is something that naturally occurs in the process of monitoring processes via procfs. It's uninteresting at best and can confuse users who think it's a reason their job isn't running as expected when it appears in their logs.
We should either make this DEBUG or remove it entirely.</blockquote></li>
Major sub-task reported by Hitesh Shah and fixed by Hitesh Shah <br>
<b>Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment</b><br>
<blockquote>AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. </blockquote></li>
Major bug reported by Hitesh Shah and fixed by Zhijie Shen (capacityscheduler)<br>
<b>CapacityScheduler does not activate applications when maximum-am-resource-percent configuration is refreshed</b><br>
<blockquote>Submit 3 applications to a cluster where capacity scheduler limits allow only 1 running application. Modify capacity scheduler config to increase value of yarn.scheduler.capacity.maximum-am-resource-percent and invoke refresh queues.
The 2 applications not yet in running state do not get launched even though limits are increased.</blockquote></li>
Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (scheduler)<br>
<b>Make scheduling mode in FS pluggable</b><br>
<blockquote>Currently, scheduling mode in FS is limited to Fair and FIFO. The code typically has an if condition at multiple places to determine the correct course of action.
Making the scheduling mode pluggable helps in simplifying this process, particularly as we add new modes (DRF in this case).</blockquote></li>
Major sub-task reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>Jobs fail during resource localization when public distributed-cache hits unix directory limits</b><br>
<blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception.
java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory.</blockquote></li>
Blocker bug reported by Thomas Graves and fixed by Thomas Graves (capacityscheduler)<br>
<b>CS user left in list of active users for the queue even when application finished</b><br>
<blockquote>We have seen a user get left in the queues list of active users even though the application was removed. This can cause everyone else in the queue to get less resources if using the minimum user limit percent config.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager , resourcemanager)<br>
<b>YARN daemon addresses must be placed in many different configs</b><br>
<blockquote>The YARN resourcemanager's address is included in four different configs: yarn.resourcemanager.scheduler.address, yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address, and yarn.resourcemanager.admin.address
A new user trying to configure a cluster needs to know the names of all these four configs.
The same issue exists for nodemanagers.
It would be much easier if they could simply specify yarn.resourcemanager.hostname and yarn.nodemanager.hostname and default ports for the other ones would kick in.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Zhijie Shen <br>
<b>Define value for * in the scheduling protocol</b><br>
<blockquote>The ResourceRequest has a string field to specify node/rack locations. For the cross-rack/cluster-wide location (ie when there is no locality constraint) the "*" string is used everywhere. However, its not defined anywhere and each piece of code either defines a local constant or uses the string literal. Defining "*" in the protocol and removing other local references from the code base will be good.</blockquote></li>
Major bug reported by Kihwal Lee and fixed by Kihwal Lee (nodemanager)<br>
<b>Remove unnecessary hflush from log aggregation</b><br>
<blockquote>AggregatedLogFormat#writeVersion() calls hflush() after writing the version. Calling hflush does not seem to be necessary. It can add a lot of load to hdfs in a big busy cluster.</blockquote></li>
Major sub-task reported by Sandy Ryza and fixed by Sandy Ryza (api , applications/distributed-shell)<br>
<b>Move special container exit codes from YarnConfiguration to API</b><br>
<blockquote>YarnConfiguration currently contains the special container exit codes INVALID_CONTAINER_EXIT_STATUS = -1000, ABORTED_CONTAINER_EXIT_STATUS = -100, and DISKS_FAILED = -101.
These are not really not really related to configuration, and YarnConfiguration should not become a place to put miscellaneous constants.
Per discussion on YARN-417, appmaster writers need to be able to provide special handling for them, so it might make sense to move these to their own user-facing class.</blockquote></li>
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Failure to download a public resource on a node prevents further downloads of the resource from that node</b><br>
<blockquote>If the NM encounters an error while downloading a public resource, it fails to empty the list of request events corresponding to the resource request in {{attempts}}. If the same public resource is subsequently requested on that node, {{PublicLocalizer.addResource}} will skip the download since it will mistakenly believe a download of that resource is already in progress. At that point any container that requests the public resource will just hang in the {{LOCALIZING}} state.</blockquote></li>
Minor bug reported by Roger Hoover and fixed by Roger Hoover (scheduler)<br>
<b>FifoScheduler incorrectly checking for node locality</b><br>
<blockquote>In the FifoScheduler, the assignNodeLocalContainers method is checking if the data is local to a node by searching for the nodeAddress of the node in the set of outstanding requests for the app. This seems to be incorrect as it should be checking hostname instead. The offending line of code is 455:
Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
In the CapacityScheduler, it's done using hostname. See LeafQueue.assignNodeLocalContainers, line 1129
Note that this bug does not affect the actual scheduling decisions made by the FifoScheduler because even though it incorrect determines that a request is not local to the node, it will still schedule the request immediately because it's rack-local. However, this bug may be adversely affecting the reporting of job status by underreporting the number of tasks that were node local.</blockquote></li>
Major bug reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>New lines in diagnostics for a failed app on the per-application page make it hard to read</b><br>
<blockquote>We need to fix the following issues on YARN web-UI:
- Remove the "Note" column from the application list. When a failure happens, this "Note" spoils the table layout.
- When the Application is still not running, the Tracking UI should be title "UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) links to "#".
- The per-application page has all the RM related information like version, start-time etc. Must be some accidental change by one of the patches.
- The diagnostics for a failed app on the per-application page don't retain new lines and wrap'em around - looks hard to read.</blockquote></li>
Critical bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RM can return null application resource usage report leading to NPE in client</b><br>
<blockquote>RMAppImpl.createAndGetApplicationReport can return a report with a null resource usage report if full access to the app is allowed but the application has no current attempt. This leads to NPEs in client code that assumes an app report will always have at least an empty resource usage report.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Zhijie Shen <br>
<b>Rationalize AllocateResponse in RM scheduler API</b><br>
<blockquote>AllocateResponse contains an AMResponse and cluster node count. AMResponse that more data. Unless there is a good reason for this object structure, there should be either AMResponse or AllocateResponse.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Sandy Ryza (resourcemanager)<br>
<b>Make it possible to specify hard locality constraints in resource requests</b><br>
<blockquote>Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app.</blockquote></li>
Trivial improvement reported by Steve Loughran and fixed by Steve Loughran (nodemanager)<br>
<b>detabify LCEResourcesHandler classes</b><br>
<blockquote>the LCEResourcesHandler classes from YARN-3 have had some tab chars that have snuck into the source tree. fix this before that code starts getting branched off and it's too late</blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
<b>ApplicationCLI and NodeCLI use hard-coded platform-specific line separator, which causes test failures on Windows</b><br>
<blockquote>{{ApplicationCLI}}, {{NodeCLI}}, and the corresponding test {{TestYarnCLI}} all use a hard-coded '\n' as the line separator. This causes test failures on Windows.
Blocker sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>Fix inconsistent protocol naming</b><br>
<blockquote>We now have different and inconsistent naming schemes for various protocols. It was hard to explain to users, mainly in direct interactions at talks/presentations and user group meetings, with such naming.
We should fix these before we go beta. </blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (api)<br>
<b>ResourceRequestPBImpl's toString() is missing location and # containers</b><br>
<blockquote>ResourceRequestPBImpl's toString method includes priority and resource capability, but omits location and number of containers.</blockquote></li>
We did this because it is directly setting the values in the original resource object passed in when the AM gets allocated and without it the AM doesn't get the resource normalized correctly in the submission context. See YARN-370 for more details.
I think we should find a better way of doing this long term, one so we don't have to keep adding things there when new resources are added, two because its a bit confusing as to what its doing and prone to someone accidentally breaking it in the future again. Something closer to what Arun suggested in YARN-370 would be better but we need to make sure all the places work and get some more testing on it before putting it in. </blockquote></li>
Major sub-task reported by xieguiming and fixed by Zhijie Shen (client , resourcemanager)<br>
<b>ApplicationMaster retry times should be set by Client</b><br>
<blockquote>We should support that different client or user have different ApplicationMaster retry times. It also say that "yarn.resourcemanager.am.max-retries" should be set by client. </blockquote></li>
Blocker bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Apps that have completed can appear as RUNNING on the NM UI</b><br>
<blockquote>On a busy cluster we've noticed a growing number of applications appear as RUNNING on a nodemanager web pages but the applications have long since finished. Looking at the NM logs, it appears the RM never told the nodemanager that the application had finished. This is also reflected in a jstack of the NM process, since many more log aggregation threads are running then one would expect from the number of actively running applications.</blockquote></li>
Major sub-task reported by Hitesh Shah and fixed by Mayank Bansal (resourcemanager)<br>
<b>Handle ( or throw a proper error when receiving) status updates from application masters that have not registered</b><br>
<blockquote>Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped.
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases.</blockquote></li>
Minor bug reported by Jason Lowe and fixed by Ravi Prakash <br>
<b>Unexpected extra results when using webUI table search</b><br>
<blockquote>When using the search box on the web UI to search for a specific task number (e.g.: "0831"), sometimes unexpected extra results are shown. Using the web browser's built-in search-within-page does not show any hits, so these look like completely spurious results.
It looks like the raw timestamp value for time columns, which is not shown in the table, is also being searched with the search box.</blockquote></li>
Major improvement reported by Junping Du and fixed by Junping Du (client)<br>
<b>YARN CLI should show CPU info besides memory info in node status</b><br>
<blockquote>With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status <NodeID> should show CPU used and capacity info as memory info.</blockquote></li>
Critical bug reported by Devaraj K and fixed by Robert Parker (nodemanager)<br>
<b>Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager</b><br>
<blockquote>{code:xml}
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}
{code:xml}
2013-01-17 04:03:46,726 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}
{code:xml}
2013-01-17 00:01:11,006 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}
{code:xml}
2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1358385982671_1304_01_000001 transitioned from NEW to DONE
2013-01-17 10:56:36,975 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at FINISHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
2013-01-17 10:56:36,975 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null
{code}
{code:xml}
2013-01-17 10:56:36,026 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at FINISHED
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
2013-01-17 10:56:36,026 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1358385982671_1304 transitioned from FINISHED to null
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Schedulers cannot control the queue-name of an application</b><br>
<blockquote>Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to "default".
A scheduler may wish to make its own decision on which queue to place an app in if none is specified. For example, when the fair scheduler user-as-default-queue config option is set to true, and an app is submitted with no queue specified, the fair scheduler should assign the app to a queue with the user's name.</blockquote></li>
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Add multi-resource scheduling to the fair scheduler</b><br>
<blockquote>With YARN-2 in, the capacity scheduler has the ability to schedule based on multiple resources, using dominant resource fairness. The fair scheduler should be able to do multiple resource scheduling as well, also using dominant resource fairness.
More details to come on how the corner cases with fair scheduler configs such as min and max resources will be handled.</blockquote></li>
Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)<br>
<b>Submit a job to a queue that not allowed in fairScheduler, client will hold forever.</b><br>
<blockquote>RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Fair scheduler allows reservations that won't fit on node</b><br>
<blockquote>An application requests a container with 1024 MB. It then requests a container with 2048 MB. A node shows up with 1024 MB available. Even if the application is the only one running, neither request will be scheduled on it.</blockquote></li>
Major bug reported by Thomas Graves and fixed by Jason Lowe (resourcemanager)<br>
<b>Resource Manager not logging the health_check_script result when taking it out</b><br>
<blockquote>The Resource Manager not logging the health_check_script result when taking it out. This was added to jobtracker in 1.x with MAPREDUCE-2451, we should do the same thing for RM.</blockquote></li>
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (capacityscheduler)<br>
<b>Capacity Scheduler web page should show list of active users per queue like it used to (in 1.x)</b><br>
<blockquote>On the jobtracker, the web ui showed the active users for each queue and how much resources each of those users were using. That currently isn't being displayed on the RM capacity scheduler web ui.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RM should point tracking URL to RM web page when app fails to start</b><br>
<blockquote>Similar to YARN-165, the RM should redirect the tracking URL to the specific app page on the RM web UI when the application fails to start. For example, if the AM completely fails to start due to bad AM config or bad job config like invalid queuename, then the user gets the unhelpful "The requested application exited before setting a tracking URL".
Usually the diagnostic string on the RM app page has something useful, so we might as well point there.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Application expiration difficult to debug for end-users</b><br>
<blockquote>When an AM attempt expires the AMLivelinessMonitor in the RM will kill the job and mark it as failed. However there are no diagnostic messages set for the application indicating that the application failed because of expiration. Even if the AM logs are examined, it's often not obvious that the application was externally killed. The only evidence of what happened to the application is currently in the RM logs, and those are often not accessible by users.</blockquote></li>
Major bug reported by Bikas Saha and fixed by Zhijie Shen (capacityscheduler)<br>
<b>Capacity scheduler doesn't trigger app-activation after adding nodes</b><br>
<blockquote>Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever.
This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated.</blockquote></li>
Major sub-task reported by Robert Joseph Evans and fixed by Ravi Prakash <br>
<b>yarn log does not output all needed information, and is in a binary format</b><br>
<blockquote>yarn logs does not output attemptid, nodename, or container-id. Missing these makes it very difficult to look through the logs for failed containers and tie them back to actual tasks and task attempts.
Also the output currently includes several binary characters. This is OK for being machine readable, but difficult for being human readable, or even for using standard tool like grep.
The help message can also be more useful to users</blockquote></li>
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
... 3 more
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
at $Proxy23.registerNodeManager(Unknown Source)
at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
... 5 more
Caused by: java.net.ConnectException: Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
at org.apache.hadoop.ipc.Client.call(Client.java:1141)
at org.apache.hadoop.ipc.Client.call(Client.java:1100)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
at java.lang.Thread.run(Thread.java:619)
2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:9999
2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 24290
2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 24290
2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler is stopped.
Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Make Yarn Client service shutdown operations robust</b><br>
<blockquote>Make the yarn client services more robust against being shut down while not started, or shutdown more than once, by null-checking fields before closing them, setting to null afterwards to prevent double-invocation. This is a subset of MAPREDUCE-3502</blockquote></li>
Minor sub-task reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Make Yarn Node Manager services robust against shutdown</b><br>
<blockquote>Add the nodemanager bits of MAPREDUCE-3502 to shut down the Nodemanager services. This is done by checking for fields being non-null before shutting down/closing etc, and setting the fields to null afterwards -to be resilient against re-entrancy.
No tests other than manual review.</blockquote></li>
Major improvement reported by Steve Loughran and fixed by Steve Loughran <br>
<b>Enhance YARN service model</b><br>
<blockquote>Having played the YARN service model, there are some issues
that I've identified based on past work and initial use.
This JIRA issue is an overall one to cover the issues, with solutions pushed out to separate JIRAs.
h2. state model prevents stopped state being entered if you could not successfully start the service.
In the current lifecycle you cannot stop a service unless it was successfully started, but
* {{init()}} may acquire resources that need to be explicitly released
* if the {{start()}} operation fails partway through, the {{stop()}} operation may be needed to release resources.
*Fix:* make {{stop()}} a valid state transition from all states and require the implementations to be able to stop safely without requiring all fields to be non null.
Before anyone points out that the {{stop()}} operations assume that all fields are valid; and if called before a {{start()}} they will NPE; MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix for this. It is independent of the rest of the issues in this doc but it will aid making {{stop()}} execute from all states other than "stopped".
MAPREDUCE-3502 is too big a patch and needs to be broken down for easier review and take up; this can be done with issues linked to this one.
h2. AbstractService doesn't prevent duplicate state change requests.
The {{ensureState()}} checks to verify whether or not a state transition is allowed from the current state are performed in the base {{AbstractService}} class -yet subclasses tend to call this *after* their own {{init()}}, {{start()}} & {{stop()}} operations. This means that these operations can be performed out of order, and even if the outcome of the call is an exception, all actions performed by the subclasses will have taken place. MAPREDUCE-3877 demonstrates this.
This is a tricky one to address. In HADOOP-3128 I used a base class instead of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods {{final}}. These methods would do the checks, and then invoke protected inner methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to retrofit the same behaviour to everything that extends {{AbstractService}} -something that must be done before the class is considered stable (because once the lifecycle methods are declared final, all subclasses that are out of the source tree will need fixing by the respective developers.
h2. AbstractService state change doesn't defend against race conditions.
There's no concurrency locks on the state transitions. Whatever fix for wrong state calls is added should correct this to prevent re-entrancy, such as {{stop()}} being called from two threads.
h2. Static methods to choreograph of lifecycle operations
Helper methods to move things through lifecycles. init->start is common, stop-if-service!=null another. Some static methods can execute these, and even call {{stop()}} if {{init()}} raises an exception. These could go into a class {{ServiceOps}} in the same package. These can be used by those services that wrap other services, and help manage more robust shutdowns.
h2. state transition failures are something that registered service listeners may wish to be informed of.
When a state transition fails a {{RuntimeException}} can be thrown -and the service listeners are not informed as the notification point isn't reached. They may wish to know this, especially for management and diagnostics.
*Fix:* extend {{ServiceStateChangeListener}} with a callback such as {{stateChangeFailed(Service service,Service.State targeted-state, RuntimeException e)}} that is invoked from the (final) state change methods in the {{AbstractService}} class (once they delegate to their inner {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing implementations of the interface.
h2. Service listener failures not handled
Is this an error an error or not? Log and ignore may not be what is desired.
*Proposed:* during {{stop()}} any exception by a listener is caught and discarded, to increase the likelihood of a better shutdown, but do not add try-catch clauses to the other state changes.
h2. Support static listeners for all AbstractServices
Add support to {{AbstractService}} that allow callers to register listeners for all instances. The existing listener interface could be used. This allows management tools to hook into the events.
The static listeners would be invoked for all state changes except creation (base class shouldn't be handing out references to itself at this point).
These static events could all be async, pushed through a shared {{ConcurrentLinkedQueue}}; failures logged at warn and the rest of the listeners invoked.
h2. Add some example listeners for management/diagnostics
* event to commons log for humans.
* events for machines hooked up to the JSON logger.
* for testing: something that be told to fail.
h2. Services should support signal interruptibility
The services would benefit from a way of shutting them down on a kill signal; this can be done via a runtime hook. It should not be automatic though, as composite services will get into a very complex state during shutdown. Better to provide a hook that lets you register/unregister services to terminate, and have the relevant {{main()}} entry points tell their root services to register themselves.</blockquote></li>
Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>Race in localization can cause containers to fail</b><br>
<blockquote>On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Mayank Bansal (nodemanager)<br>
<b>.tmp file is not deleted for localized archives</b><br>
<blockquote>When archives are localized they are initially created as a .tmp file and unpacked from that file. However the .tmp file is not deleted afterwards.</blockquote></li>
Major sub-task reported by Devaraj K and fixed by Omkar Vinit Joshi (nodemanager)<br>
<b>Jobs fail during resource localization when private distributed-cache hits unix directory limits</b><br>
<blockquote>If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache. The jobs start failing with the below exception.
{code:xml}
java.io.IOException: mkdir of /tmp/nm-local-dir/usercache/root/filecache/1701886847734194975 failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909)
at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}
We should have a mechanism to clean the cache files if it crosses specified number of directories like cache size.</blockquote></li>
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
<b>AM should not be able to abuse container tokens for repetitive container launches</b><br>
<blockquote>Clone of YARN-51.
ApplicationMaster should not be able to store container tokens and use the same set of tokens for repetitive container launches. The possibility of such abuse is there in the current code, for a duration of 1d+10mins, we need to fix this.</blockquote></li>
Major sub-task reported by Chris Douglas and fixed by Carlo Curino (resourcemanager)<br>
<b>Scheduler feedback to AM to release containers</b><br>
<blockquote>The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers.
Major bug reported by Daniel Dai and fixed by Arun C Murthy <br>
<b> Hadoop does not close output file / does not call Mapper.cleanup if exception in map</b><br>
<blockquote>Ensure that mapreduce APIs are semantically consistent with mapred API w.r.t Mapper.cleanup and Reducer.cleanup; in the sense that cleanup is now called even if there is an error. The old mapred API already ensures that Mapper.close and Reducer.close are invoked during error handling. Note that it is an incompatible change, however end-users can override Mapper.run and Reducer.run to get the old (inconsistent) behaviour.</blockquote></li>
Major bug reported by Karam Singh and fixed by Amar Kamat (contrib/gridmix)<br>
<b>GridMix emulated job tasks.resource-usage emulator for CPU usage throws NPE when Trace contains cumulativeCpuUsage value of 0 at attempt level</b><br>
<blockquote>Fixes NPE in cpu emulation in Gridmix</blockquote></li>
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi <br>
<b>Gridmix throws NPE and does not simulate a job if the trace contains null taskStatus for a task</b><br>
<blockquote>Fixes NPE and makes Gridmix simulate succeeded-jobs-with-failed-tasks. All tasks of such simulated jobs(including the failed ones of original job) will succeed.</blockquote></li>
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)<br>
<b>[Gridmix] Gridmix should give better error message when input-data directory already exists and -generate option is given</b><br>
<blockquote>Makes Gridmix emit out correct error message when the input data directory already exists and -generate option is used. Makes Gridmix exit with proper exit codes when Gridmix fails in args-processing, startup/setup.</blockquote></li>
Major improvement reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)<br>
<b>[Gridmix] Improve STRESS mode</b><br>
<blockquote>JobMonitor can now deploy multiple threads for faster job-status polling. Use 'gridmix.job-monitor.thread-count' to set the number of threads. Stress mode now relies on the updates from the job monitor instead of polling for job status. Failures in job submission now get reported to the statistics module and ultimately reported to the user via summary.</blockquote></li>
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)<br>
<b>Gridmix simulated job's map's hdfsBytesRead counter is wrong when compressed input is used</b><br>
<blockquote>Makes Gridmix use the uncompressed input data size while simulating map tasks in the case where compressed input data was used in original job.</blockquote></li>
Minor improvement reported by Chris Nauroth and fixed by Chris Nauroth (namenode)<br>
<b>ClientProtocol#metaSave can be made idempotent by overwriting the output file instead of appending to it</b><br>
<blockquote>The dfsadmin -metasave command has been changed to overwrite the output file. Previously, this command would append to the output file if it already existed.</blockquote></li>
Blocker bug reported by Ralph Castain and fixed by Arpit Agarwal (namenode)<br>
<b>Protocol buffer support cannot compile under C</b><br>
<blockquote>The Protocol Buffers definition of the inter-namenode protocol required a change for compatibility with compiled C clients. This is a backwards-incompatible change. A namenode prior to this change will not be able to communicate with a namenode after this change.</blockquote></li>
Critical bug reported by Ravi Prakash and fixed by Ravi Prakash <br>
<b>Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave</b><br>
<blockquote>This change makes name node keep its internal replication queues and data node state updated in manual safe mode. This allows metrics and UI to present up-to-date information while in safe mode. The behavior during start-up safe mode is unchanged. </blockquote></li>
Major bug reported by Tian Hong Wang and fixed by Tian Hong Wang (datanode , test)<br>
<b>TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN: Double call countReplicas() to fetch corruptReplicas and liveReplicas is not needed</b><br>
Major sub-task reported by Brandon Li and fixed by Suresh Srinivas (namenode)<br>
<b>Provide a mapping from INodeId to INode</b><br>
<blockquote>This change adds support for referencing files and directories based on fileID/inodeID using a path /.reserved/.inodes/<inodeid>. With this change creating a file or directory /.reserved is not longer allowed. Before upgrading to a release with this change, files /.reserved needs to be renamed to another name.</blockquote></li>
Minor bug reported by Todd Lipcon and fixed by Andrew Wang (namenode)<br>
<b>Add a configurable limit on number of blocks per file, and min block size</b><br>
<blockquote>This change introduces a maximum number of blocks per file, by default one million, and a minimum block size, by default 1MB. These can optionally be changed via the configuration settings "dfs.namenode.fs-limits.max-blocks-per-file" and "dfs.namenode.fs-limits.min-block-size", respectively.</blockquote></li>
Major improvement reported by Eli Collins and fixed by Eli Collins <br>
<b>Increase the default block size</b><br>
<blockquote>The default blocks size prior to this change was 64MB. This jira changes the default block size to 128MB. To go back to previous behavior, please configure the in hdfs-site.xml, the configuration parameter "dfs.blocksize" to 67108864.</blockquote></li>
Minor new feature reported by Harsh J and fixed by Aaron T. Myers (datanode)<br>
<b>Add a new block-volume device choosing policy that looks at free space</b><br>
<blockquote>There is now a new option to have the DN take into account available disk space on each volume when choosing where to place a replica when performing an HDFS write. This can be enabled by setting the config "dfs.datanode.fsdataset.volume.choosing.policy" to the value "org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy".</blockquote></li>
Minor bug reported by Mark Miller and fixed by Mark Miller (security)<br>
<b>org.apache.hadoop.security.SecurityUtil calls toUpperCase(Locale.getDefault()) as well as toLowerCase(Locale.getDefault()) on hadoop.security.authentication value.</b><br>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (fs)<br>
<b>FileUtil#createJarWithClassPath only substitutes environment variables from current process environment/does not support overriding when launching new process</b><br>
Major sub-task reported by Junping Du and fixed by Junping Du <br>
<b>Implementation of 4-layer subclass of NetworkTopology (NetworkTopologyWithNodeGroup)</b><br>
<blockquote>This patch should be checked in together (or after) with JIRA Hadoop-8469: https://issues.apache.org/jira/browse/HADOOP-8469</blockquote></li>
Major bug reported by Hitesh Shah and fixed by Siddharth Seth (nodemanager)<br>
<b>Support a way to disable resource monitoring on the NodeManager</b><br>
<blockquote>Currently, the memory management monitor's check is disabled when the maxMem is set to -1. However, the maxMem is also sent to the RM when the NM registers with it ( to define the max limit of allocate-able resources ).
We need an explicit flag to disable monitoring to avoid the problems caused by the overloading of the max memory value.</blockquote></li>
Blocker bug reported by Siddharth Seth and fixed by <br>
<b>HBase test failures when running against Hadoop 2</b><br>
<blockquote>Post YARN-429, unit tests for HBase continue to fail since the classpath for the MRAppMaster is not being set correctly.
Reverting YARN-129 may fix this, but I'm not sure that's the correct solution. My guess is, as Alexandro pointed out in YARN-129, maven classloader magic is messing up java.class.path.</blockquote></li>
Major improvement reported by Thomas Graves and fixed by Thomas Graves (nodemanager)<br>
<b>allow OS scheduling priority of NM to be different than the containers it launches</b><br>
<blockquote>It would be nice if we could have the nodemanager run at a different OS scheduling priority than the containers so that you can still communicate with the nodemanager if the containers out of control.
On linux we could launch the nodemanager at a higher priority, but then all the containers it launches would also be at that higher priority, so we need a way for the container executor to launch them at a lower priority.
I'm not sure how this applies to windows if at all.</blockquote></li>
Blocker bug reported by Siddharth Seth and fixed by Siddharth Seth (resourcemanager)<br>
<b>capacity-scheduler config missing from yarn-test artifact</b><br>
<blockquote>MiniYARNCluster and MiniMRCluster are unusable by downstream projects with the 2.0.3-alpha release, since the capacity-scheduler configuration is missing from the test artifact.
hadoop-yarn-server-tests-3.0.0-SNAPSHOT-tests.jar should include the default capacity-scheduler configuration. Also, this doesn't need to be part of the default classpath - and should be moved out of the top level directory in the dist package.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jason Lowe <br>
<b>AggregatedLogDeletionService can take too long to delete logs</b><br>
<blockquote>AggregatedLogDeletionService uses the yarn.log-aggregation.retain-seconds property to determine which logs should be deleted, but it uses the same value to determine how often to check for old logs. This means logs could actually linger up to twice as long as configured.</blockquote></li>
Critical bug reported by Daryn Sharp and fixed by Daryn Sharp <br>
<b>Allow apps to concurrently register tokens for renewal</b><br>
<blockquote>{{DelegationTokenRenewer#addApplication}} has an unnecessary {{synchronized}} keyword. This serializes job submissions and can add unnecessary latency and/or hang all submissions if there are problems renewing the token.</blockquote></li>
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>App submission should not be synchronized</b><br>
<blockquote>MAPREDUCE-2953 fixed a race condition with querying of app status by making {{RMClientService#submitApplication}} synchronously invoke {{RMAppManager#submitApplication}}. However, the {{synchronized}} keyword was also added to {{RMAppManager#submitApplication}} with the comment:
bq. I made the submitApplication synchronized to keep it consistent with the other routines in RMAppManager although I do not believe it needs it since the rmapp datastructure is already a concurrentMap and I don't see anything else that would be an issue.
It's been observed that app submission latency is being unnecessarily impacted.</blockquote></li>
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM app submission jams under load</b><br>
<blockquote>The RM performs a loopback connection to itself to renew its own tokens. If app submissions consume all RPC handlers for {{ClientRMProtocol}}, then app submissions block because it cannot loopback to itself to do the renewal.</blockquote></li>
Blocker bug reported by Liang Xie and fixed by Liang Xie <br>
<b>WebAppProxyServer exits immediately after startup</b><br>
<blockquote>Please see HDFS-4426 for detail, i found the yarn WebAppProxyServer is broken by HADOOP-9181 as well, here's the hot fix, and i verified manually in our test cluster.
I'm really applogized for bring about such trouble...</blockquote></li>
Major bug reported by Thomas Graves and fixed by Xuan Gong (capacityscheduler)<br>
<b>Capacity Scheduler maximum-capacity value -1 is invalid</b><br>
<blockquote>I tried to start the resource manager using the capacity scheduler with a particular queues maximum-capacity set to -1 which is supposed to disable it according to the docs but I got the following exception:
java.lang.IllegalArgumentException: Illegal value of maximumCapacity -0.01 used in call to setMaxCapacity for queue foo
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.checkMaxCapacity(CSQueueUtils.java:31)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:220)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:191)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:310)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:325)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:232)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:202)
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler FIFO scheduling within a queue only allows 1 app at a time </b><br>
<blockquote>The fair scheduler allows apps to be scheduled in FIFO fashion within a queue. Currently, when this setting is turned on, the scheduler only allows one app to run at a time. While apps submitted earlier should get first priority for allocations, when there is space remaining, other apps should have a change to get at them.</blockquote></li>
<blockquote>=Seems to be timing related as the container status RUNNING as returned by the ContainerManager does not really indicate that the container task has been launched. Sleep of 5 seconds is not reliable.
testKillContainersOnShutdown(org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown) Time elapsed: 9283 sec <<< FAILURE!
junit.framework.AssertionFailedError: Did not find sigterm message
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.testKillContainersOnShutdown(TestNodeManagerShutdown.java:162)
Logs:
2013-01-09 14:13:08,401 INFO [AsyncDispatcher event handler] container.Container (ContainerImpl.java:handle(835)) - Container container_0_0000_01_000000 transitioned from NEW to LOCALIZING
2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.LocalizedResource (LocalizedResource.java:handle(194)) - Resource file:hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/tmpDir/scriptFile.sh transitioned from INIT to DOWNLOADING
2013-01-09 14:13:08,412 INFO [AsyncDispatcher event handler] localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(521)) - Created localizer for container_0_0000_01_000000
2013-01-09 14:13:08,589 INFO [LocalizerRunner for container_0_0000_01_000000] localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(895)) - Writing credentials to the nmPrivate file hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens. Credentials list:
2013-01-09 14:13:08,628 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createUserCacheDirs(373)) - Initializing user nobody
2013-01-09 14:13:08,781 INFO [LocalizerRunner for container_0_0000_01_000000] nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(99)) - Copying from hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/nmPrivate/container_0_0000_01_000000.tokens to hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown/nm0/usercache/nobody/appcache/application_0_0000/container_0_0000_01_000000.tokens
Blocker bug reported by Jason Lowe and fixed by Arun C Murthy (capacityscheduler)<br>
<b>RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing</b><br>
<blockquote>If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock.
Blocker bug reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM should always be able to renew its own tokens</b><br>
<blockquote>YARN-280 introduced fast-fail for job submissions with bad tokens. Unfortunately, other stack components like oozie and customers are acquiring RM tokens with a hardcoded dummy renewer value. These jobs would fail after 24 hours because the RM token couldn't be renewed, but fast-fail is failing them immediately. The RM should always be able to renew its own tokens submitted with a job. The renewer field may continue to specify an external user who can renew.</blockquote></li>
Major bug reported by shenhong and fixed by shenhong (resourcemanager , scheduler)<br>
<b>Submit a job to a queue that not allowed in fairScheduler, client will hold forever.</b><br>
<blockquote>RM use fairScheduler, when client submit a job to a queue, but the queue do not allow the user to submit job it, in this case, client will hold forever.
Major bug reported by shenhong and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>After YARN-271, fair scheduler can infinite loop and not schedule any application.</b><br>
<blockquote>After yarn-271, when yarn.scheduler.fair.max.assign<=0, when a node was been reserved, fairScheduler will infinite loop and not schedule any application.</blockquote></li>
Critical bug reported by Devaraj K and fixed by Robert Joseph Evans (nodemanager)<br>
<b>Node Manager leaks LocalizerRunner object for every Container </b><br>
<blockquote>Node Manager creates a new LocalizerRunner object for every container and puts in ResourceLocalizationService.LocalizerTracker.privLocalizers map but it never removes from the map.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler queue doesn't accept any jobs when ACLs are configured.</b><br>
<blockquote>If a queue is configured with an ACL for who can submit jobs, no jobs are allowed, even if a user on the list tries.
This is caused by using the scheduler thinking the user is "yarn", because it calls UserGroupInformation.getCurrentUser() instead of UserGroupInformation.createRemoteUser() with the given user name.</blockquote></li>
Major new feature reported by Tom White and fixed by Tom White (applications)<br>
<b>Add a YARN ApplicationClassLoader</b><br>
<blockquote>Add a classloader that provides webapp-style class isolation for use by applications. This is the YARN part of MAPREDUCE-1700 (which was already developed in that JIRA).</blockquote></li>
Major improvement reported by Derek Dagit and fixed by Derek Dagit <br>
<b>RM should be able to provide a tracking link for apps that have already been purged</b><br>
<blockquote>As applications complete, the RM tracks their IDs in a completed list. This list is routinely truncated to limit the total number of application remembered by the RM.
When a user clicks the History for a job, either the browser is redirected to the application's tracking link obtained from the stored application instance. But when the application has been purged from the RM, an error is displayed.
In very busy clusters the rate at which applications complete can cause applications to be purged from the RM's internal list within hours, which breaks the proxy URLs users have saved for their jobs.
We would like the RM to provide valid tracking links persist so that users are not frustrated by broken links.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler fails to get queue info without root prefix</b><br>
<blockquote>If queue1 exists, and a client calls "mapred queue -info queue1", an exception is thrown. If they use root.queue1, it works correctly.</blockquote></li>
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (resourcemanager)<br>
<b>RM does not reject app submission with invalid tokens</b><br>
<blockquote>The RM will launch an app with invalid tokens. The tasks will languish with failed connection retries, followed by task reattempts, followed by app reattempts.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler maxRunningApps config causes no apps to make progress</b><br>
<blockquote>This occurs because the scheduler erroneously chooses apps to offer resources to that are not runnable, then later decides they are not runnable, and doesn't try to give the resources to anyone else.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
<b>Fair scheduler log messages try to print objects without overridden toString methods</b><br>
<blockquote>A lot of junk gets printed out like this:
2012-12-11 17:31:52,998 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: Application application_1355270529654_0003 reserved container org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl@324f0f97 on node host: c1416.hal.cloudera.com:46356 #containers=7 available=0 used=8192, currently has 4 at priority org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@33; currentReservation 4096</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler hits IllegalStateException trying to reserve different apps on same node</b><br>
<blockquote>After the fair scheduler reserves a container on a node, it doesn't check for reservations it just made when trying to make more reservations during the same heartbeat.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fix fair scheduler web UI</b><br>
<blockquote>The fair scheduler web UI was broken by MAPREDUCE-4720. The queues area is not shown, and changes are required to still show the fair share inside the applications table.</blockquote></li>
Critical bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)<br>
<b>RM and JHS Web UIs are blank because AppsBlock is not escaping string properly</b><br>
<blockquote>e.g. Job names with a line feed "\n" are causing a line feed in the JSON array being written out (since we are only using StringEscapeUtils.escapeHtml() ) and the Javascript parser complains that string quotes are unclosed. This </blockquote></li>
Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
<b>y.s.rm.DelegationTokenRenewer attempts to renew token even after removing an app</b><br>
<blockquote>yarn.s.rm.security.DelegationTokenRenewer uses TimerTask/Timer. When such a timer task is canceled, already scheduled tasks run to completion. The task should check for such cancellation before running. Also, delegationTokens needs to be synchronized on all accesses.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash (resourcemanager)<br>
<b>RM web page UI shows Invalid Date for start and finish times</b><br>
<blockquote>Whenever the number of jobs was greater than a 100, two javascript arrays were being populated. appsData and appsTableData. appsData was winning out (because it was coming out later) and so renderHadoopDate was trying to render a <br title=""...> string.
Critical bug reported by Tom White and fixed by Tom White (nodemanager)<br>
<b>Container launch may fail if no files were localized</b><br>
<blockquote>This can be demonstrated with DistributedShell. The containers running the shell do not have any files to localize (if there is no shell script to copy) so if they run on a different NM to the AM (which does localize files), then they will fail since the appcache directory does not exist.</blockquote></li>
Major bug reported by Tom White and fixed by Tom White (resourcemanager)<br>
<b>Proxy URI generation fails for blank tracking URIs</b><br>
<blockquote>If the URI is an empty string (the default if not set), then a warning is displayed. A null URI displays no such warning. These two cases should be handled in the same way.</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)<br>
<b>Make changes for RM restart phase 1</b><br>
<blockquote>As described in YARN-128, phase 1 of RM restart puts in place mechanisms to save application state and read them back after restart. Upon restart, the NM's are asked to reboot and the previously running AM's are restarted.
After this is done, RM HA and work preserving restart can continue in parallel. For more details please refer to the design document in YARN-128</blockquote></li>
Major sub-task reported by Bikas Saha and fixed by Bikas Saha (resourcemanager)<br>
<b>Remove old code for restart</b><br>
<blockquote>Much of the code is dead/commented out and is not executed. Removing it will help with making and understanding new changes.</blockquote></li>
Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Fair scheduler logs too many nodeUpdate INFO messages</b><br>
<blockquote>The RM logs are filled with an INFO message the fair scheduler logs every time it receives a nodeUpdate. It should be taken out or demoted to debug.</blockquote></li>
Critical bug reported by Radim Kolar and fixed by Radim Kolar <br>
<b>Change processTree interface to work better with native code</b><br>
<blockquote>Problem is that on every update of processTree new object is required. This is undesired when working with processTree implementation in native code.
replace ProcessTree.getProcessTree() with updateProcessTree(). No new object allocation is needed and it simplify application code a bit.</blockquote></li>
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
<b>Fair scheduler should create queue for each user by default</b><br>
<blockquote>In MR1 the fair scheduler's default behavior was to create a pool for each user. The YARN fair scheduler has this capability, but it should be turned on by default, for consistency.</blockquote></li>
Critical sub-task reported by Robert Joseph Evans and fixed by Robert Joseph Evans (nodemanager)<br>
<b>NM should aggregate logs when application finishes.</b><br>
<blockquote>The NM should only aggregate logs when the application finishes. This will reduce the load on the NN, especially with respect to lease renewal.</blockquote></li>
Blocker bug reported by Devaraj K and fixed by Devaraj K (resourcemanager)<br>
<b>yarn rmadmin commands fail in secure cluster</b><br>
<blockquote>All the rmadmin commands fail in secure mode with the "protocol org.apache.hadoop.yarn.server.nodemanager.api.RMAdminProtocolPB is unauthorized" message in RM logs.</blockquote></li>
Major improvement reported by Todd Lipcon and fixed by Robert Joseph Evans <br>
<b>Remove jquery theming support</b><br>
<blockquote>As of today we have 9.4MB of JQuery themes in our code tree. In addition to being a waste of space, it's a highly questionable feature. I've never heard anyone complain that the Hadoop interface isn't themeable enough, and there's far more value in consistency across installations than there is in themeability. Let's rip it out.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jonathan Eagles (resourcemanager)<br>
<b>RMContainerImpl does not handle event EXPIRE at state RUNNING</b><br>
<blockquote>RMContainerImpl has a race condition where a container can enter the RUNNING state just as the container expires. This results in an invalid event transition error:
{noformat}
2012-11-11 05:31:38,954 [ResourceManager Event Processor] ERROR org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: EXPIRE at RUNNING
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:205)
at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:44)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerCompleted(SchedulerApp.java:203)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1337)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:739)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:659)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:340)
at java.lang.Thread.run(Thread.java:619)
{noformat}
EXPIRE needs to be handled (well at least ignored) in the RUNNING state to account for this race condition.</blockquote></li>
Blocker bug reported by Nathan Roberts and fixed by Nathan Roberts (nodemanager)<br>
<b>NM state machine ignores an APPLICATION_CONTAINER_FINISHED event when it shouldn't</b><br>
<blockquote>The NM state machines can make the following two invalid state transitions when a speculative attempt is killed shortly after it gets started. When this happens the NM keeps the log aggregation context open for this application and therefore chews up FDs and leases on the NN, eventually running the NN out of FDs and bringing down the entire cluster.
2012-11-07 05:36:33,774 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APPLICATION_CONTAINER_FINISHED at INITING
2012-11-07 05:36:33,775 [AsyncDispatcher event handler] WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [INIT_CONTAINER]
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: INIT_CONTAINER at DONE
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee <br>
<b>Log Aggregation generates a storm of fsync() for namenode</b><br>
<blockquote>When the log aggregation is on, write to each aggregated container log causes hflush() to be called. For large clusters, this can creates a lot of fsync() calls for namenode.
We have seen 6-7x increase in the average number of fsync operations compared to 1.0.x on a large busy cluster. Over 99% of fsync ops were for log aggregation writing to tmp files.</blockquote></li>
Critical bug reported by Jason Lowe and fixed by Jason Lowe (capacityscheduler)<br>
<b>CapacityScheduler can take a very long time to schedule containers if requests are off cluster</b><br>
<blockquote>When a user runs a job where one of the input files is a large file on another cluster, the job can create many splits on nodes which are unreachable for computation from the current cluster. The off-switch delay logic in LeafQueue can cause the ResourceManager to allocate containers for the job very slowly. In one case the job was only getting one container every 23 seconds, and the queue had plenty of spare capacity.</blockquote></li>
- parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.getFinalApplicationStatus(RMAppAttemptImpl.java:2
95)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus(RMAppImpl.java:222)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getApps(RMWebServices.java:328)
at sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaM
...
...
..
"AsyncDispatcher event handler":
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.unregisterAttempt(ApplicationMasterService.java:307)
- waiting to lock <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(RMAppAttemptImpl.java:647)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:809)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(RMAppAttemptImpl.java:796)
at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
- locked <0x00002aabbb673090> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:478)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:81)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:436)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:417)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
"IPC Server handler 36 on 8030":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aabbc87b960> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.pullJustFinishedContainers(RMAppAttemptImpl.java:437)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:285)
- locked <0x00002aab3d4cd698> (a org.apache.hadoop.yarn.api.records.impl.pb.AMResponsePBImpl)
at org.apache.hadoop.yarn.api.impl.pb.service.AMRMProtocolPBServiceImpl.allocate(AMRMProtocolPBServiceImpl.java:56)
at org.apache.hadoop.yarn.proto.AMRMProtocol$AMRMProtocolService$2.callBlockingMethod(AMRMProtocol.java:87)
at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:353)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1524)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1522)
Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
<b>Remove unnecessary locking in fair scheduler, and address findbugs excludes.</b><br>
<blockquote>In YARN-12, locks were added to all fields of QueueManager to address findbugs. In addition, findbugs exclusions were added in response to MAPREDUCE-4439, without a deep look at the code.</blockquote></li>
<blockquote>Eclipse doesn't seem to handle "testResources" which resolve to an absolute path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the hadoop-yarn project.</blockquote></li>
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)<br>
<b>Capacity scheduler - containers that get reserved create container token to early</b><br>
<blockquote>The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default.
This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired.</blockquote></li>
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (capacityscheduler)<br>
<b>Bunch of test failures on trunk</b><br>
<blockquote>{{CapacityScheduler.setConf()}} mandates a YarnConfiguration. It doesn't need to, throughout all of YARN, components only depend on Configuration and depend on the callers to provide correct configuration.
This is causing multiple tests to fail.</blockquote></li>
Critical bug reported by Thomas Graves and fixed by Arun C Murthy (capacityscheduler)<br>
<b>CapacityScheduler - adding a queue while the RM is running has wacky results</b><br>
<blockquote>Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount.
Looking at the RM logs, used memory can go negative but other logs show the number positive:
Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
<b>NodeManager stop() gets called twice on shutdown</b><br>
<blockquote>The stop method in the NodeManager gets called twice when the NodeManager is shut down via the shutdown hook.
The first is the stop that gets called directly by the shutdown hook. The second occurs when the NodeStatusUpdaterImpl is stopped. The NodeManager responds to the NodeStatusUpdaterImpl stop stateChanged event by stopping itself. This is so that NodeStatusUpdaterImpl can notify the NodeManager to stop, by stopping itself in response to a request from the ResourceManager
This could be avoided if the NodeStatusUpdaterImpl were to stop the NodeManager by calling its stop method directly.</blockquote></li>
Minor improvement reported by Anthony Rojas and fixed by Anthony Rojas (nodemanager)<br>
<b>Update log4j.appender.EventCounter to use org.apache.hadoop.log.metrics.EventCounter</b><br>
<blockquote>We should update the log4j.appender.EventCounter in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/resources/container-log4j.properties to use *org.apache.hadoop.log.metrics.EventCounter* rather than *org.apache.hadoop.metrics.jvm.EventCounter* to avoid triggering the following warning:
{code}WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files{code}</blockquote></li>
<blockquote>1.x supports queue capacity < 1, but in 0.23 the capacity scheduler doesn't. This is an issue for us since we have a large cluster running 1.x that currently has a queue with capacity 0.5%.</blockquote></li>
Blocker improvement reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RM should point tracking URL to RM web page for app when AM fails</b><br>
<blockquote>Currently when an ApplicationMaster fails the ResourceManager is updating the tracking URL to an empty string, see RMAppAttemptImpl.ContainerFinishedTransition. Unfortunately when the client attempts to follow the proxy URL it results in a web page showing an HTTP 500 error and an ugly backtrace because "http://" isn't a very helpful tracking URL.
It would be much more helpful if the proxy URL redirected to the RM webapp page for the specific application. That page shows the various AM attempts and pointers to their logs which will be useful for debugging the problems that caused the AM attempts to fail.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Retrieving container log via NM webapp can hang with multibyte characters in log</b><br>
<blockquote>ContainerLogsBlock.printLogs currently assumes that skipping N bytes in the log file is the same as skipping N characters, but that is not true when the log contains multibyte characters. This can cause the loop that skips a portion of the log to try to skip past the end of the file and loop forever (or until Jetty kills the worker thread).</blockquote></li>
Major bug reported by Chris Nauroth and fixed by Chris Nauroth (api)<br>
<b>Yarn Common has multiple compiler warnings for unchecked operations</b><br>
<blockquote>The warnings are in classes StateMachineFactory, RecordFactoryProvider, RpcFactoryProvider, and YarnRemoteExceptionFactoryProvider. OpenJDK 1.6.0_24 actually treats these as compilation errors, causing the build to fail.</blockquote></li>
Major bug reported by Thomas Graves and fixed by Thomas Graves (resourcemanager)<br>
<b>RM web ui applications page should be sorted to display last app first </b><br>
<blockquote>RM web ui applications page should be sorted to display last app first.
It currently sorts with smallest application id first, which is the first apps that were submitted. After you have one page worth of apps its much more useful for it to sort such that the biggest appid (last submitted app) shows up first.</blockquote></li>
Major bug reported by Robert Joseph Evans and fixed by Ravi Prakash <br>
<b>Browser thinks RM main page JS is taking too long</b><br>
<blockquote>The main RM page with the default settings of 10,000 applications can cause browsers to think that the JS on the page is stuck and ask you if you want to kill it. This is a big usability problem.</blockquote></li>
Major bug reported by Bikas Saha and fixed by Bikas Saha <br>
<b>AppRejectedTransition does not unregister app from master service and scheduler</b><br>
<blockquote>AttemptStartedTransition() adds the app to the ApplicationMasterService and scheduler. when the scheduler rejects the app then AppRejectedTransition() forgets to unregister it from the ApplicationMasterService.</blockquote></li>
Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager)<br>
<b>Add a Web UI to the fair share scheduler</b><br>
<blockquote>The fair scheduler had a UI in MR1. Port the capacity scheduler web UI and modify appropriately for the fair share scheduler.</blockquote></li>
java.lang.IllegalArgumentException: Illegal capacity of -1 for queue root
{code}
Which basically arises from missing basic configurations, which in many cases, there is no need to explicitly provide, and a default configuration will be sufficient. For example, to address the error above, the user need to add a capacity of 100 to the root queue.
So, we need to add a capacity-scheduler-default.xml, this will be helpful to provide the basic set of default configurations required to run the capacity scheduler. The user can still override existing configurations or provide new ones in capacity-scheduler.xml. This is similar to *-default.xml vs *-site.xml for yarn, core, mapred, hdfs, etc.
Major bug reported by Nathan Roberts and fixed by Vinod Kumar Vavilapalli (api)<br>
<b>Interrupted Exception within AsyncDispatcher leads to user confusion</b><br>
<blockquote>Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up.
at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped.
2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye</blockquote></li>
Major bug reported by Thomas Graves and fixed by Ravi Prakash (resourcemanager)<br>
<b>update web services docs for RM clusterMetrics</b><br>
<blockquote>Looks like jira https://issues.apache.org/jira/browse/MAPREDUCE-3747 added in more RM cluster metrics but the docs didn't get updated: http://hadoop.apache.org/docs/r0.23.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API
Major improvement reported by Tom White and fixed by Tom White (client)<br>
<b>Simplify classpath construction for mini YARN tests</b><br>
<blockquote>The test classpath includes a special file called 'mrapp-generated-classpath' (or similar in distributed shell) that is constructed at build time, and whose contents are a classpath with all the dependencies needed to run the tests. When the classpath for a container (e.g. the AM) is constructed the contents of mrapp-generated-classpath is read and added to the classpath, and the file itself is then added to the classpath so that later when the AM constructs a classpath for a task container it can propagate the test classpath correctly.
This mechanism can be drastically simplified by propagating the system classpath of the current JVM (read from the java.class.path property) to a launched JVM, but only if running in the context of the mini YARN cluster. Any tests that use the mini YARN cluster will automatically work with this change. Although any that explicitly deal with mrapp-generated-classpath can be simplified.
Major bug reported by xieguiming and fixed by xieguiming (resourcemanager)<br>
<b>RM is missing ability to add include/exclude files without a restart</b><br>
<blockquote>The "yarn.resourcemanager.nodes.include-path" default value is "", if we need to add an include file, we must currently restart the RM.
I suggest that for adding an include or exclude file, there should be no need to restart the RM. We may only execute the refresh command. The HDFS NameNode already has this ability.
Fix is to the modify HostsFileReader class instances:
Major improvement reported by Bikas Saha and fixed by Bikas Saha <br>
<b>Add a yarn AM - RM client module</b><br>
<blockquote>Add a basic client wrapper library to the AM RM protocol in order to prevent proliferation of code being duplicated everywhere. Provide helper functions to perform reverse mapping of container requests to RM allocation resource request table format.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>Diagnostics missing from applications that have finished but failed</b><br>
<blockquote>If an application finishes in the YARN sense but fails in the app framework sense (e.g.: a failed MapReduce job) then diagnostics are missing from the RM web page for the application. The RM should be reporting diagnostic messages even for successful YARN applications.</blockquote></li>
Minor bug reported by Andy Isaacson and fixed by Hemanth Yamijala (nodemanager)<br>
<b>YARN local-dirs defaults to /tmp/nm-local-dir</b><br>
<blockquote>{{yarn.nodemanager.local-dirs}} defaults to {{/tmp/nm-local-dir}}. It should be {hadoop.tmp.dir}/nm-local-dir or similar. Among other problems, this can prevent multiple test clusters from starting on the same machine.
Thanks to Hemanth Yamijala for reporting this issue.</blockquote></li>
Major bug reported by Hitesh Shah and fixed by Sandy Ryza (nodemanager)<br>
<b>NM should handle cleaning up containers when it shuts down</b><br>
<blockquote>Ideally, the NM should wait for a limited amount of time when it gets a shutdown signal for existing containers to complete and kill the containers ( if we pick an aggressive approach ) after this time interval.
For NMs which come up after an unclean shutdown, the NM should look through its directories for existing container.pids and try and kill an existing containers matching the pids found. </blockquote></li>
Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>Implement renewal / cancellation of Delegation Tokens</b><br>
<blockquote>Currently, delegation tokens issues by the RM and History server cannot be renewed or cancelled. This needs to be implemented.</blockquote></li>
testDecommissionWithIncludeHosts(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) Time elapsed: 0.086 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<0> but was:<1> at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:283)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:195)
at junit.framework.Assert.assertEquals(Assert.java:201)
at org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testDecommissionWithIncludeHosts(TestResourceTrackerService.java:90)</blockquote></li>
Major bug reported by Thomas Graves and fixed by Thomas Graves <br>
<b>TestCompositeService fails on jdk7</b><br>
<blockquote>test TestCompositeService fails when run with jdk7.
It appears it expects test testCallSequence to be called first and the sequence numbers to start at 0. On jdk7 its not being called first and sequence number has already been incremented.</blockquote></li>
<blockquote>In FS, FSQueueSchedulable#updateDemand() limits the demand to maxTasks only after iterating though all the pools and computing the final demand.
By checking if the demand has reached maxTasks in every iteration, we can avoid redundant work, at the expense of one condition check every iteration.</blockquote></li>
Major new feature reported by Arun C Murthy and fixed by Arun C Murthy (capacityscheduler , scheduler)<br>
<b>Enhance CS to schedule accounting for both memory and cpu cores</b><br>
<blockquote>With YARN being a general purpose system, it would be useful for several applications (MPI et al) to specify not just memory but also CPU (cores) for their resource requirements. Thus, it would be useful to the CapacityScheduler to account for both.</blockquote></li>
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (applicationmaster , security)<br>
<b>Use token request messages defined in hadoop common </b><br>
<blockquote>Protobuf message GetDelegationTokenRequestProto field renewer is made requried from optional. This change is not wire compatible with the older releases.
Major bug reported by Joshua Blatt and fixed by (balancer)<br>
<b>hdfs balancer command returns exit code 1 on success instead of 0</b><br>
<blockquote>This is an incompatible change from release 2.0.2-alpha and prior releases. Balancer tool exited with exit code 1 on success. It is changed to exit with exit code 0 on success. Non 0 exit code indicates failure.</blockquote></li>
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (hdfs-client)<br>
<b>DFSClient can infer checksum type when not provided by reading first byte</b><br>
<blockquote>The HDFS implementation of getFileChecksum() can now operate correctly against earlier-version datanodes which do not include the checksum type information in their checksum response. The checksum type is automatically inferred by issuing a read of the first byte of each block.</blockquote></li>
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>GetBlockKeysResponseProto does not handle null response</b><br>
<blockquote>Protobuf message GetBlockKeysResponseProto member keys is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas (namenode)<br>
<b>GetDataEncryptionKeyResponseProto does not handle null response</b><br>
<blockquote>Member dataEncryptionKey of the protobuf message GetDataEncryptionKeyResponseProto is made optional instead of required. This is incompatible change is not likely to affect the existing users (that are using HDFS FileSystem and other public APIs). </blockquote></li>
Blocker bug reported by Suresh Srinivas and fixed by Suresh Srinivas <br>
<b>GetLinkTargetResponseProto does not handle null path</b><br>
<blockquote>Protobuf message GetLinkTargetResponseProto member targetPath is made optional from required so that null values can be passed over the wire. This is an incompatible wire protocol change and does not affect the API backward compatibility.
Major bug reported by Andrew Wang and fixed by Andrew Wang <br>
<b>Make enabling of stale marking on read and write paths independent</b><br>
<blockquote>This patch makes an incompatible configuration change, as described below:
In releases 1.1.0 and other point releases 1.1.x, the configuration parameter "dfs.namenode.check.stale.datanode" could be used to turn on checking for the stale nodes. This configuration is no longer supported in release 1.2.0 onwards and is renamed as "dfs.namenode.avoid.read.stale.datanode".
How feature works and configuring this feature:
As described in HDFS-3703 release notes, datanode stale period can be configured using parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode can be configured to use this staleness information for reads using configuration "dfs.namenode.avoid.read.stale.datanode". When this parameter is set to true, namenode picks a stale datanode as the last target to read from when returning block locations for reads. Using staleness information for writes is as described in the releases notes of HDFS-3912.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (datanode , hdfs-client , namenode)<br>
<b>Cleanup HDFS logs and reduce the size of logged messages</b><br>
<blockquote>The change from this jira changes the content of some of the log messages. No log message are removed. Only the content of the log messages is changed to reduce the size. If you have a tool that depends on the exact content of the log, please look at the patch and make appropriate updates to the tool.</blockquote></li>
Minor sub-task reported by Jing Zhao and fixed by Jing Zhao (datanode , namenode)<br>
<b>Add number of stale DataNodes to metrics</b><br>
<blockquote>This jira adds a new metric with name "StaleDataNodes" under metrics context "dfs" of type Gauge. This tracks the number of DataNodes marked as stale. A DataNode is marked stale when the heartbeat message from the DataNode is not received within the configured time ""dfs.namenode.stale.datanode.interval".
Please see hdfs-default.xml documentation corresponding to ""dfs.namenode.stale.datanode.interval" for more details on how to configure this feature. When this feature is not configured, this metrics would return zero.
Major improvement reported by nkeywal and fixed by Jing Zhao (datanode , namenode)<br>
<b>Decrease the datanode failure detection time</b><br>
<blockquote>This jira adds a new DataNode state called "stale" at the NameNode. DataNodes are marked as stale if it does not send heartbeat message to NameNode within the timeout configured using the configuration parameter "dfs.namenode.stale.datanode.interval" in seconds (default value is 30 seconds). NameNode picks a stale datanode as the last target to read from when returning block locations for reads.
This feature is by default turned * off *. To turn on the feature, set the HDFS configuration "dfs.namenode.check.stale.datanode" to true.
Minor test reported by Steve Loughran and fixed by Steve Loughran (fs , test)<br>
<b>Add test to FileSystemContractBaseTest to verify integrity of overwritten files</b><br>
<blockquote>Patches adds more tests to verify overwritten and more complex operations -write-delete-overwrite. By using differently sized datasets and different data inside, these tests verify that the overwrite really did take place. While HDFS meets all these requirements directly, eventually consistent object stores may not -hence these tests.</blockquote></li>
Major improvement reported by Todd Lipcon and fixed by Robert Parker (ipc)<br>
<b>Allow configuration of IPC connect timeout</b><br>
<blockquote>This jira introduces a new configuration parameter "ipc.client.connect.timeout". This configuration defines the Hadoop RPC connection timeout in milliseconds for a client to connect to a server. For details see the description associated with this configuration in core-default.xml.</blockquote></li>
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (ipc)<br>
<b>SASL negotiation is flawed</b><br>
<blockquote>The RPC SASL negotiation now always ends with final response. If the SASL mechanism does not have a final response (GSSAPI, PLAIN), then an empty success response is sent to the client. The client will now always expect a final response to definitively know if negotiation is complete/successful.</blockquote></li>
<blockquote>The default group mapping policy has been changed to JniBasedUnixGroupsNetgroupMappingWithFallback. This should maintain the same semantics as the prior default for most users.</blockquote></li>
Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (scheduler)<br>
<b>Change the default scheduler to the CapacityScheduler</b><br>
<blockquote>There's some bugs in the FifoScheduler atm - doesn't distribute tasks across nodes and some headroom (available resource) issues.
That's not the best experience for users trying out the 2.0 branch. The CS with the default configuration of a single queue behaves the same as the FifoScheduler and doesn't have these issues.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>FSDownload can create cache directories with the wrong permissions</b><br>
<blockquote>When the cluster is configured with a restrictive umask, e.g.: {{fs.permissions.umask-mode=0077}}, the nodemanager can end up creating directory entries in the public cache with the wrong permissions. The permissions can end up where only the nodemanager user can access files in the public cache, preventing jobs from running properly.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Nodemanager needs to set permissions of local directories</b><br>
<blockquote>If the nodemanager process is running with a restrictive default umask (e.g.: 0077) then it will create its local directories with permissions that are too restrictive to allow containers from other users to run.</blockquote></li>
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>DefaultContainerExecutor can fail to set proper permissions</b><br>
<blockquote>{{DefaultContainerExecutor}} can fail to set the proper permissions on its local directories if the cluster has been configured with a restrictive umask, e.g.: fs.permissions.umask-mode=0077. The configured umask ends up defeating the permissions requested by {{DefaultContainerExecutor}} when it creates directories.</blockquote></li>
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>NM ResourceLocalizationService does not set permissions of local cache directories</b><br>
<blockquote>{{ResourceLocalizationService}} creates a file cache and user cache directory when it starts up but doesn't specify the permissions for them when they are created. If the cluster configs are set to limit the default permissions (e.g.: fs.permissions.umask-mode=0077 instead of the default 0022), then the cache directories are created with too-restrictive permissions and no jobs are able to run.</blockquote></li>
Major improvement reported by Todd Lipcon and fixed by Arun C Murthy (capacityscheduler)<br>
<b>Support delay scheduling for node locality in MR2's capacity scheduler</b><br>
<blockquote>The capacity scheduler in MR2 doesn't support delay scheduling for achieving node-level locality. So, jobs exhibit poor data locality even if they have good rack locality. Especially on clusters where disk throughput is much better than network capacity, this hurts overall job performance. We should optionally support node-level delay scheduling heuristics similar to what the fair scheduler implements in MR1.</blockquote></li>
*org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy - is not Closeable or does not provide closeable invocation handler class org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl*
Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>RMContainer should handle a RELEASE event while RUNNING</b><br>
<blockquote>An AppMaster can send a container release at any point. Currently this results in an exception, if this is done while the RM considers the container to be RUNNING.
The event not being processed correctly also implies that these containers do not show up in the Completed Container List seen by the AM (AMRMProtocol). MR-3902 depends on this set being complete. </blockquote></li>
Major bug reported by patrick white and fixed by Daryn Sharp (nodemanager)<br>
<b>NodeManager will refuse to shutdown indefinitely due to container log aggregation</b><br>
<blockquote>The nodemanager is able to get into a state where containermanager.logaggregation.AppLogAggregatorImpl will apparently wait
indefinitely for log aggregation to complete for an application, even if that application has abnormally terminated and is no longer present.
Observed behavior is that an attempt to stop the nodemanager daemon will return but have no effect, the nm log continually displays messages similar to this:
Waiting for aggregation to complete for application_1345221477405_2733
The only recovery we found to work was to 'kill -9' the nm process.
What exactly causes the NM to enter this state is unclear but we do see this behavior reliably when the NM has run a task which failed, for example when debugging oozie distcp actions and having a distcp map task fail, the NM that was running the container will now enter this state where a shutdown on said NM will never complete, 'never' in this case was waiting for 2 hours before killing the nodemanager process.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (nodemanager)<br>
<b>aggregated logs permissions not set properly</b><br>
<blockquote>If the default file permissions are set to something restrictive - like 700, application logs get aggregated and created with those restrictive file permissions which doesn't allow the history server to serve them up.
They need to be created with group readable similar to how log aggregation sets up the directory permissions.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
<b>RMNodeImpl is missing valid transitions from the UNHEALTHY state</b><br>
<blockquote>The ResourceManager isn't properly handling nodes that have been marked UNHEALTHY when they are lost or are decommissioned.</blockquote></li>
Blocker sub-task reported by Daryn Sharp and fixed by Vinod Kumar Vavilapalli (nodemanager)<br>
<b>NMs rejects all container tokens after secret key rolls</b><br>
<blockquote>The NM's token secret manager will reject all container tokens after the secret key is activated which means the NM will not launch _any_ containers including AMs. The whole yarn cluster becomes inoperable in 1d.</blockquote></li>
Minor bug reported by Jason Lowe and fixed by Mayank Bansal (resourcemanager)<br>
<b>TestRMAppTransitions.testAppSubmittedKilled passes for the wrong reason</b><br>
<blockquote>TestRMAppTransitions#testAppSubmittedKilled causes an invalid event exception but the test doesn't catch the error since the final app state is still killed. Killed for the wrong reason, but the final state is the same.</blockquote></li>
Blocker bug reported by Eli Collins and fixed by Radim Kolar <br>
<b>branch-2.1.0-alpha doesn't build</b><br>
<blockquote>branch-2.1.0-alpha doesn't build due to the following. Per YARN-1 I updated the mvn version to be 2.1.0-SNAPSHOT, before I hit this issue it didn't compile due to the bogus version.
{noformat}
hadoop-branch-2.1.0-alpha $ mvn compile
[INFO] Scanning for projects...
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR] The project org.apache.hadoop:hadoop-yarn-project:2.1.0-SNAPSHOT (/home/eli/src/hadoop-branch-2.1.0-alpha/hadoop-yarn-project/pom.xml) has 1 error
[ERROR] 'dependencies.dependency.version' for org.hsqldb:hsqldb:jar is missing. @ line 160, column 17
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (client)<br>
<b>Add a yarn-client module</b><br>
<blockquote>I see that we are duplicating (some) code for talking to RM via client API. In this light, a yarn-client module will be useful so that clients of all frameworks can use/extend it.
And that same module can be the destination for all the YARN's command line tools.</blockquote></li>
Major bug reported by Ramya Sunil and fixed by Arun C Murthy <br>
<b>Failed refreshQueues due to misconfiguration prevents further refreshing of queues</b><br>
<blockquote>Stumbled upon this problem while refreshing queues with incorrect configuration. The exact scenario was:
1. Added a new queue "newQueue" without defining its capacity.
2. "bin/mapred queue -refreshQueues" fails correctly with "Illegal capacity of -1 for queue root.newQueue"
3. However, after defining the capacity of "newQueue" followed by a second "bin/mapred queue -refreshQueues" throws "org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=newQueue already exists!" Also see Hadoop:name=QueueMetrics,q0=root,q1=newQueue,service=ResourceManager metrics being available even though the queue was not added.
The expected behavior would be to refresh the queues correctly and allow addition of "newQueue". </blockquote></li>
Major bug reported by Thomas Graves and fixed by Robert Joseph Evans <br>
<b>remove old aggregated logs</b><br>
<blockquote>Currently the aggregated user logs under NM_REMOTE_APP_LOG_DIR are never removed. We should have mechanism to remove them after certain period.
It might make sense for job history server to remove them.</blockquote></li>
Minor bug reported by Eli Collins and fixed by Mayank Bansal <br>
<b>Using URI for yarn.nodemanager log dirs fails</b><br>
<blockquote>If I use URIs (eg file:///home/eli/hadoop/dirs) for yarn.nodemanager.log-dirs or yarn.nodemanager.remote-app-log-dir the container log servlet fails with an NPE (works if I remove the "file" scheme). Using a URI for yarn.nodemanager.local-dirs works.</blockquote></li>
Critical bug reported by Todd Lipcon and fixed by <br>
<b>Merge of yarn reorg into branch-2 copied trunk tree</b><br>
<blockquote>When the move of yarn from inside MR to the project root was merged into branch-2, it seems like the trunk code base was actually copied into the branch-2 branch, instead of a parallel move occurring. So, the poms in branch-2 show the version as 3.0.0-SNAPSHOT instead of a 2.x snapshot version. This is breaking the branch-2 build.</blockquote></li>
Major bug reported by Junping Du and fixed by Junping Du (scheduler)<br>
<b>Several Findbugs issues with new FairScheduler in YARN</b><br>
<blockquote>The good feature of FairScheduler is added recently to YARN. As recently PreCommit test from MAPREDUCE-4309, there are several bugs found by Findbugs related to FairScheduler:
Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerEventLog.logDisabled; locked 50% of time
Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.queueMaxAppsDefault; locked 50% of time
Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.userMaxAppsDefault; locked 50% of time
The details are in:https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2612//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#DE_MIGHT_IGNORE
Critical bug reported by Ravi Prakash and fixed by Ravi Prakash (mrv2)<br>
<b>In mapred-default, mapreduce.map.maxattempts & mapreduce.reduce.maxattempts defaults are set to 4 as well as mapreduce.job.maxtaskfailures.per.tracker. </b><br>
Major bug reported by Anupam Seth and fixed by Anupam Seth (mrv2)<br>
<b>User set java.library.path seems to overwrite default creating problems native lib loading</b><br>
<blockquote>-Djava.library.path in mapred.child.java.opts can cause issues with native libraries. LD_LIBRARY_PATH through mapred.child.env should be used instead.</blockquote></li>
Trivial improvement reported by Koji Noguchi and fixed by Thomas Graves (jobhistoryserver , jobtracker)<br>
<b>Add jobname to jobsummary log</b><br>
<blockquote>The Job Summary log may contain commas in values that are escaped by a '\' character. This was true before, but is more likely to be exposed now. </blockquote></li>
Minor improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , performance)<br>
<b>Enable fadvise readahead by default</b><br>
<blockquote>The datanode now performs 4MB readahead by default when reading data from its disks, if the native libraries are present. This has been shown to improve performance in many workloads. The feature may be disabled by setting dfs.datanode.readahead.bytes to "0".</blockquote></li>
Major bug reported by Brandon Li and fixed by Brandon Li (name-node)<br>
<b>If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations</b><br>
<blockquote>getBlockLocations(), and hence open() for read, will now throw SafeModeException if the NameNode is still in safe mode and there are no replicas reported yet for one of the blocks in the file.</blockquote></li>
Trivial improvement reported by Harsh J and fixed by Harsh J <br>
<b>Make the replication and invalidation rates configurable</b><br>
<blockquote>This change adds two new configuration parameters.
# {{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks.
# {{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning.
Please see hdfs-default.xml for detailed description.</blockquote></li>
Major bug reported by Matthew Jacobs and fixed by Matthew Jacobs (name-node)<br>
<b>HostsFileReader silently ignores bad includes/excludes</b><br>
<blockquote>HDFS no longer silently ignores missing or unreadable host files specified by dfs.hosts or dfs.hosts.exclude. In order to specify that no hosts should be included or excluded, administrators should either refrain from setting the relevant config properties, or create an empty file in order to represent an empty list.</blockquote></li>
Major bug reported by Brahma Reddy Battula and fixed by Brandon Li (name-node)<br>
<b>During NameNode starting up, it may pick wrong storage directory inspector when the layout versions of the storage directories are different</b><br>
Major new feature reported by Aaron T. Myers and fixed by Todd Lipcon (name-node)<br>
<b>Add an admin command to trigger an edit log roll</b><br>
<blockquote>Introduced a new command, "hdfs dfsadmin -rollEdits" which requests that the active NameNode roll its edit log. This can be useful for administrators manually backing up log segments.</blockquote></li>
Major improvement reported by Todd Lipcon and fixed by Suresh Srinivas (data-node , name-node)<br>
<b>Remove DistributedUpgrade related code</b><br>
<blockquote>This jira removes functionality that has not been used/applicable since release 0.17. The incompatibility introduced by this change will not affect any HDFS users.</blockquote></li>
Major improvement reported by Jakob Homan and fixed by Jakob Homan (security)<br>
<b>Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution</b><br>
<blockquote>Due to the requirement that KSSL use weak encryption types for Kerberos tickets, HTTP authentication to the NameNode will now use SPNEGO by default. This will require users of previous branch-1 releases with security enabled to modify their configurations and create new Kerberos principals in order to use SPNEGO. The old behavior of using KSSL can optionally be enabled by setting the configuration option "hadoop.security.use-weak-http-crypto" to "true".</blockquote></li>
Major improvement reported by Eli Collins and fixed by Eli Collins (fs)<br>
<b>Remove ability for users to easily run the trash emptier</b><br>
<blockquote>The trash emptier may no longer be run using "hadoop org.apache.hadoop.fs.Trash". The trash emptier runs on the NameNode (if configured). Old trash checkpoints may be deleted using "hadoop fs -expunge".</blockquote></li>
Major bug reported by Arun A K and fixed by (util)<br>
<b>In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output</b><br>
Major bug reported by Robert Joseph Evans and fixed by John George (fs)<br>
<b>fs -mkdir creates parent directories without the -p option</b><br>
<blockquote>FsShell's "mkdir" no longer implicitly creates all non-existent parent directories. The command adopts the posix compliant behavior of requiring the "-p" flag to auto-create parent directories.</blockquote></li>
Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)<br>
<b>Move DatanodeInfo#hostName to DatanodeID</b><br>
<blockquote>This change modifies DatanodeID, which is part of the client to server protocol, therefore clients must be upgraded with servers.</blockquote></li>
Major improvement reported by Eli Collins and fixed by Eli Collins (data-node)<br>
<b>Refactor DatanodeID#getName by use</b><br>
<blockquote>This change modifies DatanodeID, which is part of the client to server protocol, therefore clients must be upgraded with servers.</blockquote></li>
Major improvement reported by Eli Collins and fixed by Eli Collins <br>
<b>Move DatanodeInfo#ipcPort to DatanodeID</b><br>
<blockquote>This change modifies DatanodeID, which is part of the client to server protocol, therefore clients must be upgraded with servers.</blockquote></li>
Major improvement reported by Eli Collins and fixed by Eli Collins (name-node)<br>
<b>Bump LAST_UPGRADABLE_LAYOUT_VERSION to -16</b><br>
<blockquote>Upgrade from Hadoop versions earlier than 0.18 is not supported as of 2.0. To upgrade from an earlier release, first upgrade to 0.18, and then upgrade again from there.</blockquote></li>
Major improvement reported by Arpit Gupta and fixed by Arpit Gupta <br>
<b>add -nonInteractive and -force option to namenode -format command</b><br>
<blockquote>The 'namenode -format' command now supports the flags '-nonInteractive' and '-force' to improve usefulness without user input.</blockquote></li>
Major improvement reported by Eli Collins and fixed by Colin Patrick McCabe (name-node)<br>
<b>fsck move should be non-destructive by default</b><br>
<blockquote>The fsck "move" option is no longer destructive. It copies the accessible blocks of corrupt files to lost and found as before, but no longer deletes the corrupt files after copying the blocks. The original, destructive behavior can be enabled by specifying both the "move" and "delete" options. </blockquote></li>
Major new feature reported by Aaron T. Myers and fixed by Todd Lipcon (ha)<br>
<b>HA: Autopopulate standby name dirs if they're empty</b><br>
<blockquote>The HA NameNode may now be started with the "-bootstrapStandby" flag. This causes it to copy the namespace information and most recent checkpoint from its HA pair, and save it to local storage, allowing an HA setup to be bootstrapped without use of rsync or external tools.</blockquote></li>
Major improvement reported by Roman Shaposhnik and fixed by Mingjie Lai (build , scripts)<br>
<b>Unbundle jsvc</b><br>
<blockquote>To run secure Datanodes users must install jsvc for their platform and set JSVC_HOME to point to the location of jsvc in their environment.</blockquote></li>
Major improvement reported by Sanjay Radia and fixed by Sanjay Radia (ipc)<br>
<b>ProtoBuf RPC engine does not need it own reply packet - it can use the IPC layer reply packet.</b><br>
<blockquote>This change will affect the output of errors for some Hadoop CLI commands. Specifically, the name of the exception class will no longer appear, and instead only the text of the exception message will appear.</blockquote></li>
Major sub-task reported by Suresh Srinivas and fixed by Daryn Sharp (fs)<br>
<b>Handle paths using back slash as path separator for windows only</b><br>
<blockquote>This jira only allows providing paths using back slash as separator on Windows. The back slash on *nix system will be used as escape character. The support for paths using back slash as path separator will be removed in HADOOP-8139 in release 23.3.</blockquote></li>
Major improvement reported by Patrick Hunt and fixed by Patrick Hunt (conf)<br>
<b>cap space usage of default log4j rolling policy </b><br>
<blockquote>Hadoop log files are now rolled by size instead of date (daily) by default. Tools that depend on the log file name format will need to be updated. Users who would like to maintain the previous settings of hadoop.root.logger and hadoop.security.logger can use their current log4j.properties files and update the HADOOP_ROOT_LOGGER and HADOOP_SECURITY_LOGGER environment variables to use DRFA and DRFAS respectively.</blockquote></li>
Major bug reported by Vinod Kumar Vavilapalli and fixed by Siddharth Seth (mrv2)<br>
<b>MR tasks failing due to changing timestamps on Resources to download</b><br>
<blockquote>Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps.</blockquote></li>
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (name-node)<br>
<b>Secondary NN HTTPS address should be listed as a NAMESERVICE_SPECIFIC_KEY</b><br>
<blockquote>The configuration dfs.secondary.https.port has been renamed to dfs.namenode.secondary.https-port for consistency. The old configuration is still supported via a deprecation path.</blockquote></li>
Major sub-task reported by Daryn Sharp and fixed by Daryn Sharp (fs)<br>
<b>Add mkdir -p flag</b><br>
<blockquote>FsShell mkdir now accepts a -p flag. Like unix, mkdir -p will not fail if the directory already exists. Unlike unix, intermediate directories are always created, regardless of the flag, to avoid incompatibilities at this time.</blockquote></li>
Critical bug reported by Siddharth Seth and fixed by Siddharth Seth (mr-am , mrv2)<br>
<b>If multiple hosts for a split belong to the same rack, the rack is added multiple times in the AM request table</b><br>
<blockquote>Changed MR AM to not add the same rack entry multiple times into the container request table when multiple hosts for a split happen to be on the same rack</blockquote></li>
Critical sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)<br>
<b>Data Locality suffers if the AM asks for containers using IPs instead of hostnames</b><br>
<blockquote>Fixed MR AM to always use hostnames and never IPs when requesting containers so that scheduler can give off data local containers correctly.</blockquote></li>
Major improvement reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)<br>
<b>[Gridmix] Improve STRESS mode</b><br>
<blockquote>JobMonitor can now deploy multiple threads for faster job-status polling. Use 'gridmix.job-monitor.thread-count' to set the number of threads. Stress mode now relies on the updates from the job monitor instead of polling for job status. Failures in job submission now get reported to the statistics module and ultimately reported to the user via summary.</blockquote></li>
Major bug reported by Ramya Sunil and fixed by Arun C Murthy (mrv2)<br>
<b>maxActiveApplications(|PerUser) per queue is too low for small clusters</b><br>
<blockquote>Fixed CapacityScheduler so that maxActiveApplication and maxActiveApplicationsPerUser per queue are not too low for small clusters. </blockquote></li>
Blocker bug reported by Jonathan Eagles and fixed by Jonathan Eagles (mrv2)<br>
<b>java.io.File.createTempFile fails in map/reduce tasks</b><br>
<blockquote>Fixing YARN+MR to allow MR jobs to be able to use java.io.File.createTempFile to create temporary files as part of their tasks.</blockquote></li>
Blocker bug reported by Siddharth Seth and fixed by Arun C Murthy (mrv2 , resourcemanager)<br>
<b>Incorrect headroom reported to jobs</b><br>
<blockquote>Fixed the way head-room is allocated to applications by CapacityScheduler so that it deducts current-usage per user and not per-application.</blockquote></li>
Blocker sub-task reported by Siddharth Seth and fixed by Robert Joseph Evans (mrv2)<br>
<b>AppMaster recovery for Medium to large jobs take long time</b><br>
<blockquote>Fixed MR AM recovery so that only single selected task output is recovered and thus reduce the unnecessarily bloated recovery time.</blockquote></li>
Blocker sub-task reported by Arun C Murthy and fixed by Arun C Murthy (mrv2 , scheduler)<br>
<b>CapacityScheduler should be more conservative assigning off-switch requests</b><br>
<blockquote>Making CapacityScheduler more conservative so as to assign only one off-switch container in a single scheduling iteration.</blockquote></li>
Major improvement reported by Ravi Gummadi and fixed by Ravi Gummadi (tools/rumen)<br>
<b>Provide a way to access other info of history file from Rumentool</b><br>
<blockquote>Rumen now provides {{Parsed*}} objects. These objects provide extra information that are not provided by {{Logged*}} objects.</blockquote></li>
Critical bug reported by Karam Singh and fixed by Bhallamudi Venkata Siva Kamesh (mrv2 , nodemanager)<br>
<b>When 0 is provided as port number in yarn.nodemanager.webapp.address, NMs webserver component picks up random port, NM keeps on Reporting 0 port to RM</b><br>
<blockquote>Modified NM to report correct http address when an ephemeral web port is configured.</blockquote></li>
Major bug reported by Siddharth Seth and fixed by Siddharth Seth (mr-am , mrv2)<br>
<b>The task timeout check interval should be configurable independent of mapreduce.task.timeout</b><br>
<blockquote>Fixed TaskHeartBeatHandler to use a new configuration for the thread loop interval separate from task-timeout configuration property.</blockquote></li>
Major improvement reported by Amar Kamat and fixed by Amar Kamat (contrib/gridmix)<br>
<b>[Gridmix] Improve STRESS mode locking</b><br>
<blockquote>Modified Gridmix STRESS mode locking structure. The submitted thread and the polling thread now run simultaneously without blocking each other. </blockquote></li>
Blocker sub-task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2 , nodemanager)<br>
<b>ContainerLocalizer should request new resources after completing the current one</b><br>
<blockquote>Modified ContainerLocalizer to send a heartbeat to NM immediately after downloading a resource instead of always waiting for a second.</blockquote></li>
Critical bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)<br>
<b>A tracking URL of N/A before the app master is launched breaks oozie</b><br>
<blockquote>Fixed AM's tracking URL to always go through the proxy, even before the job started, so that it works properly with oozie throughout the job execution.</blockquote></li>
Blocker bug reported by Ramgopal N and fixed by Siddharth Seth (mrv2)<br>
<b>Job is getting hanged indefinitely,if the child processes are killed on the NM. KILL_CONTAINER eventtype is continuosly sent to the containers that are not existing</b><br>
<blockquote>Fixed MR AM to stop considering node blacklisting after the number of nodes blacklisted crosses a threshold.</blockquote></li>
Blocker bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (applicationmaster , mrv2)<br>
<b>MR AM for sort-job going out of memory</b><br>
<blockquote>Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes.</blockquote></li>
Major task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)<br>
<b>Move Log Related components from yarn-server-nodemanager to yarn-common</b><br>
<blockquote>Moved log related components into yarn-common so that HistoryServer and clients can use them without depending on the yarn-server-nodemanager module.
Major new feature reported by Hong Tang and fixed by Amar Kamat (tools/rumen)<br>
<b>[Rumen] Need a standalone JobHistory log anonymizer</b><br>
<blockquote>Added an anonymizer tool to Rumen. Anonymizer takes a Rumen trace file and/or topology as input. It supports persistence and plugins to override the default behavior.</blockquote></li>
Major improvement reported by Sanjay Radia and fixed by Jitendra Nath Pandey <br>
<b>Shortcut a local client reads to a Datanodes files directly</b><br>
<blockquote>1. New configurations
a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read.
b. dfs.client.read.shortcircuit is the key to enable short circuit read at the client side configuration.
c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum check at the client side.
2. By default none of the above are enabled and short circuit read will not kick in.
3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general.
Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client)<br>
<b>Switch default checksum to CRC32C</b><br>
<blockquote>The default checksum algorithm used on HDFS is now CRC32C. Data from previous versions of Hadoop can still be read backwards-compatibly.</blockquote></li>
Major sub-task reported by Todd Lipcon and fixed by Todd Lipcon (hdfs client , performance)<br>
<b>Simplify BlockReader to not inherit from FSInputChecker</b><br>
<blockquote>BlockReader has been reimplemented to use direct byte buffers. If you use a custom socket factory, it must generate sockets that have associated Channels.</blockquote></li>
Minor bug reported by Karim Saadah and fixed by Sho Shimauchi <br>
<b>dfs.blocksize accepts only absolute value</b><br>
<blockquote>The default blocksize property 'dfs.blocksize' now accepts unit symbols to be used instead of byte length. Values such as "10k", "128m", "1g" are now OK to provide instead of just no. of bytes as was before.</blockquote></li>
Critical improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (build)<br>
<b>Create hadoop-client and hadoop-minicluster artifacts for downstream projects </b><br>
<blockquote>Generate integration artifacts "org.apache.hadoop:hadoop-client" and "org.apache.hadoop:hadoop-minicluster" containing all the jars needed to use Hadoop client APIs, and to run Hadoop MiniClusters, respectively. Push these artifacts to the maven repository when mvn-deploy, along with existing artifacts. </blockquote></li>
Major improvement reported by XieXianshan and fixed by XieXianshan (fs)<br>
<b>Modify the option of FsShell getmerge from [addnl] to [-nl] for consistency</b><br>
<blockquote>The 'fs -getmerge' tool now uses a -nl flag to determine if adding a newline at end of each file is required, in favor of the 'addnl' boolean flag that was used earlier.</blockquote></li>
Blocker improvement reported by Todd Lipcon and fixed by Todd Lipcon (mrv2 , nodemanager)<br>
<b>MR2 memory limits should be pmem, not vmem</b><br>
<blockquote>Resource limits are now expressed and enforced in terms of physical memory, rather than virtual memory. The virtual memory limit is set as a configurable multiple of the physical limit. The NodeManager's memory usage is now configured in units of MB rather than GB.</blockquote></li>
Trivial bug reported by Hitesh Shah and fixed by Arun C Murthy (mrv2)<br>
<b>Change mode for hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/mock-container-executor to 755 </b><br>
Minor bug reported by Hitesh Shah and fixed by Vinod Kumar Vavilapalli (mrv2)<br>
<b>Simplify parameter passing to Application Master from Client. SImplify approach to pass info such appId, ClusterTimestamp and failcount required by App Master.</b><br>
Blocker bug reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)<br>
<b>Enhance YARN Client-RM protocol to provide access to information such as cluster's Min/Max Resource capabilities similar to that of AM-RM protocol</b><br>
Major improvement reported by Abhijit Suresh Shingate and fixed by Abhijit Suresh Shingate (applicationmaster , jobhistoryserver , nodemanager , resourcemanager)<br>
Blocker sub-task reported by Luke Lu and fixed by Robert Joseph Evans (applicationmaster , mrv2 , security)<br>
<b>MRv2 WebApp Security</b><br>
<blockquote>A new server has been added to yarn. It is a web proxy that sits in front of the AM web UI. The server is controlled by the yarn.web-proxy.address config. If that config is set, and it points to an address that is different then the RM web interface then a separate proxy server needs to be launched.
This can be done by running
yarn-daemon.sh start proxyserver
If a separate proxy server is needed other configs also may need to be set, if security is enabled.
yarn.web-proxy.principal
yarn.web-proxy.keytab
The proxy server is stateless and should be able to support a VIP or other load balancing sitting in front of multiple instances of this server.</blockquote></li>
Major bug reported by Daryn Sharp and fixed by Owen O'Malley <br>
<b>Fix renewal of dfs delegation tokens</b><br>
<blockquote>Generalizes token renewal and canceling to a common interface and provides a plugin interface for adding renewers for new kinds of tokens. Hftp changed to store the tokens as HFTP and renew them over http.</blockquote></li>
Major task reported by Eli Collins and fixed by Eli Collins (jobtracker , tasktracker)<br>
<b>Remove unused contrib components dependent on MR1</b><br>
<blockquote>The pre-MR2 MapReduce implementation (JobTracker, TaskTracer, etc) and contrib components are no longer supported. This implementation is currently supported in the 0.20.20x releases.</blockquote></li>
Major new feature reported by Sharad Agarwal and fixed by Hitesh Shah (mrv2)<br>
<b>MR-279: Write a shell command application</b><br>
<blockquote>Adding a simple, DistributedShell application as an alternate framework to MapReduce and to act as an illustrative example for porting applications to YARN.</blockquote></li>
Major bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)<br>
<b>MR279: Fate of finished Applications on RM</b><br>
<blockquote>New config added:
// the maximum number of completed applications the RM keeps <name>yarn.server.resourcemanager.expire.applications.completed.max</name></blockquote></li>
Major bug reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)<br>
<b>Gridmix system tests are failing due to high ram emulation enable by default for normal mr jobs in the trace which exceeds the solt capacity.</b><br>
Major improvement reported by Arun C Murthy and fixed by Amar Kamat (benchmarks , contrib/gridmix)<br>
<b>Gridmix should notify job failures</b><br>
<blockquote>Gridmix now prints a summary information after every run. It summarizes the runs w.r.t input trace details, input data statistics, cli arguments, data-gen runtime, simulation runtimes etc and also the cluster w.r.t map slots, reduce slots, jobtracker-address, hdfs-address etc.</blockquote></li>
Critical bug reported by Binglin Chang and fixed by Binglin Chang (tasktracker)<br>
<b>Race Condition in IndexCache(readIndexFileToCache,removeMap) causes value of totalMemoryUsed corrupt, which may cause TaskTracker continue throw Exception</b><br>
Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)<br>
<b>Porting Gridmix v3 system tests into trunk branch.</b><br>
<blockquote>Adds system tests to Gridmix. These system tests cover various features like job types (load and sleep), user resolvers (round-robin, submitter-user, echo) and submission modes (stress, replay and serial).</blockquote></li>
Major improvement reported by Robert Joseph Evans and fixed by Robert Joseph Evans (distributed-cache)<br>
<b>Make the distributed cache delete entires using LRU priority</b><br>
<blockquote>Added config option mapreduce.tasktracker.cache.local.keep.pct to the TaskTracker. It is the target percentage of the local distributed cache that should be kept in between garbage collection runs. In practice it will delete unused distributed cache entries in LRU order until the size of the cache is less than mapreduce.tasktracker.cache.local.keep.pct of the maximum cache size. This is a floating point value between 0.0 and 1.0. The default is 0.95.</blockquote></li>
Major new feature reported by Aaron T. Myers and fixed by Aaron T. Myers (jobtracker)<br>
<b>MR portion of HADOOP-7214 - Hadoop /usr/bin/groups equivalent</b><br>
<blockquote>Introduces a new command, "mapred groups", which displays what groups are associated with a user as seen by the JobTracker.</blockquote></li>
Major new feature reported by Ravi Gummadi and fixed by Amar Kamat (contrib/gridmix)<br>
<b>Make Gridmix emulate usage of data compression</b><br>
<blockquote>Emulates the MapReduce compression feature in Gridmix. By default, compression emulation is turned on. Compression emulation can be disabled by setting 'gridmix.compression-emulation.enable' to 'false'. Use 'gridmix.compression-emulation.map-input.decompression-ratio', 'gridmix.compression-emulation.map-output.compression-ratio' and 'gridmix.compression-emulation.reduce-output.compression-ratio' to configure the compression ratios at map input, map output and reduce output side respectively. Currently, compression ratios in the range [0.07, 0.68] are supported. Gridmix auto detects whether map-input, map output and reduce output should emulate compression based on original job's compression related configuration parameters.</blockquote></li>
Major improvement reported by Mahadev konar and fixed by Greg Roelofs (mrv2)<br>
<b>MR-279: Implement uber-AppMaster (in-cluster LocalJobRunner for MRv2)</b><br>
<blockquote>An efficient implementation of small jobs by running all tasks in the MR ApplicationMaster JVM, there-by affecting lower latency.</blockquote></li>
Major improvement reported by Ahmed Radwan and fixed by Ahmed Radwan <br>
<b>Allow setting of end-of-record delimiter for TextInputFormat</b><br>
<blockquote>TextInputFormat may now split lines with delimiters other than newline, by specifying a configuration parameter "textinputformat.record.delimiter"</blockquote></li>
Major improvement reported by Ravi Gummadi and fixed by Rajesh Balamohan (tools/rumen)<br>
<b>Bring in more job configuration properties in to the trace file</b><br>
<blockquote>Adds job configuration parameters to the job trace. The configuration parameters are stored under the 'jobProperties' field as key-value pairs.</blockquote></li>
Major bug reported by Ravi Gummadi and fixed by Ravi Gummadi (contrib/gridmix)<br>
<b>Mapping between Gridmix jobs and the corresponding original MR jobs is needed</b><br>
<blockquote>New configuration properties gridmix.job.original-job-id and gridmix.job.original-job-name in the configuration of simulated job are exposed/documented to gridmix user for mapping between original cluster's jobs and simulated jobs.</blockquote></li>
Major improvement reported by Ranjit Mathew and fixed by Amar Kamat (contrib/gridmix)<br>
<b>Emulate Memory Usage of Tasks in GridMix3</b><br>
<blockquote>Adds total heap usage emulation to Gridmix. Also, Gridmix can configure the simulated task's JVM heap options with max heap options obtained from the original task (via Rumen). Use 'gridmix.task.jvm-options.enable' to disable the task max heap options configuration. </blockquote></li>
* CPU load [either at the time the data are taken, or exponentially smoothed]
* Memory load [also either at the time the data are taken, or exponentially smoothed]
This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges - [0-1/3], (1/3-2/3], and (2/3-3/3] - where fundamentally different activities happen. Mappers have different boundaries that are not symmetrically placed [0-9/10], (9/10-1]. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval.
Major improvement reported by Scott Carey and fixed by Todd Lipcon (jobtracker , performance , tasktracker)<br>
<b>Lower default minimum heartbeat interval for tasktracker > Jobtracker</b><br>
<blockquote>The default minimum heartbeat interval has been dropped from 3 seconds to 300ms to increase scheduling throughput on small clusters. Users may tune mapreduce.jobtracker.heartbeats.in.second to adjust this value.</blockquote></li>
Major improvement reported by Rajesh Balamohan and fixed by Rajesh Balamohan (tools/rumen)<br>
<b>Feature to instruct rumen-folder utility to skip jobs worth of specific duration</b><br>
<blockquote>Added a ''-starts-after' option to Rumen's Folder utility. The time duration specified after the '-starts-after' option is an offset with respect to the submit time of the first job in the input trace. Jobs in the input trace having a submit time (relative to the first job's submit time) lesser than the specified offset will be ignored.</blockquote></li>
Trivial bug reported by Amogh Vasekar and fixed by Harsh J <br>
<b>Chain APIs error misleading</b><br>
<blockquote>Fix a misleading exception message in case the Chained Mappers have mismatch in input/output Key/Value pairs between them.</blockquote></li>
Minor bug reported by Steve Loughran and fixed by Amar Kamat (contrib/streaming)<br>
<b>Stream test TestStreamingExitStatus fails with Out of Memory</b><br>
<blockquote>Fixed the streaming test TestStreamingExitStatus's failure due to an OutOfMemory error by reducing the testcase's io.sort.mb.</blockquote></li>
Major improvement reported by Arun C Murthy and fixed by (mrv2)<br>
<b>Map-Reduce 2.0</b><br>
<blockquote>MapReduce has undergone a complete re-haul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2).
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
The ResourceManager has two main components:
* Scheduler (S)
* ApplicationsManager (ASM)
The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees on restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a Resource Container which incorporates elements such as memory, cpu, disk, network etc.
The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in.
The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources.
The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
The NodeManager is the per-machine framework agent who is responsible for launching the applications' containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler.
The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , performance)<br>
<b>Add HDFS support for fadvise readahead and drop-behind</b><br>
<blockquote>HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. This support is currently considered experimental, and may be enabled by configuring the following keys:
dfs.datanode.drop.cache.behind.writes - set to true to drop data out of the buffer cache after writing
dfs.datanode.drop.cache.behind.reads - set to true to drop data out of the buffer cache when performing sequential reads
dfs.datanode.sync.behind.writes - set to true to trigger dirty page writeback immediately after writing data
dfs.datanode.readahead.bytes - set to a non-zero value to trigger readahead for sequential reads</blockquote></li>
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)<br>
<b>Federation: enable using the same configuration file across all the nodes in the cluster.</b><br>
<blockquote>This change allows when running multiple namenodes on different hosts, sharing the same configuration file across all the nodes in the cluster (Datanodes, NamNode, BackupNode, SecondaryNameNode), without the need to define dfs.federation.nameservice.id parameter.</blockquote></li>
Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (webhdfs)<br>
<b>Provide authentication to webhdfs using SPNEGO</b><br>
<blockquote>Added two new conf properties dfs.web.authentication.kerberos.principal and dfs.web.authentication.kerberos.keytab for the SPNEGO servlet filter.</blockquote></li>
Major new feature reported by Eric Payne and fixed by Eric Payne (balancer , data-node)<br>
<b>Changes to balancer bandwidth should not require datanode restart.</b><br>
<blockquote>New dfsadmin command added: [-setBalancerBandwidth <bandwidth>] where bandwidth is max network bandwidth in bytes per second that the balancer is allowed to use on each datanode during balacing.
This is an incompatible change in 0.23. The versions of ClientProtocol and DatanodeProtocol are changed.</blockquote></li>
Minor improvement reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (balancer , data-node , hdfs client , name-node , security)<br>
Major bug reported by Daryn Sharp and fixed by Daryn Sharp (name-node)<br>
<b>mkdirs should use the supplied permission for all of the created directories</b><br>
<blockquote>A multi-level mkdir is now POSIX compliant. Instead of creating intermediate directories with the permissions of the parent directory, intermediate directories are created with permission bits of rwxrwxrwx (0777) as modified by the current umask, plus write and search permission for the owner.</blockquote></li>
Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node)<br>
<b>Add a new DataTransferProtocol operation, Op.TRANSFER_BLOCK, instead of using RPC</b><br>
<blockquote>Add a new DataTransferProtocol operation, Op.TRANSFER_BLOCK, for transferring RBW/Finalized with acknowledgement and without using RPC.</blockquote></li>
Minor improvement reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (data-node)<br>
<b>When DataNode throws DiskOutOfSpaceException, it will be helpfull to the user if we log the available volume size and configured block size.</b><br>
Minor sub-task reported by Tanping Wang and fixed by Tanping Wang (scripts)<br>
<b>HDFS federation: Improve start/stop scripts and add script to decommission datanodes</b><br>
<blockquote>The masters file is no longer used to indicate which hosts to start the 2NN on. The 2NN is now started on hosts when dfs.namenode.secondary.http-address is configured with a non-wildcard IP.</blockquote></li>
Major new feature reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE (data-node , hdfs client , name-node)<br>
<b>Provide a stronger data guarantee in the write pipeline</b><br>
<blockquote>Added two configuration properties, dfs.client.block.write.replace-datanode-on-failure.enable and dfs.client.block.write.replace-datanode-on-failure.policy. Added a new feature to replace datanode on failure in DataTransferProtocol. Added getAdditionalDatanode(..) in ClientProtocol.</blockquote></li>
Major bug reported by Devaraj K and fixed by Aaron T. Myers (name-node)<br>
<b>When the disk becomes full Namenode is getting shutdown and not able to recover</b><br>
<blockquote>Implemented a daemon thread to monitor the disk usage for periodically and if the disk usage reaches the threshold value, put the name node into Safe mode so that no modification to file system will occur. Once the disk usage reaches below the threshold, name node will be put out of the safe mode. Here threshold value and interval to check the disk usage are configurable.
Minor bug reported by Todd Lipcon and fixed by Todd Lipcon (data-node)<br>
<b>dfs.data.dir permissions should default to 700</b><br>
<blockquote>The permissions on datanode data directories (configured by dfs.datanode.data.dir.perm) now default to 0700. Upon startup, the datanode will automatically change the permissions to match the configured value.</blockquote></li>
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)<br>
<b>Improve decommission mechanism</b><br>
<blockquote>Summary of changes to the decommissioning process:
# After nodes are decommissioned, they are not shutdown. The decommissioned nodes are not used for writes. For reads, the decommissioned nodes are given as the last location to read from.
# Number of live and dead decommissioned nodes are displayed in the namenode webUI.
# Decommissioned nodes free capacity is not count towards the the cluster free capacity.</blockquote></li>
Major bug reported by Hairong Kuang and fixed by Hairong Kuang (hdfs client)<br>
<b>Dfs client name for a map/reduce task should have some randomness</b><br>
<blockquote>Make a client name has this format: DFSClient_applicationid_randomint_threadid, where applicationid = mapred.task.id or else = "NONMAPREDUCE".</blockquote></li>
Major new feature reported by Erik Steffl and fixed by Erik Steffl (tools)<br>
<b>Create multi-format parser for edits logs file, support binary and XML formats initially</b><br>
<blockquote>Offline edits viewer feature adds oev tool to hdfs script. Oev makes it possible to convert edits logs to/from native binary and XML formats. It uses the same framework as Offline image viewer.
Major sub-task reported by Matt Foley and fixed by Matt Foley (data-node)<br>
<b>Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file</b><br>
<blockquote>Batch hardlinking during "upgrade" snapshots, cutting time from aprx 8 minutes per volume to aprx 8 seconds. Validated in both Linux and Windows. Depends on prior integration with patch for HADOOP-7133.</blockquote></li>
Major bug reported by Jakob Homan and fixed by Jim Plush (test)<br>
<b>HDFS javadocs hard-code references to dfs.namenode.name.dir and dfs.datanode.data.dir parameters</b><br>
<blockquote>Updated the JavaDocs to appropriately represent the new Configuration Keys that are used in the code. The docs did not match the code.</blockquote></li>
Minor bug reported by gary murry and fixed by Jim Plush (name-node)<br>
<b>If service port and main port are the same, there is no clear log message explaining the issue.</b><br>
<blockquote>Added a check to match the sure RPC and HTTP Port's on the NameNode were not set to the same value, otherwise an IOException is throw with the appropriate message.</blockquote></li>
Major improvement reported by Sanjay Radia and fixed by Todd Lipcon <br>
<b>Simpler model for Namenode's fs Image and edit Logs </b><br>
<blockquote>The NameNode's storage layout for its name directories has been reorganized to be more robust. Each edit now has a unique transaction ID, and each file is associated with a transaction ID (for checkpoints) or a range of transaction IDs (for edit logs).</blockquote></li>
Minor bug reported by Uma Maheswara Rao G and fixed by Uma Maheswara Rao G (io)<br>
<b>Fix the warning in writable classes.[ WritableComparable is a raw type. References to generic type WritableComparable<T> should be parameterized ]</b><br>
Major improvement reported by Eli Collins and fixed by Eli Collins (scripts)<br>
<b>Don't add tools.jar to the classpath when running Hadoop</b><br>
<blockquote>The scripts that run Hadoop no longer automatically add tools.jar from the JDK to the classpath (if it is present). If your job depends on tools.jar in the JDK you will need to add this dependency in your job.</blockquote></li>
Major improvement reported by Daryn Sharp and fixed by Daryn Sharp (fs)<br>
<b>Refactor FsShell's du/dus/df</b><br>
<blockquote>The "Found X items" header on the output of the "du" command has been removed to more closely match unix. The displayed paths now correspond to the command line arguments instead of always being a fully qualified URI. For example, the output will have relative paths if the command line arguments are relative paths.</blockquote></li>
Minor improvement reported by Nicholas Telford and fixed by Nicholas Telford (io)<br>
<b>MapWritable violates contract of Map interface for equals() and hashCode()</b><br>
<blockquote>MapWritable now implements equals() and hashCode() based on the map contents rather than object identity in order to correctly implement the Map interface.</blockquote></li>
Major improvement reported by Matt Foley and fixed by Matt Foley (util)<br>
<b>CLONE to COMMON - HDFS-1445 Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it once per directory instead of once per file</b><br>
<blockquote>This is the COMMON portion of a fix requiring coordinated change of COMMON and HDFS. Please see HDFS-1445 for HDFS portion and release note.</blockquote></li>
Minor bug reported by Eli Collins and fixed by Eli Collins (scripts)<br>
<b>Fix link resolution logic in hadoop-config.sh</b><br>
<blockquote>Updates hadoop-config.sh to always resolve symlinks when determining HADOOP_HOME. Bash built-ins or POSIX:2001 compliant cmds are now required.</blockquote></li>
Minor bug reported by Sanjay Radia and fixed by Sanjay Radia <br>
<b>RawLocalFileSystem#listStatus does not deal with a directory whose entries are changing ( e.g. in a multi-thread or multi-process environment)</b><br>
Major improvement reported by Navis and fixed by Matt Foley (io)<br>
<b>Reduces RPC packet size for primitive arrays, especially long[], which is used at block reporting</b><br>
<blockquote>Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.</blockquote></li>