Page MenuHomec4science

FaultInjectFramework.html
No OneTemporary

File Metadata

Created
Tue, Feb 25, 07:16

FaultInjectFramework.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- Generated by Apache Maven Doxia at 2014-02-11 -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 2.3.0 - Fault Injection Framework and Development Guide</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20140211" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xleft">
<a href="http://www.apache.org/" class="externalLink">Apache</a>
&gt;
<a href="http://hadoop.apache.org/" class="externalLink">Hadoop</a>
&gt;
<a href="../">Apache Hadoop Project Dist POM</a>
&gt;
Apache Hadoop 2.3.0
</div>
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://svn.apache.org/repos/asf/hadoop/" class="externalLink">SVN</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2014-02-11
&nbsp;| Version: 2.3.0
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Hadoop Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">File System Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Hadoop Compatibility</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Superusers</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">HDFS User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html">High Availability With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html">High Availability With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">HDFS Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">HDFS Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Hftp.html">HFTP</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">C API libhdfs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS REST API</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">HDFS NFS Gateway</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibilty between Hadoop 1.x and Hadoop 2.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">YARN Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">YARN Commands</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html">History Server</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/releasenotes.html">Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CHANGES.txt">Common CHANGES.txt</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CHANGES.txt">HDFS CHANGES.txt</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-mapreduce/CHANGES.txt">MapReduce CHANGES.txt</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!-- Licensed under the Apache License, Version 2.0 (the "License"); --><!-- you may not use this file except in compliance with the License. --><!-- You may obtain a copy of the License at --><!-- --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!-- --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. See accompanying LICENSE file. --><div class="section">
<h2>Fault Injection Framework and Development Guide<a name="Fault_Injection_Framework_and_Development_Guide"></a></h2>
<ul>
<li><a href="#Fault_Injection_Framework_and_Development_Guide">Fault Injection Framework and Development Guide</a>
<ul>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#Assumptions">Assumptions</a></li>
<li><a href="#Architecture_of_the_Fault_Injection_Framework">Architecture of the Fault Injection Framework</a>
<ul>
<li><a href="#Configuration_Management">Configuration Management</a></li>
<li><a href="#Probability_Model">Probability Model</a></li>
<li><a href="#Fault_Injection_Mechanism:_AOP_and_AspectJ">Fault Injection Mechanism: AOP and AspectJ</a></li>
<li><a href="#Existing_Join_Points">Existing Join Points</a></li></ul></li>
<li><a href="#Aspect_Example">Aspect Example</a></li>
<li><a href="#Fault_Naming_Convention_and_Namespaces">Fault Naming Convention and Namespaces</a></li>
<li><a href="#Development_Tools">Development Tools</a></li>
<li><a href="#Putting_It_All_Together">Putting It All Together</a>
<ul>
<li><a href="#How_to_Use_the_Fault_Injection_Framework">How to Use the Fault Injection Framework</a></li></ul></li>
<li><a href="#Additional_Information_and_Contacts">Additional Information and Contacts</a></li></ul></li></ul>
<div class="section">
<h3>Introduction<a name="Introduction"></a></h3>
<p>This guide provides an overview of the Hadoop Fault Injection (FI) framework for those who will be developing their own faults (aspects).</p>
<p>The idea of fault injection is fairly simple: it is an infusion of errors and exceptions into an application's logic to achieve a higher coverage and fault tolerance of the system. Different implementations of this idea are available today. Hadoop's FI framework is built on top of Aspect Oriented Paradigm (AOP) implemented by AspectJ toolkit.</p></div>
<div class="section">
<h3>Assumptions<a name="Assumptions"></a></h3>
<p>The current implementation of the FI framework assumes that the faults it will be emulating are of non-deterministic nature. That is, the moment of a fault's happening isn't known in advance and is a coin-flip based.</p></div>
<div class="section">
<h3>Architecture of the Fault Injection Framework<a name="Architecture_of_the_Fault_Injection_Framework"></a></h3>
<p>Components layout</p>
<div class="section">
<h4>Configuration Management<a name="Configuration_Management"></a></h4>
<p>This piece of the FI framework allows you to set expectations for faults to happen. The settings can be applied either statically (in advance) or in runtime. The desired level of faults in the framework can be configured two ways:</p>
<ul>
<li>editing src/aop/fi-site.xml configuration file. This file is similar to other Hadoop's config files</li>
<li>setting system properties of JVM through VM startup parameters or in build.properties file</li></ul></div>
<div class="section">
<h4>Probability Model<a name="Probability_Model"></a></h4>
<p>This is fundamentally a coin flipper. The methods of this class are getting a random number between 0.0 and 1.0 and then checking if a new number has happened in the range of 0.0 and a configured level for the fault in question. If that condition is true then the fault will occur.</p>
<p>Thus, to guarantee the happening of a fault one needs to set an appropriate level to 1.0. To completely prevent a fault from happening its probability level has to be set to 0.0.</p>
<p>Note: The default probability level is set to 0 (zero) unless the level is changed explicitly through the configuration file or in the runtime. The name of the default level's configuration parameter is fi.*</p></div>
<div class="section">
<h4>Fault Injection Mechanism: AOP and AspectJ<a name="Fault_Injection_Mechanism:_AOP_and_AspectJ"></a></h4>
<p>The foundation of Hadoop's FI framework includes a cross-cutting concept implemented by AspectJ. The following basic terms are important to remember:</p>
<ul>
<li>A cross-cutting concept (aspect) is behavior, and often data, that is used across the scope of a piece of software</li>
<li>In AOP, the aspects provide a mechanism by which a cross-cutting concern can be specified in a modular way</li>
<li>Advice is the code that is executed when an aspect is invoked</li>
<li>Join point (or pointcut) is a specific point within the application that may or not invoke some advice</li></ul></div>
<div class="section">
<h4>Existing Join Points<a name="Existing_Join_Points"></a></h4>
<p>The following readily available join points are provided by AspectJ:</p>
<ul>
<li>Join when a method is called</li>
<li>Join during a method's execution</li>
<li>Join when a constructor is invoked</li>
<li>Join during a constructor's execution</li>
<li>Join during aspect advice execution</li>
<li>Join before an object is initialized</li>
<li>Join during object initialization</li>
<li>Join during static initializer execution</li>
<li>Join when a class's field is referenced</li>
<li>Join when a class's field is assigned</li>
<li>Join when a handler is executed</li></ul></div></div>
<div class="section">
<h3>Aspect Example<a name="Aspect_Example"></a></h3>
<div>
<pre> package org.apache.hadoop.hdfs.server.datanode;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.fi.ProbabilityModel;
import org.apache.hadoop.hdfs.server.datanode.DataNode;
import org.apache.hadoop.util.DiskChecker.*;
import java.io.IOException;
import java.io.OutputStream;
import java.io.DataOutputStream;
/**
* This aspect takes care about faults injected into datanode.BlockReceiver
* class
*/
public aspect BlockReceiverAspects {
public static final Log LOG = LogFactory.getLog(BlockReceiverAspects.class);
public static final String BLOCK_RECEIVER_FAULT=&quot;hdfs.datanode.BlockReceiver&quot;;
pointcut callReceivePacket() : call (* OutputStream.write(..))
&amp;&amp; withincode (* BlockReceiver.receivePacket(..))
// to further limit the application of this aspect a very narrow 'target' can be used as follows
// &amp;&amp; target(DataOutputStream)
&amp;&amp; !within(BlockReceiverAspects +);
before () throws IOException : callReceivePacket () {
if (ProbabilityModel.injectCriteria(BLOCK_RECEIVER_FAULT)) {
LOG.info(&quot;Before the injection point&quot;);
Thread.dumpStack();
throw new DiskOutOfSpaceException (&quot;FI: injected fault point at &quot; +
thisJoinPoint.getStaticPart( ).getSourceLocation());
}
}
}</pre></div>
<p>The aspect has two main parts:</p>
<ul>
<li>The join point pointcut callReceivepacket() which servers as an identification mark of a specific point (in control and/or data flow) in the life of an application.</li>
<li>A call to the advice - before () throws IOException : callReceivepacket() - will be injected (see Putting It All Together) before that specific spot of the application's code.</li></ul>
<p>The pointcut identifies an invocation of class' java.io.OutputStream write() method with any number of parameters and any return type. This invoke should take place within the body of method receivepacket() from classBlockReceiver. The method can have any parameters and any return type. Possible invocations of write() method happening anywhere within the aspect BlockReceiverAspects or its heirs will be ignored.</p>
<p>Note 1: This short example doesn't illustrate the fact that you can have more than a single injection point per class. In such a case the names of the faults have to be different if a developer wants to trigger them separately.</p>
<p>Note 2: After the injection step (see Putting It All Together) you can verify that the faults were properly injected by searching for ajc keywords in a disassembled class file.</p></div>
<div class="section">
<h3>Fault Naming Convention and Namespaces<a name="Fault_Naming_Convention_and_Namespaces"></a></h3>
<p>For the sake of a unified naming convention the following two types of names are recommended for a new aspects development:</p>
<ul>
<li>Activity specific notation (when we don't care about a particular location of a fault's happening). In this case the name of the fault is rather abstract: fi.hdfs.DiskError</li>
<li>Location specific notation. Here, the fault's name is mnemonic as in: fi.hdfs.datanode.BlockReceiver[optional location details]</li></ul></div>
<div class="section">
<h3>Development Tools<a name="Development_Tools"></a></h3>
<ul>
<li>The Eclipse AspectJ Development Toolkit may help you when developing aspects</li>
<li>IntelliJ IDEA provides AspectJ weaver and Spring-AOP plugins</li></ul></div>
<div class="section">
<h3>Putting It All Together<a name="Putting_It_All_Together"></a></h3>
<p>Faults (aspects) have to injected (or woven) together before they can be used. Follow these instructions: * To weave aspects in place use:</p>
<div>
<pre> % ant injectfaults</pre></div>
<ul>
<li>If you misidentified the join point of your aspect you will see a warning (similar to the one shown here) when 'injectfaults' target is completed:
<div>
<pre> [iajc] warning at
src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
BlockReceiverAspects.aj:44::0
advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
has not been applied [Xlint:adviceDidNotMatch]</pre></div></li>
<li>It isn't an error, so the build will report the successful result. To prepare dev.jar file with all your faults weaved in place (HDFS-475 pending) use:
<div>
<pre> % ant jar-fault-inject</pre></div></li>
<li>To create test jars use:
<div>
<pre> % ant jar-test-fault-inject</pre></div></li>
<li>To run HDFS tests with faults injected use:
<div>
<pre> % ant run-test-hdfs-fault-inject</pre></div></li></ul>
<div class="section">
<h4>How to Use the Fault Injection Framework<a name="How_to_Use_the_Fault_Injection_Framework"></a></h4>
<p>Faults can be triggered as follows:</p>
<ul>
<li>During runtime:
<div>
<pre> % ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12</pre></div>
<p>To set a certain level, for example 25%, of all injected faults use:</p>
<div>
<pre> % ant run-test-hdfs-fault-inject -Dfi.*=0.25</pre></div></li>
<li>From a program:
<div>
<pre> package org.apache.hadoop.fs;
import org.junit.Test;
import org.junit.Before;
public class DemoFiTest {
public static final String BLOCK_RECEIVER_FAULT=&quot;hdfs.datanode.BlockReceiver&quot;;
@Override
@Before
public void setUp() {
//Setting up the test's environment as required
}
@Test
public void testFI() {
// It triggers the fault, assuming that there's one called 'hdfs.datanode.BlockReceiver'
System.setProperty(&quot;fi.&quot; + BLOCK_RECEIVER_FAULT, &quot;0.12&quot;);
//
// The main logic of your tests goes here
//
// Now set the level back to 0 (zero) to prevent this fault from happening again
System.setProperty(&quot;fi.&quot; + BLOCK_RECEIVER_FAULT, &quot;0.0&quot;);
// or delete its trigger completely
System.getProperties().remove(&quot;fi.&quot; + BLOCK_RECEIVER_FAULT);
}
@Override
@After
public void tearDown() {
//Cleaning up test test environment
}
}</pre></div></li></ul>
<p>As you can see above these two methods do the same thing. They are setting the probability level of <tt>hdfs.datanode.BlockReceiver</tt> at 12%. The difference, however, is that the program provides more flexibility and allows you to turn a fault off when a test no longer needs it.</p></div></div>
<div class="section">
<h3>Additional Information and Contacts<a name="Additional_Information_and_Contacts"></a></h3>
<p>These two sources of information are particularly interesting and worth reading:</p>
<ul>
<li><a class="externalLink" href="http://www.eclipse.org/aspectj/doc/next/devguide/">http://www.eclipse.org/aspectj/doc/next/devguide/</a></li>
<li>AspectJ Cookbook (ISBN-13: 978-0-596-00654-9)</li></ul>
<p>If you have additional comments or questions for the author check <a class="externalLink" href="https://issues.apache.org/jira/browse/HDFS-435">HDFS-435</a>.</p></div></div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">&#169; 2014
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a></div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>

Event Timeline