<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibilty between Hadoop 1.x and Hadoop 2.x</a>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!-- Licensed under the Apache License, Version 2.0 (the "License"); --><!-- you may not use this file except in compliance with the License. --><!-- You may obtain a copy of the License at --><!-- --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!-- --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
<h2>Hadoop HDFS over HTTP - Documentation Sets 2.3.0<a name="Hadoop_HDFS_over_HTTP_-_Documentation_Sets_2.3.0"></a></h2>
<p>HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the <b>webhdfs</b> REST HTTP API.</p>
<p>HttpFS can be used to transfer data between clusters running different versions of Hadoop (overcoming RPC versioning issues), for example using Hadoop DistCP.</p>
<p>HttpFS can be used to access data in HDFS on a cluster behind of a firewall (the HttpFS server acts as a gateway and is the only system that is allowed to cross the firewall into the cluster).</p>
<p>HttpFS can be used to access data in HDFS using HTTP utilities (such as curl and wget) and HTTP libraries Perl from other languages than Java.</p>
<p>The <b>webhdfs</b> client FileSytem implementation can be used to access HttpFS using the Hadoop filesystem command (<tt>hadoop fs</tt>) line tool as well as from Java aplications using the Hadoop FileSystem Java API.</p>
<p>HttpFS has built-in security supporting Hadoop pseudo authentication and HTTP SPNEGO Kerberos and other pluggable authentication mechanims. It also provides Hadoop proxy user support.</p>
<div class="section">
<h3>How Does HttpFS Works?<a name="How_Does_HttpFS_Works"></a></h3>
<p>HttpFS is a separate service from Hadoop NameNode.</p>
<p>HttpFS itself is Java web-application and it runs using a preconfigured Tomcat bundled with HttpFS binary distribution.</p>
<p>HttpFS HTTP web-service API calls are HTTP REST calls that map to a HDFS file system operation. For example, using the <tt>curl</tt> Unix command:</p>
<ul>
<li><tt>$ curl http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt</tt> returns the contents of the HDFS <tt>/user/foo/README.txt</tt> file.</li>
<li><tt>$ curl http://httpfs-host:14000/webhdfs/v1/user/foo?op=list</tt> returns the contents of the HDFS <tt>/user/foo</tt> directory in JSON format.</li>
<li><tt>$ curl -X POST http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=mkdirs</tt> creates the HDFS <tt>/user/foo.bar</tt> directory.</li></ul></div>
<div class="section">
<h3>How HttpFS and Hadoop HDFS Proxy differ?<a name="How_HttpFS_and_Hadoop_HDFS_Proxy_differ"></a></h3>
<p>HttpFS was inspired by Hadoop HDFS proxy.</p>
<p>HttpFS can be seening as a full rewrite of Hadoop HDFS proxy.</p>
<p>Hadoop HDFS proxy provides a subset of file system operations (read only), HttpFS provides support for all file system operations.</p>
<p>HttpFS uses a clean HTTP REST API making its use with HTTP tools more intuitive.</p>
<p>HttpFS supports Hadoop pseudo authentication, Kerberos SPNEGOS authentication and Hadoop proxy users. Hadoop HDFS proxy did not.</p></div>
<div class="section">
<h3>User and Developer Documentation<a name="User_and_Developer_Documentation"></a></h3>
<ul>
<li><a href="./ServerSetup.html">HttpFS Server Setup</a></li>