In most examples of Hadoop code there is no reason to access a local file system. All of the data is passed using the standard map and reduce methods. Indeed it is usually a bad idea to access a local filesystem on a slave processor because that data will not be persisted from one processing step to the next. Sometimes, however, these rules have to change.
One case where it is particularly necessary to access a local filesystem is where a critical step in either the mapper or the reducer involves launching a separate process where the application assumes the existence of certain local files. Normally when Hadoop uses an external process it uses Hadoop streaming which assumes that the external process takes all of its data from standard in and sends all of its output to standard out. These assumptions may fail under several conditions. First, the external process may require more than one input. For example, one or more configuration files may be required. In addition, it assumes that the developer has sufficient control over the external process and the way it functions to make it compatible with Hadoop streaming.
In many cases these assumptions may be unrealistic. Nothing prevents a custom mapper or reducer from writing appropriate files on the local filesystem for an external program and after watching that program reading any output files that have been written.
There are two ways to get files to the local filesystem of a slave processor. One is to use Hadoop's distributed cache which will send files specified in the job configuration to each slaves local file system. The distributed cache will be a topic of another blog entry. This entry will concentrate on reading and writing local files. The alternative is to have these slave process right the files directly to the file system. Files which will be required during all steps of processing may be written to a local filesystem during the setup phase. Files required only for a single step of processing may be written during that step and, if no longer required, deleted at the end of that step.
. Hadoop supplies a LocalFileSystem object which manages the relationship to the local file system. The code below shows how to get a LocalFileSystem given a Hadoop context.
Configuration configuration = context.getConfiguration();
 LocalFileSystem localFs = FileSystem.getLocal(configuration); 
The LocalFileSystem has methods create, delete, open and append to files on the local filesystem. Each file is designated by a Path. In my work I have made these Paths relative since I am uncertain about where a program is running or what permissions are available on a slave processer.
The following code is a set of static utility routines that write to the local filesystem. I consider three cases in the first the data is a string, possibly the contents of a Text object passed in. In the second, the contents are a resource passed in with a custom jar file. Resources are very convenient when data must be passed to every instance and where the data is not large relative to the size of the jar file. Both of these end up calling a routine which writes the contents of an InputStream to the local file system. This allows a third possibility where the data source is anything that can supply an input stream, very specifically Web services and other data sources. Will I
/**
 * write a resource to a  LocalFileSystem
 * @param cls - class holding the resource
 * @param resourceName - !null name of the resource
 * @param localFs - !null file system
 * @param dstFile - !null local file name - this will become a path
 */
 public static void writeResourceAsFile(Class cls,String resourceName, LocalFileSystem localFs, String dstFile) {
     InputStream inp = cls.getResourceAsStream(resourceName);
     writeStreamAsFile(localFs, dstFile, inp);
 } 
/**
 * Write the contents of a stream to the local file system
 * @param localFs  - !null file system
 * @param dstFile - !null local file name - this will become a path
 * @param pInp - !null open Stream 
 */
public static void writeStreamAsFile(final LocalFileSystem localFs, final String dstFile, final InputStream pInp) {
    Path path = new Path(dstFile);
    try {
        FSDataOutputStream outStream = localFs.create(path);
        copyFile(pInp, outStream);
    }
    catch (IOException e) {
        throw new RuntimeException(e);  
    }
}  
/**
 * Write the contents of a String to the local file system
 * @param localFs  - !null file system
 * @param dstFile - !null local file name - this will become a path
  * @param s  !null String
 */
public static void writeStringAsFile(final LocalFileSystem localFs, final String dstFile, final String s) {
    ByteArrayInputStream inp = new ByteArrayInputStream(s.getBytes());
     writeStreamAsFile(localFs,dstFile, inp);
}  
/**
 * copy an  InputStream to an outStream
 * @param inp - !null open Stream it will be closed at the end
 * @param outStream !null open Stream it will be closed at the end
 * @return  true on success
 */
 public static boolean copyFile(InputStream inp, FSDataOutputStream outStream) {
     int bufsize = 1024;
     try {
         // failure - no data  
         int bytesRead = 0;
         byte[] buffer = new byte[bufsize];
         while ((bytesRead = inp.read(buffer, 0, bufsize)) != -1) {
             outStream.write(buffer, 0, bytesRead);
         }
         inp.close();
         outStream.close();
         return true;
     }
     catch (IOException ex) {
         return (false);
     }
 }
 
 
 
Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!
ReplyDeleteWeb Developer Melbourne
ReplyDeleteIt is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
Android Training in Chennai
Ios Training in Chennai
Updating with the current trend is strictly advisable and the content furnished here also states the same. Thanks for sharing this wonderful and worth able article in here. The way to expressed is simply awesome. Keep doing this job. Thanks :)
ReplyDeleteVisit SKARTEC
Click Here
SKARTEC Digital Marketing Academy
digital marketing course in chennai with placement
digital marketing training institute in chennai
digital marketing course near me
digital marketing course in chennai fees
best institute for digital marketing course in chennai
digital marketing course with placement
online digital marketing course in chennai
advance digital marketing course in chennai
digital marketing training institute near me
digital marketing course near me
digital marketing training in india
seo training
It is a precious concept for all.
ReplyDeleteBig Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery