Thursday 3 September 2015

Hadoop Online Quiz Questions And Answers

31. What is map - side join?
A . Map-side join is done in the map phase and done in memory
B . Map-side join is a technique in which data is eliminated at the map step
C . Map-side join is a form of map-reduce API which joins data from different locations
D . None of these answers are correct
Answer: A

32. What is reduce - side join?
A. Reduce-side join is a technique to eliminate data from initial data set at reduce step
B. Reduce-side join is a technique for merging data from different sources based on a specific key.
C. Reduce-side join is a set of API to merge data from different sources.
D. None of these answers are correct
Answer: B

34. What is PIG?
A. Pig is a subset fo the Hadoop API for data processing
B. Pig is a part of the Apache Hadoop project that provides C-like scripting languge interface for data processing
C. Pig is a part of the Apache Hadoop project. It is a "PL-SQL" interface for data processing in Hadoop cluster
D. PIG is the third most popular form of meat in the US behind poultry and beef.
Answer: B

35. How can you disable the reduce step?
A. The Hadoop administrator has to set the number of the reducer slot to zero on all slave nodes. This will disable the reduce step.
B. It is imposible to disable the reduce step since it is critical part of the Mep-Reduce abstraction.
C. A developer can always set the number of the reducers to zero. That will completely disable the reduce step.
D. While you cannot completely disable reducers you can set output to one. There needs to be at least one reduce step in Map-Reduce abstraction.
Answer: C

36. Why would a developer create a map-reduce without the reduce step?
A. Developers should design Map-Reduce jobs without reducers only if no reduce slots are available on the cluster.
B. Developers should never design Map-Reduce jobs without reducers. An error will occur upon compile.
C. There is a CPU intensive step that occurs between the map and reduce steps. Disabling the reduce step speeds up data processing.
D. It is not possible to create a map-reduce job without at least one reduce step. A developer may decide to limit to one reducer for debugging purposes.
Answer: C

37. What is the default input format?
A. The default input format is xml. Developer can specify other input formats as appropriate if xml is not the correct input.
B. There is no default input format. The input format always should be specified.
C. The default input format is a sequence file format. The data needs to be preprocessed before using the default input format.
D. The default input format is TextInputFormat with byte offset as a key and entire line as a value.
Answer: D

38. How can you overwrite the default input format?
A. In order to overwrite default input format, the Hadoop administrator has to change default settings in config file.
B. In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster.
C. The default input format is controlled by each individual mapper and each line needs to be parsed indivudually.
D. None of these answers are correct.
Answer: B

39. What are the common problems with map-side join?
A. The most common problem with map-side joins is introducing a high level of code complexity.
This complexity has several downsides: increased risk of bugs and performance degradation.
Developers are cautioned to rarely use map-side joins.
B. The most common problem with map-side joins is lack of the avaialble map slots since map-side joins require a lot of mappers.
C. The most common problems with map-side joins are out of memory exceptions on slave nodes.
D. The most common problem with map-side join is not clearly specifying primary index in the join.
This can lead to very slow performance on large datasets.
Answer: C

40. Which is faster: Map-side join or Reduce-side join? Why?
A. Both techniques have about the the same performance expectations.
B. Reduce-side join because join operation is done on HDFS.
C. Map-side join is faster because join operation is done in memory.
D. Reduce-side join because it is executed on a the namenode which will have faster CPU and more memory.
Answer: C

More Questions & Answers:-
Page1 Page2 Page3 Page4 Page5 Page6 Page7

No comments:

Post a Comment