The point of this exercise is to create a system that allows the MapReduce magic of distributedprocessing of large amounts of data to happen closer to the data itself.
The two core components are the Hadoop Distributed File System for storing data and Hadoop MapReduce for writing parallel-processing jobs.
其中两个核心组件是用于存储数据的Hadoop Distributed File System (Hadoop分布式文件系统)和用于写入并行处理任务的Hadoop MapReduce。
3
A distributed file system like HDFS allows storing static files for batch processing. Effectively a system like this allows storing and processing historical data from the past.