The point of this exercise is to create a system that allows the MapReduce magic of distributedprocessing of large amounts of data to happen closer to the data itself.
The distributeddata query is the core of the distributed database managing system, but the optimized query algorithm is the key technology in query processing.
分布式数据查询是分布式数据库管理系统的核心,而查询优化算法又是查询处理中的关键技术。
3
A distributed file system like HDFS allows storing static files for batch processing. Effectively a system like this allows storing and processing historical data from the past.