mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯].rar
mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯],mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯]包含中文翻譯和英文原文,內(nèi)容詳細(xì)完整,建議下載參考!中文: 16573 字英文: 34600字符摘要mapreduce是一種編程模型,并且是一種聯(lián)合處理和產(chǎn)生大數(shù)集的執(zhí)行過(guò)程。用戶指定一個(gè)映射(map)函數(shù),用來(lái)處理一個(gè)產(chǎn)生其他key/value媒介對(duì)的key/val...
該文檔為壓縮文件,包含的文件列表如下:
內(nèi)容介紹
原文檔由會(huì)員 xiaowei 發(fā)布
Mapreduce:在大集群上處理數(shù)據(jù)[外文及翻譯]
包含中文翻譯和英文原文,內(nèi)容詳細(xì)完整,建議下載參考!
中文: 16573 字
英文: 34600字符
摘要
MapReduce是一種編程模型,并且是一種聯(lián)合處理和產(chǎn)生大數(shù)集的執(zhí)行過(guò)程。用戶指定一個(gè)映射(map)函數(shù),用來(lái)處理一個(gè)產(chǎn)生其他key/value媒介對(duì)的key/value對(duì);用戶指定一個(gè)化簡(jiǎn)(reduce)函數(shù),合并所有的媒介value和key。這篇論文將表明,許多現(xiàn)實(shí)世界的任務(wù)都可以用這個(gè)模型描述。以這個(gè)函數(shù)形式寫(xiě)出來(lái)的程序都是自動(dòng)并行化的,并且執(zhí)行在家用計(jì)算機(jī)組成的云中。這個(gè)實(shí)時(shí)系統(tǒng)有以下功能:保存分離的數(shù)據(jù);部署程序在一組機(jī)器上執(zhí)行;處理機(jī)器錯(cuò)誤;管理機(jī)器之間的通信。這允許程序員無(wú)需任何并行和分布式系統(tǒng)的經(jīng)驗(yàn),就能很容易地使用大分布系統(tǒng)的資源。我們的MapReduce程序運(yùn)行在許多家用計(jì)算機(jī)組成的云上,并且高度分級(jí)化。一個(gè)典型的MapReduce計(jì)算,在數(shù)以千計(jì)的計(jì)算機(jī)上處理吉兆字節(jié)的數(shù)據(jù)。程序員會(huì)發(fā)現(xiàn)此系統(tǒng)容易使用,即數(shù)以百計(jì)的MapReduce程序被植入,每天超過(guò)一千個(gè)MapReduce被實(shí)施在Google的云上 ......
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the pro-gram's execution across a set of machines, handling ma-chine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce pro-grams have been implemented and upwards of one thou-sand MapReduce jobs are executed on Google's clusters every day ......
包含中文翻譯和英文原文,內(nèi)容詳細(xì)完整,建議下載參考!
中文: 16573 字
英文: 34600字符
摘要
MapReduce是一種編程模型,并且是一種聯(lián)合處理和產(chǎn)生大數(shù)集的執(zhí)行過(guò)程。用戶指定一個(gè)映射(map)函數(shù),用來(lái)處理一個(gè)產(chǎn)生其他key/value媒介對(duì)的key/value對(duì);用戶指定一個(gè)化簡(jiǎn)(reduce)函數(shù),合并所有的媒介value和key。這篇論文將表明,許多現(xiàn)實(shí)世界的任務(wù)都可以用這個(gè)模型描述。以這個(gè)函數(shù)形式寫(xiě)出來(lái)的程序都是自動(dòng)并行化的,并且執(zhí)行在家用計(jì)算機(jī)組成的云中。這個(gè)實(shí)時(shí)系統(tǒng)有以下功能:保存分離的數(shù)據(jù);部署程序在一組機(jī)器上執(zhí)行;處理機(jī)器錯(cuò)誤;管理機(jī)器之間的通信。這允許程序員無(wú)需任何并行和分布式系統(tǒng)的經(jīng)驗(yàn),就能很容易地使用大分布系統(tǒng)的資源。我們的MapReduce程序運(yùn)行在許多家用計(jì)算機(jī)組成的云上,并且高度分級(jí)化。一個(gè)典型的MapReduce計(jì)算,在數(shù)以千計(jì)的計(jì)算機(jī)上處理吉兆字節(jié)的數(shù)據(jù)。程序員會(huì)發(fā)現(xiàn)此系統(tǒng)容易使用,即數(shù)以百計(jì)的MapReduce程序被植入,每天超過(guò)一千個(gè)MapReduce被實(shí)施在Google的云上 ......
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the pro-gram's execution across a set of machines, handling ma-chine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce pro-grams have been implemented and upwards of one thou-sand MapReduce jobs are executed on Google's clusters every day ......