关于ZAKER 融媒体解决方案 合作 加入

python- 为什么 multiprocessing.Pool.map_async 中的 get.

CocoaChina 10-23

Am I doing something wrong here?

别着急 , 许多用户做的都非常一样 - 付费比收到的还多 .

这是一个普通的演讲 , 不是讲一些 " 有前途的 " 语法构造函数 , 而是讲解使用它的实际成本 .

故事很长 , 效果很直接 - 您期望得到一个低落的果实 , 但是必须付出巨大的过程实例化 , 工作包重新分发以及结果收集的费用 , 所有这些只是为了做些事情而已 . 轮 func ( ) 调用 .

哇!停止!并行化带给我 , 它将由 SPEEDUP 处理吗?!?

Well, who told you that any such ( potential ) speedup is for free?

让我们定量一些 , 而不是测量情绪 , 而是测量实际的代码执行时间 , 对吗?

基准测试始终是公平的举动 .

它帮助我们 , 凡人 , 摆脱正义的期望

并获得定量证据支持的知识:

from zmq import Stopwatch; aClk = Stopwatch ( ) # this is a handy tool to do so

AS-IS 测试:

在继续前进之前 , 应该先记录下这对:

>>> aClk.start ( ) ; _ = [ func ( SEQi ) for SEQi in inp ] ; aClk.stop ( ) # [ SEQ ] >>> HowMuchWillWePAY2RUN ( func, 4, 100 ) # [ RUN ] >>> HowMuchWillWePAY2MAP ( func, 4, 100 ) # [ MAP ]

如果希望使用其他任何工具来扩展实验 , 这将设置性能包络的范围 , 从纯 [ SERIAL ] [ SEQ ] - 调用到未优化的 joblib.Parallel ( ) 或任何其他性能包 . 表示 multiprocessing.Pool ( ) 或其他 .

测试案例 A:

意图:

以衡量 { 流程 | } 实例 , 我们需要一个 NOP-work-package 负载 , 该负载几乎什么都不会 " 有 ", 而是返回 " 后退 ", 并且不需要支付任何额外的附加成本 ( 无论是用于任何输入参数的传输还是返回任何值 )

def a_NOP_FUN ( aNeverConsumedPAR ) : """ __doc__ The intent of this FUN ( ) is indeed to do nothing at all, so as to be able to benchmark all the process-instantiation add-on overhead costs. """ pass

因此 , 这里的设置费用与附加费用比较:

#-------------------------------------------------------<function a_NOP_FUN [ SEQ ] -pure- [ SERIAL ] worked within ~ 37 .. 44 [ us ] on this localhost [ MAP ] -just- [ CONCURENT ] tool 2536 .. 7343 [ us ] [ RUN ] -just- [ CONCURENT ] tool 111162 .. 112609 [ us ]

在 joblib.Parallel ( ) 任务处理上使用 joblib.delayed ( ) 策略:

def HowMuchWillWePAY2RUN ( aFun2TEST = a_NOP_FUN, JOBS_TO_SPAWN = 4, RUNS_TO_RUN = 10 ) : from zmq import Stopwatch; aClk = Stopwatch ( ) try: aClk.start ( ) joblib.Parallel ( n_jobs = JOBS_TO_SPAWN ) ( joblib.delayed ( aFun2TEST ) ( aFunPARAM ) for ( aFunPARAM ) in range ( RUNS_TO_RUN ) ) except: pass finally: try: _ = aClk.stop ( ) except: _ = -1 pass pass; pMASK = "CLK:: {0:_>24d} [ us ] @{1: >4d}-JOBs ran{2: >6d} RUNS {3:}" print ( pMASK.format ( _, JOBS_TO_SPAWN, RUNS_TO_RUN, " ".join ( repr ( aFun2TEST ) .split ( " " ) [ :2 ] ) ) )

在 multiprocessing.Pool ( ) 实例上使用 lightweight.map_async ( ) 方法的策略:

def HowMuchWillWePAY2MAP ( aFun2TEST = a_NOP_FUN, PROCESSES_TO_SPAWN = 4, RUNS_TO_RUN = 1 ) : from zmq import Stopwatch; aClk = Stopwatch ( ) try: import numpy as np import multiprocessing as mp pool = mp.Pool ( processes = PROCESSES_TO_SPAWN ) inp = np.linspace ( 0.01, 1.99, 100 ) aClk.start ( ) for i in xrange ( RUNS_TO_RUN ) : pass; result = pool.map_async ( aFun2TEST, inp ) output = result.get ( ) pass except: pass finally: try: _ = aClk.stop ( ) except: _ = -1 pass pass; pMASK = "CLK:: {0:_>24d} [ us ] @{1: >4d}-PROCs ran{2: >6d} RUNS {3:}" print ( pMASK.format ( _, PROCESSES_TO_SPAWN, RUNS_TO_RUN, " ".join ( repr ( aFun2TEST ) .split ( " " ) [ :2 ] ) ) )

所以 ,

第一组痛苦和惊喜

直接在并发的 joblib.Parallel ( ) 池中实际付出的代价是:

CLK:: __________________117463 [ us ] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN CLK:: __________________111182 [ us ] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110229 [ us ] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110095 [ us ] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________111794 [ us ] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110030 [ us ] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________110697 [ us ] @ 3-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: _________________4605843 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________336208 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________298816 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________355492 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________320837 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________308365 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________372762 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________304228 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________337537 [ us ] @ 123-JOBs ran 100 RUNS <function a_NOP_FUN CLK:: __________________941775 [ us ] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: __________________987440 [ us ] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: _________________1080024 [ us ] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: _________________1108432 [ us ] @ 123-JOBs ran 10000 RUNS <function a_NOP_FUN CLK:: _________________7525874 [ us ] @ 123-JOBs ran100000 RUNS <function a_NOP_FUN

因此 , 这项科学公正 , 严格的测试从有史以来最简单的情况开始 , 已经显示了所有相关的代码执行处理设置的基准成本 , 而开销却是最小的 joblib.Parallel ( ) 惩罚正弦准非 .

这将我们带入了一个现实算法运行的方向 - 最好在下一步向测试循环中添加一些越来越大的 " 有效载荷 " 大小 .

现在 , 我们知道进入"just"- [ CONCURRENT ] 代码执行的惩罚 - 接下来呢?

使用这种系统的 , 轻量级的方法 , 我们可能会继续前进 , 因为我们还需要基准化附加成本和其他 { 远程工作 -PAR-XFER | 的阿姆达尔定律间接影响 . remote-job-MEM.alloc ( s ) | 远程作业 CPU 绑定处理 | remote-job-fileIO ( s ) }

这样的功能模板可能有助于重新测试 ( 如您所见 , 将有很多要重新运行 , 而操作系统噪声和一些其他工件将进入实际的使用成本模式 ) :

测试案例 B:

一旦我们支付了前期费用 , 下一个最常见的错误就是忘记内存分配的费用 . 因此 , 让我们对其进行测试:

def a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR ( aNeverConsumedPAR, SIZE1D = 1000 ) : """ __doc__ The intent of this FUN ( ) is to do nothing but a MEM-allocation so as to be able to benchmark all the process-instantiation add-on overhead costs. """ import numpy as np # yes, deferred import, libs do defer imports aMemALLOC = np.zeros ( ( SIZE1D, # so as to set SIZE1D, # realistic ceilings SIZE1D, # as how big the "Big Data" SIZE1D # may indeed grow into ) , dtype = np.float64, order = 'F' ) # .ALLOC + .SET aMemALLOC [ 2,3,4,5 ] = 8.7654321 # .SET aMemALLOC [ 3,3,4,5 ] = 1.2345678 # .SET return aMemALLOC [ 2:3,3,4,5 ]

万一您的平台停止能够分配所请求的内存块 , 我们就会遇到另一种问题 ( 如果尝试以与物理资源无关的方式进行并行处理 , 则存在一类隐藏的玻璃天花板 ) . 可以编辑 SIZE1D 缩放比例 , 以便至少适合平台 RAM 的寻址 / 大小调整功能 , 但是 , 实际问题计算的性能范围仍然值得我们关注:

>>> HowMuchWillWePAY2RUN ( a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR, 200, 1000 )

可能会产生

支付费用 , 介于 0.1 [ s ] 和 9 [ s ] 之间 ( !! )

只是为了保持静止状态 , 但现在也不必忘记一些实际的 MEM 分配附加成本 " 有 "

CLK:: __________________116310 [ us ] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________120054 [ us ] @ 4-JOBs ran 10 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________129441 [ us ] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________123721 [ us ] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________127126 [ us ] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________124028 [ us ] @ 10-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________305234 [ us ] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________243386 [ us ] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________241410 [ us ] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________267275 [ us ] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________244207 [ us ] @ 100-JOBs ran 100 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________653879 [ us ] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________405149 [ us ] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________351182 [ us ] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________362030 [ us ] @ 100-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: _________________9325428 [ us ] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________680429 [ us ] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________533559 [ us ] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: _________________1125190 [ us ] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATORCLK:: __________________591109 [ us ] @ 200-JOBs ran 1000 RUNS <function a_NOP_FUN_WITH_JUST_A_MEM_ALLOCATOR

测试用例 C:

亲切的read the tail sections of this post

测试案例 D:

结语:

对于每个 " 承诺 ", 公平的最佳下一步是首先对交叉的实际代码执行成本进行交叉验证 , 然后再开始进行任何代码重新设计 . 即使原始的 , 天真无用的阿姆达尔定律可能产生了某些预期的加速效果 , 但现实世界平台的附加成本总和可能会破坏任何预期的加速 .

正如 Walter E. Deming 先生多次表达的那样 , 没有 DATA, 我们就只能选择意见 .

? 奖励部分:

读到这里 , 可能已经发现 #Line2 本身没有任何 " 缺点 " 或 " 错误 ", 但是仔细的设计实践将显示出更好的语法构造函数 , 其花费更少的时间用于 ( 在代码执行平台上允许的实际资源 ( CPU,MEM,IO,O / S ) 允许的范围内 ) 实现更多目标 . 其他任何方面都与盲目地讲述《财富》没有本质上的不同 .

以上内容由"CocoaChina"上传发布 查看原文
相关标签 numpy轻量级正弦

觉得文章不错,微信扫描分享好友

扫码分享

热门推荐

查看更多内容