Allowing GPU memory growth

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two Config options on the Session to control this.

The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process. Note that we do not release memory, since that can lead to even worse memory fragmentation. To turn this option on, set the option in the ConfigProto by:

config = tf.ConfigProto()config.gpu_options.allow_growth = Truesession = tf.Session(config=config, ...)

The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. For example, you can tell TensorFlow to only allocate 40% of the total memory of each GPU by:

config = tf.ConfigProto()config.gpu_options.per_process_gpu_memory_fraction = 0.4session = tf.Session(config=config, ...)

This is useful if you want to truly bound the amount of GPU memory available to the TensorFlow process.

原文地址:https://www.cnblogs.com/jiu0821/p/9166281.html

时间: 2024-08-30 02:22:09

Allowing GPU memory growth的相关文章

Reducing and Profiling GPU Memory Usage in Keras with TensorFlow Backend

keras 自适应分配显存 & 清理不用的变量释放 GPU 显存 Intro Are you running out of GPU memory when using keras or tensorflow deep learning models, but only some of the time? Are you curious about exactly how much GPU memory your tensorflow model uses during training? Are

CUDA ---- Memory Model

Memory kernel性能高低是不能单纯的从warp的执行上来解释的.比如之前博文涉及到的,将block的维度设置为warp大小的一半会导致load efficiency降低,这个问题无法用warp的调度或者并行性来解释.根本原因是获取global memory的方式很差劲. 众所周知,memory的操作在讲求效率的语言中占有极重的地位.low-latency和high-bandwidth是高性能的理想情况.但是购买拥有大容量,高性能的memory是不现实的,或者不经济的.因此,我们就要尽量

转载:.NET Memory Leak: XmlSerializing your way to a Memory Leak

原文地址:http://blogs.msdn.com/b/tess/archive/2006/02/15/532804.aspx I hate to give away the resolution in the title of the blog since it takes away a lot of the suspense:) but I can't figure out a better way to name the blog posts and still keep them ni

CUDA系列学习(二)CUDA memory & variables

本文来介绍CUDA的memory和变量存放,分为以下章节: (一).CPU Memory 结构 (二).GPU Memory结构 (三).CUDA Context (四).kernel设计 (五).变量 & Memory 5.1 global arrays 5.2 global variables 5.3 Constant variables 5.4 Register 5.5 Local Array 5.6 Shared Memory 5.7 Texture Memory 5.8 总结 (一).

GPU openEXR image(RGBA) -> gray image

<1> Basic #include <stdio.h> #include <cuda_runtime.h> #include <device_launch_parameters.h> #define NUM 15 __global__ void square(float *dout,float *din) { int idx = threadIdx.x; float f = din[idx]; dout[idx] = f*f; } int main(int

[Attila GPU] Attila OGL2/D3D9 GPU C Model Simulator

http://www.opengpu.org/forum.php?mod=viewthread&tid=1094&highlight=Attila 查看: 4979|回复: 14    [Attila GPU] Attila OGL2/D3D9 GPU C Model Simulator [复制链接]     ic.expert 管理员 注册时间 2007-7-11 积分 32646 串个门 加好友 打招呼 发消息 电梯直达 1#  发表于 2009-10-19 01:29:41 |只看该

About Memory Analysis

关于内存分析About Memory Analysis 每当应用程序创建对象时,都会为它们分配内存.传统上,它已被应用的工作跟踪这些对象并释放他们时,他们不再需要的内存可以分配其他对象.自动引用计数(ARC)是一种通过让系统负责内存管理而使事情变得更容易的特性.在启用ARC的情况下,系统处理监控对象分配,并在适当时释放它们,只剩下很少的应用程序要做.然而,不管内存是如何管理的,即使是最好的应用程序设计也会遇到难以分离的偶尔内存问题.Whenever your app creates object

关于多个程序同时launch kernels on the same GPU

原谅我中英文混杂. 现在,我需要多个程序同时运行,每个程序都会多次运行GPU kernel.这些Kernels 能否并行执行呢? 答案是 不能并行执行 (除非使用 GPU multi-process server) 如果是runtime 创建的primary context,一个程序的多个线程可以共享:通过使用stream,可以实现多个kernel并行执行. 如果是driver 创建的 standard context,一个程序的多个线程是不能共享的:可以通过context migration,

GPU keylogger &amp;&amp; GPU Based rootkit(Jellyfish rootkit)

catalog 1. OpenCL 2. Linux DMA(Direct Memory Access) 3. GPU rootkit PoC by Team Jellyfish 4. GPU keylogger 5. DMA Hack 1. OpenCL OpenCL(Open Computing Language)是第一个面向异构系统通用目的并行编程的开放式.免费标准,也是一个统一的编程环境,便于软件开发人员为高性能计算服务器.桌面计算系统.手持设备编写高效轻便的代码,而且广泛适用于多核心处