Timeout Detection & Recovery (TDR)

Timeout Detection & Recovery (TDR)

NVIDIA® Nsight™ Development Platform, Visual Studio Edition 2.2 User Guide 
Send Feedback


TDR stands for Timeout Detection and Recovery. This is a feature of the Windows operating system which detects response problems from a graphics card, and recovers to a functional desktop by resetting the card. If the operating system does not receive a response from a graphics card within a certain amount of time (default is 2 seconds), the operating system resets the graphics card.

Before TDR existed, problems of this nature would have resulted in a system freeze and required a reboot of the operating system.  If TDR is enabled and you see the TDR error message, "Display driver stopped responding and has recovered," this means that the Windows operating system reset the display driver.

There are three different possible debugging configurations:

  • Local debugging with a single GPU,
  • Local debugging with multiple GPUs, or
  • Remote debugging.

Choose the one that most closely reflects your NVIDIA Nsight setup:

Local Debugging with a Single GPU

Disabling TDR removes a valuable layer of protection, so it is generally recommended that you keep it enabled.

However, setting the TDR delay too low can cause the debugger to fail for one of two reasons:

  • Debugging on some GPUs will fails with a TDR delay of less than 10 seconds.
  • Debug builds of CUDA kernels run more slowly and may intrinsically require additional time to complete. With too low of a TDR delay, the kernels may not have enough time to complete.

Therefore, if you are using local debugging with a single GPU, it‘s recommended that you leave TDR enabled, and set the delay to 10 seconds.

To enable TDR and change the delay, do the following:

  1. Right-click the Nsight Monitor icon in the system tray.
  2. Select Options. 

  3. In the Options window on the General tab, set WDDM TDR enabled to True
    Change the WDDM TDR Delay from the default setting to 10.

Local Debugging with Multiple GPUs or Remote Debugging

When using either a local debugging configuration with multiple GPUs, or a remote debugging configuration, it‘s important to disable TDR. This is because with most CUDA applications, a TDR means that any debugging operation after the TDR will fail. You will not be able to step, set breakpoints, view variables, etc. The application will receive a grid launch failure, and the CUcontext will begin to report errors.

Having TDR enabled can interfere with GPU debugging because the graphics card is perceived by the operating system as unresponsive when the execution of a target application is paused or when the debugger is performing certain operations.

To disable TDR, do the following:

  1. Right-click the Nsight Monitor icon in the system tray.
  2. Select Options. 

  3. In the Options window on the General tab, set WDDM TDR enabled to False.

For more information about TDR, see:

http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx

Timeout Detection & Recovery (TDR)

时间: 2024-11-08 01:48:34

Timeout Detection & Recovery (TDR)的相关文章

解决CUDA程序的黑屏恢复问题

本文引用自 http://blog.163.com/yuhua_kui/blog/static/9679964420146183211348/ 问题描述: 在运行CUDA程序时,出现黑屏,过一会儿屏幕恢复之后,出现如下界面:<显卡挂掉了 > ============================================================================== 解决方案:  调整计算机的TDR值    Timeout Detection & Reco

Windows平台CUDA开发之前的准备工作

CUDA是NVIDIA的GPU开发工具,目前在大规模并行计算领域有着广泛应用. windows平台上面的CUDA开发之前,最好去NVIDIA官网查看说明,然后下载相应的driver, ToolKits等等.如果你下载最新版本的CUDA7.0,里面其实已经包含了driver及Tool kits. 特别要注意:目标最高版本为CUDA7.0,仅支持64位系统(32位没法安装CUDA 7.0 Tool Kits),另外,VS编译平台最低要求是VS2010. So,那些依然用VC6或者VS2008的就别犹

redis源码学习(客户端)

大概介绍 redis 客户端设计主要是存储客户的链接,请求,请求解析的命令,执行结果.先看server的结构和client的结构,server里面有多个client,相当于一个服务端可以连多个客户端,服务端根据事件触发模式依次处理客户端的请求. server结构 struct redisServer { /* General */ // 配置文件的绝对路径 char *configfile; /* Absolute config file path, or NULL */ // serverCr

理解 OpenStack 高可用(HA) (6): Pacemaker 和 OpenStack Resource Agent (RA)

本系列会分析OpenStack 的高可用性(HA)概念和解决方案: (1)OpenStack 高可用方案概述 (2)Neutron L3 Agent HA - VRRP (虚拟路由冗余协议) (3)Neutron L3 Agent HA - DVR (分布式虚机路由器) (4)Pacemaker 和 OpenStack Resource Agent (RA) (5)RabbitMQ HA (6)MySQL HA 1. Pacemaker 1.1 概述 Pacemaker 承担集群资源管理者(CR

Android 7.0 ActivityManagerService(5) 广播(Broadcast)相关流程分析

本篇博客旨在分析Android中广播相关的源码流程. 一.基础知识 广播(Broadcast)是一种Android组件间的通信方式. 从本质上来看,广播信息的载体是intent.在这种通信机制下,发送intent的对象就是广播发送方,接收intent的对象就是广播接收者. 在Android中,为广播接收者定义了一个单独的组件:BroadcastReceiver. 1 BroadcastReceiver的注册类型 在监听广播前,要将BroadcastReceiver注册到系统中. Broadcas

redis源码分析(4)——发送响应内容

前一篇介绍了redis处理请求的过程,接下来是如何发送响应内容. 在请求处理完之后,进行响应时,需要调用addReplyXXX族函数,具体包括: void addReply(redisClient *c, robj *obj) void addReplySds(redisClient *c, sds s) void addReplyString(redisClient *c, char *s, size_t len) 这几个函数又会被封装成addReplyBulk等函数.这里以addReply为

Redis源码解析:14Redis服务器与客户端间的交互

Redis服务器是典型的一对多服务器程序,通过使用由IO多路复用技术实现的文件事件处理器,Redis服务器使用单线程单进程的方式来处理命令请求,并与多个客户端进行网络通信. Redis客户端与服务器之间通过TCP协议进行通信.TCP协议是一种流式协议,数据以字节流的形式进行传递,没有固有的"报文"或"报文边界"的概念,如果需要设置边界,需要应用层自行处理. 因此,Redis客户端与服务器之间的交互数据,都按照Redis自定义的统一请求协议的格式进行编码.使用这种协议

第10课:[实战] Redis 网络通信模块源码分析(3)

redis-server 接收到客户端的第一条命令 redis-cli 给 redis-server 发送的第一条数据是 *1\r\n\$7\r\nCOMMAND\r\n .我们来看下对于这条数据如何处理,单步调试一下 readQueryFromClient 调用 read 函数收取完数据,接着继续处理 c→querybuf 的代码即可.经实际跟踪调试,调用的是 processInputBuffer 函数,位于 networking.c 文件中: /* This function is call

Orchestrator 单节点模式介绍

一.环境说明: 1.1.3台vm虚拟机系统环境介绍: 3台VM系统为: [[email protected] ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) 3台VM centos 系统都关闭iptables,关闭selinux3台虚拟机系统时间同步:ntpdate ntp1.aliyun.com3台vm虚拟机上各安装一个orchestrator mysql orchestrator版本为:orchestrator