注:此文除了例子和使用心得是自己写的,很多都是Python核心编程中的原文。原文文风应该能看出来,就不每个地方单独表明出处了。
线程(有时被称为轻量级进程)跟进程有些相似,不同的是,所有的线程运行在同一个进程中,共享相同的运行环境。它们可以想像成是在主进程或”主线程”中并行运行的“迷你进程”。
线程有开始,顺序执行和结束三部分。它有一个自己的指令指针,记录自己运行到什么地方。线程的运行可能被抢占(中断),或暂时的被挂起(也叫睡眠),让其它的线程运行,这叫做让步。一个进程中的各个线程之间共享同一片数据空间,所以线程之间可以比进程之间更方便地共享数据以及相互通讯。线程一般都是并发执行的,正是由于这种并行和数据共享的机制使得多个任务的合作变为可能。实际上,在单CPU 的系统中,真正的并发是不可能的,每个线程会被安排成每次只运行一小会,然后就把CPU 让出来,让其它的线程去运行。在进程的整个运行过程中,每个线程都只做自己的事,在需要的时候跟其它的线程共享运行的结果。
当然,这样的共享并不是完全没有危险的。如果多个线程共同访问同一片数据,则由于数据访问的顺序不一样,有可能导致数据结果的不一致的问题。这叫做竞态条件(race condition)。幸运的是,大多数线程库都带有一系列的同步原语,来控制线程的执行和数据的访问。另一个要注意的地方是,由于有的函数会在完成之前阻塞住,在没有特别为多线程做修改的情况下,这种“贪婪”的函数会让CPU 的时间分配有所倾斜。导致各个线程分配到的运行时间可能不尽相同,不尽公平。
多线程的定义:请看维基百科点击打开链接。多线程的优缺点:请看百度百科点击打开链接。
全局解释器锁(GIL)
谈到Python的多线程,不得不说全局解释器锁(GIL)。Python 代码的执行由Python 虚拟机(也叫解释器主循环)来控制。Python 在设计之初就考虑到要在主循环中,同时只有一个线程在执行,就像单CPU 的系统中运行多个进程那样,内存中可以存放多个程序,但任意时刻,只有一个程序在CPU 中运行。同样地,虽然Python 解释器中可以“运行”多个线程,但在任意时刻,只有一个线程在解释器中运行。
多线程环境下,Python虚拟机按以下方式执行:
1. 设置GIL
2. 切换到一个线程去运行
3. 运行:
a. 指定数量的字节码指令,或者
b. 线程主动让出控制(可以调用time.sleep(0))
4. 把线程设置为睡眠状态
5. 解锁GIL
6. 再次重复以上所有步骤
在调用外部代码(如C/C++扩展函数)的时候,GIL 将会被锁定,直到这个函数结束为止(由于在这期间没有Python 的字节码被运行,所以不会做线程切换)。编写扩展的程序员可以主动解锁GIL。不过,Python 的开发人员则不用担心在这些情况下你的Python 代码会被锁住。
例如,对所有面向I/O 的(会调用内建的操作系统C 代码的)程序来说,GIL 会在这个I/O 调用之前被释放,以允许其它的线程在这个线程等待I/O 的时候运行。如果某线程并未使用很多I/O 操作,它会在自己的时间片内一直占用处理器(和GIL)。也就是说,I/O 密集型的Python 程序比计算密集型的程序更能充分利用多线程环境的好处。
Python多线程模块
Python 提供了几个用于多线程编程的模块,包括thread, threading 和Queue 等。thread 和threading 模块允许程序员创建和管理线程。thread 模块提供了基本的线程和锁的支持,而threading提供了更高级别,功能更强的线程管理的功能。Queue 模块允许用户创建一个可以用于多个线程之间共享数据的队列数据结构。我们将分别介绍这几个模块,并给出一些例子和中等大小的应用。
为什么不使用thread
核心提示:避免使用thread 模块出于以下几点考虑,我们不建议您使用thread 模块。
1、更高级别的threading 模块更为先进,对线程的支持更为完善,而且使用thread 模块里的属性有可能会与threading 出现冲突。
2、低级别的thread 模块的同步原语很少(实际上只有一个),而threading 模块则有很多。
3、另一个不要使用thread 原因是,对于你的进程什么时候应该结束完全没有控制,当主线程结束时,所有的线程都会被强制结束掉,没有警告也不会有正常的清除工作。
4、只建议那些有经验的专家在想访问线程的底层结构的时候,才使用thread 模块。
5、另一个避免使用thread 模块的原因是,它不支持守护线程。当主线程退出时,所有的子线程不论它们是否还在工作,都会被强行退出。
守护进程
threading 模块支持守护线程,它们是这样工作的:守护线程一般是一个等待客户请求的服务器,如果没有客户提出请求,它就在那等着。如果你设定一个线程为守护线程,就表示你在说这个线程是不重要的,在进程退出的时候,不用等待这个线程退出。
如果你的主线程要退出的时候,不用等待那些子线程完成,那就设定这些线程的daemon 属性。即,在线程开始(调用thread.start())之前,调用setDaemon()函数设定线程的daemon 标志(thread.setDaemon(True))就表示这个线程“不重要”如果你想要等待子线程完成再退出, 那就什么都不用做, 或者显式地调用thread.setDaemon(False)以保证其daemon 标志为False。你可以调用thread.isDaemon()函数来判断其daemon 标志的值。新的子线程会继承其父线程的daemon
标志。整个Python 会在所有的非守护线程退出后才会结束,即进程中没有非守护线程存在的时候才结束。
Threading FUNCTIONS
NAME
threading - Thread module emulating a subset of Java‘s threading model.
class Thread(_Verbose)
表示一个线程的执行的对象
| A class that represents a thread of control. This class can be safely subclassed in a limited fashion.
| Methods defined here:
|
| __init__(self, group=None, target=None, name=None, args=(), kwargs=None, verbose=None)
| This constructor should always be called with keyword arguments. Arguments are:
| 此构造函数总是用关键字参数调用。
| *group* should be None; reserved for future extension when a ThreadGroup class is implemented. *group* 应该为None;为未来扩展ThreadGroup预留。
| *target* is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called. *target* 是随时被run()方法调用的可调用对象,默认为None,表明没有对象被调用。
| *name* is the thread name. By default, a unique name is constructed of the form "Thread-N" where N is a small decimal number. *name* 为线程名字。默认名字为Thread-N,其中N为比较小的十进制数。
| *args* is the argument tuple for the target invocation. Defaults to (). *args*list参数
| *kwargs* is a dictionary of keyword arguments for the target invocation. Defaults to {}. *kwargs* dict参数
| If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.如果子类重写此构造函数,它必须确保调用基类的构造函数(Thread.__init__())在thread做其他事情之前。
| __repr__(self)
getName(self)
返回线程的名字
isAlive(self)
Return whether the thread is alive. This method returns True just before the run() method starts until just after the run() method terminates.The module function enumerate() returns a list of all alive threads.布尔标志,表示这个线程是否还在运行中
isDaemon(self)
返回线程的daemon 标志
is_alive = isAlive(self)
join(self, timeout=None)
程序挂起,直到线程结束;如果给了timeout,则最多阻塞timeout 秒
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates -- either normally or through an unhandled exception or until the optional timeout occurs
.等待线程终止。这种阻塞调用线程,直到其join()方法称为线程终止——正常或通过一个未处理的异常,或直到可选超时。
When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call isAlive() after join() to decide whether a timeout
happened -- if the thread is still alive, the join() call timed out.
当超时参数被指定时,,它应该是一个浮点数。作为join()总是返回None,你必须在调用join()之后调用isAlive()方法来判断是否超时——如果线程还活着,这时候join()调用超时。
When the timeout argument is not present or None, the operation will block until the thread terminates.
当超时参数不存在或没有,该操作将阻塞直到该线程终止。
A thread can be join()ed many times.
一个线程可以join()许多次。
join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raises the same exception.
join()如果试图加入当前线程会导致死锁,join()会抛出runtimeerror异常。在线程开始join()这个线程也会抛出此异常。
run(self)
定义线程的功能的函数(一般会被子类重写),经常重写!
Method representing the thread‘s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object‘s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
setDaemon(self, daemonic)
把线程的daemon 标志设为daemonic(一定要在调用start()函数前调用),守护进程设置
setName(self, name)
设置线程的名字
start(self)
开始线程的执行
Start the thread‘s activity.
It must be called at most once per thread object. It arranges for the object‘s run() method to be invoked in a separate thread of control.
This method will raise a RuntimeError if called more than once on the same thread object.
FUNCTIONS
BoundedSemaphore(*args, **kwargs)
与Semaphore 类似,只是它不允许超过初始值
A factory function that returns a new bounded semaphore.
A bounded semaphore checks to make sure its current value doesn‘t exceed its initial value. If it does, ValueError is raised. In most situations semaphores are used to guard resources with limited capacity.
If the semaphore is released too many times it‘s a sign of a bug. If not given, value defaults to 1.
Like regular semaphores, bounded semaphores manage a counter representing the number of release() calls minus the number of acquire() calls, plus an initial value. The acquire() method blocks if necessary until it can return without making the counter
negative. If not given, value defaults to 1.
Condition(*args, **kwargs)
条件变量对象能让一个线程停下来,等待其它线程满足了某个“条件”。如,状态的改变或值的改变。
Factory function that returns a new condition variable object.
A condition variable allows one or more threads to wait until they are notified by another thread.
If the lock argument is given and not None, it must be a Lock or RLock object, and it is used as the underlying lock. Otherwise, a new RLock object is created and used as the underlying lock.
Event(*args, **kwargs)
通用的条件变量。多个线程可以等待某个事件的发生,在事件发生后,所有的线程都会被激活。
A factory function that returns a new event.
Events manage a flag that can be set to true with the set() method and reset to false with the clear() method. The wait() method blocks until the flag is true.
Lock = allocate_lock(...)
锁原语对象(跟thread 模块里的锁对象相同)
allocate_lock() -> lock object
(allocate() is an obsolete synonym)
Create a new lock object. See help(LockType) for information about locks.
RLock(*args, **kwargs)
可重入锁对象。使单线程可以再次获得已经获得了的锁(递归锁定)。
Factory function that returns a new reentrant lock.
A reentrant lock must be released by the thread that acquired it. Once a thread has acquired a reentrant lock, the same thread may acquire it again without blocking; the thread must release it once for each time it has acquired it.
Semaphore(*args, **kwargs)
为等待锁的线程提供一个类似“等候室”的结构
A factory function that returns a new semaphore.
Semaphores manage a counter representing the number of
release() calls minus the number of acquire()
calls, plus an initial value. The acquire() method blocks if necessary until it can return without making the counter negative. If not given, value defaults to 1.
Timer(*args, **kwargs)
与Thread 相似,只是,它要等待一段时间后才开始运行。
Factory function to create a Timer object.
Timers call a function after a specified number of seconds:
t = Timer(30.0, f, args=[], kwargs={})
t.start()
t.cancel() # stop the timer‘s action if it‘s still waiting
activeCount()
返回当前处于alive状态的线程对象数目。和enumerate()返回的list长度相同
Return the number of Thread objects currently alive.
The returned count is equal to the length of the list returned by enumerate().
active_count = activeCount()
Return the number of Thread objects currently alive.
The returned count is equal to the length of the list returned by enumerate().
currentThread()
返回当前线程对象,对应于控制调用者的线程。如果控制调用者的线程不是通过线程模块创建,则返回一个有限的功能的虚拟线程对象。
Return the current Thread object, corresponding to the caller‘s thread of control.
If the caller‘s thread of control was not created through the threading module, a dummy thread object with limited functionality is returned.
current_thread = currentThread()
Return the current Thread object, corresponding to the caller‘s thread of control.
If the caller‘s thread of control was not created through the threading module, a dummy thread object with limited functionality is returned.
enumerate()
返回当前alive的线程对象list。列表中包括后台线程、current_thread()创建的虚拟线程对象、主线程。它不包括已经终止的线程和尚未启动的线程。
Return a list of all Thread objects currently alive.
The list includes daemonic threads, dummy thread objects created by current_thread(), and the main thread. It excludes terminated threads and threads that have not yet been started.
setprofile(func)
为所有线程设置一个profile 函数。
Set a profile function for all threads started from the threading module.
The func will be passed to sys.setprofile() for each thread, before its run() method is called.
settrace(func)
为所有线程设置一个跟踪函数。
Set a trace function for all threads started from the threading module.
The func will be passed to sys.settrace() for each thread, before its run() method is called.
stack_size(...)
stack_size([size]) -> size
返回创建新线程时使用的线程的堆栈大小。
Return the thread stack size used when creating new threads. The optional size argument specifies the stack size (in bytes) to be used for subsequently created threads, and must be 0 (use platform or configured default) or a positive integer value
of at least 32,768 (32k). If changing the thread stack size is unsupported, a ThreadError exception is raised. If the specified size is invalid, a ValueError exception is raised, and the stack size is unmodified. 32k bytes currently the minimum supported
stack size value to guarantee sufficient stack space for the interpreter itself.
Note that some platforms may have particular restrictions on values for the stack size, such as requiring a minimum stack size larger than 32kB or requiring allocation in multiples of the system memory page size - platform documentation should be
referred to for more information (4kB pages are common; using multiples of 4096 for the stack size is the suggested approach in the absence of more specific information).
简单例子
准备写几个小例子,都是遇到过的问题。不定期更新。
GUI中的应用
这也是我前面遇到的问题,即怎么实时输出那些需要print的信息到GUI中的TextBrowser?答案正是Threading!以下为代码:
# -*- coding: utf-8 -*- import sys from PyQt4 import QtGui, QtCore import time, datetime class Main(QtGui.QWidget): def __init__(self, parent=None): super(Main, self).__init__(parent) self.layout() def layout(self): self.setWindowTitle(u'The_Third_Wave:简单Threading教程') self.text_area_l = QtGui.QTextBrowser() self.thread_start_l = QtGui.QPushButton('Start Left') self.thread_start_l.clicked.connect(self.start_threads_l) self.text_area_r = QtGui.QTextBrowser() self.thread_start_r = QtGui.QPushButton('Start Right') self.thread_start_r.clicked.connect(self.start_threads_r) lef_layout = QtGui.QVBoxLayout() lef_layout.addWidget(self.text_area_l) button_layout = QtGui.QVBoxLayout() button_layout.addWidget(self.thread_start_l) button_layout.addWidget(self.thread_start_r) right_layout = QtGui.QVBoxLayout() right_layout.addWidget(self.text_area_r) grid_layout = QtGui.QGridLayout() grid_layout.addLayout(lef_layout, 0, 0) grid_layout.addLayout(button_layout, 0, 1) grid_layout.addLayout(right_layout, 0, 2) self.setLayout(grid_layout) def start_threads_l(self): thread = MyThread_l(self) # 创建线程 thread.trigger.connect(self.update_text_l) # 连接信号! thread.setup(range(0, 10)) # 传递参数 thread.start() # 启动线程 def update_text_l(self, message): self.text_area_l.insertPlainText(message) def start_threads_r(self): thread = MyThread_r(self) # 创建线程 thread.trigger.connect(self.update_text_r) # 连接信号! thread.setup(range(10, 20)) # 传递参数 thread.start() # 启动线程 def update_text_r(self, message): self.text_area_r.insertPlainText(message) class MyThread_l(QtCore.QThread): trigger = QtCore.pyqtSignal(str) # trigger传输的内容是字符串 def __init__(self, parent=None): super(MyThread_l, self).__init__(parent) def setup(self, args): self.args = args def run(self): # 很多时候都必重写run方法, 线程start后自动运行 self.my_function() def my_function(self): # 把代码中的print全部改为trigger.emit # print u"线程启动了!" self.trigger.emit(u"左边线程启动了!\n") for i in self.args: time.sleep(3) self.trigger.emit(u"当前为:"+str(i)+"----"+str(datetime.datetime.now())+"\n") self.trigger.emit(u"左边线程结束了!\n") class MyThread_r(QtCore.QThread): trigger = QtCore.pyqtSignal(str) # trigger传输的内容是字符串 def __init__(self, parent=None): super(MyThread_r, self).__init__(parent) def setup(self, args): self.args = args def run(self): # 很多时候都必重写run方法, 线程start后自动运行 self.my_function() def my_function(self): # 把代码中的print全部改为trigger.emit self.trigger.emit(u"右边线程启动了!\n") for i in self.args: time.sleep(3) self.trigger.emit(u"当前为:"+str(i)+"----"+str(datetime.datetime.now())+"\n") self.trigger.emit(u"右边线程结束了!\n") if __name__ == "__main__": app = QtGui.QApplication(sys.argv) mainwindow = Main() mainwindow.show() sys.exit(app.exec_())
结果为:
Python多线程的诟病大家都知道,即没有真正意义上的多程序并发。但是多线程在GUI程序上却不得不用,因为有些操作总是很费时的,如果不用多线程,那么GUI会处于假死状态,用户体验一定是极差的!
多线程爬虫
未完待续。。。随时更新。欢迎提问,共同学习,一起进步。
本文由@The_Third_Wave(Blog地址:http://blog.csdn.net/zhanh1218)原创。不定期更新,有错误请指正。
Sina微博关注:@The_Third_Wave
如果你看到这篇博文时发现没有例子,那是我为防止爬虫先发布一半的原因,请看原作者Blog。
如果这篇博文对您有帮助,为了好的网络环境,不建议转载,建议收藏!如果您一定要转载,请带上后缀和本文地址。
Python多线程(threading)学习总结,布布扣,bubuko.com