关于videobuf,即V4L2如何实现高性能的和用户空间的Buffer交互,这部分内容应该是V4L2中最难以理解的部分了。
看文档:kernel/Documentation/video4linux/videobuf.
videobuf的功能是V4L2 driver和user space之间的粘合层。为存储video frames分配和管理buffer。这是一组许多基于标准POSIX I/O系统调用都可以使用和实现的函数集,包括read(), poll(), and mmap()。另外一组函数集是基于V4L2的ioctl()调用的streaming I/O,包括buffer allocation,queueing,dequeueing,streaming control。
Buffer types
不是所有的video devices使用相同类型的buffer,事实上,至少有3种常见的变化:
- 在物理和(内核)虚拟地址空间中分散的Buffers。(几乎)所有的user-space buffers是物理和内核地址空间地址都不连续的,但是当可能时以这种方式分配Kernel-space buffers是非常有意义的。不幸的是,这不总是可行;以这种方式工作的buffer通常需要可以执行scatter/gather DMA操作的硬件。
- 物理地址空间分散,但是虚拟地址空间连续的Buffers;也就是使用vmalloc()分配的buffers。这些buffers是很难使用DMA进行操作的,但是在DMA不可用的系统上非常有用。
- 物理地址空间连续的Buffers。分配这种类型的Buffer在fragmented system上是不可靠的,但是较简单的DMA控制器不能解决任何问题。
以上3中buffers,videobuf都可以处理。但是Driver开发者必须在开始时选择一种。
【值得注意的是存在第4种buffers:“overlay” buffers,这种buffers位于系统的video memory中。overlay功能对大多数的使用场景被认为是过时的,但是仍然偶尔存在system-on-chip驱动中,且这种计数存在性能优势。Overlay buffers在处理时可以认为是分散buffer的一种形式,但是在内核中非常少见其实现,这种技术的描述是目前超出了本文档的范围。】
Data structures, callbacks, and initialization
根据使用不同类型的buffer,驱动需要包含相应的头文件:
<media/videobuf-dma-sg.h> /* Physically scattered */ <media/videobuf-vmalloc.h> /* vmalloc() buffers */ <media/videobuf-dma-contig.h> /* Physically contiguous */
描述V4L2设备的驱动数据结构应该包含一个管理buffer queue的struct videobuf_queue实例,一个为可用buffers队列的list_head。还需要一个interrupt-safe的spinlock,用来保护queue。
下一步就是实现4个简单的回调函数来帮助videobuf管理buffers:
struct videobuf_queue_ops { int (*buf_setup)(struct videobuf_queue *q, unsigned int *count, unsigned int *size); int (*buf_prepare)(struct videobuf_queue *q, struct videobuf_buffer *vb, enum v4l2_field field); void (*buf_queue)(struct videobuf_queue *q, struct videobuf_buffer *vb); void (*buf_release)(struct videobuf_queue *q, struct videobuf_buffer *vb); };
buf_setup()是在I/O进程中streaming初始化时调用;目的就是告诉videobuf关于I/O stream信息。参数count是所需分配的buffer数目;驱动应该检查其合理性并且如果有需要做调整。实际规则,proper streaming最少2个buffer,最大不超过32个。参数size设置为所期望的(最大)每帧数据的size。
每个buffer(struct videobuf_buffer)将会传递到buf_prepare(),该函数会设置buffer的size,width,height,和其他成员属性。如果buffer的状态成员是VIDEOBUF_NEEDS_INIT,驱动应该传递到:
int videobuf_iolock(struct videobuf_queue* q, struct videobuf_buffer *vb, struct v4l2_framebuffer *fbuf);
除此之外,该函数将会为这个buffer分配memory。最后,buf_prepare()函数会设置buffer状态为VIDEOBUF_PREPARED.
当一个buffer排队等待I/O,通过调用buf_queue(),该函数应该将这个buffer放到驱动的可用buffers链表(list of available buffers),并且设置其状态为:VIDEOBUF_QUEUED.注意,这个函数被调用时需要获取queue spinlock。同时也要注意,videobuf可能等待在queue中的第一个buffer;放置在其他buffer的前边可能再次被使用,因此,要使用list_add_tail()来enqueue buffers。
最后,buf_release()是在一个buffer不在使用时被调用。驱动程序应该确认在该buffer上不再有I/O是在active状态。然后调用相应的释放函数:
/* Scatter/gather drivers */ int videobuf_dma_unmap(struct videobuf_queue *q, struct videobuf_dmabuf *dma); int videobuf_dma_free(struct videobuf_dmabuf *dma); /* vmalloc drivers */ void videobuf_vmalloc_free (struct videobuf_buffer *buf); /* Contiguous drivers */ void videobuf_dma_contig_free(struct videobuf_queue *q, struct videobuf_buffer *buf);
确认一个buffer不再是处于I/O状态调用:
int videobuf_waiton(struct videobuf_buffer *vb, int non_blocking, int intr);/* non_blocking : 是否为非阻塞 intr : 是否是可中断等待*/
File operations
此时,大部分工作已经完成;剩下的工作就是驱动实现videobuf的回调函数。第一步是open()函数,该函数必须初始化videobuf queue。该函数依赖于使用的buffer类型:
void videobuf_queue_sg_init(struct videobuf_queue *q, struct videobuf_queue_ops *ops, struct device *dev, spinlock_t *irqlock, enum v4l2_buf_type type, enum v4l2_field field, unsigned int msize, void *priv); void videobuf_queue_vmalloc_init(struct videobuf_queue *q, struct videobuf_queue_ops *ops, struct device *dev, spinlock_t *irqlock, enum v4l2_buf_type type, enum v4l2_field field, unsigned int msize, void *priv); void videobuf_queue_dma_contig_init(struct videobuf_queue *q, struct videobuf_queue_ops *ops, struct device *dev, spinlock_t *irqlock, enum v4l2_buf_type type, enum v4l2_field field, unsigned int msize, void *priv);
每种情况下,参数是一致的:irqlock是一个中断安全的spinlock来保护数据的访问。
type是设备使用的buffer类型(camera是使用V4L2_BUF_TYPE_VIDEO_CAPTURE)
field描述which field is being captured (often V4L2_FIELD_NONE for progressive devices)
msize是包含任何使用的结构体的videobuf_buffer
priv是私有数据指针,指示的是videobuf_queue中的priv_data成员。
V4L2 capture驱动可以设置同时支持两种APIs:read()系统调用和更加复杂的streaming机制。通常,支持两种APIs是必要的,这样可以适配所有的应用。为了实现read(),驱动需要调用以下其中之一:
ssize_t videobuf_read_one(struct videobuf_queue *q, char __user *data, size_t count, loff_t *ppos, int nonblocking); ssize_t videobuf_read_stream(struct videobuf_queue *q, char __user *data, size_t count, loff_t *ppos, int vbihack, int nonblocking);
上述函数会读取一帧数据到参数data中,返回实际读取到的数据个数;两个函数的区别在于:videobuf_read_one()只读取一帧数据,videobuf_read_stream()会去读取应用程序需要的多帧数据。一个典型的驱动read()实现将会开启capture engin,调用以上函数其中之一;stop the engin在返回之前(though a smarter implementation might leave the engine running for a little while in anticipation of another read() call happening in the near future)。
poll函数的实现通常直接调用:
unsigned int videobuf_poll_stream(struct file *file, struct videobuf_queue *q, poll_table *wait);
注意,实际的waie queue最后将会和第一个可用buffer联合使用。
When streaming I/O is done to kernel-space buffers, the driver must support
the mmap() system call to enable user space to access the data. In many
V4L2 drivers, the often-complex mmap() implementation simplifies to a
single call to:
int videobuf_mmap_mapper(struct videobuf_queue *q, struct vm_area_struct *vma);
Everything else is handled by the videobuf code.
release()函数需要两个分离的videobuf调用:
void videobuf_stop(struct videobuf_queue *q); int videobuf_mmap_free(struct videobuf_queue *q);
videobuf_stop()会终止I/O——尽管它仍然是通过驱动来stop the capture engine.videobuf_mmap_free()将会确认所有的buffers被unmapped;如果是这样,他们将会传递到buf_release()调用。如果buffers仍然是mmaped状态,videobuf_mmap_free()会返回错误代码。目的是明确如果buffer仍映射引起的文件描述符关闭失败,但在2.6.32内核每个驱动高高兴兴地忽略它的返回值。
ioctl() operations
V4L2 API包含了一个非常长的驱动回调函数列表,来响应大量的ioctl()命令。其中的一切IOCTL命令是有关streming I/O:
int videobuf_reqbufs(struct videobuf_queue *q, struct v4l2_requestbuffers *req); int videobuf_querybuf(struct videobuf_queue *q, struct v4l2_buffer *b); int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b); int videobuf_dqbuf(struct videobuf_queue *q, struct v4l2_buffer *b, int nonblocking); int videobuf_streamon(struct videobuf_queue *q); int videobuf_streamoff(struct videobuf_queue *q);
Buffer allocation
对于buffer分配,驱动可以完全把buffer的分配交给videobuf layer;此时,buffer被分配为匿名的user-space pages,而且是非常分散的。如果应用程序使用user-space buffers,不需要再次分配。videobuf layer将会调用get_user_pages()并且填充在散聚表阵列。
如果驱动需要自己分配内存,应该在videoc_reqbufs()函数中完成,调用videobuf_reqbufs()后,第一步就是调用:
struct videobuf_dmabuf *videobuf_to_dma(struct videobuf_buffer *buf);
返回videobuf_dmabuf结构体(<media/videobuf-dma-sg.h>)包含相关的成员:
struct scatterlist *sglist; int sglen;
驱动必须分配一个适当大小的散聚表阵列和与使用分配的buffer的指针的块填充它;sglen应设置数组的长度。
Drivers using the vmalloc() method need not (and cannot) concern themselves
with buffer allocation at all; videobuf will handle those details. The
same is normally true of contiguous-DMA drivers as well; videobuf will
allocate the buffers (with dma_alloc_coherent()) when it sees fit. That
means that these drivers may be trying to do high-order allocations at any
time, an operation which is not always guaranteed to work. Some drivers
play tricks by allocating DMA space at system boot time; videobuf does not
currently play well with those drivers.
As of 2.6.31, contiguous-DMA drivers can work with a user-supplied buffer,
as long as that buffer is physically contiguous. Normal user-space
allocations will not meet that criterion, but buffers obtained from other
kernel drivers, or those contained within huge pages, will work with these
drivers.
Filling the buffers
最后一步就是将frame data放入buffer中,通常是根据设备的中断来完成。
- 获取下一个可用buffer并且确认有人在等待这个buffer
- 获取内存指针,然后把video data放到该指针指向的内存
- 标记buffer状态为done,并且唤醒等待它的进程
Step (1) above is done by looking at the driver-managed list_head structure
- 回调buf_queue()。由于start engine和enqueue buffers已经完成,
- the one which is filled in the buf_queue() callback. Because starting
the engine and enqueueing buffers are done in separate steps, it‘s possible
for the engine to be running without any buffers available - in the
vmalloc() case especially. So the driver should be prepared for the list
to be empty. It is equally possible that nobody is yet interested in the
buffer; the driver should not remove it from the list or fill it until a
process is waiting on it. That test can be done by examining the buffer‘s
done field (a wait_queue_head_t structure) with waitqueue_active().
A buffer‘s state should be set to VIDEOBUF_ACTIVE before being mapped for
DMA; that ensures that the videobuf layer will not try to do anything with
it while the device is transferring data.
For scatter/gather drivers, the needed memory pointers will be found in the
scatterlist structure described above. Drivers using the vmalloc() method
can get a memory pointer with:
void *videobuf_to_vmalloc(struct videobuf_buffer *buf);
For contiguous DMA drivers, the function to use is:
dma_addr_t videobuf_to_dma_contig(struct videobuf_buffer *buf);
The contiguous DMA API goes out of its way to hide the kernel-space address
of the DMA buffer from drivers.
The final step is to set the size field of the relevant videobuf_buffer
structure to the actual size of the captured image, set state to
VIDEOBUF_DONE, then call wake_up() on the done queue. At this point, the
buffer is owned by the videobuf layer and the driver should not touch it
again.
Developers who are interested in more information can go into the relevant
header files; there are a few low-level functions declared there which have
not been talked about here. Also worthwhile is the vivi driver
(drivers/media/video/vivi.c), which is maintained as an example of how V4L2
drivers should be written. Vivi only uses the vmalloc() API, but it‘s good
enough to get started with. Note also that all of these calls are exported
GPL-only, so they will not be available to non-GPL kernel modules.