人类视觉系统

源:The Human Visual System (Display Interfaces)

Dynamic Range and Visual Response

At any given moment, the eye is capable of discriminating varying levels of luminance over a range of perhaps 100:1 or slightly higher. If the brightness of a given object in the visual field falls below the lower end of the range at any moment, it is simply seen as black. Similarly, those items outside the range at the high end are seen as “bright”, with no possibility of being distinguished from each other in terms of perceived brightness. However, as we know from personal experience, the eye is capable of adapting with time over a much wider absolute range. The opening and closing of the iris varies the amount of light admitted to the interior of the eye, and (along with other adaptive processes) permits us to see well in conditions varying from bright sunlight to nearly the darkest of nights. In terms of the total range that can be covered by human vision with this adaptation, the figure is more like 10,000,000:1. The lowest luminance which is generally considered as visible under any conditions, to a fully dark-adapted eye, is about 0.0001-0.001 cd/m2; the greatest, at least in terms of what can be viewed without permanent damage to the eye,1 is on the order of 10,000 cd/m2, a value achievable from a highly reflective white surface in direct sunlight. Adaptation of the eye to varying light levels within this range permits the 100:1 range of discrimination to be set anywhere within this total absolute range.

Within a given adapted range, however, the response of the eye is not linear. At any given instant, we are capable of better discrimination at the lower end of the eye’s range than the higher – or in other words, it is easier to tell the difference between similar dimly lit areas of a given scene than to tell the difference between similar bright areas. (This is again as might be expected; it is more important, as a survival trait, to be able to detect objects – or threats -in a dimly lit area (such as a cave) than it is to be able to discriminate shadings on the same object in broad daylight.) The typical response curve is shown in Figure 2-8. This non-linear response has some significant implications for the realistic portrayal of images on electronic displays.The non-linearity of the response also has an impact on the amount of information required to properly convey visual data. Given the ability to discriminate luminance over only a range of 100:1 or slight higher, we are tempted to assume that only about 7-8 bits per sample would be required to encode luminance. Tests with 7-8 bits per sample of luminance with linear encoding, however, will show clearly discernible bands (contouring), especially in the darker areas, due to the eye’s ability to discern finer differences at the low end of the luminance range. Ten to twelve bits of luminance information per sample, if linear encoding is to be used, is generally assumed to be required for the realistic portrayal of images. (Note, however, that this level of performance is very often well beyond the capability of many display and image-sampling devices; noise in these systems may limit the resolvable bits/sample to a lower value, especially for those operating at “video” (smooth-motion) sampling rates.) Encoding full-color images, as opposed to simply luminance information only.

Figure 2-8 Typical normalized luminance response curve for the human eye, showing the nonlinear relationship between absolute luminance (within the current adapted range) and perceived brightness. This shows that the eye is more sensitive to luminance changes in the “dark” end of the current range than to similar-sized changes in “bright” areas. The response curve shown here is a power function wherein the perceived brightness is given as Y(1/25), or Y(04). Different standard models have used different values for the exponent in this function, ranging from about 0.333 to 0.450.

Chromatic Aberrations

Color affects our vision in at least one other, somewhat unexpected, manner. The lens of the eye is a simple double-convex type, but made of a clear, jellylike material rather than glass. In most optical equipment, focusing is achieved by varying the spacing of the optical elements (lenses, mirrors, etc.); in the eyes of living creatures, images are focused by altering the shape of the lens itself, and so its optical characteristics. (The curved surface of the transparent cornea also acts to bend light, and is a major contributor in focusing the image – however, its action is not variable.) However, simple lenses of any type suffer from a significant flaw with respect to color. The refractive index of any optical material, and so the degree to which light is “bent” at the interface of that material and air, varies with the frequency of the light. Higher-frequency light, toward the blue end of the spectrum, is bent less than lower-frequency light. If not compensated for, this has the effect of changing the focal length of the lens for various colors of light. (In conventional lenses, this compensation comes in the form of an additional optical element with a slightly different refractive index, bonded to the original simple lens. Such a color-corrected lens is called an achromat.)

Figure 2-9 In a simple lens, higher-frequency light (i.e., blue) is refracted to a lesser degree than lower-frequency light (red). In the case of human vision, this results in the blue components of an image being focused effectively “behind” the red components, leading to a false sense of depth induced by color (chromostereopsis). This also makes it very tiring to look at images containing areas of bright red and blue in close proximity, as the eye have a very difficult time focusing!

With the simple lens of the eye, this sort of chromatic aberration results in images of different color being focused slightly differently. Pure fields of any given color can be brought into proper focus, through the adaptive action of the lens, but if objects of very different colors are seen in close proximity, a problem arises. The problem is at its worst, of course, with colors at the extremes of the visual spectrum – blue and red. If bright examples of both colors are seen together, the eye cannot focus correctly on both; the blue image focuses “behind” the red, as seen in Figure 2-9. Besides being a source of visual strain (as the eye/brain system attempts to resolve the conflict in focus), this also creates a false sense of depth. The blue object(s) are seen as behind the red, through chromostereopsis (the perception of depth resulting solely from color differences rather than actual differences in the physical distance between objects). Due to these problems, the use of such colors in close proximity – bright red text on a blue background, for instance – is to be avoided.

Stereopsis

Besides the false sense of visual depth mentioned above, human beings are, of course, very capable of seeing true depth – we have “three-dimensional,” or stereoscopic vision. By this we mean that human beings can get a sense of the distance to various objects, and their relative relationships in terms of distance to the viewer, simply by looking at them. This ability comes primarily (but not exclusively!) from the fact that we have two eyes which act together, seeing in very nearly the same direction at all times, and a visual system in the brain which is capable of synthesizing depth information from these two “flat”, or twodimensional, views of the world. In nature, stereo vision is most often found in creatures which are at least part-time hunters, and so need the ability to accurately judge the distance to prey (to leap the right distance, or to aim a spear, etc.). Most animal species which possess a sense of sight have two eyes (or at least two primary eyes), but relatively few have them properly located and working in concert so as to support stereo vision.

Perceiving depth visually (stereopsis, a general term covering such perception regardless of the basic process) is basically a matter of parallax. Both eyes focus on the same object, but due to the fact that they are spaced slightly apart in the head do not see it at quite the same angle. The eye/brain system notes this difference, and uses it to produce a sense of the distance to the object. This can also be used to impart a sense of depth to two-dimensional images; if each eye is presented with a “flat” view of the same scene, but the two views differ in a manner similar to that which results from the difference in viewing angle in a “real” scene, the visual system will perceive depth in the image. This is the principle underlying stereoscopic viewers or displays, which are arranged so as to present “left-eye” and “right-eye” images separately to the two eyes.

However, this parallax effect is not the only means through which we perceive depth visually. Some people have only one working eye, and yet still function well in situations requiring an understanding of depth; they are able to compensate through reliance on these other cues. (There is also a small percentage of the population who have functional vision in both eyes, and yet do not perceive depth through the normal process. In these cases, the eye/brain system, for whatever reason, never gained the ability to synthesize depth information from the two different views. Such people often do not realize that their deficiency exists at all, until they are unable to see the “3-D” effect from a stereoscopic display or viewer.) Depth is also perceived through the changes required to focus on nearby vs. distant objects, from differences in the rate at which objects are passing through the visual field (rapidly moving objects are seen as being closer than slower or stationary objects, in the absence of other cues), and, curiously, through delays in processing a given image in one eye relative to the other. (This latter case is known as the Pulfrich effect, and may be produced simply by changing the luminance of the image presented to one eye relative to the other.)

Temporal Response and Seeing Motion

Our eyes have the ability to see motion, at least up to rates normally encountered in nature. This tells us that the mechanisms of vision work relatively quickly; it does not take an unreasonable amount of time from the moment a given scene is imaged on the retina, the receptor cells respond to the pattern of light making up the image, the impulses are passed to the visual centers of the brain, and the information interpreted as the sensation we call vision. However, this action is not infinitely fast, nor is motion perceived in quite the way we might initially think.

Clearly, the perception of motion is going to be governed by how rapidly our eyes can process new images, or changes in the visual information presented to them. It takes time for the receptors to respond to a change in light level, and then time to “reset” themselves in order to be ready for the next change. It takes time for this information to be conveyed to the brain and to be processed. We can reasonably expect, then, that there will be a maximum rate at which such changes can be perceived at all, but that this rate will vary with certain conditions, such as the brightness or contrast of the changing area relative to the background, the size of the object within the visual field, and so forth.

We also should understand that the eye/brain system has evolved to track moving objects – to follow them and fixate upon them, even while they are moving – and how this occurs. Obviously, being able to accurately follow a moving object was a very important skill for creatures who are both trying to be successful as hunters, and not being successfully hunted themselves. So we (and other higher animals) evolved the ability to predict the path of a moving object quite well, as is demonstrated each time one catches a ball. But this does not mean that the eye itself is tracking these objects via a smooth, fluid motion. This would not work well, due to the fact that the receptors do take some finite time to respond as mentioned above. Instead, the eye moves in short, very rapid steps – called saccades – with the sense of vision effectively suppressed during these transitions. The eye captures a scene, moves slightly “ahead” such that the moving object will remain fixed within the field, then stops and captures the “new” scene. In one way, this is very similar to the action of a motion picture camera, which captures individual still images to show motion. In fact, it is practically impossible for one to consciously move their eyes in a smooth manner; almost invariably, the actual motion of the eye will be in a series of quick, short steps.

The temporal response of vision affects display system design primarily in two areas – ensuring that the display of moving objects will appear natural, and in making sure that the performance of certain display types (which do not behave as constant-luminance light sources) is acceptable. The term critical fusion frequency (CFF) is used to describe the rate at which, under a given set of conditions, the eye can be “fooled” into perceiving motion (from a series of still images) or luminance (from a varying source) as “smooth” or “constant.”

Flicker has always been one of the major concerns in the design and use of electronic displays, primarily because the dominant display type for years has been the cathode-ray tube, or CRT.If this process is not repeated often enough, the display appears to be rapidly flashing, an effect with is very annoying and fatiguing for the viewer. The key question, of course, is how often the refresh must occur in order to avoid this appearance – what is the critical fusion frequency for such a source?

The prediction of the CFF for displays in general is a fairly complex task. Factors affecting it include the luminance of the display in question, the amount of the visual field it occupies, the frequency, amplitude, decay characteristics, etc., of the variation in luminance, the average luminance of the surrounding environment, and of course the sensitivity of the individual viewer. Contrary to a popular misconception, display flicker is generally not the result of a “beat frequency” with flickering ambient lighting (the most common form of this myth involves fluorescent lights); flickering ambients can result in modulation of the contrast ratio of the display, but this is usually a relatively minor, second-order effect. The overall level of the ambient lighting does affect flicker, but only because it is the perceived brightness of the display relative to its surroundings which is important. (Of course, exactly how important this is depends on the amount of the visual field occupied by both the display and the surroundings.)

The mathematical models used to predict flicker come in large part from work done by Dr. Joyce Farrell and her team at Hewlett-Packard Laboratories (working with researchers from the University of California, Berkeley) in the 1980s [1,2]. This work became the basis for several standards regarding display flicker, notably the International Standards Organization’s ISO-9241-3 [3] set of ergonomic standards for CRT displays. A simplified form of the analysis, using assumptions appropriate for a typical CRT display in an office environment (specifically, a typical phosphor response with the display occupying about 70° of the visual field, in diagonal measurement), leads to an estimation of the CFF as a function of display luminance, as given in ISO-9241-3, of

where Lt is the display luminance in cd/m2.

Figure 2-10 Critical flicker-fusion frequencies (CFF) given by the ISO-9241-3 formula for a range of display luminance values. This calculation assumes a display occupying 70° of the visual field (diagonal measurement). Figures are given for both the mean CFF, and the CFF for the 95th percentile of the population, calculated as CFF(mean) + 1.65 x SD for the standard deviation values listed. The SD values in boldface are from the ISO-9241-3 standard; the remainder were derived via a linear interpolation. Note that these CFF calculation apply only to a CRT display, or a similar display technology in which the actual duration of the image is in reality relatively short compared to the refresh period. Such calculations do not apply to such types as the LCD, in which the display elements are illuminated to nearly their full intended value for most if not all of the frame time.

The distribution of CFF for the entire population has been shown to be essentially Gaussian, so to this mean one must add the appropriate multiple of the population’s standard deviation in order to determine the frequency at which the display would appear “flicker-free” to a given percentage of the population. For example, the frequency at which the display would appear flicker-free to 95% of the population would be found by determining the CFF based on the display luminance, and then adding 1.65 times the standard deviation at that luminance. Note that these formulas have been based on assumptions regarding both display luminance and size, and the average viewing distance, which correspond to typical desktop-monitor use. The above formula suggests that, for a CRT-based computer display of 120 cd/m2 luminance, and used at normal viewing distances, the refresh rate should be set to at least 71.5 Hz to appear flicker-free to half the population (this is the mean CFF predicted by the formula), and to not less than 81 Hz to satisfy 95% of the viewers. This is very typical for the desktop CRT monitor, and similar calculations have led to 85 Hz becoming a de-facto standard refresh rate to satisfy the “flicker-free” requirement of many ergonomic standards. A graph of the result of the above formula for mean CFF vs. luminance is shown in Figure 2-11, along with the standard deviations for inter-individual differences as established by the ISO-9241 standard. (Television, while operating at higher typical luminances, can get away with lower refresh rates since the display typically occupies a much smaller portion of the visual field than is the case with a desktop monitor.)

The update rate required for the perception of “smooth” motion is, fortunately, similar to that required for the avoidance of flicker, and in general is even lower. It is affected by many of the same factors, although one important consideration is that viewers on average tend to accept poorer motion rendition more readily than flicker. Acceptable motion can often be realized with an image update rate of only a few new images per second. For example, most “cartoon” animation employs a rate of between 10 and 24 new frames per second. The standard for the theatrical display of motion pictures is 24 frames/s in North America2 (25 frames/s is the more common rate in Europe). Finally, television systems, which are generally seen as providing very realistic motion, use a rate of 50 or 60 new images per second.This is, of course, very close to the refresh rates (60-85 Hz) generally considered to be “flicker-free” in many display applications.

While the rates required for good motion portrayal and a “flicker-free” image are similar, some interesting problems can arise when these rates are not precisely matched to each other. Examples of situations where this can occur are common in the computer graphics field (where new images may not be generated by the computer hardware at the same rate as that at which the display is being refreshed), and in cases of mixing systems of differing standard rates. An example of the latter is the display of film-sourced material on television; in North America, for instance, films are normally shot at 24 frames/s, while television uses a refresh rate of roughly 60 Hz. To accomplish this, a technique called “3:2 pulldown” is used. One frame of the film is shown for three refreshes of the television display (“fields”), while the next appears for only two (Figure 2-11). This results in the frames of the film being unequal in duration as displayed, which can result in certain motion artifacts (known as “judder”) as seen by the viewer.

Figure 2-11 To show standard motion pictures (shot at 24 frames/s) on US standard television (approx. 60 fields/s), a technique known as “3:2 pulldown” is used. However, the uneven duration of the original frames, as seen now by the viewer, can result in certain objectionable motion artifacts.

Figure 2-12 Effect of mismatched refresh and update rates. In this example, we assume that the display is being refresh 60 times per second; however, new images are being created only 20 times per second. This results in frame A being displayed for three refresh cycles, followed by frame B for the next three. The visual effect is simulated in the image at the bottom. Since the eye “expects” smooth movement, the center of the field of view moves slightly along the expected track of the moving object – but since the object in question has not actually moved for two out of every three displayed frames, the appearance is that of a moving object with “ghosts” or “shadows” resulting from the eye motion.

The problems here again have to do with how the eye/brain system responds to moving objects. Again, the motion of the eye is not smooth – it occurs in quick, short saccades, based in large part on where the brain expects the object being tracked to appear. If the object does not appear in the expected position, its image now registers on a different part of the retina. A curious example of this may be seen when the image update rate is related to the display’s refresh rate but is not the same. If, for instance, the display is being refreshed at 60 Hz, but only 20 new images are being provided per second, the object “really” appears in the same location three times before moving to its next position. The visual system, however, since it is expecting “smooth” motion, moves slightly “ahead” in the time of those two intermediate display refreshes. This results in the “stationary” image being seen by slightly different parts of the retina, and the object is seen as multiple copies along the direction of motion (Figure 2-12). In many applications, then, the perception of smooth motion will not depend as much on the absolute rate at which new images can be generated (at least above a certain minimum rate), but rather on making sure that this rate is kept constant and is properly matched to the display rate.

Display Ergonomics

Our desire, of course, is to produce display systems which are usable by the average viewer, and a large part of this means assuring that undue effort or stress is not required to use them. The field of properly matching machines to the capabilities and preferences of human beings is, of course, ergonomics, and the ergonomics of display systems has been a very important field in the past few decades. Many of the various international regulations and standards affecting display design have to do at least in part with ensuring proper display hardware and displayed image ergonomics.

Unfortunately, these factors were not always considered in the design and use of electronic displays, owing to a poor understanding of the field by early display system designers. This is not really the fault of those designers, as the widespread use of electronic displays was very new and the ergonomic factors themselves not yet researched in depth. However, today we have a far better understanding of these effects, and those wishing to implement successful display systems are well advised to be familiar with them. Not only will this lead to a product more acceptable to its intended users, but it compliance with the various standards in this area is often mandatory for even being able to sell the product into a given market.

Besides the standards for flicker already mentioned, items commonly covered in ergonomic guidelines or requirements include minimums and maximums for luminance and contrast, minimum capabilities for positioning the display screen (such as horizontal and vertical tilt/swivel requirements and minimum screen height from the work surface), character size and readability, the use of color, positional stability of the image (e.g., freedom from “jitter”), uniformity of brightness and color, and requirements for minimizing reflections or “glare” from the screen surface. A summary of specifications regarding some of the more important of these, from the ISO 9241-3 standard, is given in Table 2-1.

Table 2-1 Summary of ISO-9241-3 Ergonomic Requirements for CRT Displays


Item


ISO-9241-3

Ref.


Requirement


Design viewing distance


6.1


Min. 400 mm (300 mm in some cases)


Design line of sight angle


6.2


Horizontal to 60 deg. below horizontal


Angle of view


6.3


Legible up to 40° from the normal to the surface of the display


Displayed character height


6.4


Min. 16 minutes of arc; preferably 20-22


Character stroke width


6.5


1/6 to 1/12 of the character height


Character width/height ratio


6.6


0.5:1 to 1:1 allowed; 0.7:1 to 0.9:1 preferred


Between-word spacing


6.11


One character width (capital “N”)


Between-line spacing


6.12


One pixel


Display luminance


6.15


35 cd/m2 min


Luminance contrast


6.16


Minimum 0.5 contrast mod; minimum 3:1 contrast ratio


Luminance uniformity


6.20


Not to exceed 1:7 to 1, as measured from the center to the edge of the display screen


Temporal instability (flicker)


6.23


Flicker-free to at least 90% of the user

 
population


Spatial instability (jitter)


6.24


Maximum 0.0002 mm per mm viewing distance, 0.5-30 Hz

Note: This table is for example only; the complete ISO-9241-3 standard imposes specific measurement requirements and other conditions not detailed here.

时间: 2024-10-29 00:55:24

人类视觉系统的相关文章

小波变换与人类视觉系统 [转自飞鸟的博客]

视觉系统的空间和频率特性是相互依赖的,对于运动图像,存在一种时间分辨率和空间分辨率的交换.实际上,生活中也有这种经验,当快速运动物体从眼前通过时,很难看清其细节,只能看见粗略的轮廓.只有当物体细节大小.细节明暗对比度以及在眼中呈现时间长短都合适时,才能对物体细节有较清楚的感知.对人眼的空间一频率响应曲线的测试表明,当空间频率较高时,空间对比度敏感性下降,也即人眼对快运动物体的细节分辨力低.同样,空间分辨率较高时,人眼对闪烁的敏感度下降,实际上,人眼对运动物体的分辨能力和人眼能不能“跟踪”有关.如

嵌入式视觉系统

随着嵌入式技术的发展,嵌入式视觉技术也越来越比较重视,多年前.人们对嵌入式视觉技术的研究还是很模糊的,而在嵌入式视觉技术高度专业化应用的今天.越来越多的新兴工业为视觉应用找到了用武之地.那么就让我们谈谈如何使用嵌入式视觉技术.采用嵌入式视觉技术的理由以及近期哪些应用最有希望采用嵌入式视觉技术. 更强处理能力 根据定义,嵌入式视觉系统实际上涵盖了执行图像信号处理算法或视觉系统控制软件的任何设备或系统.智能视觉系统中的关键部分是进行实时高清数字视频流处理的高性能计算引擎.大容量固态存储.智能摄像头或

实时视觉系统

一直在想着实现一个数据可视化系统,用于能谱数据,信号,图像处理,以及图像重建,融合,分离.可是时间过去了这么久一直没有做出第一步,今天去阅读了阵列探测器信号优化,忙信号处理,自适应阵列信号处理. 甚至有时在想我的目的是什么,一直想的是实时视觉处理,在视觉处理拼的是仔细,时间成本的投入的年代,而原创性 早已少有的时候,我觉得自己这样选择本没错. 写好了拷贝过来.. 实时视觉系统

机器视觉(4)——视觉系统基本组成

从视觉软件进入机器视觉行业,有必要全局认识一下机器视觉系统组成. 图 1 典型的机器视觉系统可以分为:图像采集部分.图像处理部分和运动控制部分.基于PC的视觉系统具体由如图1所示的几部分组成: ①工业相机与工业镜头——这部分属于成像器件,通常的视觉系统都是由一套或者多套这样的成像系统组成,如果有多路相机,可能由图像卡切换来获取图像数据,也可能由同步控制同时获取多相机通道的数据.根据应用的需要相机可能是输出标准的单色视频(RS-170/CCIR).复合信号(Y/C).RGB信号,也可能是非标准的逐

机器人视觉系统笔记

机器人视觉系统研究 杭电图书馆 科学出版社 总页数:202 唯一QQ:1825587919 唯一WX:ly1825587919 PS:由于阅读效率原因,仅记录关键点 第一章  绪论 第二章 全向视觉系统 1.多摄像机拼接全向视觉系统 ringcam系统   五个摄像头2.鱼眼镜头全向视觉系统 短焦距,超广角镜头3.折反射式全向视觉系统 锥形,椭圆形,双曲线形,抛物线形 水平等比镜面,水平距离成像一样 垂直等比镜面,垂直距离成像一样 角度等比镜面 改进 由内到外 双曲,水平等比,垂直等比 标定方法

OpenCV学习总结(2)- 视觉系统的构成要素

相机将光源中的场景转换成图像,计算机视觉系统对图像进行描述. 照明设备:光源成像设备:相机处理设备:主机算法软件:视觉处理系统 原文地址:https://www.cnblogs.com/ingy0923/p/9029047.html

人类视觉成像原理

一篇论文上看到的,感觉还不错: 活到老,学到老!对于科学知识,我们人类应始终报以敬畏和学习的态度. 版权声明:本文为博主原创文章,未经博主允许不得转载.

一个典型的视觉系统——图像采集卡+计算机+输入/输出+控制机构

图像采集卡 图像采集卡,其功能是将图像信号采集到电脑中,以数据文件的形式保存在硬盘上.它是我们进行图像处理必不可少的硬件设备,通过它,我们就可以把摄像机拍摄的视频信号从摄像带上转存到计算机中. 释意 图像采集卡是图像采集部分和图像处理部分的接口.图象经过采样.量化以后转换为数字图象并输入.存储到帧存储器的过程,叫做采集.图像采集卡还提供数字I/O的功能. 技术参数 (1) 图像传输格式 格式是视频编辑最重要的一种参数,图像采集卡需要支持系统中摄像机所采用的输出信号格式.大多数摄像机采用RS422

视觉系统 分辨率/精度/公差计算公式说明(Kyence)

1.分辨率(Resolution)   单个像素的物理尺寸=视野/像素数目.0.019mm. 2.精度(Accuracy)  =分辨率*有效像素.根据产品表面和照明状况的不同,通过放大图像观察辨别稳定像素的个数(有效像素数目),从而得出精度.如果条件不允许实际测试观察,一般的规律是,如果使用正面打光,有效像素为1个,使用背光,有效像素是0.5个.取一个像素,精度=1*0.019=0.019mm.3.公差(Tolerance).一般情况下,精度和公差的对应关系如下:最小可检测尺寸 = 10倍精度(