监控视频相关数据集

BOSS dataset

Website:

Datasets are available here.

Dataset:

The BOSS project aims at developing an innovative and bandwidth efficient communication system to transmit large data rate communications between public transport vehicles and the wayside. In particular, the BOSS concepts will be evaluated and demonstrated in the context of railway transport. As a matter of fact, security issues, traditionally covered in stations by means of video-surveillance are clearly lacking on-board trains, due to the absence of efficient transmission means from the train to a supervising control centre. Similarly, diagnostic or maintenance issues are generally handled when the train arrives in stations or during maintenance stops, which prevents proactive actions to be carried out.

Dataset include 15 sequences shot by 9 cameras and 8 microphones, all synchronized together to give the possibility of 3D video/audio reconstruction.

In these datasets, we can find the following events:

- Cell phone theft (in Spanish language).

- Check out - a passenger checking out another man‘s wife, then fighting (in French language).

- Disease - a series of 3 passengers fainting, alone in the coach (both in French and Spanish).

- Disease in public (both in French and Spanish).

- Harass - 3 sequences in which a man harasses a woman. In "Harass2", there are other passengers in the coach.

- Newspaper - two sequences (one in French, one in Spanish) in which a passenger harasses another passenger for his newspaper, and end up assaulting him.

- Panic (in French language) - a passenger notices a fire in the next coach, and everybody runs out of the train.

- Two more sequences are provided, containing no incidents whatsoever. They were shot to assess the robustness of incident detection software to false alarms.

- Other sequences are provided, which are not acted incidents but were used for specific incident detection tasks.

Metadata:

Events generated by the BOSS processing are given for some sequences, in a file called "nameofthesequence.xml", in the same directory as the data set of the sequence itself. The format and types of the events are described in a PDF files.

Contextual info:

All the sequences were shot in a Madrid suburban train kindly lent by RENFE who are gratefully acknowledged.
In order to allow as much flexibility as possible, all the video files are uncalibrated, the calibration files are provided along with each sequence and the description of how to use them is given in calibTutorial.pdf . An associated Matlab library is provided in BOSScalibTutorial.zip.

Comments:

Copyrights:

The sequences are provided free of charge for academic research. For any other use, please ask the contact person. Should you care to publish these sequences or results obtained using, please indicate their origin as "BOSS project", and mention the address of the project: http://www.celtic-boss.org
You are welcome to provide a link to the location of the sequences, but copying them to another web site is subject to prior consent of the contact person.

Contact:

[email protected]

EMAV 2009

Website:

Datasets are available here:

http://www.emav09.org/

The objective of the EMAV 2009 (European Micro Aerial Vehicle Conference and Flight Competition) conference is to provide an effective and established forum for discussion and dissemination of original and recent advances in MAV technology. The conference program will consist of a theoretical part and a flight competition. We aim for submission of papers that address novel, challenging and innovative ideas, concepts or systems. We particularly encourage papers that go beyond MAV hardware, and address issues such as the collaboration of multiple MAVs, applications of computer vision, and non-GPS based navigation.

Dataset:

For computer vision researchers an image set is published. The set consists of photos taken with various MAV platforms at different locations. The photos are always stills from movies made by the platform. For this EMAV, there is no explicit assignment or competition linked to this data set. However, possible tasks with the data set are: segmentation of the images in meaningful entities, specific object recognition (cars / roads), construction of image mosaics on the basis of the films, etc.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

info [-at-] emav2009.org

Caltech Pedestrian Dataset

Website:

Datasets are available here:

http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

Dataset:

The Caltech Pedestrian Dataset consists of approximately 10 hours of 640x480 30Hz video taken from a vehicle driving through regular traffic in an urban environment. About 250,000 frames (in 137 approximately minute long segments) with a total of 350,000 bounding boxes and 2300 unique pedestrians were annotated.

Metadata:

The annotation includes temporal correspondence between bounding boxes and detailed occlusion labels. More information can be found in our CVPR09 paper.

Associated Matlab code is available. The annotations use a custom "video bounding box" (vbb) file format. The code also contains utilities to view seq files with the annotations overlayed, evaluation routines used to generate all the ROC plots in the paper, and also the vbb labeling tool used to create the dataset (a slightly outdated video tutorial of the labeler is also).

Contextual info:

Comments:

Copyrights:

Contact:

pdollar[at]caltech.edu

NGSIM

Website:

Datasets are available here (registration is needed):

http://ngsim.fhwa.dot.gov/modules.php?op=modload&name=News&file=article&sid=4

Dataset:

Detailed vehicle trajectory data on parts of highways

Metadata:

Contextual info:

Comments:

Copyrights:

Need to register before using the NGSIM Data Sets.

Contact:

[email protected]

AMI Corpora

Website:

Datasets are available here (registration is needed)

http://corpus.amiproject.org/amicorpus/download/download

Dataset:

This dataset consists in meeting room scenarios, with two people sitting around meeting tables

Around two-thirds of the data has been elicited using a scenario in which the participants play different roles in a design team, taking a design project from kick-off to completion over the course of a day. The rest consists of naturally occurring meetings in a range of domains.

Metadata:

Annotations are available for many different phenomena (dialog acts, head movement etc. ).

See herefor more information.

Contextual info:

Comments:

Copyrights:

Contact:

[email protected]

MORYNE - Traffic scenes mobile video acquisition

Website:

http://www.fp6-moryne.org/

MORYNE aims at contributing to greater transport efficiency, increased transport safety and more environmental friendly transport by improving traffic management in an urban and sub-urban area.

Dataset:

There are sequences from both demonstration busses of the MORYNE project. 
Filenames explicitly provide the date and time of acquisition.

Metadata:

Ground truth is provided in XML format as following:

< event >
     < time >2008-01-18T10:05:10.747209< /time >
     < name >ODOINFO

     < parameters >
          < sender >OBU< /sender >
          < target >MVS< /target >
          < starttime >2008-01-18T10:05:10.747209< /starttime >
          < stoptime >2008-01-18T10:05:11.784436< /stoptime >
          < distance >9.216714< /distance >
     < /parameters >
< /event >

This file gives the distance covered by the bus during the interval starttime - stoptime.

Contextual info:

.idx files
----------
.idx files contain the date and time for each frame in the sequence. The structure of this file is:

- header of 12 bytes
- For each frame, a structure of 24 bytes

The structure contains:
- unsigned 32 bits integer: seconds since Epoch
- unsigned 32 bits integer: microseconds in the second
- unsigned 64 bits integer: offset in bytes in the .avi file
- unsigned 32 bits integer: frame number starting with 0
- unsigned 32 bits integer: frame type as defined by libavcodec (may be useless)

All integers are encoded in little endian.

Comments:

The material for camera calibration and bus speed/context metadata will be added as soon as possible.

Copyrights:

This folder contains a list of test sequences which have been recorded for the MORYNE project (http://www.fp6-moryne.org).
They can be used for non-commercial purpose only, if a reference to the MORYNE project is associated to their use (e.g. in publications, video demontrations...).

Contact:

christophe.parisot(at)multitel.be

BEHAVE - Crowds

Website:

Datasets are available here:

http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/CROWDS/index.html

Dataset:

Data for the real scene:

These are the smoothed flow sequences for the Waverly train station scene. There are 4 files number. (002) is used for testing, the remaining used for training.

Data for the simulated scene

These are the smoothed flow sequences for the train station simulation. There are 30 files divided in the groups below. Use from frame 1100 to 4000. The emergency is at frame 2000.

Group 1: Normal - Training

Group 2: Normal - Testing

Group 3: Emergency - Blocked exit at the bottom of the scene.

Metadata:

No Ground Thruth available

Contextual info:

Comments:

Copyrights:

Free download from website.

Contact:

Dimitrios Makris, [email protected]

CANTATA - Left Objects Dataset

Website:

http://www.multitel.be/~va/cantata/LeftObject/

Dataset:

A number of video clips were recorded acting out the scenario of interest: left objects. 31 sequences of two minutes have be recorded, showing different left objects scenarios (1 or more objects, person staying close to the left object, etc).
The 31 scenarios have been recorded using 2 different cameras (not synchronised), with two different views:

- a Panasonic camera - miniDV, model NV-DS28EG (camera1)

- a Sony camera - miniDV, model DSR-PD170P (camera2)

The videos have the following caracteristics:

- A resolution of 720x576 pixels

- 25 frames per second

- A compression using MPEG4

- The file sizes are of 75 Mo for camera1 and 65 Mo for camera2.

Metadata:

All the sequences are annotated using XML format. Each sequence is associated with a ".xml" annotation file with the same name ending by .gt.xml.

For each left object, we can find in the xml:

- the exact time of the detection

- the position of the object in the image

Contextual info:

Comments:

In each sequence, nothing appends before 30 seconds, and after 1m45s.

Copyrights:

Free download from website. If you publish results using the data, please acknowledge the data as coming from the CANTATA project, found at URL: http://www.hitech-projects.com/euprojects/cantata/. THE DATASET IS PROVIDED WITHOUT WARRANTY OF ANY KIND

Contact:

[email protected]

VISOR - Surveillance

Website:

Datasets are available here:

http://imagelab.ing.unimore.it/visor/

Dataset:

4 types of video clips. These sequences constitute a representative panel of different video surveillance areas.

They merge indoor and outdoor scenes, such as Indoor Domotic Unimore D.I.I. setup.

Metadata:

Object Detection and Tracking.

Contextual info:

Comments:

Mostly simple videos.

Copyrights:

Free download

Contact:

[email protected]

Traffic datasets from Institut fur Algorithmen und Kognitive Systemes

Website:

Sequences are available here:

http://i21www.ira.uka.de/image_sequences/

Dataset:

  • Traffic intersection sequence recorded at the Durlacher-Tor-Platz in Karlsruhe by a stationary camera (512 x 512 grayvalue images (GIF-format))
  • Traffic intersection sequence recorded at the Ettlinger-Tor in Karlsruhe by a stationary camera (512 x 512 grayvalue images (GIF-format))
  • Traffic intersection sequence recorded at the Nibelungen-Platz in Frankfurt by a stationary camera (720 x 576 grayvalue images (GIF-format))
  • Traffic  sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (740 x 560 grayvalue images (GIF-format))
  • Another traffic  sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (702 x 566 grayvalue images (PM-format))
  • Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 grayvalue images (PGM-format),normal conditions)
  • Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 grayvalue images (PGM-format),normal conditions)
  • Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 color images (PPM-format),heavy fog)
  • Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 color images (PPM-format),heavy snowfall)
  • Traffic sequence showing the intersection Karl-Wilhelm-/ Berthold-Straße in Karlsruhe, recorded by a stationary camera (768 x 576 color images (PPM-format),snow on lanes)
  • Traffic sequence showing an intersection at Rheinhafen, Karlsruhe (688 x 565 grayvalue images (PM.GZ-format))
  • Traffic sequence showing a taxi in Hamburg(256 x 191 grayvalue images (PGM-format))

Metadata:

Camera projection data in the file proj.dat which uses the following format:

tx ty tz	# Translation vector Global <---> Camera Coordinatesr11 r12 r13	# r21 r22 r23	#  > 3x3 Rotation Matrix Global <---> Camerar31 r32 r33	# /fx		# Focal length x-direction (pixels)fy		# Focal length y-direction (pixels, usually 4/3 * fx)x0		# Image Center X (pixels)y0		# Image Center Y (pixels)1		# Sharp shadows visible (1=true, 0=false)phi		# Azimut angle for shadowtheta		# Polar angle for shadow

Contextual info:

Different context, snow, fogs, etc.

Comments:

Copyrights:

license (no), cost (free)

Contact:

Sabri Boughorbel (mailto:[email protected])

TRAFICON - Traffic jam

Website:

No

Dataset:

Traffic jam.

Metadata:

Contextual info:

Camera height 12m, Camera: inch sensor, 4 mm lens.

Comments:

Period of road markings is 12m (9+3).

Copyrights:

License (no), cost (free): When dataset is used refer and give credit to Traficon N.V. as follows: " www.traficon.com".

Contact:

Wouter Favoreel, [email protected]

CANDELA - Surveillance

Website:

Datasets are available here:

http://www.multitel.be/~va/candela/

Dataset:

Two different scenarios have been relaized during the CANDELA project : "Indoor abandonned object" and "road intersection".

o Scenario 1: Abandoned object. The detection of abandoned objects is more or less the detection of idle (stationary or non-moving) objects that remain stationary over a certain period of time. The period of time is adjustable. In several types of scenes, idle objects should be detected. In a parking lot e.g., an idle object can be a parked car or a left suitcase. For this scenario we are not looking at the object types "person" or "car", but at unidentified objects, called "unknown objects". An unknown object is any object that is not a person or a vehicle. In general, unknown objects cannot move. What should be detected? : Whenever an unknown object appears in the scene and remains stationary for some amount of time person, an alarm needs to be generated. This alarm must remain active, as long as the unknown object remains stationary.

o Scenario 2: Persons are allowed to cross the street at zebra crossings, a crossing controlled with lights. Alarms should be generated when persons are not allowed to be on the crossing, or when dangerous scenarios occur (cars driving when people crossing). Since the external signal from the traffic light is not available (when the crossing is regulated by traffic lights), detection needs to be done automatically. Detection of persons on the crossing itself is pretty easy, but alarms should only be given when persons are on the crossing, and cars are driving.

Metadata:

Detailed information about data and metadatas can be found here:

http://www.hitech-projects.com/euprojects/candela/pr/scenario_description_document_v06.pdf

Contextual info:

Comments:

Copyrights:

Public domain

Contact:

Xavier Desurmont, [email protected]

OVVV - Virtual sequences

Website:

Datasets are available here:

http://development.objectvideo.com/

Dataset:

The ObjectVideo Virtual Video provides the ability to generate virtual video sequences. These video sequences can then be used to test VCA algorithms.

Metadata:

The automatically generated ground truth is generated in a propriety binary format. The format is open, and a conversion program can be created to convert metadata to any format. A simple bounding box scheme is available, for more powerful validation a "blob" video can be created.

Contextual info:

Virtual environment, the user can make his own environment from the internet. Several camera settings can be changed to simulate real-world cameras more closely.

Comments:

This is not a dataset as is but using these tools, very powerful and tailored; test videos can be created.

Copyrights:

The ObjectVideo Virtual Video Tool is provided free for non-commercial use, for your own research and development purposes. If you publish or distribute images, videos or derivative results based on this software, you must acknowledge ObjectVideo by including "ObjectVideo Virtual Video Tool".

To use the ObjectVideo Virtual Video tool a licence for the commercial game Half-Life 2 is needed (www.steampowered.com).

Contact:

Rick Koeleman, VDG-Security bv. [email protected]

IBM - Tracking

Website:

http://domino.research.ibm.com/comm/research_projects.nsf/pages/s3.performanceevaluation.html

Dataset:

4 outdoor (from PETS2001) of people and vehicles and 11 indoor clips of people.

Metadata:

Motion detection and motion tracking

Contextual info:

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, [email protected]

SPEVI: Multiple faces dataset

Website:

http://www.spevi.org

Dataset:

This is a dataset for multiple people/faces visual detection and tracking. The dataset is composed of 3 sequences (same scenario); 4 targets repeatedly occlude each other while appearing and disappearing from the field of view of the camera. The sequence motinas_multi_face_frontal shows frontal faces only; in motinas_multi_face_turning the faces are frontal and rotated; in motinas_multi_face_fast the targets move faster that in the previous two sequences. Total number of images: 2769, DivX 6 compression,640 x 480 pixels,25 Hz.

Sensor details
- video camera: JVC GR-20EK

Metadata:

Contextual info:

Comments:

Copyrights:

Requested citation acknowledgment: E. Maggio, E. Piccardo, C. Regazzoni, A. Cavallaro. "Particle PHD filter for multi-target visual tracking", in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu (USA), April 15-20, 2007

Contact:

Xavier Desurmont, [email protected]

SPEVI: Single face dataset

Website:

http://www.spevi.org

Dataset:

This is a dataset for single person/face visual detection and tracking. The dataset is composed of five sequences with different illumination conditions and resolutions. Three sequences (motinas_toni, motinas_toni_change_ill and motinas_nikola_dark) are shot with a hand held camera (JVC GR-20EK). In motinas_toni the target moves under a constant bright illumination; in motinas_toni_change_ill the illumination changes from dark to bright; the sequence motinas_nikola_dark is constantly dark. Two sequences (motinas_emilio_webcam and motinas_emilio_webcam_turning) are shot with a webcam (Logitech Quickcam) under a fairly constant illumination.Total number of images: 3018, DivX 6 compression, 640 x 480 pixels and 25 Hz (motinas_toni, motinas_toni_change_ill, motinas_nikola_dark), 320 x 240 pixels and 10 Hz (motinas_emilio_webcam and motinas_emilio_webcam_turning)

Metadata:

The ground truth data is available in the .zip files for the sequences motinas_toni and motinas_emilio_webcam. In the ground truth files each line of text describes the objects‘ position and size in a frame. The syntax of a line is the following: frame number_of_objects obj_1_name x y half_width half_height angle obj_2_name x y half_width half_height angle ...

Contextual info:

Comments:

Copyrights:

Requested citation acknowledgment E. Maggio, A. Cavallaro, "Hybrid particle filter and mean shift tracker with adaptive transition model", in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), Philadelphia, 19-23 March 2005, pp. 221 - 224.

Contact:

Xavier Desurmont, [email protected]

SPEVI: Audiovisual people dataset

Website:

http://www.spevi.org

Dataset:

This is a dataset for uni-modal and multi-modal (audio and visual) people detection tracking. The dataset consists of three sequences recorded in different scenarios with a video camera and two microphones. Two sequences (motinas_Room160 and motinas_Room105) are recorded in rooms with reverberations. The third sequence (motinas_Chamber) is recorded in a room with reduced reverberations. The camera is placed in the centre of a bar that supports two microphones. Total number of images: 3271, Format of images: 8-bit color AVI 360 x 288 pixels 25 fps, audio sampling rate: 44.1 kHz.

Sensor details
- The camera is placed in the centre of a bar that supports two microphones
- Distance between the microphones: 95 cm
Microphones: Beyerdynamic MCE 530 condenser microphones 
- Camera: KOBI KF-31CD analog CCD surveillance camera

Metadata:

The ground truth data are provided together with the sequences in the corresponding .zip file, as list of XML files representing the positions of the objects in the field of view.

Contextual info:

Comments:

Copyrights:

Requested citation acknowledgment Courtesy of EPSRC funded MOTINAS project (EP/D033772/1)

Contact:

Xavier Desurmont, [email protected]

ETISEO - Surveillance

Website:

Datasets are available here: (registration is needed)

http://www-sop.inria.fr/orion/ETISEO/

Dataset:

86 video clips. These sequences constitute a representative panel of different video surveillance areas.

They merge indoor and outdoor scenes, corridors, streets, building entries, subway station... They also mix different types of sensors and complexity levels.

Metadata:

5 different levels: Object Detection, Object Localization, Object Tracking, Object Classification.

Contextual info:

Zone of interest, calibration matrix

Comments:

Copyrights:

Free download but registration and user agreement is required.

Contact:

[email protected]

SELCAT - Level Crossing

Website:

These datasets have been realized during the SELCAT project.

http://www.levelcrossing.net/

Datasets are available here:

http://www.multitel.be/~va/selcat

Dataset:

These datasets are composed of 24 Hours of real sequences, showing a level crossing where some vehicles stop due to its particular configuration: on the right side of the LC, there is an avenue, parallel to the LC. So a traffic light is located just after the LC. Consequently, sometimes, vehicles stopped on the LC due to this traffic light. The Total Amount of data is about 7 GigaBytes.

Metadata:

For each video files, there is a corresponding ground truth file in XML that gives the timestamp of events "stopped vehicles".

Contextual info:

Environment conditions (calibration, scene...)

Comments:

Copyrights:

Licence, Cost, etc.

Contact:

Caroline Machy, [email protected]

BEHAVE - INTERACTION

Website:

http://groups.inf.ed.ac.uk/vision/BEHAVEDATA/INTERACTIONS/

Dataset:

The dataset comprises of two views of various scenario‘s of people acting out various interactions. Ten basic scenarios were acted out. These were called InGroup (IG), Approach (A), WalkTogether (WT), Split (S), Ignore (I), Following (FO), Chase (C), Fight (FI), RunTogether (RT), and Meet (M).The data is captured at 25 frames per second. The resolution is 640x480. The videos are available either as AVI‘s or as a numbered set of JPEG single image files.

Metadata:

Tracking, Event detection.

Contextual info:

3D coordinates of points for calibration purposes provided.

Comments:

The site will be updated when more of the ground truth becomes available.

Copyrights:

Free download from website.

Contact:

Dimitrios Makris, [email protected]

PETS - 2007 - REASON

Website:

Datasets ate available here:

http://www.pets2007.net/

Dataset:

The datasets are multisensor sequences containing the following 3 scenarios, with increasing scene complexity: 1. loitering, 2. attended luggage removal (theft), 3. unattended luggage.

Metadata:

Event Detection

Contextual info:

Calibration provided

Comments:

Free download from website . The UK Information Commisioner has agreed that the PETS 2007 datasets described here may be made publicly available for the purposes of academic research. The video sequences are copyright UK EPSRC REASON Project consortium and permission is hereby granted for free download for the purposes of the PETS 2007 workshop.

Copyrights:

Contact:

Dimitrios Makris, [email protected]

PETS - 2006 - ISCAPS

Website:

Datasets are available here:

http://www.pets2006.net/

Dataset:

Surveillance of public spaces, detection of left luggage events. Scenarios of increasing complexity, captured using multiple sensors.

Metadata:

All scenarios come with two XML files. The first of these files contains camera calibration parameters, these are given in the sub-directory ‘calibration‘. See the previous section (Calibration Data) for information on this XML file format. The second XML file (given in the sub-directory ‘xml‘) contains both configuration and ground-truth information.

Contextual info:

Calibration provided.

Comments:

Copyrights:

Free download from website . The UK Information Commisioner has agreed that the PETS 2006 data-sets described here may be made publicly available for the purposes of academic research. The video sequences are copyright ISCAPS consortium and permission is hereby granted for free download for the purposes of the PETS 2006 workshop.

Contact:

Dimitrios Makris, [email protected]

PETS - 2005 - WAMOP

Website:

Datasets are available here: (registration is needed)

http://www.vast.uccs.edu/~tboult/PETS05/

Dataset:

Challenging detection/tracking scenes on water.

Metadata:

Object Detection/Tracking.

Contextual info:

Comments:

Copyrights:

Free download from website, but registration is required.

Contact:

Dimitrios Makris, [email protected]

PETS - ECCV‘2004 - CAVIAR

Website:

http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/

or http://www-prima.inrialpes.fr/PETS04/caviar_data.html

Dataset:

A number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, fighting and passing out and last, but not least, leaving a package in a public place. All video clips were filmed with a wide angle camera lens. The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are mostly between 6 and 12 MB, a few up to 21 MB.A number of video clips were recorded acting out the different scenarios of interest. These include people walking alone, meeting with others, window shopping, fighting and passing out and last, but not least, leaving a package in a public place. All video clips were filmed with a wide angle camera lens. The resolution is half-resolution PAL standard (384 x 288 pixels, 25 frames per second) and compressed using MPEG2. The file sizes are mostly between 6 and 12 MB, a few up to 21 MB.

Metadata:

Person/Group Tracking, Person/Group Activity Recognition, Scenario/Situation Recognition

Contextual info:

3D coordinates of points for calibration purposes provided.

Comments:

Copyrights:

Free download from website. If you publish results using the data, please acknowledge the data as coming from the EC Funded CAVIAR project/IST 2001 37540, found at URL:http://www.dai.ed.ac.uk/homes/rbf/CAVIAR/

Contact:

Dimitrios Makris, [email protected]

PETS 2002

Website:

Datasets are available here:

http://www.cvg.cs.rdg.ac.uk/PETS2002/pets2002-db.html

Dataset:

Indoor people tracking (and counting). Two training and four testing sequences consist of people moving in front of a shop window. Sequences are provided as both MPEG movie format and as individual JPEG images.

Metadata:

People tracking, counting and activity recognition.

Contextual info:

No calibration provided

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, [email protected]

PETS 2001

Website:

Datasets are available here:

http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-dataset.html

http://www.cvg.cs.rdg.ac.uk/cgi-bin/PETSMETRICS/page.cgi?dataset

Dataset:

Outdoor people and vehicle tracking (two synchronised views; includes omnidirectional and moving camera). PETS‘2001 consists of five separate sets of training and test sequences, i.e. each set consists of one training sequence and one test sequence. All the datasets are multi-view (2 cameras) and are significantly more challenging than for PETS‘2000 in terms of significant lighting variation, occlusion, scene activity and use of multi-view data.

Metadata:

Tracking information on image plane and ground plane can be found at:

http://www.cvg.cs.rdg.ac.uk/PETS2001/ANNOTATION/

Contextual info:

Camera Calibration provided

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, [email protected]

PETS 2000

Website:

ftp://ftp.pets.rdg.ac.uk/pub/PETS2000/

Dataset:

Outdoor people and vehicle tracking (single camera).

Two sequences:

a) Training sequence of 3672 frames at 25 Hz (146.88 secs).

b) Test sequence of 1452 frames (58.08 secs).

The sequences are available in 2 formats:

a) QuickTime movie format with Motion JpegA compression (training.mov and test.mov).

b) Individual Jpeg files (training_images/*.jpg and test_9images/*.jpeg).

Metadata:

No Ground Truth provided.

Contextual info:

Camera Calibration provided.

Comments:

Copyrights:

Free download

Contact:

Dimitrios Makris, [email protected]

PETS

Website:

Website: http://www.cvg.rdg.ac.uk/slides/pets.html

Dataset:

Each year PETS runs an evaluation framework on specific datasets with specific objective. 2000: 2001.... (more on duration and theme)

Metadata:

Ground truth depends on the theme of each year‘s workshop.

Contextual info:

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, [email protected]

I-LIDS - Surveillance

Website:

http://scienceandresearch.homeoffice.gov.uk/hosdb/cctv-imaging-technology/video-based-detection-systems/i-lids/

Dataset:

4 scenarios (Parked Vehicle, Abandoned Package, Doorway Surveillance and Sterile Zone) x 2 datasets (training, testing) each. Each dataset contains about 24 hours of footage in few different scenes.

Metadata:

Event-based Ground truth.

Contextual info:

Images of a pedestrian model in different positions are given for calibration purposes

Comments:

7 free clips for 2 scenarios (Parked Vehicle, Abandoned Package) are available from: http://www.elec.qmul.ac.uk/staffinfo/andrea/avss2007_d.html

Copyrights:

A user agreement and a payment (£500-£650 per dataset) is required to obtain each dataset. Datasets are provided in hard disks.

Contact:

Dimitrios Makris, [email protected]

MEDICAL

DDSM: Digital Database for Screening Mammography

Website:

Datasets are available here:

http://marathon.csee.usf.edu/Mammography/Database.html

Dataset:

The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The database contains approximately 2620 cases available in 43 volumes (healthy and diseased).

Metadata:

Images containing suspicious areas have associated pixel-level "ground truth" information about the locations and types of suspicious regions.

Contextual info:

Each study includes two images of each breast, along with some associated patient information (age at time of study, ACR breast density rating, subtlety rating for abnormalities, ACR keyword description of abnormalities) and image information (scanner, spatial resolution, ...). A case consists of between 6 and 10 files. These are an "ics" file, an overview "16-bit PGM" file, four image files that are compressed with lossless JPEG encoding and zero to four overlay files. Normal cases will not have any overlay files.

Comments:

Copyrights:

If you use data from DDSM in publications:

Please credit the DDSM project as the source of the data, and reference: ?The Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, Richard Moore and W. Philip Kegelmeyer, in Proceedings of the Fifth International Workshop on Digital Mammography, M.J. Yaffe, ed., 212-218, Medical Physics Publishing, 2001. ISBN 1-930524-00-5?. ?Current status of the Digital Database for Screening Mammography, Michael Heath, Kevin Bowyer, Daniel Kopans, W. Philip Kegelmeyer, Richard Moore, Kyong Chang, and S. MunishKumaran, in Digital Mammography, 457-460, Kluwer Academic Publishers, 1998; Proceedings of the Fourth International Workshop on Digital Mammography?. Also, please send a copy of your publication to Professor Kevin Bowyer / Computer Science and Engineering / University of Notre Dame / Notre Dame, Indiana 46530.

Contact:

Cedric Marchessoux, [email protected]

The Volume Library

Website:

Datasets are available here:

http://www9.informatik.uni-erlangen.de/External/vollib/

Dataset:

Name of the set, Anatomy, resolution, number of bits

Metadata:

Contextual info:

Environment conditions (calibration, scene...): scanning parameters

Comments:

Mainly CT, PET, MRI. Additional comments are available, all the dataset are not only medical content, you could find a scan of a bonzaï. The raw data can be extracted easily using the PVM tools distributed with the V^3 volume rendering package available at http://www.stereofx.org/

Copyrights:

Commercial use is prohibited and no warranty whatsoever is expressed, credit should be given to the group who created the dataset.

Contact:

Stefan Roettger ([email protected]) or Cedric Marchessoux ([email protected])

DICOM sample image sets

Website:

http://pubimage.hcuge.ch:8080

http://pubimage.hcuge.ch/

Dataset:

DICOM sample image sets with alias name, the modality, the file size with a short description.

Metadata:

Contextual info:

Environment conditions (calibration, scene...)

Comments:

Mainly CT and MRI, more than 10 GB of data.

Copyrights:

Click on the thumbnail images to download the full set of corresponding DICOM images

Contact:

Cedric Marchessoux ([email protected])

MyPACS.net, reference case manager

Website:

Datasets are available here:

http://www.MyPACS.net

Dataset:

MyPACS.net is still free, and it now has over 16,500 teaching files contributed by 14,000 registered users. With 75,000 key images categorized by anatomy and pathology, you can quickly find examples of any disease. The web-based viewer has been improved with more PACS-like features, and it still works instantly in your browser, requiring nothing to download.

The datasets contain:

1. Cranium and Contents (1205)
2. Face and Neck (398)
3. Spine and Peripheral Nervous System (504)
4. Skeletal System (3433)
5. Heart (160)
6. Chest (894)
7. Gastrointestinal (1271)
8. Genitourinary (800)
9. Vascular/Lymphatic (416)
10. Breast (62)
11. Other (458)

Metadata:

Description of the pathology by medical doctors.

Contextual info:

Environment conditions (calibration, scene...): Medical modality described: Brand and acquisition conditions

Comments:

Copyrights:

MyPACS.net is still free, you need to be registered.

Contact:

Cedric Marchessoux ([email protected])

The NCIA (National Cancer Imaging Archive from National Cancer Institute) data base

Website:

Datasets are available here:

https://imaging.nci.nih.gov/ncia/

Dataset:

Description of Dataset (Content, size, etc): CT scans with xml files for the ground truth, and also other modalities.

Metadata:

Groundtruth stored in xml

Contextual info:

Environment conditions (calibration, scene...): X-ray scanner system: Brand and acquisition conditions

Comments:

Copyrights:

The user should ask for a login. You may browse, download, and use the data for non-commercial, scientific and educational purposes. However, you may encounter documents or portions of documents contributed by private institutions or organizations. Other parties may retain all rights to publish or produce these documents. Commercial use of the documents on this site may be protected under United States and foreign copyright laws. In addition, some of the data may be the subject of patent applications or issued patents, and you may need to seek a license for its commercial use. NCI does not warrant or assume any legal liability or responsibility for the accuracy, completeness or usefulness of any information in this archive.

Contact:

Cedric Marchessoux ([email protected])

Conventional x-ray mammography data base

Website:

No official website, via Elizabeth Krupinski ([email protected])

Dataset:

Real masses, micro calcifications, backgrounds, conventional x-ray mammography, bmp images with resolution of 256x256.

Metadata:

None, signals can be extracted by substraction between backrgrounds alone and background+signals at 100% density

Contextual info:

Environment conditions (calibration, scene...): X-ray system

Comments:

See examples:
1. Backgrounds,
2. Signals: masses
3. Signals: micro calcifications

Copyrights:

Via Elizabeth Krupinski ([email protected]) free but credit should be given to them if publication.

Contact:

Elizabeth Krupinski ([email protected]) or Cedric Marchessoux ([email protected])

JSRT - Standard Digital Image Database (X-RAY)

Website:

Datasets are available here:

http://www.jsrt.or.jp/web_data/english03.html

Dataset:

Around 5 datasets of 250 images, x-ray chest healthy and diseased with nodules. 2048x2048, white is zero, big endian.

Metadata:

Per image, clinical metadata in txt file for each image with patient information age, sexe and images in itf with nodule, cancer, infection position.

Contextual info:

Environment conditions (calibration, scene...): X-ray system

Comments:

THe dataset should be ordered by email with a Visa card number. The dataset is delivered by post after one week. The price per dataset is more than reasonable.

Copyrights:

For publication credit should be given by citing in references the following article:
o J. Shiraishi et al. Development of a Digital Image Database for Chest Radiographs with and without a Lung Nodule: Receiver Operating Characteristic Analysis of Radiologists, Detection of Pulmonary Nodules. AJR, 174(1):71-74, 2000.

Contact:

Cedric Marchessoux ([email protected])

CONSUMER APPLICATIONS

ICCV 2007 - Optical Flow Performance Evaluation

Website:

Dataset can be found here: http://vision.middlebury.edu/flow/data/

Dataset:

Datasets are here composed of sets of images to evaluate optical flow.

Sets can be made of 2 or 8 images for the evaluation in color or graylevel format.

Metadata:

GT is not provided for all datasets

Contextual info:

Flow accuracy and interpolation evaluation

We report two measures of flow accuracy (angular and end-point error) and two measures of interpolation quality. For each of the 4 measures we report 8 error metrics, resulting in a total of 32 tables. Links to the 4 measures are included below, but the tables are also linked among each other. At this point we do not identify a "default" measure or metric, and thus we do not provide an overall ranking of methods.

Comments:

The ground-truth flow is provided in a .flo format. Information and C++ code is provided in flow-code.zip, which contains the file README.txt. A Matlab version is also available in flow-code-matlab.zip.

Copyrights:

thanks to Brad Hiebert-Treuer and Alan Lim, who spent countless hours creating the hidden texture datasets

Contact:

Basket-ball - APIDIS

Website:

Sequences are available here: http://www.apidis.org/Public/

This page gives access to the first acquisition campaign of basket ball data during the APIDIS European project.

Dataset:

The dataset is composed of a basket ball game.

  • Seven 2-Mpixels color cameras around and on top of a basket ball court

Note: Due to bandwidth limitations, only a part of the basket ball game is availbale from this web site. Please contact us (bottom of this page) for more data.

Metadata:

  • Time stamp for each frame (all cameras being captured by a unique server at ~22 fps)
  • Manually annotated basket ball events
  • Manually annotated objects positions
  • Calibration data

Contextual info:

All cameras are Arecont Vision AV2100M IP cameras. The datasheets can be downloaded from the constructor site here and here.
Lenses: The fish-eye lenses used for the top view cameras are Fujinon FE185C086HA-1 lenses.

Comments:

Copyrights:

This dataset is available for non-commercial research in video signal processing only. We kindly ask you to mention the APIDIS project when using this dataset (in publications, video demonstrations...).

Contact:

christophe.devleeschouwer(at)uclouvain.be or Damien.Delannay(at)uclouvain.be

Freesound

Website:

Datasets are available here:

http://freesound.iua.upf.edu/

Dataset:

The Freesound Project is a collaborative database of Creative Commons licensed sounds. Freesound focusses only on sound, not songs.

Metadata:

Contextual info:

Comments:

Copyrights:

Creative Commons

Contact:

The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) Project

Website:

Datasets are available here:

http://www.music-ir.org/evaluation/

Dataset:

The objective of the International Music Information Retrieval Systems Evaluation Laboratory project (IMIRSEL) is the establishment of the necessary resources for the scientifically valid development and evaluation of emerging Music Information Retrieval (MIR) and Music Digital Library (MDL) techniques and technologies.

Metadata:

Contextual info:

Comments:

Copyrights:

Available on request

Contact:

Public domain

Website:

Datasets are available here:

http://www.publicdomaintorrents.com/ Lien bittorrent

Dataset:

10 movies (from 1930-1950, some more recent), most are in color

Metadata:

The databases can be shared and are available on the internet. No annotation or ground-truth is currently available. It will be added when available.

Contextual info:

Comments:

Copyrights:

all fall now in the public domain

Contact:

Sabri Boughorbel

Phillips Internal dataset

Website:

none

Dataset:

Metadata:

we can provide the metadata such as shot, scene cuts, face, eye position, identity etc.

Contextual info:

Comments:

Copyrights:

Contact:

Sabri Boughorbel

RWC Music Database

Website:

Datasets are available here:

http://staff.aist.go.jp/m.goto/RWC-MDB/

Dataset:

The RWC (Real World Computing) Music Database is a copyright-cleared music database (DB) that is available to researchers as a common foundation for research.

Metadata:

MIDI files, genre, lyrics

Contextual info:

Comments:

Copyrights:

Users who have submitted the Pledge and received authorization may freely use the database for research purposes without facing the usual copyright restrictions, but all of the copyrights and neighboring rights connected with this database belong to the National Institute of Advanced Industrial Science and Technology and are managed by the RWC Music Database Administrator. Persons or organizations that have not submitted a Pledge and that have not received authorization may not use the database.

Contact:

CVBASE - 2006

Website:

Datasets are available here:

http://vision.fe.uni-lj.si/cvbase06/downloads.html

Dataset:

Video data (.avi, DivX compressed). Dataset includes three types of sports: European (team) handball (3 synchronized videos, 10 min, 25 FPS, 384x288, Divx 5 AVI), Squash (2 videos from 2 separate matches, 25 FPS, 384x288, DivX AVI) , Basketball (videos only, 2 synchronized overhead videos in 2 quality modes 368x288, 25FPS, 5 minutes each and 720x576, 25 FPS 2 minutes each).

Metadata:

Annotations (individual player actions, group activity). Suitable for use as a gold standard. Trajectories (player positions in court and camera coordinate systems). These are not intended to be used as a gold standard, since their accuracy is not particularly high.

Contextual info:

Comments:

Copyrights:

nothing defined from website

Contact:

Xavier Desurmont, [email protected]

VSPETS - 2003 - INMOVE

Website:

Datasets are available here:

ftp://ftp.cs.rdg.ac.uk/pub/VS-PETS/

Dataset:

Outdoor people tracking - football data (three synchronised views). The datasets consists of football players moving around a pitch.

Metadata:

Tracking information on image plane for camera 3 can be downloaded. An AVI file of the ground truth for camera view 3 is also available.

Contextual info:

Comments:

Copyrights:

Free download from website

Contact:

Dimitrios Makris, [email protected]

Trictrac

Website:

http://www.multitel.be/trictrac/?mod=3

Dataset:

HD progressive image in jpeg for synthetic video sequence of soccer.

Metadata:

XML (position is 2D, 3D of objects and camera)

Contextual info:

no

Comments:

The dataset is fully described in "TRICTRAC Video Dataset: Public HDTV Synthetic Soccer Video Sequences With Ground Truth", X. Desurmont, J-B. Hayet, J-F. Delaigle, J. Piater, B. Macq, Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE), 2006.

Copyrights:

All data is publicly available and downloadable. If you publish results using the data, please acknowledge the data as coming from the TRICTRAC project, found at URL: http://www.multitel.be/trictrac. THE DATASET IS PROVIDED WITHOUT WARRANTY OF ANY KIND.

Contact:

Xavier Desurmont, [email protected]

OTHERS

PETS - 2009

Website:

The datasets are available here:

http://www.cvg.rdg.ac.uk/PETS2009/

Dataset:

Pets 2009 : Eleventh IEEE International Workshop on Performance Evaluation of Tracking and Surveillance

One-day workshop organised in association with CVPR 2009, supported by the EU project SUBITO.

The datasets for PETS 2009 consider crowd image analysis and include crowd count and density estimation, tracking of individual(s) within a crowd, and detection of separate flows and specific crowd events. Click on the link to the left to view the benchmark data.

The dataset is organised as follows:

  • Calibration Data
  • S0: Training Data
    • contains sets background, city center, regular flow
  • S1: Person Count and Density Estimation
    • contains sets L1,L2,L3
  • S2: People Tracking
    • contains sets L1,L2,L3
  • S3: Flow Analysis and Event Recognition
    • contains sets Event Recognition and Multiple Flow

Metadata:

Contextual info:

Comments:

Copyrights:

Please e-mail [email protected] if you require assistance obtaining these datasets for the workshop.

Contact:

[email protected]

IPPR : contest motion segmentation dataset

Website:

Datasets are available here:

http://media.ee.ntu.edu.tw/Archer_contest/

Dataset:

3 different context of walking persons.

Metadata:

Segmentation of person is provided.

Contextual info:

Comments:

Copyrights:

Contact:

GavabDB : 3D face database

Website:

Datasets are available here:

http://gavab.escet.urjc.es/recursos_en.html

Dataset:

GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. The total of the individuals are Caucasian and their age is between 18 and 40 years old. Each image is given by a mesh of connected 3D points of the facial surface without texture. The database provides systematic variations with respect to the pose and the facial expression. In particular, the 9 images corresponding to each individual are: 2 frontal views with neutral expression, 2 x-rotated views (±30º, looking up and looking down respectively) with neutral expression, 2 y-rotated views (±90º, left and right profiles respectively) with neutral expression and 3 frontal gesture images (laugh, smile and a random gesture chosen by the user, respectively).

Metadata:

Contextual info:

Comments:

Copyrights:

Those publications that use this signature date must reference the following work: A.B. Moreno y A.Sanchez. GavabDB: A 3D Face Database. Proc. 2nd COST Workshop on Biometrics on the Internet: Fundamentals, Advances and Applications, C. Garcia et al (eds): Proc. 2nd COST Workshop on Biometrics on the Internet: Fundamentals, Advances and Applications, Ed. Univ. Vigo, pp. 77-82, 2004

Contact:

3D_RMA : 3D database

Website:

Datasets are available here:

http://www.sic.rma.ac.be/~beumier/DB/3d_rma.html

Dataset:

120 persons were asked to pose twice in front of the system: in Nov 97 (session1) and in January 98 (session2). For each session, 3 shots were recorded with different (but limited) orientations of the head: straight forward / Left or Right / Upward or downard.

Among the 120 people, two thirds consist of students from the same ethnic origins and with nearly the same age. The last third consists of people of the academy, all aged between 20 and 60.

Different problems encountered in the cooperative scenario were taken into account. People sometimes worn their spectacles, sometimes didn‘t. Beards and moustaches were represented. Some people smiled in some shots. Small up/down and left/right rotations of the head were requested. We regret that only a few (14) women were available.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

[email protected]

Actions as Space-Time Shapes

Website:

Datasets are available here:

http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html

Dataset:

Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et. al. for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

[email protected]

KTH - Recognition of human actions

Website:

Datasets are available here:

http://www.nada.kth.se/cvap/actions/

Dataset:

The current video database containing six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 25 subjects in four different scenarios: outdoors s1, outdoors with scale variation s2, outdoors with different clothes s3 and indoors s4 as illustrated below. Currently the database contains 2391 sequences. All sequences were taken over homogeneous backgrounds with a static camera with 25fps frame rate. The sequences were downsampled to the spatial resolution of160x120 pixels and have a length of four seconds in average.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

laptev(at)nada.kth.se

PLIA2

Website:

Datasets are available here:

http://architecture.mit.edu/house_n/data/PlaceLab/PLIA2.htm

Dataset:

The researcher was asked to perform a set of common household activities during the four-hour period using a set of instructions. Activities included the following: preparing a recipe, doing a load of dishes, cleaning the kitchen, doing laundry, making the bed, and light cleaning around the apartment. The volunteer determined the sequence, pace, and concurrency of these activities and also integrated additional household tasks. Our intent was to have a short test dataset of a manageable size that could be easily placed on the web without concerns about anonymity. We wanted this test dataset, however, to show a variety of activity types and activate as many sensors as possible, but in a natural way. In addition to the activities above, the researcher searches for items, uses appliances, talks on the phone, answers email, and performs other everyday tasks. The researcher five mobile accelerometers (one on each limb and one on the hip) and a Polar M32 wireless heart rate monitor. The researcher carried an SMT 5600 mobile phone that ran experience sampling software that beeped and presented a set of questions about her activities.

Metadata:

The dataset includes four hours of partially (and soon to be fully) annotated video. The annotation was done using custom annotation software written by Randy Rockinson and Leevar Williams of MIT House_n. This software (called HandLense) is available for researchers to use to study this dataset. [Overview of HandLense and executable]

The annotations include descriptors for body posture, type of activity, location, and social context.

Contextual info:

Comments:

Copyrights:

Contact:

MuHAVi: Multicamera Human Action Video Data

Website:

Datasets are available here:

http://dipersec.king.ac.uk/MuHAVi-MAS/

Dataset:

Here is collected a large body of human action video (MuHAVi) data using 8 cameras. There are 17 action classes performed by 14 actors. So far we have processed videos corresponding to 7 actors in order to split the actions and provide the JPG image frames. However, we have included some image frames before and after the actual action, for the purpose of background subtraction, tracking, etc. The longest pre-action frames correspond to the actor called Person1. Each actor performs each action several times in the action zone highlighted using white tapes on the scene floor. As actors were amateurs the leader had to interrupt the actors in some cases and ask them to redo the action for consistency. We have used 8 CCTV Schwan cameras located at 4 sides and 4 corners of a rectangular platform. Note that these cameras are not necessarily synchronised. We are working on improving the synchronisation between the images corresponding to different cameras.

Metadata:

Calibration information may be included here in the future. Meanwhile, one can use the patterns on the scene floor to calibrate the cameras of interest.

Contextual info:

Comments:

Copyrights:

Contact:

[email protected]

ViHASi: Virtual Human Action Silhouette Data

Website:

Datasets are available here:

http://dipersec.king.ac.uk/VIHASI/

Dataset:

This dataset provides a large body of synthetic video data generated for the purpose of evaluating different algorithms on human action recognition which are based on silhouettes. The data consist of 20 action classes, 9 actors and up to 40 synchronised perspective camera views. It is well known that for the action recognition algorithms which are purely based on human body masks, where other image properties such as colour and intensity are not used, it is important to obtain accurate silhouette data from video frames. This problem is not usually considered as part of the action recognition, but as a lower level problem in the motion tracking and change detection. Hence for researchers working on the recognition side, access to reliable Virtual Human Action Silhouette (ViHASi)data semmes to be both a necessity and a relief. The reason for this is that such data provide a wat of comprehensive experimentation and evaluation of the methods under study, that might even lead to thier improvments.

Metadata:

Contextual info:

Comments:

Copyrights:

Contact:

[email protected]

Daimler - Pedestrian Dataset

Website:

Datasets are available here:

http://www.gavrila.net/Computer_Vision/Research/Pedestrian_Detection/DC_Pedestrian_Class__Benchmark/dc_pedestrian_class__benchmark.html

Dataset:

The dataset contains a collection of pedestrian and non-pedestrian images. It is made available for download on this site for benchmarking purposes, in order to advance research on pedestrian classification.

The dataset consists of two parts:

  • base data set. The base data set contains a total of 4000 pedestrian- and 5000 non-pedestrian samples cut out from video images and scaled to common size of 18x36 pixels. This data set has been used in Section VII-A of the paper referenced above.

    Pedestrian images were obtained from manually labeling and extracting the rectangular positions of pedestrians in video images.  Video images were recorded at various (day) times and locations with no particular constraints on pedestrian pose or clothing, except that pedestrians are standing in upright position and are fully visible. As non-pedestrian images, patterns representative for typical preprocessing steps within a pedestrian classification application, from video images known not to contain any pedestrians. We chose to use a shape-based pedestrian detector that matches a given set of pedestrian shape templates to distance transformed edge images (i.e. comparatively relaxed matching threshold).

  • additional non-pedestrian images. An additional collection of 1200 video images NOT containing any pedestrians, intended for the extraction of additional negative training examples. Section V of the paper referenced above describes two methods on how to increase the training sample size from these images, and Section VII-B lists experimental results.

Metadata:

Contextual info:

Comments:

Copyrights:

This dataset is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given.

Contact:

gavrila(at)science.uva.nl

TERRASCOPE

Website:

Datasets are available here:

http://www.metaverselab.org/datasets/terrascope/

Dataset:

The dataset consists of nine different cameras, deployed over several different rooms and a hallway in a ``laboratory/office" setting. Several different scenarios were collected from the cameras. A two minute sequence was captured of researchers/staff/visitors going about their daily activities. In addition three different scenarios were scripted so that particular behaviors were exibited in the data.

During data collection, all cameras wrote raw (uncompressed) data at a resolution of 640x480. All machine clocks were synchonrized via the NTP. In addition to each frame, a timestamp was recorded so that frames can be associated with one another across cameras.

Selected Ground Truth (102 MB) - frames with hand-marked labels of individuals and objects

Scenario 1 (11.8 GB) - “Group Meeting”

Scenario 2 (11.2 GB) - “Group Exit and Intruder”

Scenario 3 (17.4 GB) - “Suspicious Behavior/Theft”

Unscripted Activities (59.6 GB) - natural behavior and activities

Subject Face/Gait Database (101 MB) - face pictures and video of subjects walking in front of the camera

Metadata:

Extensive groundtruth is also provided. Entrance and exit times for individuals in each camera, foreground segmentation, and activity labeling is all part of the dataset.

Contextual info:

Comments:

Copyrights:

Public datasets

Contact:

OTCBVS Benchmark Dataset Collection

Website:

Datasets are available here:

http://www.cse.ohio-state.edu/otcbvs-bench/

Dataset:

This is a publicly available benchmark dataset for testing and evaluating novel and state-of-the-art computer vision algorithms. Several researchers and students have requested a benchmark of non-visible (e.g., infrared) images and videos. The benchmark contains videos and images recorded in and beyond the visible spectrum and is available for free to all researchers in the international computer vision communities. Also it will allow a large spectrum of IEEE and SPIE vision conference and workshop participants to explore the benefits of the non-visible spectrum in real-world applications, contribute to the OTCBVS workshop series, and boost this research field significantly.

There are 7 datasets:

1) Dataset 01: OSU Thermal Pedestrian Database

2) Dataset 02: IRIS Thermal/Visible Face Database

3) Dataset 03: OSU Color-Thermal Database

4) Dataset 04: Terravic Facial IR Database

5) Dataset 05: Terravic Motion IR Database

6) Dataset 06: Terravic Weapon IR Database

7) Dataset 07: CBSR NIR Face Dataset

Metadata:

Contextual info:

Comments:

Copyrights:

Register (name, institution, email) to download the datasets.

Contact:

[email protected]

Eyes and faces dataset

Website:

Datasets are available here:

http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html

http://www.multitel.be/~va/cantata/EyesAndFaces/index.html

Dataset:

Hereby the eyes ground truth in Viper format of face YaleB database containing 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions) + 650 viper files. Ground truth developed in the context of CANTATA project, developed by BARCO

Metadata:

All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.xml” annotation file with the same name, containing the iris positions. The position corresponds to crosses. The path of the bmp image should be changed in the viper file.

Contextual info:

For every subject in a particular pose, an image with ambient (background) illumination was also captured. Hence, the total number of images is in fact 5760+90=5850. The total size of the compressed database is about 1GB.

Comments:

The dataset already exists without the ground truth in Viper format. The ground truth was either generated or converted in Viper format in the context of Cantata project. The metadata were generated by Arnaud Joubel.

Copyrights:

Dataset YaleB: You are free to use the Yale Face Database B for research purposes. If experimental results are obtained that use images from within the database, all publications of these results should acknowledge the use of the "Yale Face Database B" and reference to “Georghiades, A.S. and Belhumeur, P.N. and Kriegman, D.J. From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose", IEEE Trans. Pattern Anal. Mach. Intelligence, 2001, 23, number, 643-660”.

Ground truth in Viper: Requested citation acknowledgment about the ground truth:
Courtesy of ITEA2 funded Cantata project

Contact:

Quentin Besnehard, [email protected] or Cedric Marchessoux, [email protected]

Anti Aliased Text Dataset

Website:

Datasets are available here:

http://www.multitel.be/~va/cantata/AntiAliased/index.html

Dataset:

Set of bitmap images containing anti-aliased text in the context of CANTATA project, developed by BARCO. Number of images in the archive (2400 available in the archive)

Metadata:

All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.grid.xml” annotation file with the same name. The annotation takes the form of a grid of 32x32 pixels bounding boxes. The path of the bmp image should be changed in the viper file if you want to open it in viper-gt.

Contextual info:

The text is represented in different colors: black on white, white on black, random dark color on white, white on random dark color, black on random light color, random light color on white, random dark color on random light color and, finally, random light color on random dark color.The annotation takes the form of a grid of 32x32 pixels bounding boxes.

Comments:

The dataset and the ground truth were generated by Quentin Besnehard and Arnaud Joubel. To obtain the complete dataset, send an e-mail to the contact person

Copyrights:

The fonts used are available under the GNU General Public License version 2.0. These fonts are free clones of the original fonts provided by URW typeface foundry.

Requested citation acknowledgment about the dataset and the ground truth : Courtesy of ITEA2 funded Cantata project.

Contact:

Quentin Besnehard, [email protected] or Cedric Marchessoux, [email protected]

Aliased Text Dataset

Website:

Datasets are available here:

http://www.multitel.be/~va/cantata/Aliased

Dataset:

Set of bitmap images containing aliased text (2 colors) in the context of CANTATA project, developed by BARCO. Number of images in the archive (1250 available in the archive)

Metadata:

All the images are annotated with Viper XML files. Each “.bmp” image is associated with a “.grid.xml” annotation file with the same name. The annotation takes the form of a grid of 32x32 pixels bounding boxes. The path of the bmp image should be changed in the viper file if you want to open it in viper-gt.

Contextual info:

The text is represented in different colors: black on white, white on black, random dark color on white, white on random dark color, black on random light color, random light color on white, random dark color on random light color and, finally, random light color on random dark color. Fonts used (from 7 to 42 points):

  • Helvetica
  • Optima
  • AvantGarde
  • Times
  • Palatino
  • Courier
  • Century

Comments:

The dataset and the ground truth were generated by Quentin Besnehard and Cédric Marchessoux.

Copyrights:

The fonts used are available under the GNU General Public License version 2.0. These fonts are free clones of the original fonts provided by URW typeface foundry. Requested citation acknowledgment about the data set and the ground truth: Courtesy of ITEA2 funded Cantata project

Contact:

Quentin Besnehard, [email protected]; C?dric Marchessoux, [email protected]

PETS - ICVS - 2003 - FGnet

Website:

Datasets are available here:

http://www.cvg.cs.rdg.ac.uk/PETS-ICVS/pets-icvs-db.html

Dataset:

Smart meeting, that includes facial expressions, gaze and gesture/action. The environment consists of three cameras: one mounted on each of two opposing walls, and an omnidirectional camera positioned at the centre of the room. The dataset consists of four scenarios.

Metadata:

a) Eye positions of people in Scenarios A, B and D. (every 10th frame is annotated).

b) Facial expression and gaze estimation for Scenarios A and D, Cameras 1-2.

c) Gesture/action annotations for Scenarios B and D, Cameras 1-2.

Contextual info:

Camera Calibration provided.

Comments:

Copyrights:

Free download

Contact:

Dimitrios Makris, [email protected]

RESSOURCES AND LINKS

Medical datasets

Datasets are available here:

http://gdcm.sourceforge.net/wiki/index.php/Sample_DataSet#DataSet

This website contains a multiple links to medical datasets.

TRECVID

The TRECVIDconference series is sponsored by the National Institute of Standards and Technology (NIST) with additional support from other U.S. government agencies. The goal of the conference series is to encourage research in information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. In 2001 and 2002 the TREC series sponsored a video "track" devoted to research in automatic segmentation, indexing, and content-based retrieval of digital video. Beginning in 2003, this track became an independent evaluation (TRECVID) with a 2-day workshop taking place just before TREC.

Datasets are described here.

Image Datasets

Datasets are available here:

http://www.cs.bu.edu/groups/ivc/data.php

It contains various datasets like:

  • Image database used in shape-based retrieval experiments
  • Images databases used in deformable shape-based segmentation and retreival experiments
  • Over 70 video sequences and ground truth used in evaluation of 3D head tracking
  • Labeled video sequences used as ground truth in skin color segmentation experiments
  • Hand image database with ground truth
  • Dynamic background sequences

Half-Life 2 mods

www.hl2mods.co.uk

More mods for the game engine.

Scenario game

A mod created by students in Toronto. It is a complete game, but maps can be used with the OVVV.

www.torontoconflict.com

The USC-SIPI Image Database

The USC-SIPI image database is a collection of digitized images. It is maintained primarily to support research in image processing, image analysis, and machine vision. The first edition of the USC-SIPI image database was distributed in 1977 and many new images have been added since then.

The database is divided into volumes based on the basic character of the pictures. Images in each volume are of various sizes such as 256x256 pixels, 512x512 pixels, or 1024x1024 pixels. All images are 8 bits/pixel for black and white images, 24 bits/pixel for color images. The following volumes are currently available:

  Textures Brodatz textures, texture mosaics, etc.
  Aerials High altitude aerial images
  Miscellaneous Lena, the mandrill, and other favorites
  Sequences Moving head, fly-overs, moving vehicle

http://sipi.usc.edu/database

Computer vision test images

http://www-2.cs.cmu.edu/~cil/v-images.html

CVonline: The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer Vision

http://homepages.inf.ed.ac.uk/rbf/CVonline/

时间: 2024-10-06 08:52:00

监控视频相关数据集的相关文章

课题:监控视频内的人数统计

课题简介 监控视频下的人数统计在生活中应用广泛,有效掌握实时的人数信息,对于人流控制,公共空间设计,行人行为分析,意外事件控制等非常重要.然而这个课题也面对着监控视频分辨率,人群遮挡等问题,目前这个课题的研究领域活跃,提出了很多行之有效的方法,但是对于场景的依赖度很高,统计精度也有待提升.本次毕业设计,将会立足于视频分析,结合当前已知解决方案,采用计算机视觉的技术手段和模式识别的理论依据,进行充分的实验和思考,对课题进行深入的研究和探索,提出解决监控视频下人数统计的高效可行的方法. 文献检索综述

教你如何获取世界各地监控视频画面

(本资料仅供学习交流使用,严禁非法及商业用途,涉及任何法律本文作者概不负责!) 工具准备:纯真IP数据库.扫描器(Superscan.S扫描器等).ApacheTomCatCrack.钟馗之眼搜索引擎(https://www.zoomeye.org/): 方法有两种: 一.第一种 ①下载"纯真数据库"并安装,然后键入"关键词"搜索相关活跃ip段.例如:河北.郑州: ②在搜索到的ip段里,随便找几条ip段,复制到扫描器的ip地址里,这里以S扫描器为准.依次键入端口号:

网络视频相关技术介绍

AnyChat音视频互动开发平台(SDK)是一套跨平台的即时通讯解决方案,基于先进的H.264视频编码标准.AAC音频编码标准与P2P技术,支持高清视频,整合了佰锐科技在音视频编码.多媒体通讯领域领先的开发技术和丰富的产品经验而设计的高质量.宽适应性.分布式.模块化的网络音视频互动平台.        AnyChat音视频互动开发平台(SDK)包含了音视频处理模块(采集.编解码).流媒体管理模块(丢包重传.抖动平滑.动态缓冲).流媒体播放模块(多路混音.音视频同步)以及P2P网络模块(NAT穿透

【爬虫问题】爬取tv.sohu.com的页面, 提取视频相关信息

尝试解决下面的问题 问题: 爬取tv.sohu.com的页面, 提取视频相关信息,不可用爬虫框架完成 何为视频i关信息?属性有哪些? 需求: 做到最大可能的页面覆盖率 *使用httpClient 模拟获取页面HtmlText源码信息,正则出源码视频URL解析类HtmlText2URLlist.ashx. 正则式:href=\"(.*?)\" 选择重要的信息进行存储 选择合适的数据存储方式,便于后续使用 数据库字段 ID.URL.IsSuccess.Title.Isdelete.Vide

转: 视频相关的协议族介绍(rtsp, hls, rtmp)

转自: http://www.zhihu.com/question/20621558 作者:杨华链接:http://www.zhihu.com/question/20621558/answer/15661190来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明出处. 作为自己的专业知识领域,我决定更新本答案.版权保留,不得商用,转载必须在开始位置注明作者.出处.凭借印象完成,错漏地方,还请大家指正.----视频相关的协议有很多,不同的公司,甚至有自己的协议标准.本文尽量涵盖

Link-based Classification相关数据集

Link-based Classification相关数据集 Datasets Document Classification Datasets: CiteSeer: The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the

民政部拟规定:儿童福利机构的监控视频,至少保存3个月!又拍云为儿童健康成长保驾护航

2018 年 2 月 23 日,民政部出台了<儿童福利机构管理办法(征求意见稿)>,从各个角度加强.完善了儿童福利机构的运营.管理. △ 截图来自新华社App 为了加强儿童福利机构的安全管理,<征求意见稿>要求儿童福利机构应当实行24小时值班制度,保护儿童人身财产安全:更在第三十四条[安全保卫]中,明确规定"监控录像资料保存期不少于3个月": 儿童福利机构应当建立安全保卫制度,在各出入口.接待大厅.楼道.食堂等公共区域.观察室及婴幼儿居室等特殊区域安装具有存储功

实时监控视频与超图GIS的对接应用(二、实时视频的介绍)

一.实时视频投放的原理是? 将rtsp的视频流利用websocket推送给前端,video元素来播放视频.在三维场景中设置视频投放观察者的位置(监控设备的位置)和相关参数,形成一个固定方向和范围的视锥体,该视椎体与三维模型缓存相交的部分,即为投放视频或者图片的区域.(如下图) WebGL中对应的接口:ProjectionImage,相关的参数说明如下: l direction :  Number,获取或设置视频投放时投影仪的方位角,即顺时针与正北方向的夹角,取值范围:0度~36度. l dist

机器学习相关数据集

from:http://www.cppblog.com/cdy20/archive/2012/10/10/193134.html KDD杯的中心,所有的数据,任务和结果. UCI机器学习和知识发现研究中使用的大型数据集KDD数据库存储库. UCI机器学习数据库. AWS(亚马逊网络服务)公共数据集,提供了一个集中的资料库,可以无缝集成到基于AWS的云应用程序的公共数据集. 生物测定数据,在 虚拟筛选,生物测定数据,对化学信息学,J.由阿曼达Schierz的,有21个生物测定数据集(有效/无效的化