最新超简单解读torchvision

torchvision

https://pytorch.org/docs/stable/torchvision/index.html#module-torchvision

The torchvision package consists of popular datasets(数据集), model architectures(模型结构), and common image transformations(通用图像转换) for computer vision.

torchvision.get_image_backend():Gets the name of the package used to load images

torchvision.set_image_backend(backend): Specifies the package used to load images.

torchvision.set_video_backend(backend): Specifies the package used to decode videos.

MNIST;Fashion-MNIST;KMNIST;EMNIST;QMNIST;FakeData;COCO;LSUN;ImageFolder;DatasetFolder;ImageNet;CIFAR;STL10;SVHN;PhotoTour;SBU;Flickr;VOC;Cityscapes;SBD;USPS;Kinetics-400;HMDB51;UCF101.

Video

torchvision.io.read_video(filenamestart_pts=0end_pts=Nonepts_unit=‘pts‘)

Reads a video from a file, returning both the video frames as well as the audio frames.

Classification

The models subpackage contains definitions for the following model architectures for image classification:

AlexNet

VGG

ResNet

SqueezeNet

DenseNet

Inception v3

GoogLeNet

ShuffleNet v2

MobileNet v2

ResNeXt

Wide ResNet

MNASNet

You can construct a model with random weights by calling its constructor:

import torchvision.models as models

resnet18 = models.resnet18()

alexnet = models.alexnet()

vgg16 = models.vgg16()

squeezenet = models.squeezenet1_0()

densenet = models.densenet161()

inception = models.inception_v3()

googlenet = models.googlenet()

shufflenet = models.shufflenet_v2_x1_0()

mobilenet = models.mobilenet_v2()

resnext50_32x4d = models.resnext50_32x4d()

wide_resnet50_2 = models.wide_resnet50_2()

mnasnet = models.mnasnet1_0()

pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True:

import torchvision.models as models

resnet18 = models.resnet18(pretrained=True)

alexnet = models.alexnet(pretrained=True)

squeezenet = models.squeezenet1_0(pretrained=True)

vgg16 = models.vgg16(pretrained=True)

densenet = models.densenet161(pretrained=True)

inception = models.inception_v3(pretrained=True)

googlenet = models.googlenet(pretrained=True)

shufflenet = models.shufflenet_v2_x1_0(pretrained=True)

mobilenet = models.mobilenet_v2(pretrained=True)

resnext50_32x4d = models.resnext50_32x4d(pretrained=True)

wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)

mnasnet = models.mnasnet1_0(pretrained=True)

Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.

Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],

std=[0.229, 0.224, 0.225])

 

Semantic Segmentation

The models subpackage contains definitions for the following model architectures for semantic segmentation:

FCN ResNet101

DeepLabV3 ResNet101

As with image classification models, all pre-trained models expect input images normalized in the same way. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. They have been trained on images resized such that their minimum size is 520.

The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset. You can see more information on how the subset has been selected in references/segmentation/coco_utils.py. The classes that the pre-trained model outputs are the following, in order:

[‘__background__‘, ‘aeroplane‘, ‘bicycle‘, ‘bird‘, ‘boat‘, ‘bottle‘, ‘bus‘,

‘car‘, ‘cat‘, ‘chair‘, ‘cow‘, ‘diningtable‘, ‘dog‘, ‘horse‘, ‘motorbike‘,

‘person‘, ‘pottedplant‘, ‘sheep‘, ‘sofa‘, ‘train‘, ‘tvmonitor‘]

 

Object Detection, Instance Segmentation and Person Keypoint Detection

The models subpackage contains definitions for the following model architectures for detection:

Faster R-CNN ResNet-50 FPN

Mask R-CNN ResNet-50 FPN

The pre-trained models for detection, instance segmentation and keypoint detection are initialized with the classification models in torchvision.

The models expect a list of Tensor[C, H, W], in the range 0-1. The models internally resize the images so that they have a minimum size of 800. This option can be changed by passing the option min_size to the constructor of the models.

For object detection and instance segmentation, the pre-trained models return the predictions of the following classes:

COCO_INSTANCE_CATEGORY_NAMES = [

‘__background__‘, ‘person‘, ‘bicycle‘, ‘car‘, ‘motorcycle‘, ‘airplane‘, ‘bus‘,

‘train‘, ‘truck‘, ‘boat‘, ‘traffic light‘, ‘fire hydrant‘, ‘N/A‘, ‘stop sign‘,

‘parking meter‘, ‘bench‘, ‘bird‘, ‘cat‘, ‘dog‘, ‘horse‘, ‘sheep‘, ‘cow‘,

‘elephant‘, ‘bear‘, ‘zebra‘, ‘giraffe‘, ‘N/A‘, ‘backpack‘, ‘umbrella‘, ‘N/A‘, ‘N/A‘,

‘handbag‘, ‘tie‘, ‘suitcase‘, ‘frisbee‘, ‘skis‘, ‘snowboard‘, ‘sports ball‘,

‘kite‘, ‘baseball bat‘, ‘baseball glove‘, ‘skateboard‘, ‘surfboard‘, ‘tennis racket‘,

‘bottle‘, ‘N/A‘, ‘wine glass‘, ‘cup‘, ‘fork‘, ‘knife‘, ‘spoon‘, ‘bowl‘,

‘banana‘, ‘apple‘, ‘sandwich‘, ‘orange‘, ‘broccoli‘, ‘carrot‘, ‘hot dog‘, ‘pizza‘,

‘donut‘, ‘cake‘, ‘chair‘, ‘couch‘, ‘potted plant‘, ‘bed‘, ‘N/A‘, ‘dining table‘,

‘N/A‘, ‘N/A‘, ‘toilet‘, ‘N/A‘, ‘tv‘, ‘laptop‘, ‘mouse‘, ‘remote‘, ‘keyboard‘, ‘cell phone‘,

‘microwave‘, ‘oven‘, ‘toaster‘, ‘sink‘, ‘refrigerator‘, ‘N/A‘, ‘book‘,

‘clock‘, ‘vase‘, ‘scissors‘, ‘teddy bear‘, ‘hair drier‘, ‘toothbrush‘

]

For person keypoint detection, the pre-trained model return the keypoints in the following order:

COCO_PERSON_KEYPOINT_NAMES = [

‘nose‘,

‘left_eye‘,

‘right_eye‘,

‘left_ear‘,

‘right_ear‘,

‘left_shoulder‘,

‘right_shoulder‘,

‘left_elbow‘,

‘right_elbow‘,

‘left_wrist‘,

‘right_wrist‘,

‘left_hip‘,

‘right_hip‘,

‘left_knee‘,

‘right_knee‘,

‘left_ankle‘,

‘right_ankle‘

]

 

Video classification

We provide models for action recognition pre-trained on Kinetics-400. They have all been trained with the scripts provided in references/video_classification.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W), where H and W are expected to be 112, and T is a number of video frames in a clip. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989].

NOTE

The normalization parameters are different from the image classification ones, and correspond to the mean and std from Kinetics-400.

NOTE

For now, normalization code can be found in references/video_classification/transforms.py, see the Normalizefunction there. Note that it differs from standard normalization for images because it assumes the video is 4d.

Kinetics 1-crop accuracies for clip length 16 (16x112x112)


Network


Clip [email protected]


Clip [email protected]


ResNet 3D 18


52.75


75.45


ResNet MC 18


53.90


76.29


ResNet (2+1)D


57.50


78.81

torchvision.ops implements operators that are specific for Computer Vision.

支持:

torchvision.ops.nms(boxesscoresiou_threshold):Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

torchvision.ops.roi_align(inputboxesoutput_sizespatial_scale=1.0sampling_ratio=-1): Performs Region of Interest (RoI) Align operator described in Mask R-CNN

torchvision.ops.roi_pool(inputboxesoutput_sizespatial_scale=1.0): Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

torchvision.utils.make_grid(tensornrow=8padding=2normalize=Falserange=Nonescale_each=Falsepad_value=0), Make a grid of images.

torchvision.utils.save_image(tensorfpnrow=8padding=2normalize=Falserange=Nonescale_each=Falsepad_value=0format=None), Save a given Tensor into an image file.

原文地址:https://www.cnblogs.com/jeshy/p/12048463.html

时间: 2024-10-07 17:38:21

最新超简单解读torchvision的相关文章

程序员,一起玩转GitHub版本控制,超简单入门教程 干货2

本GitHub教程旨在能够帮助大家快速入门学习使用GitHub,进行版本控制.帮助大家摆脱命令行工具,简单快速的使用GitHub. 做全栈攻城狮-写代码也要读书,爱全栈,更爱生活. 更多原创教程请关注头条号.每日更新.也可以添加小编微信:fullstackCourse.一起交流,获取最新全栈教程信息.因为FQ原因,不能下载客户端的同仁,可以关注后回复“GitHub客户端”获取安装软件. 上篇教程:GitHub这么火,程序员你不学学吗? 超简单入门教程 干货 GitHub概念部分出现了一丝纰漏.为

ExtJS5 (一) 超简单整合到eclipse中,搭建简单的开发环境

个人一直欣赏大神们的唯美前端页面,而现在个人从事的是MIS系统等相关的开发,故决定学习ExtJS,目前最新版本是5.0,就从5.0开始吧. 作为java开发人员,自然而然的想在eclipse中搞个extjs的简单开发环境,以后再慢慢丰富,开始吧~ 第一步:从官网下载extjs,目前是5.0版本 第二步:用eclipse创建一个动态web工程 第三步:在工程的WebContent目录下创建一个文件夹,叫作ExtJS5. 第四步:解压从官网下载的extjs,解压之后,目录结构如下: 看一下根目录下的

Git超简单入门简明教程--写给一直不敢用Git的同学

从2014年2月12号开始工作到现在,已经快小半年了,还记得第一次接触集中式版本控制工具SVN时的惊喜,这对于之前一直独立开发的我来说,才明白原来代码还可以这样管理!当然,现在对于SVN的理解,也不过是知道运行原理,能满足工作里一些简单的代码版本控制罢了.对于Git这个版本控制工具,其实已经听说很长时间了,也明白Git与SVN的工作原理的区别,也一直想入门接触一下,但是苦于下载的一些教材太厚,内容太多,一直也没上手练.正好,这几天事件比较宽裕,于是又找来相关的资料,开始慢慢接触Git的使用,希望

DCGAN 论文简单解读

DCGAN的全称是Deep Convolution Generative Adversarial Networks(深度卷积生成对抗网络).是2014年Ian J.Goodfellow 的那篇开创性的GAN论文之后一个新的提出将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题的一篇paper. 关于基本的GAN的原理,可以参考原始paper,或者其他一些有用的文章和代码,比如:GAN mnist 数据生成,深度卷积GAN之图像生成,GAN tutorial等.这里不再赘述. 一. DCGA

DCGAN 代码简单解读

之前在DCGAN文章简单解读里说明了DCGAN的原理.本次来实现一个DCGAN,并在数据集上实际测试它的效果.本次的代码来自github开源代码DCGAN-tensorflow,感谢carpedm20的贡献! 1. 代码结构 代码结构如下图1所示: 图1 代码结构 我们主要关注的文件为download.py,main.py,model.py,ops.py以及utils.py.其实看文件名字就大概可以猜出各个文件的作用了. download.py主要下载数据集到本地,这里我们需要下载三个数据集:M

pdf怎么转换成excel格式 超简单

可编辑文档转换为不可编辑文档是非常简单的,比如将word或者excel转换成jpg或者pdf,office或者wps软件本身的最新版就自带有这个功能.但是如果我们要将PDF这种不可修改编辑的文档转换成可编辑的形式就会稍微麻烦一点,因为这种格式是任你怎么放大缩小都不会改变文件的排版方式,虽然阅读起来很方便.那怎么办呢?下面小编教给大家一个方法,可以将PDF转换成Excel格式,超简单! 把PDF格式的文件精确转换成EXCEL表格,这边我们可以选择一款叫"迅捷PDF转换器"的软件. (pd

微信公众平台之超简单实用的天气预报后台实现

微信公众平台之超简单实用的天气预报后台实现 概述,前段时间我在开发一个自己的微信公众平台,需要实现天气预报功能,在网上度娘了下,实现天气预报的接口API还蛮多的,有:中国气象局.雅虎和新浪等,中国天气预报接口需要全国的编码,雅虎的有时候访问不了,研究了下还是新浪提供的接口比较简单实用.新浪天气预报API的URL是http://php.weather.sina.com.cn/xml.php?city=%B1%B1%BE%A9&password=DJOYnieT8234jlsK&day=0.其

超简单使用批处理(batch)操作数据库

超简单使用批处理(batch)操作数据库 批处理(batch)是什么 批处理的执行就好比快递员的工作: 未使用批处理的时候,快递员一次从分发点将一件快递发给客户: 使用批处理,则是快递员将所有要派送的快递都用车带到发放处派给客户. 批处理(batch)操作数据库 批处理指的是一次操作中执行多条SQL语句,批处理相比于一次一次执行效率会提高很多. 批处理操作数据库的过程主要是分两步: 1.将要执行的SQL语句保存 2.执行保存的SQL语句 如何实现批处理 Statement和PreparedSta

打造支持apk下载和html5缓存的 IIS(配合一个超简单的android APP使用)具体解释

为什么要做这个看起来不靠谱的东西呢? 由于刚学android开发,还不能非常好的熟练控制android界面的编辑和操作,所以我的一个急着要的运用就改为html5版本号了,反正这个运用也是须要从server获取大量数据来展示在手机上面的,也就是说:必须联网,才干正常工作,于是想了一下,反正都要联网获取数据,为什么不直接用我相对熟悉一点的 html来做这个运用呢?省的花费不够用的时间去学习android界面的控制,于是就简单了:用蹩脚的手段做了一个android程序的启动欢迎界面,内页就是一个全屏的