OpenStack虚拟机创建快照实际上是将虚拟机的磁盘创建为一个新的镜像，其操作实际就是创建镜像，我们可以通过dashboard页面或者命令行来调用对应的接口，创建快照的基本流程如下：

一. 获取token（获取token接口）

二. 查询虚拟机状态（查询接口）

三. 创建虚拟机快照

可以通过OpenStack提供的CLI命令创建快照：

通过OpenStack Dashboard或者nova命令可以发起快照，快照命令格式：

nova image-create {server} {name}

下面的命令对id=814a8ad8-9217-4c45-91c7-c2be2016e5da的云主机执行快照，快照名称为snapshot1

nova image-create 814a8ad8-9217-4c45-91c7-c2be2016e5da snapshot1

也可以通过curl命令来调用对应的api接口：

curl -i http://186.100.8.214:8774/v2/814a8ad8-9217-4c45-91c7-c2be2016e5da/servers/6c2504f4-efa-47ec-b6f4-06a9fde8a00b/action -X POST -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: " -d ‘{"createImage": {"name": " snapshot1", "metadata": {}}}‘

可以看到，创建快照的实际操作关键词为“createImage”！

从镜像启动的云主机离线快照

1.1. nova-api部分

根据nova-api部分代码提供的API接口，结合使用curl命令调用创建快照的命令，我们可以很容易的知道快照函数入口是： nova/api/openstack/compute/servers.py/ServersController._action_create_image，下面一起来看看代码：

@wsgi.response(202)
@extensions.expected_errors((400, 403, 404, 409))
@wsgi.action(‘createImage‘)
@common.check_snapshots_enabled
@validation.schema(schema_servers.create_image, ‘2.0‘, ‘2.0‘)
@validation.schema(schema_servers.create_image, ‘2.1‘)
def _action_create_image(self, req, id, body):
    """Snapshot a server
instance.
    输入参数如下：
    req = Request对象，包含本次请求的上下文
    id = 814a8ad8-9217-4c45-91c7-c2be2016e5da
    body = {u‘createImage‘: {u‘name‘: u‘snapshot1‘, u‘metadata‘:
{}}}
    """
    # 得到请求的上下文，并执行权限验证
  context = req.environ[‘nova.context‘]
    context.can(server_policies.SERVERS % ‘create_image‘)

# 从body中获取快照的名称及相关属性
  entity = body["createImage"]
    image_name = common.normalize_name(entity["name"])
    metadata = entity.get(‘metadata‘, {})
    snapshot_id = entity.get("snapshot_id", None)

# Starting from microversion 2.39 we don‘t check quotas on
createImage
if api_version_request.is_supported(req, max_version=api_version_request.MAX_IMAGE_META_PROXY_API_VERSION):

common.check_img_metadata_properties_quota(context, metadata)

# 从nova数据库中获取实例信息，包括：metadata，system_metadata，security_groups,
# info_cache, flavor及pci_devices等属性信息，并返回一个Instance V2对象
instance = self._get_server(context, req, id)

snapshot = snapshot_current(context, instance, self.compute_rpcapi)
    if snapshot: # if there
are snapshots, then create an image with snashots.
        if not snapshot_id:
            snapshot_id
= snapshot["id"]
        image =
snapshot_create_image(context, snapshot_id, instance, self.compute_rpcapi, entity)
    else:
        # 从数据库中获取该实例关联的所有块设备，返回BlockDeviceMappingList对象
    bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(context, instance.uuid)
       try:
            if compute_utils.is_volume_backed_instance(context, instance, bdms):

context.can(server_policies.SERVERS % ‘create_image:allow_volume_backed‘)

#
这里判断系统磁盘类型是否是volume,如果是，说明是从磁盘启动的实例
           image = self.compute_api.snapshot_volume_backed(

context,

instance,

image_name,

extra_properties=metadata

)
            else:

# 镜像启动的实例，执行快照走这个分支，调用：nova/compute/api.py/API执行快照
         image = self.compute_api.snapshot(context, instance, image_name,

                                extra_properties=metadata)
        except exception.InstanceUnknownCell
as e:
            raise exc.HTTPNotFound(explanation=e.format_message())
        except exception.InstanceInvalidState
as state_error:

common.raise_http_conflict_for_instance_invalid_state(state_error, ‘createImage‘, id)
        except exception.Invalid
as err:
            raise exc.HTTPBadRequest(explanation=err.format_message())
        except exception.OverQuota
as e:
            raise exc.HTTPForbidden(explanation=e.format_message())

# Starting with microversion 2.45 we return a response
body containing
    # the snapshot image id without the Location header.
    if api_version_request.is_supported(req, ‘2.45‘):
        return {‘image_id‘: image[‘id‘]}

# build location of newly-created image entity
    image_id = str(image[‘id‘])

# 根据glance.conf配置，生成镜像url，我的例子中是：
    # http://$glance_host:$glance_port/images/‘ffb841fd-d5f8-4146-bb29-b12eb5bbf6b2‘
    image_ref = glance.generate_image_url(image_id)

resp = webob.Response(status_int=202)
resp.headers[‘Location‘] = image_ref
return resp

当执行镜像启动的快照后，就会调用nova/compute/api.py中的API.snapshot方法，代码分析如下：

@check_instance_cell
@check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.STOPPED,

vm_states.PAUSED, vm_states.SUSPENDED])
def snapshot(self, context, instance, name, extra_properties=None):
"""Snapshot the given
instance.

:param context: 请求上下文
  :param
instance: InstanceV2实例对象
  :param name:快照名
‘snapshot1’
    :param extra_properties: dict of extra image
properties to include

when creating the image.快照属性 {}
    :returns: A dict containing image metadata
    """
    """
    在glance数据库（images表）中添加一条类型为‘snapshot‘的条目，

每个properties属性作为一条记录添加到image_properties表；
  {
   ‘status‘: u‘queued‘,
     ‘name‘: u‘snapshot1‘,
     ‘deleted‘: False,
    ‘container_format‘: u‘bare‘,
     ‘created_at‘:
datetime.datetime(2018,9,26,7,26,29,tzinfo=<iso8601.Utc>),
   ‘disk_format‘: u‘raw‘,
     ‘updated_at‘:
datetime.datetime(2018,9,26,7,26,29,tzinfo=<iso8601.Utc>),
    ‘id‘: u‘ffb841fd-d5f8-4146-bb29-b12eb5bbf6b2‘,
     ‘owner‘: u‘25520b29dce346d38bc4b055c5ffbfcb‘,
     ‘min_ram‘: 0,
   ‘checksum‘: None,
     ‘min_disk‘: 20,
     ‘is_public‘: False,
     ‘deleted_at‘: None,
     ‘properties‘: {
       u‘image_type‘: u‘snapshot‘,
         u‘instance_uuid‘:
u‘814a8ad8-9217-4c45-91c7-c2be2016e5da‘,
         u‘user_id‘:
u‘b652f9bd65844f739684a20ed77e9a0f‘,
         u‘base_image_ref‘:
u‘e0cc468f-6501-4a85-9b19-70e782861387‘
     },
    ‘size‘: 0
    }
   """
   # 调用glance api创建image entry，为后将snapshot上传为镜像做准备，
    # 虽然镜像和snapshot在可以上传到glance作为镜像启动虚拟机，
    # 但是为了区分二者的不同，glance将镜像和snapshot标记卫不同的类型：type=image 和 type=snapshot
    image_meta = self._create_image(context, instance, name, ‘snapshot‘,

extra_properties=extra_properties)

# NOTE(comstud): Any changes to this method should also be
made
    # to the snapshot_instance() method in
nova/cells/messaging.py
   # 更新实例的状态为：镜像快照等待中
  instance.task_state = task_states.IMAGE_SNAPSHOT_PENDING
   # 中间异常处理省略
    instance.save(expected_task_state=[None])

# 通过rpc调用将消息‘snapshot_instance’投递到消息队列，nova-compute会接受该消息，做对应的处理
self.compute_rpcapi.snapshot_instance(context, instance, image_meta[‘id‘])

return image_meta

当代码走到self.compute_rpcapi.snapshot_instance(context, instance,
image_meta[‘id‘])，会调用rpc，发送一个创建快照的消息到消息队列，rpc消息部分代码如下：

def snapshot_instance(self, ctxt, instance, image_id):
    version = ‘4.0‘
    cctxt = self.router.client(ctxt).prepare(
            server=_compute_host(None, instance), version=version)
    cctxt.cast(ctxt, ‘snapshot_instance‘,

instance=instance,

image_id=image_id)

梳理下流程：

1. 用户发起create
snapshot的请求;

2. nova-api服务接收到这个请求并进行前期处理，即api中的snapshot方法;

3. 真正的snapshot操作是需要在nova-compute节点上执行的，所以nova-api需要向nova-compute发送message；

由于OpenStack环境中会有很多个nova-compute,所以需要通过server=_compute_host(None, instance)来获取虚拟机所在的host，并向其发送message。

1.2. nova-compute部分

当nova-compute接收到来自nova-api发来的“snapshot_instance”快照请求后，nova-compute会调用 nova/compute/manager.py/ComputeManager.snapshot_instance方法处理该请求，如下：

@wrap_exception()
@reverts_task_state
@wrap_instance_fault
@delete_image_on_error
def snapshot_instance(self, context, image_id, instance):
"""Snapshot an instance on
this host.

:param context: security context
:param image_id: glance.db.sqlalchemy.models.Image.Id
:param instance: a nova.objects.instance.Instance
object

该方法实现很简单：设置实例任务状态后，直接将请求转交给_snapshot_instance方法处理
    """
    try:
       # 更新实例的状态为“快照中”
        instance.task_state
= task_states.IMAGE_SNAPSHOT
        instance.save(expected_task_state=task_states.IMAGE_SNAPSHOT_PENDING)
    except exception.InstanceNotFound:
        # possibility instance
no longer exists, no point in continuing
        LOG.debug("Instance
not found, could not set state %s for instance.",

task_states.IMAGE_SNAPSHOT, instance=instance)
        return

except exception.UnexpectedDeletingTaskStateError:
        LOG.debug("Instance being
deleted, snapshot cannot continue", instance=instance)
        return

    self._snapshot_instance(context, image_id, instance, task_states.IMAGE_SNAPSHOT)

snapshot_instance在做完基本处理后，实际上是调用self._snapshot_instance(context,
image_id, instance, task_states.IMAGE_SNAPSHOT)来实现具体的快照功能，如下（去掉异常处理）：

def _snapshot_instance(self, context, image_id, instance, expected_task_state):
context = context.elevated()

# 获取虚拟机的电源状态
  instance.power_state = self._get_power_state(context, instance)
    instance.save()
    LOG.info(‘instance snapshotting‘, instance=instance)

# 若虚拟机处于非运行状态，记录告警日志
  if instance.power_state
!= power_state.RUNNING:
        state = instance.power_state
        running = power_state.RUNNING
        LOG.warning(‘trying to snapshot
a non-running instance: ‘

‘(state: %(state)s expected: %(running)s)‘,

{‘state‘: state, ‘running‘:
running},
                    instance=instance)
   # 通过“notifier”发送“snapshot.start”通知消息，改消息应该是投递给ceilometer
  self._notify_about_instance_usage(context, instance, "snapshot.start")
  compute_utils.notify_about_instance_action(
        context, instance, self.host, action=fields.NotificationAction.SNAPSHOT,
      phase=fields.NotificationPhase.START)

# 实例状态更新辅助函数
def update_task_state(task_state, expected_state=expected_task_state):
instance.task_state = task_state
instance.save(expected_task_state=expected_state)

# 调用LibvirtDriver.snapshot执行快照具体操作
self.driver.snapshot(context, instance, image_id, update_task_state)

# 更新虚拟机的状态为None
instance.task_state = None
instance.save(expected_task_state=task_states.IMAGE_UPLOADING)

# 通过“notifier”发送一个"snapshot.end"消息，通知ceilometer快照结束
  self._notify_about_instance_usage(context, instance, "snapshot.end")
   compute_utils.notify_about_instance_action(
        context, instance,
        self.host, action=fields.NotificationAction.SNAPSHOT,
      phase=fields.NotificationPhase.END)

通过以上代码，可以看到执行快照实际上是调用libvirt的具体接口来做的，即调用“self.driver.snapshot”来做快照（代码位置：nova/virt/libvirt/driver.py/LibvirtDriver.snapshot）：

def snapshot(self, context, instance, image_id, update_task_state):
    """Create snapshot
from a running VM instance.
    This command only works with qemu 0.14+
    """
    try:
        # 通过libvirt获取instance对应的virDomain对象
        guest = self._host.get_guest(instance)
       virt_dom = guest._domain
    except exception.InstanceNotFound:
        raise exception.InstanceNotRunning(instance_id=instance.uuid)

# 从glance数据库中获取快照的信息，该信息在调用nova-api时已经记录到数据库中
snapshot = self._image_api.get(context, image_id)

# 这一步是要从实例的xml文件中解析出实例的磁盘信息，包括磁盘路径disk_path和磁盘格式
    # source_format is an on-disk format，如raw
    disk_path, source_format
= libvirt_utils.find_disk(guest)
    # source_type
is a backend type，解析出该disk_path的后端存储类型，如rbd，或者思华的flexblock
    source_type =
libvirt_utils.get_disk_type_from_path(disk_path)
    LOG.info(‘disk_path:
%s‘, disk_path)
    # 修正后端存储类型及快照磁盘类型
    # 如果未能从磁盘路径中解析出后端存储类型，就用磁盘格式类型作为后端类型
    # 使用‘snapshot_image_format ‘或者后端存储类型作为快照磁盘类型，
    # 如果快照类型为lvm或者rbd，就修改为raw格式
    if source_type is None:
        source_type = source_format
    image_format = CONF.libvirt.snapshot_image_format or source_type
    if image_format == ‘lvm‘ or
image_format == ‘rbd‘ or image_format
== ‘flexblock‘:
        image_format = ‘raw‘
         """根据系统盘镜像属性，快照属性及快照磁盘格式生成快照属性字典，
            用来上传快照文件时更新glance数据库条目，属性字典信息如下：
         {
         ‘status‘: ‘active‘,
         ‘name‘: u‘snapshot1‘,
         ‘container_format‘: u‘bare‘,
         ‘disk_format‘: ‘raw‘,
         ‘is_public‘: False,
         ‘properties‘: {
                 ‘kernel_id‘:
u‘‘,
                 ‘image_location‘:
‘snapshot‘,
                 ‘image_state‘:
‘available‘,
                 ‘ramdisk_id‘:
u‘‘,
                 ‘owner_id‘:
u‘25520b29dce346d38bc4b055c5ffbfcb‘
                 }
         }
         """
    metadata = self._create_snapshot_metadata(instance.image_meta, instance,

image_format,

snapshot[‘name‘])
    # 本地的临时快照文件名
    snapshot_name = uuid.uuid4().hex
    # 获取实例电源状态，用来判断是执行在线快照还是离线快照
    state = guest.get_power_state(self._host)

"""判断是执行在线快照还是离线快照，在线快照需要同时满足下面的条件：
        1. QEMU >= 1.3 &&　libvirt >= 1.0.0
        2. nova后端存储非lvm或者rbd
        3. 未开启外部存储加密功能 ephemeral_storage_encryption = False
        4. 未关闭在线快照disable_libvirt_livesnapshot = False
    """
    if (self._host.has_min_version(hv_type=host.HV_DRIVER_QEMU)
         and source_type
not in (‘lvm‘, ‘rbd‘, ‘flexblock‘)
         and not CONF.ephemeral_storage_encryption.enabled
         and not CONF.workarounds.disable_libvirt_livesnapshot):
        live_snapshot = True
        # Abort is an idempotent operation, so make sure any block
        # jobs which may have failed are
ended. This operation also
        # confirms the running instance, as
opposed to the system as a
        # whole, has a new enough version of
the hypervisor (bug 1193146).
        try:

guest.get_block_device(disk_path).abort_job()
        except libvirt.libvirtError
as ex:
            error_code =
ex.get_error_code()
            if error_code == libvirt.VIR_ERR_CONFIG_UNSUPPORTED:

live_snapshot = False
            else:

pass
    else:

# 比如后端存储使用的是ceph RBD，则执行的快照即为离线快照
live_snapshot = False

# NOTE(rmk): We cannot perform live snapshots when a
managedSave

#            file is
present, so we will use the cold/legacy method

#            for
instances which are shutdown.

# 在管理状态下执行离线快照
if state == power_state.SHUTDOWN:
live_snapshot =False

# 如果采取的是非“LXC”虚拟化，在执行并且实例处于运行或者暂停状态时，在快照前需要卸载pci设备及sriov端口
    self._prepare_domain_for_snapshot(context,
live_snapshot, state, instance)
    """

“_prepare_domain_for_snapshot”就是在判断底层虚拟化的类型和处理实例的设备,内容为：
       def _prepare_domain_for_snapshot(self,
context, live_snapshot, state, instance):
           if
CONF.libvirt.virt_type != ‘lxc‘ and not live_snapshot:
             if
state == power_state.RUNNING or state == power_state.PAUSED:
                 self.suspend(context,
instance)
      调用到了suspend方法，来卸载pci设备和sriov端口：
      def suspend(self, context, instance):
         """Suspend the
specified instance."""
         guest =
self._host.get_guest(instance)
         self._detach_pci_devices(guest,
pci_manager.get_instance_pci_devs(instance))
         self._detach_direct_passthrough_ports(context,
instance, guest)
         guest.save_memory_state()
    """

root_disk = self.image_backend.by_libvirt_path(instance, disk_path, image_type=source_type)
    LOG.info(‘root_disk:
%s‘, root_disk)

    # 显示不同类型快照的日志
    if live_snapshot:
        LOG.info("Beginning live snapshot process", instance=instance)
    else:
        LOG.info("Beginning cold snapshot process", instance=instance)
    # 当在调用“driver.snapshot”时，会给snapshot传递一个辅助函数“update_task_state”，这里进行调用，实际上也就是更新一下虚拟机的状态为“IMAGE_PENDING_UPLOAD”和“IMAGE_UPLOADING”，然后更新metadata信息。
    update_task_state(task_state=task_states.IMAGE_PENDING_UPLOAD)

try:
        update_task_state(task_state=task_states.IMAGE_UPLOADING,

expected_state=task_states.IMAGE_PENDING_UPLOAD)
        metadata[‘location‘]
= root_disk.direct_snapshot(
            context, snapshot_name, image_format, image_id,
            instance.image_ref)
        self._snapshot_domain(context, live_snapshot, virt_dom, state, instance)
        self._image_api.update(context, image_id, metadata, purge_props=False)
    except (NotImplementedError, exception.ImageUnacceptable, exception.Forbidden)
as e:
        if type(e) != NotImplementedError:
            LOG.warning(‘Performing standard snapshot because direct ‘

‘snapshot failed: %(error)s‘, {‘error‘: e})
        failed_snap = metadata.pop(‘location‘, None)
        if failed_snap:
            failed_snap
= {‘url‘: str(failed_snap)}

root_disk.cleanup_direct_snapshot(failed_snap,

also_destroy_volume=True,

ignore_errors=True)
        update_task_state(task_state=task_states.IMAGE_PENDING_UPLOAD,

expected_state=task_states.IMAGE_UPLOADING)
        # TODO(nic):
possibly abstract this out to the root_disk
        if source_type
in (‘rbd‘,‘flexblock‘) and live_snapshot:
           # 当出现异常时（更新虚拟机状态时失败），将在线快照离线

# Standard snapshot uses qemu-img convert from RBD which is
            # not safe
to run with live_snapshot.
            live_snapshot = False
            # Suspend the guest, so this is no longer a live
snapshot
            self._prepare_domain_for_snapshot(context, live_snapshot, state, instance)
        # 从配置文件中获取生成本地快照的存放路径，例如/opt/nova/data/nova/instances/snapshots
        snapshot_directory =
CONF.libvirt.snapshots_directory

fileutils.ensure_tree(snapshot_directory)
        # 接着需要生成一个临时的目录
        with utils.tempdir(dir=snapshot_directory)
as tmpdir:
            try:

# 拼接出完整的快照文件路径

out_path = os.path.join(tmpdir, snapshot_name)
                LOG.info(‘out_path: %s‘, out_path)

if live_snapshot:

# NOTE(xqueralt): libvirt needs o+x
in the tempdir

# 在线快照需要设定快照文件的访问权限为701

os.chmod(tmpdir, 0o701)

self._live_snapshot(context, instance, guest,

disk_path, out_path, source_format,

image_format, instance.image_meta)

# 调用后端存储驱动执行快照，Rbd.snapshot_extract，内部实现
                    #
调用‘qemu-img
convert‘拷贝系统磁盘到out_path文件中，命令如下：

   """

qemu-img convert -O raw rbd:vms/814a8ad8-9217-

  4c45-91c7-c2be2016e5da_disk:id=cinder:

conf=/etc/ceph/ceph.conf‘

/opt/stack/data/nova/instances/snapshots/tmptR6hog/e44639af86434069b38f835847083697

"""

else:

root_disk.snapshot_extract(out_path,
image_format)
            finally:
                # 上文卸载了pci设备及sriov端口，快照完成后需要重新挂载上

self._snapshot_domain(context, live_snapshot, virt_dom,

state, instance)

LOG.info("Snapshot
extracted, beginning image upload", instance=instance)
            # Upload that image to the image service
            # 接着再次调用传递进来的辅助函数，更新实例的状态为“IMAGE_UPLOADING”
            update_task_state(task_state=task_states.IMAGE_UPLOADING,

expected_state=task_states.IMAGE_PENDING_UPLOAD)
            # 最后一步，通过glance api将快照文件上传到后端存储，过程类似于上传镜像
            with libvirt_utils.file_open(out_path, ‘rb‘) as image_file:

self._image_api.update(context, image_id, metadata, image_file)
    except Exception:
        with excutils.save_and_reraise_exception():

LOG.exception(_("Failed to
snapshot image"))
            failed_snap
= metadata.pop(‘location‘, None)
            if failed_snap:

failed_snap = {‘url‘: str(failed_snap)}
            root_disk.cleanup_direct_snapshot(failed_snap, also_destroy_volume=True, ignore_errors=True)

LOG.info("Snapshot
image upload complete", instance=instance)

到这里，从镜像启动的云主机的离线快照就分析完了，总结如下：

l
快照时，需要先在本地生成临时快照，再上传到glance，效率比较低

l
快照过程中，云主机包括如下任何状态转换：（None）镜像快照等待中 -> 快照中 -> 等待镜像上传 -> 上传镜像中 -> None

l
如果nova以lvm或者ceph rbd做后端存储，则任何情况下都不支持在线快照

l
openstack中的实例快照以镜像形式存储在glance中，不同于通常理解的快照用于数据恢复

原文地址：https://www.cnblogs.com/qianyeliange/p/9712853.html

时间： 2024-10-16 17:43:23

OpenStack快照分析：（一）从镜像启动的云主机离线快照分析

从镜像启动的云主机离线快照

1.1. nova-api部分

1.2. nova-compute部分

OpenStack快照分析：（一）从镜像启动的云主机离线快照分析的相关文章

nova 云主机 evacuate 简单分析

openstack迁移云主机总汇其二（云主机冷迁移）

openstack项目中遇到的各种问题总结其二（云主机迁移、ceph及扩展分区）

【分析】dalvik虚拟机启动过程（三）

OpenStack启动虚拟机、虚拟机做快照加速

Openstack liberty源码分析之云主机的启动过程3

Openstack liberty Glance上传镜像源码分析

Openstack liberty源码分析之云主机的启动过程2

openstack启动云主机的流程