Symptoms

The Operations Automation (OA) Provider or Customer Control Panel shows the incorrect location or status for the Cloud Infrastructure (CI) server (Virtual Environment, or VE).

For example, the Provider initiates a migration of a CI server between Hardware Nodes. The server is successfully migrated and is up-and-running on the target Node.

However, the OA Control Panel still shows the server as being located on the source Hardware Node and the status of the CI server is incorrect: "Migration failed", "Migration in progress", or "Failed to start".

A similar problem may be caused by CI server backup or restore operations, or by creating a VM image.

Such situation may be also accompanied by Console unavailability.

Cause

There are several possible causes of this issue:

  • The OACI Instance Manager (IM) server did not get a callback or was not able to process the callback from the Hardware Node after the requested operation (backup, restore, migration) was completed. For example, the callback timed out.
  • The IM server was restarted, or crashed while processing a callback from the corresponding Hardware Node.
  • There was a node crash and VE failed to get relocated by the backend High Availability routines.

Resolution

  1. LOCATION.

    After OACI 17.12

    Since OACI of version 17.12 Instance Manager automatically correct location and status of VM if it is up and running - paci-agent module of snmpd service on Virtuozzo node sends performance statistics to Instance Manager, it processes it and corrects the location automatically every 5 minutes.

    Thus, in case OACI 17.12 or greater is installed, the solution below could be applied, however the proper way to solve the incorrect location is to investigate why location was not updated from the received performance statistics.

    Before OACI 17.12

    In order to synchronize the location of the VE, it is needed to move it manually to the hardware node, to which it is bound in Control Panel. Use Virtuzzo utilities to do that directly on the hardware node:

    • To relocate a container (CT):

      # pmigrate c [VEID|UUID] c [TARGET_NODE] --online
      

      where [VEID|UUID] - either the container ID (displayed by vzlist) or the container UUID (displayed by prlctl list) --online option is used to initiate a "hot" migration to minimize the downtime of VE services

      Real-life example:

      # pmigrate c 101 c 192.168.0.10 --online
      
    • To relocate a virtual machine (VM):

      # pmigrate v [VM_NAME|UUID] v [TARGET_NODE]
      

      where [VM_NAME|UUID] - either the name or UUID of the virtual machine (displayed by prlctl list)

      Real-life example:

      # pmigrate v test_vm v 192.168.0.10
      

      Note: for virtual machines, "online" migration mode is initiated by default in case the VM is in running state.

  2. STATUS.

    In order to synchronize the status of the VE, use Virtuozzo VE management utilities to bring the VE to the same state, as it is displayed in CP. In case of any difficulties, contact Odin Technical Support to resolve the situation.

    If a VM has moved to UNCERTAIN state, it is enough to press the action button in CP that corresponds to the last actual state per VE history: press 'start' if it was RUNNING, press 'stop' if it was STOPPED. The state will get synced afterwards and no actual actions will be done to the VE.

    NOTE: Do not change the VE state in case the it has moved to UNCERTAIN state from Creation in progress - that means that the creation on Virtuozzo host is not yet completed and the VM will get deleted when operation timeout will be reached. Contact CloudBlue support in that case to find the root cause.

  3. UUID mismatch.

    Sometimes after manual actions with VEs on Virtuozzo backend, the VE may end up with wrong UUID, as opposed to OACI IM records. As a result, there's the following error logged to /var/log/pa/vps.im.errors.log:

    2016-09-03 00:00:18,502 () ERROR SnmpManagerImpl [VF executor thread #28] - VE uuid doesn't match: 2a90296c-3f8a-4182-a9a6-00917e3be9cb != f2bcb6db-72bc-4913-917e-a89486b2c0ee for VeId [customerId=1010101, name=server-1002003], update ignored
    

    Follow these steps to fix the problem:

    1. At convenient time, suspend the VM (all OS operations inside the VM will get freezed):

      # prlctl suspend 1010101.server-1002003
      
    2. Unregister it:

      # prlctl unregister 1010101.server-1002003
      
    3. Register the VM back with the required UUID:

      # prlctl register /pcs/1010101.server-1002003.pvm --uuid 2a90296c-3f8a-4182-a9a6-00917e3be9cb
      

      Note: for CT technology type, the path will contain VEID, e.g. /pcs/123.

    4. Resume the VM:

      # prlctl resume 1010101.server-1002003
      

      Note: If resume procedure fails with the following output:

      Failed to resume the VM: Operation failed. Failed to execute the operation. (Details: operation failed: domain '1010101.server-1002003' already exists with uuid 2a90296c-3f8a-4182-a9a6-00917e3be9cb)
      

      rename config.sav in VM home directory and try resume command again:

      # mv /pcs/2a90296c-3f8a-4182-a9a6-00917e3be9cb/config.sav /pcs/2a90296c-3f8a-4182-a9a6-00917e3be9cb/config.sav-back 
      

      It has to be done due to the fact that config.sav contains incorrect VM UUID.

Internal content

Link on internal Article