Skip to content

HyperCloud upgrade to version 2.2 from before 2.1

Note

If upgrading from HyperCloud version < 2.0.2, you MUST first upgrade the BMC firmware on all nodes to BMC version >= 6.12, see Legacy firmware upgrade.

  1. Disable write caching on all SSDs by executing the following 2 commands from the dashboard:

    export SSH_AUTH_SOCK=/tmp/ssh-dashboard/agent.sock 
    for host in $(ceph osd tree | grep storage | awk '{print $NF}'); do ssh $host 'for drive in /dev/disk/by-id/ata-Micron_5*; do hdparm -W 0 ${drive}; hdparm -F ${drive}; sync; done' ; done
    
  2. Download or otherwise copy the latest multiarch bundle to the cluster in the /var/cores/ directory.

  3. Next, SSH into each static node with:

    ### Replace X with number of static node
    
    SSH_AUTH_SOCK=/tmp/ssh-dashboard/agent.sock ssh -l root 10.0.0.X
    

    On each static node, run:

    hypercloud-apply-multiarch-bundle /var/cores/<bundle_name>.bundle
    

    The full path to the bundle must be used. Relative paths will not work.

    This will install the bundle to the static node.

  4. After the bundles have been applied, ensure the static nodes are in the enabled or on state.

    Warning

    If the cluster is nearly full, or there are not enough compute nodes to live migrate things around, it may hang on rebooting compute nodes indefinitely. Manual intervention (i.e. manually moving or shutting down a workload may be necessary.)

    Return to the dashboard and run:

    hypercloud-reboot-cluster-live -all
    

    During the cluster reboot procedure, first the static nodes will be rebooted one-by-one, then the compute nodes will be rebooted while each VM is migrated among non-rebooting compute nodes, and finally the dashboard VM itself will reboot, which will terminate your SSH connection to it.

    At this point, your cluster will now be running HyperCloud version 2.2 software. The HyperCloud firmware has not yet been upgraded.

  5. Download the special HyperCloud version 2.2 firmware upgrade tarball to a non-HyperCloud computer which has access to the BMC Ethernet network: https://cdn.softiron.com/hypercloud/v2.2.1/hypercloud-2.2.1-firmware.tar.gz

    Note

    This firmware bundle is only needed for this one specific upgrade to HyperCloud version 2.2. It will never be needed in the future. HyperCloud >=version 2.1 has the ability to update its own firmware, but only after the firmware has been updated to support these features.

  6. Unpack the HyperCloud version 2.2 firmware upgrade tarball and change into the unpacked directory:

    tar -xf hypercloud-2.2.1-firmware.tar.gz
    cd hypercloud-2.2.1-firmware
    
  7. Back on the dashboard node, observe that there are now cluster-control facts visible from every HyperCloud node in /var/run/cluster-control/facts/ which document the serial number and BMC IP address of each node in the cluster. You will need the BMC IP address information for each node.

    To easily create a list mapping cluster node to their BMC IP address, from the dashboard node execute:

    for i in $(ls /var/run/cluster-control/facts/*bmc-ip); do echo -n "$(basename $i | cut -f2 -d_): "; cat $i; done
    

    Copy the output from the above command to a text file you can reference later so you’ll know the BMC IP addresses for each node.

  8. For each HyperCloud node, verify that the BMC has an admin user password set. This can be done two different ways:

    • Try to SSH to the BMC as admin@ using the presumed BMC admin password.

    • Set a new BMC admin user password on each node by running the bmc_password command and then inputting a new BMC admin password twice.

  9. For each HyperCloud node, verify that network IPMI has been enabled and that you know the IPMI username and password. This can be done in two different ways:

    • Try to run ipmitool -H <BMC_IP> -U <ipmiusr> -P <ipmipasswd> mc info

    • Enable network IPMI and set the IPMI username and password for IPMI user number 2 on each node:

      ipmitool user enable 2
      ipmitool user set name 2 ipmiusr
      ipmitool user set password 2 <password>
      
  10. Perform a rolling halt of each node in the cluster so that the firmware can be updated. The nodes must be halted one-by-one in exactly this way in order to avoid impacting cluster services. DO NOT MANUALLY HALT ANY NODES

    From the dashboard node, run:

    hypercloud-reboot-cluster-live -all -halt
    

    This will ensure the cluster is healthy and then halt just one node at a time

    When each node is halted, the node’s hostname will be printed along with information about if it has gone DOWN yet or not. Wait until the node is marked as DOWN.

    After the node is DOWN, find that node’s BMC IP address from the cluster-control facts files.

    Upgrade the firmware on this one node using the HyperCloud version 2.2 firmware upgrade tool previously downloaded and extracted on the computer which has access to the BMC network by specifying the BMC admin password, IPMI username, and IPMI password:

    ./flash_all.sh -U <ipmiusr> -P <ipmipasswd> -S <adminpasswd> -F rootfs.ubifs -H <BMC_IP>
    

    This process of applying the firmware upgrade to a node will take about 5 minutes.

    While the firmware updates apply, the serial terminal (either USB console or IPMI serial over LAN) may be disconnected on the node being updated.

    Once the firmware upgrades are completed, the node will automatically boot back into HyperCloud and the hypercloud-reboot-cluster-live script on the dashboard will show that the node has come back UP.

    This process will repeat until the final node in the cluster is to be halted. The final node in the cluster is the compute node running the dashboard. When the final compute node halts, the dashboard will exit and at that point you can upgrade the firmware on the compute node which was running the dashboard.

  11. Once all HyperCloud cluster nodes have had their firmware upgraded to HyperCloud version 2.2 firmware, the dashboard node will spawn again automatically.

  12. Wait for the final node to boot fully.

  13. SSH into the dashboard node again.

  14. Verify the currently running firmware versions on the cluster match the firmware from the multiarch bundle you downloaded to /var/cores/ directory:

    hypercloud-check-firmware-versions -b /var/cores/<multiarch_bundle>
    

    Each HyperCloud node’s hostname will be printed along with an indication of which firmware is out of date on that node. If all firmware is up to date then “none” is printed for that node.