Skip to content

HyperCloud Configuration

Configure Interconnects

The interconnects shipped run SONiC today. This will change in a future release of the HyperCloud product. The interconnects are NOT intended to be used as standard switches.

Info

If any issues occur during the configuration of the interconnects, follow the link below to reset the interconnects to their default state and restart the configuration procedure:

[Enterprise SONiC] Reset default configuration

To check for errors, run echo $? after the running the configuration scripts. In the event of a non-zero exit status, the following steps will wipe the interconnect and the process can be restarted:

sudo rm /etc/sonic/config_db.json
sudo config-setup factory
sudo reboot
  1. Connect to the Management Interconnect via either the USB A or Serial port on the left side of the device with a serial baud rate of 115200 baud. The default credentials are: admin:YourPaSsWoRd.

    Mgmt Switch Ports

  2. Verify the interconnect is running the latest approved SONiC firmware (verify the switch boots into SONiC console, then login and run show version.)

    SONiC Firmware

  3. Leave default passwords in place (at least until handover)

  4. Confirm gathered [customer information]
  5. Untar the SI SW Interconnect Configurations Tarball locally and generate the initial configurations as detailed in the README.md file, found under the docs/ subdirectory in the tarball.

    Tarball contents

    ./
    ./opt/
    ./opt/softiron/
    ./opt/softiron/switch-config/
    ./opt/softiron/switch-config/docs/
    ./opt/softiron/switch-config/docs/README.md
    ./opt/softiron/switch-config/EXAMPLE/
    ./opt/softiron/switch-config/EXAMPLE/cluster.vars
    ./opt/softiron/switch-config/EXAMPLE/mgmt-switch-oob.vars
    ./opt/softiron/switch-config/EXAMPLE/high-speed-switch.vars
    ./opt/softiron/switch-config/edgecore/
    ./opt/softiron/switch-config/edgecore/4630/
    ./opt/softiron/switch-config/edgecore/4630/bin/
    ./opt/softiron/switch-config/edgecore/4630/bin/oob.sh
    ./opt/softiron/switch-config/edgecore/7326/
    ./opt/softiron/switch-config/edgecore/7326/bin/
    ./opt/softiron/switch-config/edgecore/7326/bin/high-speed-switch.sh
    
  6. Create the variable files

    • There are templates in the tarball that can be edited for each use case
    • Ensure that the variable files are in the same location as the OOB and High Speed Interconnect scripts

    Switch Config Scripts

    Cluster Configuration Variables:

    Cluster Config Vars

    Note

    Customer VLANs are listed as space-separated list. If there is only one VLAN ID, make it the OOB VLAN ID.

    High Speed Configuration Variables:

    High Speed Config Vars

    Management Configuration Variables:

    Management Config Vars

Script compression and encoding

The steps below will allow larger interconnect configurations to be compressed and sent over serial at a baud rate of 115200. Follow for management and high speed interconnect configuration scripts.

  1. Generate the up.sh script locally before copying over to the Management Interconnect.

    • Execute the OOB script to generate the commands that will be used to configure the management interconnect.

      • The command below will create and store the commands in a new file that can then be copied to the management interconnect
      ./oob.sh up > up.sh
      
  2. Generate the SHA-256 hash of the script: sha256sum up.sh

  3. Compress and BASE64 encode the up.sh script via:
    cat up.sh | gzip -9c | openssl base64
    
  4. Copy the output
  5. Create a new up.b64 file on the interconnect and paste the string from above
  6. Decode and uncompress the up.b64 file with:
    cat up.b64 | base64 -d | gzip -d - > up.sh
    
  7. Generate the SHA-256 hash of the up.sh script on the interconnect:
    sha256sum up.sh
    
  8. Verify that the checksums from both the source and destination match EXACTLY

    If not moving the file from above to the interconnect, the contents will need to be copied over into file created, start by printing the contents to the screen via the command below:

    cat up.sh
    
    • On the management interconnect, create a new file, paste the contents from the cat command above, save, and make executable.

    To accomplish the steps above, run the following command to create the script file:

    vi up.sh
    

    PASTE (ctrl-vor ⌘-v) the copied text, then press escape, then type :wq, finally, press enter or return.

  9. The script will be made executable via: chmod 755 up.sh

    If the console will not execute the commands, the privileges may need to be elevated with sudo; therefore, execute the commands as sudo chmod 755 up.sh or sudo ./up.sh or etc.

    • Example of contents below that will be pasted into the file:
      sudo sh -c 'cat /etc/rc.local | head -n -1 > /etc/rc.local.new'
      echo "/usr/sbin/ifconfig eth0 hw ether \$(/usr/sbin/ifconfig eth0 | grep ether | awk '{print \$2}' | awk -F: '{if (\$5==\"ff\") {\$5=\"00\"} else {\$5=sprintf(\"%02x\",(\"0x\"\$5)+1)} ; print \$1\":\"\$2\":\"\$3\":\"\$4\":\"\$5\":\"\$6\"\"}')" > /tmp/hwaddr-tmp-replace
      sudo sh -c 'cat /tmp/hwaddr-tmp-replace >> /etc/rc.local.new'
      sudo sh -c 'echo "exit 0" >> /etc/rc.local.new'
      sudo mv /etc/rc.local.new /etc/rc.local
      sudo chmod 755 /etc/rc.local
      rm -f /tmp/hwaddr-tmp-replace
      
      sudo config vlan add 1
      sudo config interface ip add eth0 10.1.2.23/24 10.1.2.1
      sudo config portchannel add PortChannel0 --fallback=true --lacp-key=1
      sudo config portchannel member add PortChannel0 Ethernet50
      sudo config portchannel member add PortChannel0 Ethernet51
      sudo config vlan member add -u 1 PortChannel0
      sudo config vlan member add -u 1 Ethernet0
      sudo config vlan member add -u 1 Ethernet1
      .
      .
      .
      sudo config vlan member add -u 1 Ethernet45
      sudo config vlan member add -u 1 Ethernet46
      sudo config vlan member add -u 1 Ethernet47
      sudo config save -y
      
  10. This newly created script can now be executed ./up.sh and afterwards the management interconnect will be configured.

    The console may output failure of name resolution when configuring the interconnects, this can be ignored as there is no DNS in place to resolve these targets.

    • It can verified that the script ran with no errors with the command echo $? and a result of 0
    • Follow up with a sync and reboot of the interconnect:
      sync && sync && sync && sudo reboot
  11. The High Speed Interconnect configuration is next.

    • Ensure that the high-speed-switch.sh, high-speed-switch.vars, and the cluster.vars files are all in the same directory on the local machine.
    • Ensure the variables have been filled out with the correct information for the deployment.
    • One interconnect will be the primary and the other will be the secondary.
  12. Execute the script for the primary High Speed Interconnect to generate the commands that will configure the interconnect:

    • Again, this command below will create and store the commands in a new file that will then be copied onto the Primary High Speed interconnect:

    ./high-speed-switch.sh up primary > primary-hss.sh

    cat primary-hss.sh

    • Or, on the Primary High Speed Interconnect, create a new file and paste the contents from the cat command above, save it, and make it executable.

    vi primary-hss.sh

    PASTE the copied commands, then press Escape, then :wq, followed by Enter

    chmod 755 primary-hss.sh

  13. Execute this new script on the interconnect, followed by sync and reboot:

    ./primary-hss.sh

    sync && sync && sync && sudo reboot

  14. Now the secondary High Speed Interconnect will be configured, the same process as the primary High Speed Interconnect:

    • Again, this command below will create and store the commands in a new file that will then be copied onto the Secondary High Speed Interconnect:

    ./high-speed-switch.sh up secondary > secondary-hss.sh

    cat secondary-hss.sh

    • On the Secondary High Speed Interconnect, create and new file and paste the contents form the cat command above, save it, and make it executable.

    vi secondary-hss.sh

    PASTE the copied commands, then press Escape, then :wq, followed by Enter

    chmod 755 secondary-hss.sh

  15. Execute this new script on the interconnect, followed by sync and reboot:

    ./secondary-hss.sh

    sync && sync && sync && sudo reboot

  16. Set customer uplink speed

    Only required if customer uplink uses SFP28 ports and does not support 25 Gb/s OR customer uplink uses QSFP28 ports and does not support 100 Gb/s.

    • If using SFP28 ports and customer uplink does not support 25 Gb/s (i.e. 10 Gb/s only):
      sudo config interface breakout Ethernet43 '4x10G[1G]'

    This changes the port speed to 10 Gb/s on all children interfaces:
    - Child ports: Ethernet43 Ethernet45 Ethernet46 Ethernet47

    • If using QSFP28 ports and customer uplink does not support 100 Gb/s (i.e. 40 Gb/s only):
      sudo config interface speed Ethernet48 40000
  17. Save the configuration

    sudo config save -y

  18. Reboot and ensure everything has been configured properly with no errors.

  19. Ensure everything is healthy so far (i.e. on SONiC).

    • show mclag brief
    • show vlan config (or, show vlan brief)
  20. Backup the configurations to be stored in Git later
    • scp admin@[interconnect]:/etc/sonic/config_db.json [somewhere_local]

Step 4 - Install the Storage Cluster

Warning

To allow faster cluster convergence on an initial build, it is recommended to set the hardware clock from the UEFI to a time and date as closely together as possible. If the clocks are far in the past, such as if the CMOS battery has failed, then cluster convergence may take several hours.

  1. Power up the first static node with the serial console cable attached, configured to a baud rate of 115200.

    • Choose to install static node 1

      Static Node 1

    • If this is the first time booting static node 1, the node will look into the boot disk and determine if the system time has been set. If it is determined that the system time has never been set, the user will be prompted to respond to the query below:

      The current system time is:
      
      16:24:33 2023/10/25
      
      Please choose one of the following:
      1: Change the system time
      2: Keep the current system time
      > 1
      Enter a new system time with the format:
      
          hour:minute:seconds year/month/day
      
      For example:
      
          23:08:41 2023/09/21
      
      The system time must be UTC
      > 16:25:30 2023/10/25
      New system time is : 16:25:30 2023/10/25
      
      Would you like to apply the new system time?
      1: Apply the new system time
      2: Change the new system time
      > 1
      
    • Once the user either confirms the system time or manually inputs the time in the specific format, the system will record this time and no longer prompt for this information in future deployments.

    • Wait for it to run through the remaining setup and checks.
    • Run lsb_release -a to verify release information.

    Primary Boot Complete

  2. Continue with the second static node

    • Choose to install static node 2

    Static Node 2

  3. Continue with the third static node

    • Choose to install static node 3

    Static Node 3

  4. Watch for the OSDs to be created

    Note

    Depending on how much time you have, you can now power on the rest of the storage nodes - make sure they have PXE boot configured correctly.

    Info

    If the node was part of a previous cluster build, HyperDrive or HyperCloud, then HyperCloud may refuse to wipe an existing data disk and ingest it into the cluster. If you are unable to boot another OS to [wipe the data disks], then you can create a cluster control fact to nuke the disks by running the following command: touch /var/run/cluster-control/facts/clusternode_NODENAME_nuke_disks, replacing NODENAME with the HyperCloud hostname of the machine in question.

    • Run ceph -s to verify

    Static Node OSDs

  5. Once all the storage nodes are up: run hypercloud-manage on any node and enter the following information:

        Compute (KVM) VLAN ID: 
        Storage VLAN ID:
        Dashboard system VLAN ID (optional): 
        Dashboard system IP Address (optional): 
        Dashboard system netmask: 
        Dashboard system default gateway (optional):
        Dashboard system DNS servers (optional):
        Dashboard system Syslog servers (optional): 
        Dashboard system NTP servers (optional):
    

    HyperCloud Cluster Manage 1

  6. Choose the remaining options in sequential order (2, 3, & 4) to show the proposed cluster changes (input values), commit the changes, and quit the HyperCloud Cluster Manager, respectively.

    Info

    If modifications are needed to the cluster variables, option 1 can be chosen again to re-enter the information or, option 4 can be chosen prior to committing the changes with option 3 to discard them.

Step 5 - Install the Compute Cluster

  1. Power up the nodes one by one, ensuring they PXE boot

    Compute Node

    Info

    There is no need to wait for them to boot fully.

    Completed Compute Primary

  2. Check that the HyperCloud dashboard comes up.

    Info

    • On the first compute node you should see the dashboard with virsh list.
    • You can connect to the command line of the dashboard with hypercloud-dashboard-console from that hostname.

    HyperCloud Dashboard Console

  3. You now have a running HyperCloud cluster

  4. You can pull the admin:password login from this dashboard node as well via:

    cat /var/lib/one/.one/one_auth

    Dashboard Password

  5. HyperCloud Dashboard can be configured via the Cloud Management GUI at https://<dashboard_ip>/cloudmanagement/ including adding custom SSH keys for authentication.

Step 6 - Connect to the Customer Infrastructure

  1. Plug in the customer uplinks
  2. Troubleshoot

    Warning

    This is typically the hardest part of the install, as it requires interfacing with the customer fabric. Up until this point, the entire HyperCloud installation is self-contained.

    If the customer side has not set up LACP properly, you may have to disconnect the uplink to the secondary switch to allow traffic to pass, as the primary switch allows for LACP bypass.

    • At this point, it would be VERY helpful to have a dump of the port configurations from the customer-side switch fabric.
    • Do they have LACP enabled on their side?
    • Is LACP on their side configured at “Rate: Slow”?
    • Are all VLANs trunked that are needed from the customer side?
    • Does the physical link layer come up? (i.e. Do you have a link light?)
      • If not, double check interface speeds and/or FEC settings both sides. On the Edge-Core side, see: SONiC Port Attributes

Step 7 - Verify Customer can reach HyperCloud Cluster

Log in and help with first steps of having an empty cluster.

See HyperCloud Documentation - User Guide

Notes

  • To quickly set the BMC addresses (e.g.)

        ipmitool lan set 1 ipsrc static
        ipmitool lan set 1 ipaddr 10.11.1.NN
        ipmitool lan set 1 netmask 255.255.0.0
        ipmitool lan set 1 defgw ipaddr 10.11.0.251
    
  • To add extra VLANs on the SONiC switches (then set up the virtual net in the dashboard)

    • sudo config vlan add [VLAN_ID]
    • To add it tagged
      • sudo config vlan member add [VLAN_ID] PortChannelXX
        • XX meaning for every port channel to every compute node and uplink to customer fabric