HyperCloud Configuration
Configure Interconnects
The interconnects shipped run SONiC today. This will change in a future release of the HyperCloud product. The interconnects are NOT intended to be used as standard switches.
Info
If any issues occur during the configuration of the interconnects, follow the link below to reset the interconnects to their default state and restart the configuration procedure:
[Enterprise SONiC] Reset default configuration
To check for errors, run echo $?
after the running the configuration scripts. In the event of a non-zero exit status, the following steps will wipe the interconnect and the process can be restarted:
-
Connect to the Management Interconnect via either the USB A or Serial port on the left side of the device with a serial baud rate of 115200 baud. The default credentials are:
admin:YourPaSsWoRd
. -
Verify the interconnect is running the latest approved SONiC firmware (verify the switch boots into SONiC console, then login and run
show version
.)-
If the interconnect is not running SONiC (boots to ONIE) OR if the interconnect is running an alternative release of SONiC, install the firmware image on the interconnect:
-
-
Leave default passwords in place (at least until handover)
- Confirm gathered [customer information]
-
Untar the SI SW Interconnect Configurations Tarball locally and generate the initial configurations as detailed in the
README.md
file, found under thedocs/
subdirectory in the tarball.Tarball contents
./ ./opt/ ./opt/softiron/ ./opt/softiron/switch-config/ ./opt/softiron/switch-config/docs/ ./opt/softiron/switch-config/docs/README.md ./opt/softiron/switch-config/EXAMPLE/ ./opt/softiron/switch-config/EXAMPLE/cluster.vars ./opt/softiron/switch-config/EXAMPLE/mgmt-switch-oob.vars ./opt/softiron/switch-config/EXAMPLE/high-speed-switch.vars ./opt/softiron/switch-config/edgecore/ ./opt/softiron/switch-config/edgecore/4630/ ./opt/softiron/switch-config/edgecore/4630/bin/ ./opt/softiron/switch-config/edgecore/4630/bin/oob.sh ./opt/softiron/switch-config/edgecore/7326/ ./opt/softiron/switch-config/edgecore/7326/bin/ ./opt/softiron/switch-config/edgecore/7326/bin/high-speed-switch.sh
-
Create the variable files
- There are templates in the tarball that can be edited for each use case
- Ensure that the variable files are in the same location as the OOB and High Speed Interconnect scripts
Cluster Configuration Variables:
Note
Customer VLANs are listed as space-separated list. If there is only one VLAN ID, make it the OOB VLAN ID.
High Speed Configuration Variables:
Management Configuration Variables:
Script compression and encoding
The steps below will allow larger interconnect configurations to be compressed and sent over serial at a baud rate of 115200. Follow for management and high speed interconnect configuration scripts.
-
Generate the
up.sh
script locally before copying over to the Management Interconnect.-
Execute the OOB script to generate the commands that will be used to configure the management interconnect.
- The command below will create and store the commands in a new file that can then be copied to the management interconnect
-
-
Generate the SHA-256 hash of the script:
sha256sum up.sh
- Compress and BASE64 encode the
up.sh
script via: - Copy the output
- Create a new
up.b64
file on the interconnect and paste the string from above - Decode and uncompress the
up.b64
file with: - Generate the SHA-256 hash of the
up.sh
script on the interconnect: -
Verify that the checksums from both the source and destination match EXACTLY
If not moving the file from above to the interconnect, the contents will need to be copied over into file created, start by printing the contents to the screen via the command below:
- On the management interconnect, create a new file, paste the contents from the cat command above, save, and make executable.
To accomplish the steps above, run the following command to create the script file:
PASTE (
ctrl-v
or⌘-v
) the copied text, then press escape, then type:wq
, finally, press enter or return. -
The script will be made executable via:
chmod 755 up.sh
If the console will not execute the commands, the privileges may need to be elevated with
sudo
; therefore, execute the commands assudo chmod 755 up.sh
orsudo ./up.sh
or etc.- Example of contents below that will be pasted into the file:
sudo sh -c 'cat /etc/rc.local | head -n -1 > /etc/rc.local.new' echo "/usr/sbin/ifconfig eth0 hw ether \$(/usr/sbin/ifconfig eth0 | grep ether | awk '{print \$2}' | awk -F: '{if (\$5==\"ff\") {\$5=\"00\"} else {\$5=sprintf(\"%02x\",(\"0x\"\$5)+1)} ; print \$1\":\"\$2\":\"\$3\":\"\$4\":\"\$5\":\"\$6\"\"}')" > /tmp/hwaddr-tmp-replace sudo sh -c 'cat /tmp/hwaddr-tmp-replace >> /etc/rc.local.new' sudo sh -c 'echo "exit 0" >> /etc/rc.local.new' sudo mv /etc/rc.local.new /etc/rc.local sudo chmod 755 /etc/rc.local rm -f /tmp/hwaddr-tmp-replace sudo config vlan add 1 sudo config interface ip add eth0 10.1.2.23/24 10.1.2.1 sudo config portchannel add PortChannel0 --fallback=true --lacp-key=1 sudo config portchannel member add PortChannel0 Ethernet50 sudo config portchannel member add PortChannel0 Ethernet51 sudo config vlan member add -u 1 PortChannel0 sudo config vlan member add -u 1 Ethernet0 sudo config vlan member add -u 1 Ethernet1 . . . sudo config vlan member add -u 1 Ethernet45 sudo config vlan member add -u 1 Ethernet46 sudo config vlan member add -u 1 Ethernet47 sudo config save -y
- Example of contents below that will be pasted into the file:
-
This newly created script can now be executed
./up.sh
and afterwards the management interconnect will be configured.The console may output failure of name resolution when configuring the interconnects, this can be ignored as there is no DNS in place to resolve these targets.
- It can verified that the script ran with no errors with the command
echo $?
and a result of0
- Follow up with a sync and reboot of the interconnect:
sync && sync && sync && sudo reboot
- It can verified that the script ran with no errors with the command
-
The High Speed Interconnect configuration is next.
- Ensure that the
high-speed-switch.sh
,high-speed-switch.vars
, and thecluster.vars
files are all in the same directory on the local machine. - Ensure the variables have been filled out with the correct information for the deployment.
- One interconnect will be the primary and the other will be the secondary.
- Ensure that the
-
Execute the script for the primary High Speed Interconnect to generate the commands that will configure the interconnect:
- Again, this command below will create and store the commands in a new file that will then be copied onto the Primary High Speed interconnect:
./high-speed-switch.sh up primary > primary-hss.sh
cat primary-hss.sh
- Or, on the Primary High Speed Interconnect, create a new file and paste the contents from the
cat
command above, save it, and make it executable.
vi primary-hss.sh
PASTE the copied commands, then press Escape, then
:wq
, followed by Enterchmod 755 primary-hss.sh
-
Execute this new script on the interconnect, followed by sync and reboot:
./primary-hss.sh
sync && sync && sync && sudo reboot
-
Now the secondary High Speed Interconnect will be configured, the same process as the primary High Speed Interconnect:
- Again, this command below will create and store the commands in a new file that will then be copied onto the Secondary High Speed Interconnect:
./high-speed-switch.sh up secondary > secondary-hss.sh
cat secondary-hss.sh
- On the Secondary High Speed Interconnect, create and new file and paste the contents form the
cat
command above, save it, and make it executable.
vi secondary-hss.sh
PASTE the copied commands, then press Escape, then
:wq
, followed by Enterchmod 755 secondary-hss.sh
-
Execute this new script on the interconnect, followed by sync and reboot:
./secondary-hss.sh
sync && sync && sync && sudo reboot
-
Set customer uplink speed
Only required if customer uplink uses SFP28 ports and does not support 25 Gb/s OR customer uplink uses QSFP28 ports and does not support 100 Gb/s.
- If using SFP28 ports and customer uplink does not support 25 Gb/s (i.e. 10 Gb/s only):
sudo config interface breakout Ethernet43 '4x10G[1G]'
This changes the port speed to 10 Gb/s on all children interfaces: - Child ports:
Ethernet43
Ethernet45
Ethernet46
Ethernet47
- If using QSFP28 ports and customer uplink does not support 100 Gb/s (i.e. 40 Gb/s only):
sudo config interface speed Ethernet48 40000
- If using SFP28 ports and customer uplink does not support 25 Gb/s (i.e. 10 Gb/s only):
-
Save the configuration
sudo config save -y
-
Reboot and ensure everything has been configured properly with no errors.
-
Ensure everything is healthy so far (i.e. on SONiC).
show mclag brief
show vlan config
(or,show vlan brief
)
- Backup the configurations to be stored in Git later
scp admin@[interconnect]:/etc/sonic/config_db.json [somewhere_local]
Step 4 - Install the Storage Cluster
Warning
To allow faster cluster convergence on an initial build, it is recommended to set the hardware clock from the UEFI to a time and date as closely together as possible. If the clocks are far in the past, such as if the CMOS battery has failed, then cluster convergence may take several hours.
-
Power up the first static node with the serial console cable attached, configured to a baud rate of 115200.
-
Choose to install static node 1
-
If this is the first time booting static node 1, the node will look into the boot disk and determine if the system time has been set. If it is determined that the system time has never been set, the user will be prompted to respond to the query below:
The current system time is: 16:24:33 2023/10/25 Please choose one of the following: 1: Change the system time 2: Keep the current system time > 1 Enter a new system time with the format: hour:minute:seconds year/month/day For example: 23:08:41 2023/09/21 The system time must be UTC > 16:25:30 2023/10/25 New system time is : 16:25:30 2023/10/25 Would you like to apply the new system time? 1: Apply the new system time 2: Change the new system time > 1
-
Once the user either confirms the system time or manually inputs the time in the specific format, the system will record this time and no longer prompt for this information in future deployments.
- Wait for it to run through the remaining setup and checks.
- Run
lsb_release -a
to verify release information.
-
-
Continue with the second static node
- Choose to install static node 2
-
Continue with the third static node
- Choose to install static node 3
-
Watch for the OSDs to be created
Note
Depending on how much time you have, you can now power on the rest of the storage nodes - make sure they have PXE boot configured correctly.
Info
If the node was part of a previous cluster build, HyperDrive or HyperCloud, then HyperCloud may refuse to wipe an existing data disk and ingest it into the cluster. If you are unable to boot another OS to [wipe the data disks], then you can create a cluster control fact to nuke the disks by running the following command:
touch /var/run/cluster-control/facts/clusternode_NODENAME_nuke_disks
, replacingNODENAME
with the HyperCloud hostname of the machine in question.- Run
ceph -s
to verify
- Run
-
Once all the storage nodes are up: run
hypercloud-manage
on any node and enter the following information:Compute (KVM) VLAN ID: Storage VLAN ID: Dashboard system VLAN ID (optional): Dashboard system IP Address (optional): Dashboard system netmask: Dashboard system default gateway (optional): Dashboard system DNS servers (optional): Dashboard system Syslog servers (optional): Dashboard system NTP servers (optional):
-
Choose the remaining options in sequential order (2, 3, & 4) to show the proposed cluster changes (input values), commit the changes, and quit the HyperCloud Cluster Manager, respectively.
Info
If modifications are needed to the cluster variables, option 1 can be chosen again to re-enter the information or, option 4 can be chosen prior to committing the changes with option 3 to discard them.
Step 5 - Install the Compute Cluster
-
Power up the nodes one by one, ensuring they PXE boot
Info
There is no need to wait for them to boot fully.
-
Check that the HyperCloud dashboard comes up.
Info
- On the first compute node you should see the dashboard with
virsh list
. - You can connect to the command line of the dashboard with
hypercloud-dashboard-console
from that hostname.
- On the first compute node you should see the dashboard with
-
You now have a running HyperCloud cluster
-
You can pull the admin:password login from this dashboard node as well via:
cat /var/lib/one/.one/one_auth
-
HyperCloud Dashboard can be configured via the Cloud Management GUI at
https://<dashboard_ip>/cloudmanagement/
including adding custom SSH keys for authentication.
Step 6 - Connect to the Customer Infrastructure
- Plug in the customer uplinks
-
Troubleshoot
Warning
This is typically the hardest part of the install, as it requires interfacing with the customer fabric. Up until this point, the entire HyperCloud installation is self-contained.
If the customer side has not set up LACP properly, you may have to disconnect the uplink to the secondary switch to allow traffic to pass, as the primary switch allows for LACP bypass.
- At this point, it would be VERY helpful to have a dump of the port configurations from the customer-side switch fabric.
- Do they have LACP enabled on their side?
- Is LACP on their side configured at “Rate: Slow”?
- Are all VLANs trunked that are needed from the customer side?
- Does the physical link layer come up? (i.e. Do you have a link light?)
- If not, double check interface speeds and/or
FEC
settings both sides. On the Edge-Core side, see: SONiC Port Attributes
- If not, double check interface speeds and/or
Step 7 - Verify Customer can reach HyperCloud Cluster
Log in and help with first steps of having an empty cluster.
See HyperCloud Documentation - User Guide
Notes
-
To quickly set the BMC addresses (e.g.)
-
To add extra VLANs on the SONiC switches (then set up the virtual net in the dashboard)
sudo config vlan add [VLAN_ID]
- To add it tagged
sudo config vlan member add [VLAN_ID] PortChannelXX
- XX meaning for every port channel to every compute node and uplink to customer fabric