SAN Intro
Replacing Junos Pulse with OpenConnect
In an attempt to avoid using the Juniper Pulse (Now Pulse Secure) VPN client we tried OpenConnect but found that DNS did not work correctly when connected to the VPN. This bug has now been resolved recently but has not made it’s way into a new build, in fact there have been no releases for 6 months.
Luckily the OpenConnect was not too difficult to build from source.
Build OpenConnect on OSX Remove old openconnect and install depsbrew remove openconnect brew install libxml2 lzlib openssl libtool libevent Build openconnectwget git.infradead.org/users/dwmw2/openconnect.git/snapshot/0f1ec30d17aa674142552e275bf3fac30d891b39.tar.gz tar zxvf 0f1ec30d17aa674142552e275bf3fac30d891b39.tar.gz cd openconnect-0f1ec30 LIBTOOLIZE=glibtoolize ./autogen.sh PATH=/usr/local/opt/gettext/bin:$PATH ./configure make make install To connectsudo openconnect --juniper -u myusername www.
SSD Storage - Two Months In Production
Over the last two months I’ve been running selected IO intensive servers off the the SSD storage cluster, these hosts include (among others) our:
Primary Puppetmaster Gitlab server Redmine app and database servers Nagios servers Several Docker database host servers ReliabilityWe haven’t had any software or hardware failures since commissioning the storage units.
During this time we have had 3 disk failures on our HP StoreVirtual SANs that have required us to call the supporting vendor and replace failed disks.
We have performed a great deal of live cluster failovers without any noticeable interruption to services and with no unexpected results.
OS X Software Update Channels For Betas
Set update channel to receive developer beta updatesudo softwareupdate --set-catalog https://swscan.apple.com/content/catalogs/others/index-10.11seed-10.11-10.10-10.9-mountainlion-lion-snowleopard-leopard.merged-1.sucatalog.gz Set update channel to receive public beta updatesudo softwareupdate --set-catalog https://swscan.apple.com/content/catalogs/others/index-10.11beta-10.11-10.10-10.9-mountainlion-lion-snowleopard-leopard.merged-1.sucatalog.gz List available updatessudo softwareupdate --list Set update channel to receive default, stable updatessudo softwareupdate --clear-catalog Show current settingsdefaults read /Library/Preferences/com.apple.SoftwareUpdate.plist Write setting manuallydefaults write /Library/Preferences/com.apple.SoftwareUpdate CatalogURL https://swscan.apple.com/content/catalogs/others/index-10.11beta-10.11-10.10-10.9-mountainlion-lion-snowleopard-leopard.merged-1.sucatalog.gz
iSCSI Benchmarking
The following are benchmarks from our testings of our iSCSI SSD storage.
67,300 read IOP/s on a VM on iSCSI (Disk -> LVM -> MDADM -> DRBD -> iSCSI target -> Network -> XenServer iSCSI Client -> VM) Per VM and scales to 1,000,000 IOP/s total root@dev-samm:/mnt/pmt1 128 # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=128 --size=2G --readwrite=read test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128 2.0.8 Starting 1 process bs: 1 (f=1): [R] [55.6% done] [262.1M/0K /s] [67.3K/0 iops] [eta 00m:04s] 38,500 random 4k write IOP/s on a VM on iSCSI (Disk -> LVM -> MDADM -> DRBD -> iSCSI target -> Network -> XenServer iSCSI Client -> VM) Per VM and scales to 700,000 IOP/s total root@dev-samm:/mnt/pmt1 # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=128 --size=2G --readwrite=randwrite test: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128 2.
Delayed Serial STONITH
A modified version of John Sutton’s rcd_serial cable coupled with our Supermicro reset switch hijacker:
This works with the rcd_serial fence agent plugin.
Reasons rcd_serial makes for a very good STONITH mechanism:
It has no dependency on power state. It has no dependency on network state. It has no dependency on node operational state. It has no dependency on external hardware. It costs less that $5 + time to build. It is incredibly simple and reliable. Essentially the most common STONITH agent type in use is probably those that control UPS / PDUs, while this sounds like a good idea in theory there are a number of issues with relying on a UPS / PDU:
Video - Cluster Failover Performance Demo
CentOS 7 and HA
First some background…
One of the many lessons I’ve learnt from my Linux HA / Storage clustering project is that the Debian HA ecosystem is essentially broken, We reached the point where packages were too old, too buggy or in Debian 8’s case - outright missing.
In the past I was very disappointed with RHEL/CentOS 5 / 6 and (until now) have been quite satisfied with Debian as a stable server distribution with historicity more modern packages and kernels.
I feel that CentOS / RHEL 7 has changed the game.*
(When combined with ElRepo or EPEL that provide wide array of modern packages)
SSD Storage Cluster - Update and Diagram
Due to several recent events beyond my control I’m a bit behind on the project - hence the lack of updates which I apologise for.
The goods news is that I’m back working to finish off the clusters and I’m happy to report that all is going to plan.
Here is the final digram of the two-node cluster design:
Plain text version available here
This was generated from the LCMC tool (beware - it’s java!).
More on this soon…
Video - Storage Cluster Failover Demo
A brief demonstration of the failover and recovery process on the storage clusters I’ve been building.