ccolonbackslash

Just another WordPress.com site

Monthly Archives: March 2011

Migrating Security Gateways from Checkpoint R65 to R71.10 on HP/Intel Hardware

Having completed this process several times I thought other people might find the experience useful.

Having built a new smartcenter on R71.10, this is the process i used to migrate R65 gateways to R71.10 with a minimum of hassle. All the gateways run SPLAT on late-model HPDL360 G5’s with intel/hp nics, all hardware IS on the HCL but this doesnt seem to make a blind bit of difference and we’ve had issues with duplex retention and negotiation since moving to R65 on kernel 2.6, R71.10 on 2.6 hasnt fixed this either and I’ve yet to look at R75.

This process reuses the same R65 firewall object in the policy and upgrades it to R71.10, i’ve not had any issues with doing this as new certificates are issued and it reduces the amount of configuration needed. I’m sure it’s still better practice to totally recreate it mind. 

  1. Log onto smartcenter
  2. Reset sic for your device
  3. Change platform to proper security blade license – i use a separate management server so no need for the management products. Don’t be tempted to tick any new feature boxes as they’ll mess with the license if you’ve upgraded it.
  4. Push policy
  5. Install R71.10 on hardware,+ any other products you’ve paid for.
  6. Determine interfaces and label so on paper, there is no rhyme or reason to the names or order it assigns them, totally different from R65 and actually, if you install it twice on the same hardware it’ll still randomise them and they won’t match up. All I do is unplug all of them, plug one in scan in initial install when it’s looking for a management interface, note down it’s interface, label cable, plug another, scan, note down interface ad infinitum. You can also do this from the shell once it’s installed, unplug all the nics and get it to flash lights from each port but seems like more aggro than it’s worth, besides you need to id them before you can even get into the shell.
  7. Step through the install, giving the device it its own internal mgt/lan address as default gateway (remove this from webgui later)
  8. Reboot
  9. Log on through web browser, define interfaces according to step 6.
  10. Define routes, deleting default internal route and creating external default route
  11. Use google or provider dns (google is 8.8.8.8, 8.8.4.4) EXTERNAL NTP for setting time, here i use pool.ntp.org as my primary and set to sync every 60 seconds, Leave the rest on default pretty much,
  12. Allow it to build
  13. Reboot again
  14. Log back onto SmartCentre
  15. Establish SIC
  16. Go to topology,  Get interfaces (Don’t get interfaces with topology it’ll create a bunch of new objects which will break your existing fw object in the policy).
  17. Check interfaces look right and match up with step 6 and 9 and make sure internal/external/networks behind is set correctly
  18. Push policy – you should be online. Now comes all the stupid hoop jumping getting the duplex to work both initially and after a reboot on HP/intel nics and others i suspect. oh, btw, thanks for fixing this Checkpoint, it’s only been a much reported issue for three years with half the hardware on the HCL. Essentially, if you set the duplex from expert mode, it will be forgotten on restart, and dont even think about setting it in the gui – the gui duplex switches are about as much use as a chocolate fireguard.
  19. Log onto fw through ssh or console
  20. Enter expert mode
  21. Run mii-tool and review interface list, if duplex look good, try to change them to manually set and see if you get reconnected (obviously give them 60 seconds to re-establish), if not leave them on auto. It seems that still with this new version of the splat if you manually define it sometimes the other side will not accept despite defaulting to 100/full or 10/full when on auto….. nice. Either way this is the most important thing to get right in the process as it will much up performance for everything if you have any interfaces flapping around on half on one side.
  22. To manually set a duplex from expert shell: Ethtool –s eth1 (or whatever) speed 100 duplex full autoneg off
  23. To reset an interface back to auto: ethtool –s eth1 (or whatever) autoneg on
  24. To review interfaces status: mii-tool
  25. To review an interface for collisions after changing: Ethtool –S eth1 (or whatever)
  26. To make the duplex persistent across reboots you need to enter the correct commands in /etc/rc.d/rc.local, see below for my rc.local. to get here from expert shell type: cd /etc/rc.c, then vi rc.local, if you haven’t used vi before look at: http://acms.ucsd.edu/info/vi_tutorial.shtml

Test the manual duplex setting with a reboot and then go back in and look at mii-tool, i don’t use the web interface for this purpose as the web interface tells LIES it has no idea what the duplex/speed settings actually are and when you change the settings it’s like the buttons on one of those kiddies helicopter rides at the supermarket, you can press them all day and it won’t make a damn bit of difference, nothing happens.

License the fw from the user centre: 

  1. Use software blades (R70 and above)
  2. Download and install from smart update
  3. As soon as you install the license you’ll probably lose connectivity, this is because a big red light just came on on someones desk in Tel Aviv notifying Checkpoint that you were using more cores/features than you have paid for, the only way to restore connectivity is a reboot so the Kernel can slam the door on all that processing power we don’t get for only 35000usd per year (excluding hardware).

Et Voila! Brand new shiny R71.10 firewall, inspection throughput  is massively improved in my situation, we will only notice this on the UTM 270 in a branch office, as despite being only 12 months old it’s beating heart is a 600mhz Celeron from ten years ago, still cost 4500usd though. As the other sites are on xeons and openservers we’re not even getting close to the performance ceiling if a single core (200mbps real world with sensible scanning and sd according to www.cpug.org).

VM Cluster using HA Starwind iscsi storage and you don’t have NASA style data centre redundancy? don’t bother yet. Major problems.

I’ll leave this in place, but should mention that the concerns i have below are not valid anymore, in version 5.7. See my follow up post here: StarWind 5.7/5.6 and recovery from both nodes down.

__________________________________

Seems starwind consider HA to be slightly more exclusive than their site and marketing blurb let on.

I understand that true HA means, never, ever off, but even investment banks have occasional power-downs, just to prove they can start systems up again afterwards. Beware though, if you ever (and i mean EVER) want to contemplate turning your clustered storage off for a period of time due to a building power cut/act of god/whatever, for now, pick another solution.

It works great if one node is up full time, which i suppose if you are NASA is possible, but its good practice for all organizations to do an occasional poweroff, and every so often you know, even in London, you have a long power outage, or there is building maintenance.

Essentially, the issue is if you power down both nodes of a storage cluster gracefully after powering down your hyper-v/xen/vmware cluster you will not be able to get them up again without MANUALLY specifying the most recent copy of the data (a major issue if you get this wrong and are running any db app) then sitting through a FULL synchronisation, 200gb took almost 12 hours in my test environment during which the cluster was inaccessible as the storage was not accepting incoming connections. In production this would mean your supposedly HA environment would be offline until the storage had done a pointless full sync between nodes.

I checked out the Starwind forum where they claim this is by design, this is totally ridiculous. There are degrees of HA, and it’s not often a midsize company can afford separate power supply companies at either end of the building, which seems to be where most people lose out, for example, we planned to have redundant hosts, redundant storage units, redundant switches all on redundant UPS’s but we only have one provider supplying electricity, to totally eliminate the viability of this platform by not implementing a last write flag on the storage is insane.

Essentially this means a great product is ruined for a large number of it’s users. A real shame. There is a workaround, outlined in this link, but it’s risky and involves judging yourself which replica is most current, deleting the targets, recreating and then recreating ALL iscsi connections on the cluster? absolutely crazy. In my test environment this took me almost an hour first time round.

Check this out:

http://www.starwindsoftware.com/forums/starwind-f5/full-sync-requirement-both-nodes-are-powered-off-t2132.html

If anyone else has had their implementation hobbled by this oversight I’d love to hear from you. I’d also be keen to hear when this is addressed in a workable way by Starwind as this does not seem to be a feature they shout about in the marketing department.