When Gaia released at R75.40 on 2012, our Checkpoint firewalls have been adopted it right away with an upgrade. Since then we have upgraded to R77.10, R77.20 and recently planing to R77.30. The new version’s experience was quite good, but just recently we are starting to feel the Gaia CLI and Porttal is getting slower and slower.
Symptoms:
For example, the ssh login process is taking a couple of minutes to show the prompt. WebUi is consistently showing lost database connection when saving any changes. You will have to re-login again to WebUI. SNMP Monitoring shows your device is up and reachable by ping but could not poll any SNMP information. After a couple of minutes, sometimes, it may take more than 10 minutes or longer, everything goes back normal. It did not happen all the time, just a couple of times per day. Most of times, log in, snmp access are fine.
Also some times, you will find out save config command will cause database timeout issue too.
FW-CP2> save config
NMSCFD0026 Timeout waiting for response from database server. |
Solutions:
Actually Checkpoint has a couple of sk relating to this issue: such as sk104761 for this. Based on sk104761 : Each change made in Gaia Clish or in Gaia Portal is saved under a revision in the Gaia Database – /config/db/initial_db file. Once this file becomes large, confd process consumes more CPU to read from this file, or to save new data to this file.
[Expert@CP-DMZ-1:0]# cd /config/db [Expert@CP-DMZ-1:0]# ls -l total 218836 -rw-r–r– 1 admin root 133250 Sep 20 13:28 initial -rw-r–r– 1 admin root 223720448 Sep 20 13:28 initial_db |
The Initial_db file has been increased to size 220M. So what is Initial_db, and can we delete it? Answer of course is no.
From sk101273, “The /config/db/initial file must be present and valid (in other words, not corrupted) at boot time for IP Series Appliance to get configured. Otherwise, the IP Series Appliance will go into first-time boot mode and attempt to configure itself using DHCP, or wait for the user to configure it through the serial console port.”
Lets take a look what is inside:
[Expert@CP-1:0]# cat initial
# This file was AUTOMATICALLY GENERATED # Generated by /bin/confd on Sun Sep 20 13:28:32 2015 # # DO NOT EDIT # configurationChange t centrallyManaged t inactto:default 720 # DO NOT EDIT file was AUTOMATICALLY GENERATED by /bin/confd on Tue 6 16:16:12 NOT EDIT resolv:resolver:1 8.8.8.8 ntp:server:10.9.16.5 t ntp:server:10.9.16.5:version 1 ntp:server:10.9.16.5:iburst t ntp:server:10.9.16.5:prefer t ntp:server:10.4.4.27 t ntp:server:10.4.4.27:version 1 ntp:server:10.4.4.27:iburst t ntp:servers:primary 10.9.16.5 ntp:servers:secondary 10.4.4.27 dhcp:dhcpc:interface:eth3 t dhcp:dhcpc:interface:eth3:timeout 60 dhcp:dhcpc:interface:eth3:retry 300 dhcp:dhcpc:interface:eth3:reboot 10 machine:hostname FW-GRU1-CP1 update_upgrade_info:set_counter f 5 17:23:44 installer:available_install_packages_number 4 installer:available_download_packages_number 7 installer:category_is_aligned:3 1 installer:category_is_aligned:5 1 installer:category_is_aligned:1 1 installer:category_is_aligned:4 0 installer:ftw_random_res 1 installer:d_weekday Saturday installer:d_hours 17 installer:d_minutes 30 …
|
- Log in to Expert mode.
- Backup the current Gaia configuarion database:
[Expert@HostName]# cp /config/db/initial_db /config/db/initial_db_backup
- Connect to the Gaia configuration database:
[Expert@HostName]# sqlite3 /config/db/initial_db
- Query the database using the SQLite to identify the issue:
sqlite> select * from revisions where time like "%1969%";
If any entries are returned, the system is likely experiencing this issue. - Exit from SQLite:
sqlite> .exit
[Expert@FW-GRU1-CP1:0]# sqlite3 /config/db/initial_db
SQLite version 3.6.20 Enter “.help” for instructions Enter SQL statements terminated with a “;” sqlite> sqlite> select * from revisions where time like “%1969%”; Error: near “sqlite”: syntax error sqlite> select * from revisions where time like “%1969%”; cluster:shared_feature_lock:admin|0|||||1969-12-31 19:00:00|1 cluster:shared_feature_lock:cadmin|0|||||1969-12-31 19:00:00|1 cdm:per_exec|0|||||1969-12-31 19:00:00|1 cdm:total|0|||||1969-12-31 19:00:00|1 cdm:enable|0|||||1969-12-31 19:00:00|1 lcd:screensaver:mode|0|||||1969-12-31 19:00:00|1 lcd:screensaver:timeout|0|||||1969-12-31 19:00:00|1 lcd:backlight:support|0|||||1969-12-31 19:00:00|1 zoneinfo:Atlantic:Faroe|0|||||1969-12-31 19:00:00|1 zoneinfo:Atlantic:Stanley|0|||||1969-12-31 19:00:00|1 zoneinfo:Atlantic:Canary|0|||||1969-12-31 19:00:00|1 zoneinfo:Atlantic:St_Helena|0|||||1969-12-31 19:00:00|1 zoneinfo:Atlantic:South_Georgia|0|||||1969-12-31 19:00:00|1 …… |
Once cause confirmed, contact Checkpoint Support to get a fix patch and apply it.
Some other SKs, sk95238, sk102988 are having similar solution on this issue. Basically, a Jumbo Hotfix will have this to be fixed.
Reference:
sk95238 (‘confd’ daemon consumes the CPU up to 100% when using Gaia Portal)
sk102988 (‘monitord’ and ‘confd’ processes consume 100% CPU)
sk102994 (Clish in Gaia OS is very slow when making any changes in Gaia OS configuration)