Re: Ldap replica high cpu issue

I wanted to share this response with the list. I received two responses (which were basically the same) sent directly to me rather than the list, but I found them very helpful, so I wanted to make sure the info got into the list archives:

Tim,

Check for ZCO 5 and older 6 users. In Zimbra 7 you can move them to a COS that disables the ZCO. I think ZCO 5 was especially bad at hammering the LDAP replica server. That cleared up most of our issues with high load on the LDAP.

To further improve auto complete times we have also moved the galsync account and logger host to their own dedicated mailstore with no accounts on it. We did this for deployments with 50k or more users.

I'd bet that once you get ZCO 5 users to stop hammering you, you will see significantly better performance.

Also, here's a snippet from our support case with Zimbra on this a few months back

set_cachesize value for the db.

set_cachesize 0 424370176 0

Can we get it turned up to this?

set_cachesize 0 536870912 1

----------
And this:

I just discussed the issue with a senior QA engineer. Apparently, it is very likely with a large GAL that we can see non-stop requests for GAL sync which leads to very high CPU utilization of slapd. You may also want to set zimbraGalSyncMaxConcurrentClients.

zimbraGalSyncMaxConcurrentClients
   Maximum number of concurrent GAL sync requests allowed on the system /
   domain.

   type : integer
   value :
   callback :
   immutable : false
   cardinality : single
   requiredIn :
   optionalIn : domain,globalConfig
   flags : domainAdminModifiable,domainInherited
   defaults : 2
   min :
   max :
   id : 1154
   requiresRestart :
   since : 7.0.0
   deprecatedSince :

<Respondee's name withheld in case they prefer to keep a low profile>

From: "Tim Ross" <tross@calpoly.edu>
To: "zimbra-hied-admins" <zimbra-hied-admins@sfu.ca>
Sent: Tuesday, August 21, 2012 9:55:25 AM
Subject: Ldap replica high cpu issue

I have been working with Zimbra Support on this issue, but haven't dug up the root cause yet. I wanted to throw this out to the list to see if any of you may have experienced this or have some suggestions on what we might look at / settings to tweak.

We just upgraded from ZCS 6.0.14 NE to 7.2.0. The Monday after the upgrade users started reporting seeing "Server Slow to Respond" warnings (higher percentage of these were IE users). With a little looking around we found that our ldap replica server's cpu load had increased dramatically post-upgrade. Pre-upgrade we ran less than 1 (when checking "top"). Post-upgrade, we would run 5-7 with spikes up to 10 and 11. We created a Gal Sync account on the mailstores and that helped a little. We tweaked a couple other settings and those helped a minor amount also. We now run 3-4 with spikes up to 7-9 about once or twice a day during the heavy usage times. We have 8 cpus and 16 GB of RAM on the ldap replica server, so even with these loads, we wouldn't really expect users to be receiving slow server notices. We are running Red Hat 5, 64 bit. We have approximately 30,000 accounts, but closer to 2,000 moderate to heavy users. All our other servers show very low loads, including the master ldap server. All the slapd process threads are running 200-400% cpu (again in the top, cpu% column) most of the time.

We have gone through the Zimbra Large Deployment Performance Guide and made sure we followed the settings advice there as best we could. We installed Patch 1 for ZCS 7.2.0 and that didn't resolve this issue.

Any ideas or suggestions? Any info I left out that would be useful?

Thanks,

Tim Ross
Application Administrator
Enterprise Applications Group
Cal Poly State University, San Luis Obispo