Friday, 28 December 2018

Alfresco server LB URL Access Slowness

Here are some of the points/findings which we did when we face slowness while accessing Load Balancer URL of our application (when compared to node specific URL):

1) LB team only gives details/logs at runtime. They dont give/track previous logs
2) Check with Alfresco product team/Ahmad for Alfresco performance benchmark points
3) Try accessing node specific URL, and compare it with the time taken by LB url
4) Keep checking the logs of all 4 servers - alf1,alf2,solr1,solr2
5) Conduct PT with PT team for measuring performance with node specific url and LB url
6) Access the LB URLs every 30 mins or 1 hour. Check at exactly which timeframe or window, the slowness occurs
7) Check access logs apart from alfresco server logs and solr logs.
8) Check Alfresco alf-global.properties files; compare all 4 nodes properties and validate them
9) Check alfresco admin-summary page and check Repository Cluster settings if only alf nodes are listed in cluster; or solr nodes are also listed there. Ideally only alf nodes shud be listed under cluster

10) Increase JVM memory - Xms and Xmx
11) Measure CPU performance, memory, thread dumps - at the time when its slowness with tools like NewRelic. Also DB logs and DB memory performance. If there is big spike, check with DB team. Try getting the slow queries ; they might be causing slowness.
12) Allow different members of the team (sitting at different locations) to access the LB URL. and check if there is location specific issue
13) Disable all things which are not required - like activity feeds, auditing, transformation server turning off, disabling full text indexing for nodes if not needed; keeping limited indexing for specific nodes, disabling schedulers running in background which are not needed.
14) Raise ticket with LB team if issue seems to be with LB.
15) Raise ticket with Alfresco product team if issue seems to be with Alfresco product itself.
16) Till now, experience with Alfresco product says that - keep alf only for storage, dont use alf share as direct client access, mostly its painful performance wise especially when alf share is customized. USE CMIS QUERIES to Alfresco from front end whenever possible.

17) 
a)Remove LB, Keep SAML 
Remove from Load Balancer and keep SAML in Node 2. 
These changes are to test whether Load Balancer is playing key role in slowness or not. 
You will able to login through your SSO as in normal way without any changes.
Use the alf2 (8080 with http) link to direct node specific access with your SSO.
You can replace the https DNS url with http 8080 (node specific) url to continue the testing for Node specific. (Remember that node specific link is http compare to https in LB) 

b)Remove LB, Remove SAML
Open one of the Solr Node to test performance without LB/SAML. 
These changes are to test performance without any additional layer for authentication or redirection. (Same like how we are accessing Node specific with admin user)
Access node specific url. Provide “test1/test1” as UserId/Pwd to access the link. 
You can replace the LB url with http solr2 8080 link in URL to continue the testing for Node specific. (Remember that node specific link is http compare to https in LB). 
You should be asked to provide Alfresco user “test1/test1” to access.
At the same time the regular LB URL also works as usual. 
Ask users to test above both changes against LB URL and provide their observation. 
If a) is still Slow that means, SAML is playing key role instead of LB in slowness and we need to focus in that direction. Also the b) should access in quicker way as it is without LB and SAML.

18) There can be a problem with port 443 (https) if node specific url is working fine without slowness and LB url with https is slow. Changing the alfresco tomcat server port to 443 in tomcat/conf/server.xml will give exception (BindException: Permission denied) atleast in alf5.1.
So as this is not allowed, you can keep the port 8443 in server.xml, change the share.port=8443 and share.protocol=https in alf-global; change the SAML config API page with Base URL and ACS URL with https and 8443 instead of 443 and check. This may help.

19) 
a). Map the http alf2 8080 share URL to continue testing and as a temporary solution till the time LB URL issue is resolved. This option will allow you some more breathing time for issue resolution as well it will not impact on User Testing as well as go-Live Plan.
b). In Parallel, try to have F5 team available to work on this issue along with application team. This should include below items.
b.1). Do Changes in F5 where disable the node availability monitoring and just map only one node to remove any choices. By this just check that having one node direct access resolving the issue or not.
b.2). Try change the node specific URL access from 8080 to 8443 port which should access as https instead of http. This is to check whether https protocol/port having any issue or not when you access from node specific.

20) Check with LB team if there are any flaps or errors/exception in their logs at the time where there is slowness OR check with LB team if there is any issue with their configuration which diverts the requests to alf1 and alf2.

1 comment:

  1. Pass4sure Checkpoint dumps revolutionized my IT career by helping me earning CheckPoint. There support was so perfectly designed that I could easily pass my exam at the first attempt. Dumpspass4sure offered me money back guarantee with Checkpoint PDF compact guide.

    ReplyDelete