passionatesanket: September 2020

1) Frequency of documents coming into the system per day; with approx size of each document.
Max size of document, Min size of document, mime types of document allowed by the system. And if zero KB docs are to be allowed or not. Intimate at the project beginning itself, that 0 KB docs should not be allowed, as converting 0 KB docs to pdf in alfresco will generate an erroneous or corrupted pdf, which nobody can open after download or view in browser. So raise this issue at the beginning itself.

2) Any month end spike in file upload, metadata ingestion or heavy db queries expected

3) If combining file and generating a pdf (ex: docpack) exceeds a certain limit (approx 200 MB), then think of other provision
to store the document (instead of storing in alfresco). Probably a dedicated SFTP server (based on the no. of expected
docpacks to be generated per day/per month with size of each docpack).
Else, download od docpack (in GBs) wont be possible from Alfresco or any other web-portal designed.

4) In a REST/Http call, never connect to FTP to read files, parse and do some operation. Same way, donot send email in
REST/Http call whenver possible. Make it asynchronous. Atleast at those places where multiple (ex:100) emails are to be sent from inside the for loop.

5) Remember: DB queries are always faster than REST calls. Wherever possible, try to store the values (which are possible) in db tables instead of in ECM system as files or folders.

6) When you design a database table structure, in case of more columns, try to split the table into master and child table (related with pk and fk); and getting data from them through JOIN query. This will store the data in a structured way; less duplication; and fetched faster through select query

Always maintain a date-time (timestamp) column in the table; as it will always be needed for fetching date wise records for auditing purpose.
Check the file that you are trying to store in ECM; if it is not so big in size, try accomodating/storing the file in table as BLOB / CLOB.
Keep a watch on the no. of records in the tables and create indexes on tables (specific columns) which grow in size exponentially like ACT_RU_TASK, ACT_HI_PROCINST , ALF_NODE, etc.

7) If there is no requirement to store the data (anywhere - either in ECM or db), then think of a cache system like Redis or Memcached to store the data temporarily/

8) If the data is to be stored in an unstructured manner (ex: json), and there is a lot of that data, try going for NoSQL database like MongoDB.

9) Maintain logs / service running monitors / CPU, Memory monitors:

If possible - maintain success and error logs seperately of each application; it will be needed to track the files processed, failed, etc.
Keep a mechanism (may be shell script running periodically) which deletes the old logs (example: logs older than a month, but validate this duration with client first)
Keep a mechanism (may be shell script running periodically) which monitors your running applications. For example: shell script running every 6 hours which checks that your applications are up and running or not, if not, send email to concerned DL/user, and make the service up stating the same in email. Also handle (ignore) the downtime scenarios where a patch or system upgrade (windows 10 upgrade, etc) activity is planned.
Check for the customer/client provided softwares/tools which automatically monitors the services running as well as monitors CPU/Memory usage and much more (for example: Nagios - https://en.wikipedia.org/wiki/Nagios , Monit , Monitorix, Netdata , etc)

10) In ECM systems, when you plan to trigger one or multiple rules (inbound or update) on arrival of a document in the folder, always keep in mind the concurrent users who will be working on the documents under that folder.

Triggering the rule (set on that folder) by concurrent operations/users can throw Concurrency update failure exception.
So, try avoiding the rule/behaviour policy part atleast when concurrent operations/users can come into action.
Content Model: Keep an additional parent folder type (with application name) before you give a custom type folder for your custom folder. So if any other muliple custom folder types or custom doc types come up in future for the same application, u can add it under that parent folder type.

11) In ECM systems, keep a check on the number of folders/documents getting created inside a single folder. In context of Alfresco, if the no. of folders/documents in a single folder crosses 2000, then search would become slow.

So keep the folder taxonomy like folder > YEAR > MONTH > folder_name/document_name in case where no. of documents is high and going to cross 2000 mark soon (even if next year); because the system design has to be made at the inception of the project. You can keep this taxonomy which will help maintaining the folders in structured way and keep search speed optimum.
Also, keep provision of Archival (i.e moving the documents/folders older than a fixed period ex: 2 years into Archived folder). As these are not going to be accessed freqently. And it will prevent unnecessary flooding the main folder container with documents/folders.
Also, if you enable partial search for your custom metadata, make sure that you enforce the user to enter minimum 4 characters - to get accurate search results. With less than 4 chars (example - 3 chars) , alfresco gives unexpected search results.

12) Backup and Restore

If you are taking a backup of the application, make sure it is a workable backup. Backup of content, database, indexes with code should be maintained as a single entity; and should be checked on a new instance/another instance periodically by deploying your backed up files there and checking if the backup is actually a workable backup.

13) Transformation to PDF

At the beginning of the project, intimate the team/client that the following tool you will be using (example: apache pdfbox , iText, etc) for converting to PDF - if you are doing it in a spring boot/spring batch or any external service (i.e outside of alfresco without using alfresco's OOTB transformation)
At the same time, inform them about the limitations of the tool like: The following documents won't convert to PDF
If using Alfresco' OOTB transformation, inform the list of mimetypes that will be converted to PDF, not all docs.
Also, with any transformation you provide (Alfresco's OOTB or external like iText, pdfbox), inform client that the docs once converted to pdf can have a cut off image, some invalid characters which aren't a part of the original document. This will be based on the implementation which the API/services of the tool has provided.
Highlight the limitation of Alfresco transformation service for converting each mimetype to PDF. As for a list of mimetypes, the max size has to be in a particular limit for alfresco to convert it to PDF. Example:

• .txt – 5 MB
• .docx – 768 KB
• .doc – 10 MB
• .ppt – 6 MB
• .pptx – 4 MB

14) Pagination

Clarify at the project beginning that what should be the number of items per page going to be.
Whether it will be folder wise (different no of items per page for different folder), and logged in user wise.
Intimate them at the start itself - based on the product limitation, that in current application (ex: ADF), when items will be loaded and when next page will be clicked, sorting will be applicable only on that next set of data. If you want to see latest doc or oldest doc first for example, then it won't work if total no. of items is huge.
Also, if you load all items of a folder in one go, if the no of items is large, it will take a very long time to load; this may also result in timeout sometimes. So it's better, we should use the product provided feature itself rather than changing or customizing.
ADF 3.4 with back-end as Alfresco 5.2 repo does not have option of fetching all records while loading the document library page will all records of a single folder. So in ADF with Alfresco 5.2 repo, when doclibrary page is loaded, only those no. of items will be loaded on page which is set in 'Items per page' field. If it's 25, only 25 items will be fetched from alf repository. On click of the number in 'Items per page' (for ex: 25), next 25 records will be fetched and sorting based on any field (created date, modified date, name, title, etc) will be applied only on that set of data (26 to 50 items). So highlight this kind of issue at project beginning. And the fix for this can be applied if we upgrade to Alfresco 6.1 repo version.

15) Ingestion of Files/Bulk Upload from 3rd party system to Alfresco or other system

List of Mimetypes to be ingested to Alfresco/other system
List of Mimetypes to be blocked from ingesting - along with what to do with the blocked files
Email notification with successful doc count, failed doc count and total docs processed
Zero byte file to be ignored
A folder to be picked for reading and processing only if a .tmp file does not exist in the folder OR some other trigger which tells your service that this folder is complete and ready to be ingested.
A java file watcher which monitors a directory/directories for any files arrived (monitors the sub directories also)

16) Deployment (Manual / Automated - CI/CD)

Always check which product client is or will be using for version control (git /bit bucket/SVN)
Along with that, check which product he has procured for in order to do the deployment. If no tool he has subscribed for CI/CD, propose Docker, Ansible, Bamboo, Bit bucket pipelines (for bit bucket), Chef, etc for CI/CD.
Present the supported stack of the CI/CD tool with the application/softwares which client is using along with the cost of the tool if not freeware.

17) Version control

Always created three/four different branches - dev, qa, prod based on the requirement OR if only single branch you are maintaining, keep the folder (containing the properties file) separate envt wise. So anyone cloning from the repository will be able to build it and deploy it without any hurdles.
Keep dependencies with properties file - soft binded. And also you should not be required to change the properties file manually everytime you deploy code on other envt.

18) Time zone (server-side OR client-side) to be displayed in the portal page

At the scratch of the project itself, take a written email or in documentation/requirement that what is the time zone to be displayed on the portal page/ADF page for users to see.
ECM systems like Alfresco mostly displays client side timezone on all pages (time when user uploaded document from his/her browser).
But client might have requirement to display server side timezone to fit the requirement of uniform timezone - for all distributed users across the globe.

19) Conversion to PDF

In Alfresco, for doc/docx files, default size limit is 768 KB. So if any doc/docx file you upload with size greater than 786 KB, Alfresco will not convert to pdf and exception will be generated in logs: org.alfresco.repo.content.transform.UnsupportedTransformationException: 11136194 Unsupported transformation: transformer.JodConverter application/vnd.openxmlformats-officedocument.wordprocessingml.document to application/pdf 1.4 MB > 768 KB
You can increase this limit by setting the following parameter in alfresco-global.properties:
content.transformer.OpenOffice.extensions.docx.pdf.maxSourceSizeKBytes=10240
Above limit is 10 MB.

20) SOW sign off / Requirement document freezing / Design document

Document in entirety in detail what will it include. For example: Team will prepare runbook. Clarify each specific point as a sub bullet point saying runbook will include deployment steps, user manual/user guide or developer envt setup setup with snapshots, etc.
Ask customer to provide wireframe of each screen and list of business validations for each field
Agree upon whether there will be K.T plan at the project end (phase 1 end for example) and if yes, the K.T hours/weeks which will be provided to client. Not more than these many hours.

Also mention that high level K.T containing functionalities explanation OR detailed code level K.T is expected.
If code level K.T, then agree upon like class level functionalities explanation or method level explanation.

Agree upon the QA time (hours/weeks for which QA will be done). Do not allow QA to keep logging defects as long as they want even if they are from different vendor. Highlight this point to the respective stakeholders and update JIRA stories regularly.
Preparing design document, agree upon that these points particularly will be updated/added by infra team or network team (whosoever is the concerned owner of that point). Infra and network related stuff will not be added/updated by development team.
Clearly distinguish the In Scope, Out of Scope and Assumptions sections.
From installation perspective (example - Alfresco installation), clarify if application team OR infra team will do the setup of Alfresco

For preparation of RDS/Amazon Aurora or Postgres on AWS, ideally its done by AWS cloud team OR client side infra team OR CIS team, but not by Application team.
Similarly the following activities would fall under the responsibilities of infra/cloud team is AWS/Azure/GCP is selected:

Security group configuration
Firewall/port opening between instances
SSL/HTTPS communication enabling with certificates
S3 bucket configuration and data encryption on S3 using KMS (if that is a part of requirement). Only S3 bucket name, url configurations to be done from application side.
SMTP/Email configuration
Auto scaling
Elastic load balancing
Access to be given to devt team on DEV, QA, UAT, Prod envt for deployment
Access to VDIs/Virtual desktops with LAN IDs created, RSA tokens created, separate client email ids created (in some cases), Amazon workspaces, etc
Pre-requisite softwares installed on dev machines and DEV, UAT, PROD envts

Ideal softwares needed on development machine, and other envts:
Java - All envts
Node.js - Development machine
Windows VDI/Devt machine - with specified RAM like 16 GB
Eclipse (with maven dependencies download enabled)
Maven
WinSCP and Putty
Notepad++
7-zip
Git bash
VSCodeEditor
Keytool generator (if keys to be generated for SSL)
ActiveMQ
Tomcat - on servers
ACS / APS / AGS - on servers (based on requirement)
Search services - on servers
RDS installation - on servers

Installing Alfresco as distribution zip on a node and installing search services on same node or different node - this can be done by application team (provided they have required access). But clarify if it will be done by infra team or app team. Also, Alfresco installation with docker can either be done by infra team or by app team but needs clarification.
Access to bit bucket or git repository for committing code

Ideal in-scope activities

Design of overall application architecture
Design of existing and to be platform architecture (consult hardware/infra team and collaborate with their information)
Platform installation, setup and configuration
Development
Integration with 3rd party systems if any
Installation (of exactly which tools) and by which team it will be done
Testing (with test cases or not) - and separate testing team will be involved or not
Documentation
Hypercare support (example 4 weeks after go live)
LDAP, SSO integration

Ideal out-of-scope activities

Any data migration
Any infra related activities like S3 config, firewall/security groups config, VPC setup, network security, access, etc
Any infra or hardware setup like OS, database
Disaster recovery setup
UAT execution
Performance testing
Imaging or OCR tool integration or Barcode integration, etc
Supporting existing functionalities of existing application

Ideal assumptions

The UI/Portal to be used by end users - Share or ADF or other UI
Browser compatibility of application - as per product supported stack
Any product license cost procurement to be done by customer
Integration with 3rd party systems as per the product's exposed REST APIs and CMIS queries
Deployment (Manual or Automated with CI/CD). If automated - which tool exactly will be used needs to be specified.
Duration of SIT & UAT - 2 weeks (example) - depends on functionalities and project size
Documentation - what will be covered in it
Approx no. of content types, metadata, workflows
Volume of data (daily, weekly, monthly, yearly) and their future plans
3 envts - dev, qa/uat, prod or 4 envts
Total users, concurrent users accessing the application
Reporting, auditing as per the product OOTB support or customization
Duration of system testing, UAT and hypercare support.

Clarity over - support from Alfresco product team (for HA, Clustering, scaling, license cost, etc) and support from AWS/Azure based on the support procured.
Clarity over the subscribed cores of CPU by client (ex: 16 cores for ACS and 4 cores for index server, etc)
Glacier needed for archival or not
Inputs from product team, infra team and client team needed but need clarity over - Type of EC2 instance needed like m5.xlarge, m5.large, etc
Which db to be finalized - rds amazon aurora/mysql , postgres, msssql, etc. with size i.e r5.large. And if master-slave architecture needed or just master db. But certainly postgres is already a proved solution at many implementations.
Clustering required or not based on concurrency and volume of data flowing. Ideally 1 app server+1 index server in 1st AZ and 1 app server+1 index server in 2nd AZ - but depends on the client's procurement and available cores with them, and also depends on the volume of data and concurrency.
Auto scaling not applicable for index server, but only for app servers

21) Sending Email

Take a confirmation or sign off on which port email will be sent. Whether it will be non-secured port 25 OR a secured port 587 or 465.
If it's going to be on cloud, it mostly have to be secured with firewall protection. So some tickets need to be raised with infra team to open the port on F5 as well as on the node (instance where code is deployed - dev/qa/prod).
Also, if its going to be sent on port 587, developer needs a test/functional email credentials which needs to be passed into the method while authenticating and sending email. Properties: mail.username, mail.password. Sending email on port 25 will not need this authentication

22) Running into performance issues ?

Finding slow running queries:

P6spy jar file put in tomcat/lib, change db.url with p6spy jdbc url, and restart alf. It will generate p6spy.log file which can be analyzed by us or hyland support people and fetch the slow running queries out of it.
Generate AWR report by Infra/AWS team (between the load time/execution time when testing was done), and share it with DB team to analyze - which shud show the slow running queries.

Check the logs of repo, solr, amq, ats and also the access-logs if you find anything wrong or any errors pointing to the problem.
For search slowness or indexing slowness or issues - recommend contentless indexing instead of full content indexing when not needed
Consider increasing power of instances - i.e increase memory/RAM, cpu cores and try
Capturing thread dumps when slowness occurs
Capture jmx dump
For Alfresco performance:
Solr is CPU intensive. So Transfer rate between solr instance and EBS volume shud be good.
For content indexing (FTS, not contentless indexing), ideal case is that index size on shards shud be 1/3 or 1/2 of the size in S3.
- So if S3 size of one envt is 500 GB, then index size shud be around 170 GB (i.e 1/3rd) or at max 250 GB (half)
TLog folder size may increase during indexing in progress, but will shrink automatically when the txn is indexed successfully and committed. So recommendation is to keep disk space atleast 3-4 times more than the actual indexes size to accomodate the spike in tlog folder. It depends on how big is a single txn. If one single txn contains more than 100k or 1 million rows in db, then it should have a huge no. of ACLs associated with it or large set of permissions and associations. So while indexing such txn, this will keep on increasing the tlog size. But if we do not have such big txn, tlog size might not be increased much during indexing.
Alf. does not recommend more than 60 million nodes to be kept on single shard.
With search-services 2.0.2 - While estimating the disk space for shards, estimate approx 3 times the size of indexes. That is , if indexes size on one shard is 50 GB, then need to ask for alteast 150-200 GB of space while procuring (this is for contentless indexing.). For content indexing, for 220 GB of indexes on one shard (with ACS5.2, ASS 1.4), when went into ACS7.1 with ASS2.0.2, it took space upto 1.4 TB while performing indexing and then optimized itself to 600-700 GB after optimization. So we need to procure disk space accordingly. In current project, RAM procured was 378 GB, 48 core CPU. With this much h/w, Indexing time taken on higher envt. (50 GB on each shard) with 12 shards in place - 2 days (with contentless indexing)
Solr xms xms shud be same due to backend activity, no UI involved. Whereas for UI , sched node as users are accessing, diff in xms and xmx due to multiple requests coming up on them.
ATS (Alfresco Transformation Services) is CPU intensive. So procure the h/w accordingly. Ideally 500 GB space, 60 GB RAM., CPU cores- 8 cores is more than enough.
Hyland recommendation:
No way directly to reindex just ACLs.
The approach of running index-checker program should work ideally.
Use index checker spring boot application, modify the code to reindex the ACLs only.
Disk compression shud also be applied before starting reindexing to save disk space.
suggester in solr shud be disabled to save disk space
ASS 2 uses less disk space than ASS 1.4, but its diff behaviour found in current project.
Disable fingerprint - it saves a lot of disk space
By default, no replication is enabled. Even if u see green tick in replication enabled in solr admin console, but if u have not setup a slave then its equivalent to replication disabled. u can define how many replicas u want to create for each shard. So if u loose one shard, u have backup.
Alf.Standard -

750 GB per shard
50 Million transactions per shard
Alf.benchmark - 60 million trans. per shard for DB_ID
Optimization takes its own time

After indexing:
- ACLs in solr and ACLs in DB shud match.
- Infact nodes in solr and nodes in DB shud match.
- Basically all metrics of DB shud match with that of solr console
Our findings - content, folder, custom types, ACLs, thumbnails - all are indexed separately.
Hyland says that even for max load - 16 GB RAM and 8 core vCPU is fine. Even if we increase it, it wont improve performance.
Trackers -> Hyland says trackers are not necessary and can be omitted atleast in ACS7.x. But if trackers improve perf, then definitely use trackers with ACS7.x.

23) Solr sharding architecture with Alfresco

Flow of sharding in solr (when dynamic shard registration = true in alf-global):
- When solr is up, it searches for alf.host
- Alf. acknowledges n makes entry in DB with shard 0=IP_ADDRESS
- Solr 2 is up - it searches some alf.host
- Alf. acknowledges and makes entry in DB with shard 1 = IP_ADDRESS
- Now user queries with some alf.keyword
- Alf. finds shard no.s in DB and searches in shard 0 first and then in shard 1.
- In this process, the value we set in /alfresco (search-service page) will persist in DB when saved. So it will be reflected across all the trackers/repo nodes / alfresco (search-service page).

24) Alfresco Solr Search Finding:

You can do an open search using just the keywords like 1572 which will return results containing the term across all metadata, document name, description or content
You can also do search using string of words like '1572 signed' OR using Partial search like '157*' which will return results containing the term across all metadata, document name, description or content
To search for an exact document name, prefix the term with "name", For eg. name:"1572" which will return list of documents whose name contains '1572'
To search for an exact document content, prefix the term with "TEXT", For eg. TEXT:"1572" which will return list of documents which contains '1572' in the content of the document
AND Search - For example, typing the keywords 'Protocol AND Amendment', would return a list of documents that include the Protocol and the word Amendment.
OR Search - For example, type the keywords 'Protocol Amendment OR Informed Consent Form', would return a list of documents that have the exact match for Protocol Amendment or the exact match for Informed Consent Form.
FTS-alfresco search query you can test in javascript-console in alfresco:

       var query = "";
       var def = {
       query: query,
       store: "workspace://SpacesStore",
       language: "fts-alfresco",
       };

       var results = search.queryResultSet(def);
       var totalRecords = results.meta.numberFound;
       logger.log("totalRecords : " + totalRecords);

Other findings:
Legacy transformation not working with ACS7.x. Have to go with ATS.
When repo starts, change property in activemq that its not compulsory to get rid of the error. Set the property in alf-global and it will start without amq.
Dedicated tomcats for repo and share is better - but its fine if you keep them on same tomcat as share is lightweight and occupies less resources.
ACS and ARender running on same instance use the same clustering port 5701, and cause issues when ARender is started first and then ACS is started (ACS7.x also has this problem). So recommendation is to make first ACS up and running and then start ARender. OR change the clustering port 5701 on either alfresco side or arender side.
To get rid of the cache memory warnings of insufficient cache during ACS server startup, add the following line in tomcat/conf/Catalina/localhost/share.xml: <Resources cachingAllowed="true" cacheMaxSize="300000">
If you upgrade from ACS5.2 to new ACS7.1, and if your old code has conn.commit() i.e explicit txn commit code, then you need to comment such lines as explicit commit is not needed in new ACS. By default, autoCommit is set to true.
To create solr core and start solr:

./solr/bin/solr start -a "-Dcreate.alfresco.defaults=alfresco,archive"
OR
./solr start -a "-Dcreate.alfresco.defaults=alfresco,archive"

If solr search not working, to resolve such issues , you can try:

Remove sharding and try to point to single solr node
Delete alfresco, archive cores, and create new cores while solr start
Disabled all safedx schedulers in alf-global where it was enabled=true
Uncomment these properties in shared.properties:
- solr.host=IP_ADDRESS
- solr.port=8983
- alfresco.cross.locale.datatype.0={http://www.alfresco.org/model/dictionary/1.0}text
- alfresco.cross.locale.datatype.1={http://www.alfresco.org/model/dictionary/1.0}content
- alfresco.cross.locale.datatype.2={http://www.alfresco.org/model/dictionary/1.0}mltext

Versions - if keeping on growing - consider deleting older versions after certain no. to improve performance and save disk space as well as indexing, db data.
Apply S3 bucket class policies for archival, consider glacier.
Cold backup is preferred always while upgrade
If reindexing on prod is to be considered while upgrade, consider atleast a month for it (considering 70-80 million nodes in Alfresco - nodes means total entries in alf_node table).
Alfresco provides optimal benefit with a max CPU limit of 16 vCPUs. Beyond that there is diminishing benefit of CPU
Split Alfresco Repo and Share to different tomcat instances on same machine. This will provide performance benefit during the garbage cleaning activity as both of them will have their dedicated JVM.
Heap size should be as small as possible to support the application. Tune it as per my application needs
Clone the db schemas on same server or new server. Calculate the time required. Either clone current prod db and point alf to it directly (keep backup of current prod db for safety) OR Confirm downtime with customer

------------------
Agile
------------------

Agile: Maintain sprint retrospection at end of go live
In Agile, 8 hours of development shud only be considered per day while calculating devt effort. Rest 1 hour goes in lunch break, communication, calls, daily scrum meetings.

--------------------------------------------------------------------------------------------------------

ArchiMate:
ArchiMate is a standard technique/tool to represent system architecture with standard notations, similar to UML.There are 3 layers in an enterprise architecture: Business level, Application level, Technology levelThere are specific notations for each object in an enterprise architecture: example- actor, node, component, interface, aggregation, composition, specialization, etc.Versions of ArchiMate - 2.0 and 3.0View of this tool is similar to eclipse IDE.

passionatesanket

Friday, 18 September 2020

Architecture Level Points