Issues:
1. After full indexing/reindexing, few txns not indexed or skipped by solr due to timeout issue or some other discrepency.
In this case, it needs to be checked by generating solr report that what are the missing txns.
URL --> http://<IP>:8983/solr/admin/cores?action=REPORT&wt=xml
Basically, single txn can be reindexed by passing tx id in the following url:
http://<IP>:8983/solr/admin/cores?action=REINDEX&txid=<id>
But in cases where there are millions of txns missed or not indexed, that needs to be handled either through:
1. Alfresco/Hyland recomends to use the action=FIX url instead of REINDEX. Also if the solr report is not appropriate and also if the nodes are not reindexed, they recommend to upgrade to latest version of search-services-2.0.3 (as per April, 2022).
Tried with action=FIX but does not work. (If want to try and check, use the following url):
http://<IP_ADDRESS>:8983/solr/admin/cores?action=fix&core=alfresco-0&dryRun=true
To actually fix,
http://<IP_ADDRESS>:8983/solr/admin/cores?action=fix&core=alfresco-0&dryRun=false
But as above did not work,
Upgraded to 2.0.3 version and reindexed. Solr summary and report url worked but not accurately as expected. Few ACLs and nodes were not indexed, which were shown clean/zero in solr report when generated. So fixed them through action=REINDEX only by running the index-checker program (example jar running commands below):
For ACL check:
nohup java -jar index-checker-0.0.1-SNAPSHOT.jar --acl.checker=true --logging.file.name=acl-checker-<batchid>.log --solr.acl.check.batch.id=<batchid> --solr.reindex.local=true --solr.core=alfresco-0 &
For ACL reindex:
nohup java -jar index-checker-0.0.1-SNAPSHOT.jar --acl.reindex=true --solr.reindex.batch.id=<batchid> --logging.file.name=acl-reindexer-<batchid>.log --solr.reindex.sleep.time=100 --solr.reindex.local=true --solr.core=alfresco-1 &
For Nodes Validation:
nohup java -jar index-checker-0.0.1-SNAPSHOT.jar --nodes.validation=true --logging.file.name=node-indexer.log --solr.core=alfresco-1 --solr.validate.batch.id=<batchid> &
For Nodes reindex:
nohup java -jar index-checker-0.0.1-SNAPSHOT.jar --nodes.reindex=true --solr.reindex.batch.id=<batchid> --logging.file.name=node-reindexer.log --solr.reindex.local=true --solr.core=alfresco-1 &
NOTE: For making 2.0.3 work without https/mtls, renamed the security.json file as security.json.bak in solr to make it work without https or secret pwd.
NOTE: In higher envt with high volume of data, there is a bug with openjdk-11.0.2 which crashes the running alfresco servers. So recommended to upgrade to openjdk-11.0.4 atleast to fix this crash issue on trackers, repo nodes, etc where alfresco is running. Also another fix for this crash is to increase the maxThread count in tomcat's server.xml as follows:
<Connector port="8080" URIEncoding="UTF-8" protocol="HTTP/1.1" compressibleMimeType="text/html,text/xml,text/css,text/javascript, application/x-javascript,application/javascript" compression="on" compressionMinSize="128" noCompressionUserAgents="gozilla, traviata" connectionTimeout="300000" keepAliveTimeout="300000" redirectPort="8443" maxHttpHeaderSize="32768" maxThreads="2000" maxConnections="2000"/>
Enable MetadataTracker logs in solr to see the indexing logs coming.
2. Custom code like index-checker (https://github.com/AlfrescoLabs/index-checker) - which on running will print the missing DB IDs list. Those DB IDs need to be indexed. To fix those DB IDs, the property run.fix.actions needs to be set to true in application.properties.
When the property run.fix.actions in this properties file is set to true, fixing actions are applied to SOLR Index:
- DELETE documents (NODE or ACL) existing in SOLR but missing in DB
- REINDEX nodes (NODE or ACL) existing in DB but missing in SOLR
OR
3. By running above REINDEX urls in bulk through postman.
Steps to follow this 2nd approach are as follows:
Check-out and Modify (if needed) the index-checker code base to find the list of missing DB IDs first.
Details at: https://github.com/AlfrescoLabs/index-checker
Change the application.properties file to put your RDS details against spring.datasource.url, supply username, pwd, solr.url. Keep validation.nodes=true (to find missing DB IDs) and validation.permissions=true (to find missing ACL IDs). Keep report.detailed=true.
Keep run.fix.actions=false ( if you just want to analyze and print the missing IDs, and not fix/reindex them). If you want to fix/reindex also when you run this jar, keep this as true (i.e run.fix.actions=true)
Now, build the jar using maven in your local editor-eclipse or IntelliJIdea.
Upload the jar on the shard where you want to find the missing DB IDs.
When you run the program, it will print the DB IDs in the logs. Pick those DB IDs list (would be printed in an array ).
It will also print the ACL IDs in the logs. Pick those ACL IDs list (would be printed in an array).
To find the list of such IDs, download the nohup.out file and extract the ids.
For ACLs:
select * from ALF_NODE where ACL_ID IN (xx,xx,xx); // list of IDs you extracted from logs
select DISTINCT(type,qname_id) from ALF_NODE where ACL_ID IN (xx,xx,xx); - save this list - X
select * from alf_qname where id IN (X) - it will give types i.e unindexed nodes are of which types and how many.
select distinct(acl_change_set) from ALF_ACCESS_CONTROL_LIST where ID IN (xx,xx,xx); -
select * from ALF_ACCESS_CONTROL_LIST
These above queries are for ACLs. For Txn IDs, as below:
a. Derive/find on the total set of missing tx ids
b. Create a table of it by storing that list in a temp table (temp_tbl_solr). Export that list in an excel sheet.
c. Execute below query to get the list of tx ids on shard 0 (0th shard) :
select DISTINCT(n.transaction_id),t.shard from temp_tbl_solr t,alf_node n where t.shard=0
and n.id=t.id;d. Assuming that you get 25K IDs from above query, copy to an excel sheet.
e. Then select first 5K entries and copy those 5k IDs in notepad++ and using find/replace feature , create the following:
{"txid": "112354987"},
{"txid": "109027376"},
{"txid": "112450465"},
{"txid": "108974054"},
{"txid": "112367005"},
{"txid": "110708383"},
{"txid": "112269230"},
{"txid": "114433532"},
{"txid": "112366037"}
... upto 5K entries
Now, convert it to a json array as follows:
[
{"txid": "112354987"},
{"txid": "109027376"},
{"txid": "112450465"},
{"txid": "108974054"},
{"txid": "112367005"},
{"txid": "110708383"},
{"txid": "112269230"},
{"txid": "114433532"},
{"txid": "112366037"}
... upto 5K entries
]
Save this above json array as a json file - example: JSONArrayToBeReindexed.json.
f. Open Postman > Create a new collection > Add request to that collection > Enter req name, desc and select your created collection
g. Go to your request in postman, and provide the GET URL as --> http://<IP>:8983/solr/admin/cores?action=REINDEX&txid={{txid}}
Click Save to save this request.
Your request is ready and has got associated with your collection.
h. Create a new collection > Click the arrow which will show a Run button. Click the Run button to open a Collection runner window.
i. In Collection runner window, select/search for the collection you created first (which has your request stored). In file selector, upload the json file you created (JSONArrayToBeReindexed.json). In iteration , you can provide the number of IDs you have in the JSON, here 5k. (NOTE: Tested with 5K entries, with delay of 200 ms)
j. Click Run and allow it to complete the execution (NOTE: 5k txns reindexing might take 2 hours of time depending on the txn details - nodes, acls associated within that txn).
k. Keep the solr shard logs open to see if there are any errors coming. Note the errors and note down the node ids for which error comes. Example error in solr logs:
ERROR (org.alfresco.solr.AlfrescoCoreAdminHandler@4b960b5b_Worker-8) [ ] o.a.s.SolrInformationServer Node 5514691 index failed and skipped in Tx 116707794. See the stacktrace below for further details. java.lang.NullPointerException
So it means while executing your batch of 5k, solr could not index this node id 5514691 inside this txn - 116707794. Other nodes or acls lying in this txn might have been indexed successfully. So later on you can gather such node ids and reindex the node ids again.
l. Once the execution/run from postman is over, save/export the results for reference.
m. Open solradmin console of that shard. Get the list of those 5k IDs. Using notepad++ find/replace feature, convert those IDs to :
TXID : 114065553 OR TXID : 110516191 OR TXID : 107786742 OR TXID : 112279436 OR TXID : 110565804 OR TXID : 114469126 OR TXID : 111593637 ....upto 5k
n. Copy above string and paste it in the solradmin console's query > q window and execute. This should return numFound as 5k. If yes, then all tx ids are indexed successfully.
o. Now, proceed with another 5k batch from your excel that you created in point# 'd' above. Follow the same steps.
p. The node ids which you found in point# 'k', reindex them manually by hitting url from browser for such individual node id - http://<IP>:8983/solr/admin/cores?action=reindex&nodeid=
This should reindex those node ids which showed errors in solr logs.
EXTRA NOTE:
TYPE_QNAME_ID=149 means type 'deleted' i.e deleted contents.
Part 2:
It is possible that even after completing the execution of above 25K TXN ids, certain node ids or acl ids (which are part of these 25K txn ids) might not have indexed.
Again such node ids and acl ids have to be picked up and executed.
Extra notes:
To search ACLIDs in solradmin console in query box:
Example: ACLID:246585 OR OR ACLID:246587 OR OR ACLID:246589 ... (max 2k or 3k entries tested). Above this limit, execute will work but solr does not return results. Multiple attempts need to be made to get the results with above 3k or 4k entries.
----------------------
If you want to reindex a particular acltxid:
Get the first Txn id failed from solradmin report -example- 18xxxx41
http://IP_ADDRESS:8983/solr/admin/cores?action=REINDEX&aclTxId=18xxxx41
---------------------
Some additional URLs:
http://localhost:8983/solr/admin/cores?action=txReport&txid=(txid)
http://IP_ADDR:8983/solr/admin/cores?action=txReport&txid=1
http://IP_ADDR:8983/solr/admin/cores?action=report
http://IP_ADDR:8983/solr/admin/cores?action=report&toCalTx=xxxx
---------------------
To search DBIDs:
Example: DBID:5699943 OR DBID:5700156 OR DBID:5700729 OR DBID:5702259
Some useful DB queries to find missing ACLs, DBIds:
"Count of missing transactions from the Index">xxxxxx
select * from ALF_NODE where TRANSACTION_ID=xxxx;
select * from ALF_TRANSACTION WHERE ID=xxxxx;
select * from ALF_STORE;
select * from ALF_NODE;
select count(1) from ALF_NODE where STORE_ID=4 AND TYPE_QNAME_ID= xx;
select count(1) from ALF_NODE where STORE_ID=4 AND TYPE_QNAME_ID= xx;
select * from ALF_QNAME where ID IN (xx,xx);
select count(1) from ALF_NODE where STORE_ID=4 AND AUDIT_MODIFIED>'2022-03-10T11:51:43.219Z';
select * from ALF_NAMESPACE where id=102;
select count(*) from ALF_ACCESS_CONTROL_LIST;
select count(*) from ALF_ACCESS_CONTROL_ENTRY;
UPDATE alf_acl_change_set SET COMMIT_TIME_MS = 16453xxxx000 WHERE id in (xxxx,xxxx,xxxx);
commit;
select * from alf_acl_change_set where id IN (xxxx,xxxx,xxxx);
select * from alf_access_control_list where acl_change_set in(xxxx,xxxx,xxxx);
select acl_id,count(1) from alf_node where type_qname_id NOT IN (142) and acl_id in (xxxx,xxxx,xxxx) GROUP by acl_id;
No comments:
Post a Comment