Troubleshooting a full drive on a Service Operations Insight (SOI) management server often leads to the Apache ActiveMQ data store, specifically the kahadb directory. This article outlines a successful resolution to a scenario where the kahadb directory consumed 52GB of disk space, impacting SOI performance.
Understanding the Problem: SOI and KahaDB
Service Operations Insight (SOI) relies on Apache ActiveMQ for message queuing, enabling communication and data transfer between various components. ActiveMQ utilizes KahaDB as its default persistent message store. While KahaDB is generally efficient, certain conditions can lead to excessive disk space consumption. A rapidly growing KahaDB indicates a potential bottleneck or issue within the SOI environment requiring immediate attention.
Initial Investigation and Troubleshooting Steps
The initial alert indicated a near-full drive on the SOI management server. Upon investigation, the C:\Program Files (x86)\CA\SOI\apache-activemq\data\kahadb
directory was identified as the culprit, containing over 2900 files totaling 52GB. Preliminary checks ruled out database connectivity issues between the SOI management server and the SQL database. The SOI Manager debug page also reported successful database connection tests.
Further investigation explored the possibility of connector re-initialization or a large backlog of jobs in the queue. The Active Monitor queue in the SOI Manager debug page (http://:7090/sam/debug) was checked for pending jobs, but it was found to be empty. All connectors were online and functioning normally. Existing online resources regarding similar KahaDB issues provided no immediate solutions.
Solution: Full System Restart
After exhausting initial troubleshooting options, a full system shutdown and restart was performed. This involved stopping all SOI services, clearing all log files on all SOI servers, including the db-##.log
files located under C:\Program Files (x86)\CA\SOI\apache-activemq\data\kahadb
.
Additionally, the SQL Server service on the database server was stopped and restarted due to an observed issue with SQL Studio failing to connect. This step, while seemingly unrelated, ensured the proper functioning of the SQL database critical to SOI operations. Following the restarts, SQL Studio connected successfully, and the database was accessible.
Results and Conclusion
Upon restarting all SOI services, the system returned to normal operation. The kahadb directory size reduced significantly to four files, indicating the successful purging of the accumulated data. The root cause of the issue remains unknown, but the full system restart effectively resolved the immediate problem. This highlights the importance of systematic troubleshooting and the potential for resolving complex issues through seemingly simple procedures. While a full restart might not always be the ideal solution, in this case, it proved effective in restoring service operations insight functionality and addressing the excessive KahaDB growth.