Overview
IBM BAW 23.x/24.x introduced performance improvements through new configuration properties (defined in the XML configuration). However, these changes may lead to unexpected and unpredictable behavior in the Event Manager when processing Event Manager tasks.
Problem
After upgrading from an earlier version of IBM BAW to BAW 24.x, you may encounter an issue where tokens for in-flight instances become stuck without any apparent reason—and with no errors in the logs. Even simple processes fail to complete.
If you monitor instrumentation, you may notice that Event Manager tasks take an unusually long time to complete. Similarly, on the database side, certain queries related to the Event Manager task table may remain stuck in execution indefinitely.
This issue affects both in-flight and new process instances. For example, I reproduced the problem with a simple process that calls a single service. The instance remained stuck in the Event Manager for over five minutes, even though it should have completed within seconds. Eventually, it was placed on hold, yet no errors appeared in the logs.
Explanation
IBM BAW 23.x/24.x introduced two new Event Manager-related properties that, under certain edge-case scenarios, may cause tasks to become stuck:
<em-thread-reuse>true</em-thread-reuse> //note: it's set to true by default
<optimization-for-retry>true</optimization-for-retry> //note: it's set to true by default
"em-thread-reuse"
This property is documented in IBM’s official documentation. A snippet from the documentation states:
When new Event Manager tasks are scheduled during the current task execution, this setting avoids the Event Manager thread switch within the specified time period from thread-reuse-duration for the following scenarios:
- Event Manager tasks associated with a BPD instance, but "Optimize Execution for Latency" is not selected in BPD.
- All Event Manager tasks not associated with any BPD instance.
"optimization-for-retry"
This property is not yet documented by IBM, but based on the analysis and my work with IBM BAW support team:
-
it's related to optimizing unversioned persistence objects in BAW (
unversioned-po-optimization
). -
it's enabled by default in the code (even though it’s not explicitly set in the configuration).
-
it may cause thread deadlocks in specific edge-case scenarios.
See the Resolution section below for mitigation steps.
Resolution
To confirm that your issue is related to these properties, disable them one at a time and observe the impact.
Step 1: Disable optimization-for-retry
first
This property is the most likely cause of the issue. Try disabling it first and then retest with both simple and complex workflows.
To disable optimization-for-retry
, add the following custom XML snippet:
<server>
<unversioned-po-optimization>
<optimization-for-retry merge="replace">false</optimization-for-retry>
</unversioned-po-optimization>
</server>
Step 2: If the issue persists, disable em-thread-reuse
If disabling optimization-for-retry
does not resolve the issue, try disabling em-thread-reuse
using the following custom XML snippet:
Step 3: Apply the IBM Fix
If you confirm that disabling optimization-for-retry
resolves the issue, IBM has released a permanent fix in March 2025. However, as of now, it is not yet publicly available on IBM FixCentral but you can request it from IBM, the Fix ID is: DT422946
Open a Case with IBM and request this fix for BAW 24.x until it is officially available on FixCentral. (Once you obtain the fix from IBM and install it don't forget to set the property back to true and re-test, it should now work as expected without causing issue with EM tasks)
Comments
0 comments
Please sign in to leave a comment.