|
|
|
Edward Archibald
|
I found the following deadlock which is, apparently, due to the concurrent execution of a task for a 'delayed' rule with a concurrently executing application thread attempting to get access to a 'global'. Any recommendations for avoiding this type of deadlock besides not using rules with 'duration()' etc. which cause asynchronous execution with respect to my main application thread? This problem is somewhat difficult to reproduce on demand but it does come up frequently when the 'delayed' rule "DETECT MONITORING HAS STOPPED" is activated as a result of the trigger conditions. =================================================================================== This thread, my application's EnterprisePolicyManager thread, is attempting to get access to a global, policyMgr, and is waiting for the 'lock.lock' on RetooStatefulSession It owns the 'ReteooStatefulSession.actionQueue' and is waiting for the ReteooStatefulSession.lock.lock owns: java.util.LinkedList<E> (id=207) waited by: Thread [pool-3-thread-1] (Suspended) owns: com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine (id=208) sun.misc.Unsafe.park(boolean, long) line: not available [native method] [local variables unavailable] java.util.concurrent.locks.LockSupport.park() line: 118 [local variables unavailable] java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() line: 681 [local variables unavailable] java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) line: 711 java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) line: 1041 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() line: 184 [local variables unavailable] java.util.concurrent.locks.ReentrantLock.lock() line: 256 [local variables unavailable] org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) line: 587 com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, org.drools.rule.Declaration[], org.drools.WorkingMemory, java.lang.Object) line: not available org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, org.drools.WorkingMemory, java.lang.Object) line: 117 org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 180 org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory, org.drools.reteoo.LeftTuple) line: 117 org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, org.drools.reteoo.RightTuple, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory, boolean) line: 28 org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 175 org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 42 org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, org.drools.common.InternalWorkingMemory) line: 326 org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) line: 221 org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) line: 394 org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() line: 1486 org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation) line: 158 org.drools.common.NamedEntryPoint.insert(java.lang.Object, boolean, boolean, org.drools.rule.Rule, org.drools.spi.Activation) line: 122 org.drools.common.NamedEntryPoint.insert(java.lang.Object) line: 80 com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, java.lang.Object, boolean) line: 162 com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() line: 249 java.lang.Thread.run() line: 595 The rule implicated in the above thread is: rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS" salience 999 when notification : ClusterResourceNotification() from entry-point "MONITORING" eval(policyMgr.getMode() == ClusterPolicyManagerMode.MAINTENANCE) then statistics.increment("IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"); retract(notification); end This other thread, apparently a scheduled thread for a rule with a 10 second duration, is attempting to insert a fact and owns the 'lock.lock' on ReteooStatefulSession and is waiting for the 'ReteooStatefulSession.actionQueue'. owns: org.drools.common.DefaultAgenda (id=4046) waiting for: java.util.LinkedList<E> (id=207) org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() line: 1480 org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) line: 1051 org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, boolean, boolean, org.drools.rule.Rule, org.drools.spi.Activation) line: 1001 org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, boolean) line: 114 org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) line: 108 com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, org.drools.FactHandle, java.lang.String, org.drools.FactHandle, com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, org.apache.log4j.Logger) line: not available com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, org.drools.WorkingMemory) line: not available org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) line: 934 org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) line: 70 org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 132 org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 110 java.util.concurrent.FutureTask$Sync.innerRun() line: 269 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() line: 123 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) line: 65 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 168 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) line: 650 java.util.concurrent.ThreadPoolExecutor$Worker.run() line: 675 java.lang.Thread.run() line: 595 The rule for this task looks like: rule "DETECT MONITORING HAS STOPPED" duration(10s) salience 1000 when lastNotification : DataServerNotification($resourceName : resourceName) from entry-point "MONITORING" not (DataServerNotification(resourceName == $resourceName, this after [10s] lastNotification) from entry-point "MONITORING") not (ManagerFailedAlarm(expired == false, resourceName == $resourceName)) not (DataSource(name == $resourceName, state == ResourceState.SHUNNED || state == ResourceState.FAILED)) then Object[] params = {$resourceName}; if (policyMgr.getMode() != ClusterPolicyManagerMode.MAINTENANCE) { lastNotification.setResourceState(ResourceState.UNKNOWN); ManagerFailedAlarm alarm = new ManagerFailedAlarm(lastNotification, "rule detected monitor stop", 6, AlarmSeverity.FAULT); logger.info(alarm.toString()); insert(alarm); update(lastNotification); } end _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Greg Barton
|
Well, I'm not sure how to avoid the deadlock without changing the drools codebase. I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.) Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized. Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)
diff attached. If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock. If it does it's up to the drools devs as to whether the change should be made. I'm just hacking about. :P --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote: > From: Edward Archibald <[hidden email]> > Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? > To: "[hidden email]" <[hidden email]> > Date: Tuesday, November 3, 2009, 9:41 PM > > I found the following deadlock which is, apparently, due to > the concurrent execution > of a task for a 'delayed' rule with a concurrently > executing application thread attempting to get access to a > 'global'. Any recommendations for avoiding this type > of deadlock besides not using rules with 'duration()' etc. > which cause asynchronous execution with respect to my main > application thread? > > This problem is somewhat difficult to reproduce on demand > but it does come up frequently when the 'delayed' rule > "DETECT MONITORING HAS STOPPED" is activated as a result of > the trigger conditions. > > =================================================================================== > > This thread, my application's EnterprisePolicyManager > thread, is attempting to get access to a global, policyMgr, > and is waiting for > the 'lock.lock' on RetooStatefulSession > > It owns the 'ReteooStatefulSession.actionQueue' > and is waiting for the ReteooStatefulSession.lock.lock > > owns: java.util.LinkedList<E> (id=207) > waited by: Thread [pool-3-thread-1] (Suspended) > owns: > com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine > (id=208) > sun.misc.Unsafe.park(boolean, long) line: not available > [native method] [local variables unavailable] > java.util.concurrent.locks.LockSupport.park() line: 118 > [local variables unavailable] > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > line: 681 [local variables unavailable] > java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) line: 711 > java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) > line: 1041 > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() > line: 184 [local variables unavailable] > java.util.concurrent.locks.ReentrantLock.lock() line: 256 > [local variables unavailable] > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) > line: 587 > com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, > org.drools.rule.Declaration[], org.drools.WorkingMemory, > java.lang.Object) line: not available > org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, > org.drools.WorkingMemory, java.lang.Object) line: 117 > org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory) line: 180 > org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory, > org.drools.reteoo.LeftTuple) line: 117 > org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, > org.drools.reteoo.RightTuple, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory, boolean) line: 28 > org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory) line: 175 > org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory) line: 42 > org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, > org.drools.common.InternalWorkingMemory) line: 326 > org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) > line: 221 > org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) > line: 394 > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() > line: 1486 > org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, > java.lang.Object, org.drools.rule.Rule, > org.drools.spi.Activation) line: 158 > org.drools.common.NamedEntryPoint.insert(java.lang.Object, > boolean, boolean, org.drools.rule.Rule, > org.drools.spi.Activation) line: 122 > org.drools.common.NamedEntryPoint.insert(java.lang.Object) > line: 80 > com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, > java.lang.Object, boolean) line: 162 > com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() > line: 249 > java.lang.Thread.run() line: 595 > > The rule implicated in the above thread is: > > rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS" > salience 999 > when > notification : ClusterResourceNotification() > from entry-point "MONITORING" > eval(policyMgr.getMode() == > ClusterPolicyManagerMode.MAINTENANCE) > then > statistics.increment("IF IN > MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"); > retract(notification); > end > > > > This other thread, apparently a scheduled thread for a rule > with a 10 second duration, > is attempting to insert a fact and owns the 'lock.lock' on > ReteooStatefulSession and > is waiting for the 'ReteooStatefulSession.actionQueue'. > > owns: org.drools.common.DefaultAgenda (id=4046) > waiting for: java.util.LinkedList<E> (id=207) > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() > line: 1480 > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, > java.lang.Object, org.drools.rule.Rule, > org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) > line: 1051 > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, > boolean, boolean, org.drools.rule.Rule, > org.drools.spi.Activation) line: 1001 > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, > boolean) line: 114 > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) > line: 108 > com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, > com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, > org.drools.FactHandle, java.lang.String, > org.drools.FactHandle, > com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, > org.apache.log4j.Logger) line: not available > com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, > org.drools.WorkingMemory) line: not available > org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) > line: 934 > org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) > line: 70 > org.drools.time.impl.JDKTimerService$JDKCallableJob.call() > line: 132 > org.drools.time.impl.JDKTimerService$JDKCallableJob.call() > line: 110 > java.util.concurrent.FutureTask$Sync.innerRun() line: 269 > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() > line: 123 > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > line: 65 > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() > line: 168 > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) > line: 650 > java.util.concurrent.ThreadPoolExecutor$Worker.run() line: > 675 > java.lang.Thread.run() line: 595 > > The rule for this task looks like: > rule "DETECT MONITORING HAS STOPPED" > duration(10s) > salience 1000 > when > lastNotification : > DataServerNotification($resourceName : resourceName) > > from entry-point "MONITORING" > > not (DataServerNotification(resourceName == > $resourceName, > > > this after [10s] lastNotification) > > from entry-point "MONITORING") > > not (ManagerFailedAlarm(expired == false, > > resourceName == $resourceName)) > > not (DataSource(name == $resourceName, > > state == ResourceState.SHUNNED || > > state == ResourceState.FAILED)) > > then > Object[] params = {$resourceName}; > if (policyMgr.getMode() != > ClusterPolicyManagerMode.MAINTENANCE) > { > > lastNotification.setResourceState(ResourceState.UNKNOWN); > ManagerFailedAlarm alarm = > > new > ManagerFailedAlarm(lastNotification, "rule detected monitor > stop", > > > 6, AlarmSeverity.FAULT); > logger.info(alarm.toString()); > insert(alarm); > update(lastNotification); > } > end > > > > > > > > _______________________________________________ > rules-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/rules-dev > Index: drools-core/src/main/java/org/drools/common/AbstractWorkingMemory.java =================================================================== --- drools-core/src/main/java/org/drools/common/AbstractWorkingMemory.java (revision 29938) +++ drools-core/src/main/java/org/drools/common/AbstractWorkingMemory.java (working copy) @@ -31,6 +31,7 @@ import java.util.Queue; import java.util.Map.Entry; import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.Executors; import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicLong; @@ -289,7 +290,7 @@ this.initialFactHandle = initialFactHandle; } - this.actionQueue = new LinkedList<WorkingMemoryAction>(); + this.actionQueue = new ConcurrentLinkedQueue<WorkingMemoryAction>(); this.addRemovePropertyChangeListenerArgs = new Object[]{this}; this.queryResults = Collections.EMPTY_MAP; @@ -1556,7 +1557,7 @@ public void executeQueuedActions() { try { startOperation(); - synchronized ( this.actionQueue ) { + //synchronized ( this.actionQueue ) { if ( !this.actionQueue.isEmpty() && !evaluatingActionQueue ) { evaluatingActionQueue = true; WorkingMemoryAction action = null; @@ -1571,7 +1572,7 @@ } evaluatingActionQueue = false; } - } + //} } finally { endOperation(); } @@ -1582,7 +1583,7 @@ } public void queueWorkingMemoryAction(final WorkingMemoryAction action) { - synchronized ( this.actionQueue ) { + //synchronized ( this.actionQueue ) { try { startOperation(); this.actionQueue.add( action ); @@ -1590,7 +1591,7 @@ } finally { endOperation(); } - } + //} } public void removeLogicalDependencies(final Activation activation, _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Michael Neale
|
ha - was just musing with someone the other day who uses "duration"
anymore ;) I guess its still useful to people ! I would say that the "duration" codebase is probably fairly "old" - in the sense that it probably pre-dates the availability of j.u.concurrent (which was java 5 I think? ) - so please try out that patch, if it works, we can probably pull it in (hoping Edson can take a look). On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]> wrote: > Well, I'm not sure how to avoid the deadlock without changing the drools codebase. I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.) Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized. Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.) > > diff attached. If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock. If it does it's up to the drools devs as to whether the change should be made. I'm just hacking about. :P > > --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote: > >> From: Edward Archibald <[hidden email]> >> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? >> To: "[hidden email]" <[hidden email]> >> Date: Tuesday, November 3, 2009, 9:41 PM >> >> I found the following deadlock which is, apparently, due to >> the concurrent execution >> of a task for a 'delayed' rule with a concurrently >> executing application thread attempting to get access to a >> 'global'. Any recommendations for avoiding this type >> of deadlock besides not using rules with 'duration()' etc. >> which cause asynchronous execution with respect to my main >> application thread? >> >> This problem is somewhat difficult to reproduce on demand >> but it does come up frequently when the 'delayed' rule >> "DETECT MONITORING HAS STOPPED" is activated as a result of >> the trigger conditions. >> >> =================================================================================== >> >> This thread, my application's EnterprisePolicyManager >> thread, is attempting to get access to a global, policyMgr, >> and is waiting for >> the 'lock.lock' on RetooStatefulSession >> >> It owns the 'ReteooStatefulSession.actionQueue' >> and is waiting for the ReteooStatefulSession.lock.lock >> >> owns: java.util.LinkedList<E> (id=207) >> waited by: Thread [pool-3-thread-1] (Suspended) >> owns: >> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine >> (id=208) >> sun.misc.Unsafe.park(boolean, long) line: not available >> [native method] [local variables unavailable] >> java.util.concurrent.locks.LockSupport.park() line: 118 >> [local variables unavailable] >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() >> line: 681 [local variables unavailable] >> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, >> int) line: 711 >> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) >> line: 1041 >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() >> line: 184 [local variables unavailable] >> java.util.concurrent.locks.ReentrantLock.lock() line: 256 >> [local variables unavailable] >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) >> line: 587 >> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, >> org.drools.rule.Declaration[], org.drools.WorkingMemory, >> java.lang.Object) line: not available >> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, >> org.drools.WorkingMemory, java.lang.Object) line: 117 >> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory) line: 180 >> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory, >> org.drools.reteoo.LeftTuple) line: 117 >> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, >> org.drools.reteoo.RightTuple, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory, boolean) line: 28 >> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory) line: 175 >> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory) line: 42 >> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, >> org.drools.common.InternalWorkingMemory) line: 326 >> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) >> line: 221 >> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) >> line: 394 >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() >> line: 1486 >> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, >> java.lang.Object, org.drools.rule.Rule, >> org.drools.spi.Activation) line: 158 >> org.drools.common.NamedEntryPoint.insert(java.lang.Object, >> boolean, boolean, org.drools.rule.Rule, >> org.drools.spi.Activation) line: 122 >> org.drools.common.NamedEntryPoint.insert(java.lang.Object) >> line: 80 >> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, >> java.lang.Object, boolean) line: 162 >> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() >> line: 249 >> java.lang.Thread.run() line: 595 >> >> The rule implicated in the above thread is: >> >> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS" >> salience 999 >> when >> notification : ClusterResourceNotification() >> from entry-point "MONITORING" >> eval(policyMgr.getMode() == >> ClusterPolicyManagerMode.MAINTENANCE) >> then >> statistics.increment("IF IN >> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"); >> retract(notification); >> end >> >> >> >> This other thread, apparently a scheduled thread for a rule >> with a 10 second duration, >> is attempting to insert a fact and owns the 'lock.lock' on >> ReteooStatefulSession and >> is waiting for the 'ReteooStatefulSession.actionQueue'. >> >> owns: org.drools.common.DefaultAgenda (id=4046) >> waiting for: java.util.LinkedList<E> (id=207) >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() >> line: 1480 >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, >> java.lang.Object, org.drools.rule.Rule, >> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) >> line: 1051 >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, >> boolean, boolean, org.drools.rule.Rule, >> org.drools.spi.Activation) line: 1001 >> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, >> boolean) line: 114 >> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) >> line: 108 >> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, >> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, >> org.drools.FactHandle, java.lang.String, >> org.drools.FactHandle, >> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, >> org.apache.log4j.Logger) line: not available >> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, >> org.drools.WorkingMemory) line: not available >> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) >> line: 934 >> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) >> line: 70 >> org.drools.time.impl.JDKTimerService$JDKCallableJob.call() >> line: 132 >> org.drools.time.impl.JDKTimerService$JDKCallableJob.call() >> line: 110 >> java.util.concurrent.FutureTask$Sync.innerRun() line: 269 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() >> line: 123 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) >> line: 65 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() >> line: 168 >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) >> line: 650 >> java.util.concurrent.ThreadPoolExecutor$Worker.run() line: >> 675 >> java.lang.Thread.run() line: 595 >> >> The rule for this task looks like: >> rule "DETECT MONITORING HAS STOPPED" >> duration(10s) >> salience 1000 >> when >> lastNotification : >> DataServerNotification($resourceName : resourceName) >> >> from entry-point "MONITORING" >> >> not (DataServerNotification(resourceName == >> $resourceName, >> >> >> this after [10s] lastNotification) >> >> from entry-point "MONITORING") >> >> not (ManagerFailedAlarm(expired == false, >> >> resourceName == $resourceName)) >> >> not (DataSource(name == $resourceName, >> >> state == ResourceState.SHUNNED || >> >> state == ResourceState.FAILED)) >> >> then >> Object[] params = {$resourceName}; >> if (policyMgr.getMode() != >> ClusterPolicyManagerMode.MAINTENANCE) >> { >> >> lastNotification.setResourceState(ResourceState.UNKNOWN); >> ManagerFailedAlarm alarm = >> >> new >> ManagerFailedAlarm(lastNotification, "rule detected monitor >> stop", >> >> >> 6, AlarmSeverity.FAULT); >> logger.info(alarm.toString()); >> insert(alarm); >> update(lastNotification); >> } >> end >> >> >> >> >> >> >> >> _______________________________________________ >> rules-dev mailing list >> [hidden email] >> https://lists.jboss.org/mailman/listinfo/rules-dev >> > > > > _______________________________________________ > rules-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/rules-dev > > -- Michael D Neale home: www.michaelneale.net blog: michaelneale.blogspot.com _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Edward Archibald
|
In reply to this post
by Edward Archibald
Hi Greg,
Thanks for the post. I'll give this a shot. Turns out that I can reproduce the issue often enough that I'll be able to see if this simple change resolves it. Regards, Edward ________________________________________ From: [hidden email] [[hidden email]] On Behalf Of Greg Barton [[hidden email]] Sent: Tuesday, November 03, 2009 9:43 PM To: Rules Dev List Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? Well, I'm not sure how to avoid the deadlock without changing the drools codebase. I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.) Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized. Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.) diff attached. If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock. If it does it's up to the drools devs as to whether the change should be made. I'm just hacking about. :P --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote: > From: Edward Archibald <[hidden email]> > Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? > To: "[hidden email]" <[hidden email]> > Date: Tuesday, November 3, 2009, 9:41 PM > > I found the following deadlock which is, apparently, due to > the concurrent execution > of a task for a 'delayed' rule with a concurrently > executing application thread attempting to get access to a > 'global'. Any recommendations for avoiding this type > of deadlock besides not using rules with 'duration()' etc. > which cause asynchronous execution with respect to my main > application thread? > > This problem is somewhat difficult to reproduce on demand > but it does come up frequently when the 'delayed' rule > "DETECT MONITORING HAS STOPPED" is activated as a result of > the trigger conditions. > > =================================================================================== > > This thread, my application's EnterprisePolicyManager > thread, is attempting to get access to a global, policyMgr, > and is waiting for > the 'lock.lock' on RetooStatefulSession > > It owns the 'ReteooStatefulSession.actionQueue' > and is waiting for the ReteooStatefulSession.lock.lock > > owns: java.util.LinkedList<E> (id=207) > waited by: Thread [pool-3-thread-1] (Suspended) > owns: > com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine > (id=208) > sun.misc.Unsafe.park(boolean, long) line: not available > [native method] [local variables unavailable] > java.util.concurrent.locks.LockSupport.park() line: 118 > [local variables unavailable] > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > line: 681 [local variables unavailable] > java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) line: 711 > java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) > line: 1041 > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() > line: 184 [local variables unavailable] > java.util.concurrent.locks.ReentrantLock.lock() line: 256 > [local variables unavailable] > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) > line: 587 > com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, > org.drools.rule.Declaration[], org.drools.WorkingMemory, > java.lang.Object) line: not available > org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, > org.drools.WorkingMemory, java.lang.Object) line: 117 > org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory) line: 180 > org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory, > org.drools.reteoo.LeftTuple) line: 117 > org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, > org.drools.reteoo.RightTuple, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory, boolean) line: 28 > org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory) line: 175 > org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, > org.drools.spi.PropagationContext, > org.drools.common.InternalWorkingMemory) line: 42 > org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, > org.drools.common.InternalWorkingMemory) line: 326 > org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) > line: 221 > org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) > line: 394 > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() > line: 1486 > org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, > java.lang.Object, org.drools.rule.Rule, > org.drools.spi.Activation) line: 158 > org.drools.common.NamedEntryPoint.insert(java.lang.Object, > boolean, boolean, org.drools.rule.Rule, > org.drools.spi.Activation) line: 122 > org.drools.common.NamedEntryPoint.insert(java.lang.Object) > line: 80 > com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, > java.lang.Object, boolean) line: 162 > com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() > line: 249 > java.lang.Thread.run() line: 595 > > The rule implicated in the above thread is: > > rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS" > salience 999 > when > notification : ClusterResourceNotification() > from entry-point "MONITORING" > eval(policyMgr.getMode() == > ClusterPolicyManagerMode.MAINTENANCE) > then > statistics.increment("IF IN > MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"); > retract(notification); > end > > > > This other thread, apparently a scheduled thread for a rule > with a 10 second duration, > is attempting to insert a fact and owns the 'lock.lock' on > ReteooStatefulSession and > is waiting for the 'ReteooStatefulSession.actionQueue'. > > owns: org.drools.common.DefaultAgenda (id=4046) > waiting for: java.util.LinkedList<E> (id=207) > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() > line: 1480 > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, > java.lang.Object, org.drools.rule.Rule, > org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) > line: 1051 > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, > boolean, boolean, org.drools.rule.Rule, > org.drools.spi.Activation) line: 1001 > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, > boolean) line: 114 > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) > line: 108 > com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, > com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, > org.drools.FactHandle, java.lang.String, > org.drools.FactHandle, > com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, > org.apache.log4j.Logger) line: not available > com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, > org.drools.WorkingMemory) line: not available > org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) > line: 934 > org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) > line: 70 > org.drools.time.impl.JDKTimerService$JDKCallableJob.call() > line: 132 > org.drools.time.impl.JDKTimerService$JDKCallableJob.call() > line: 110 > java.util.concurrent.FutureTask$Sync.innerRun() line: 269 > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() > line: 123 > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > line: 65 > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() > line: 168 > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) > line: 650 > java.util.concurrent.ThreadPoolExecutor$Worker.run() line: > 675 > java.lang.Thread.run() line: 595 > > The rule for this task looks like: > rule "DETECT MONITORING HAS STOPPED" > duration(10s) > salience 1000 > when > lastNotification : > DataServerNotification($resourceName : resourceName) > > from entry-point "MONITORING" > > not (DataServerNotification(resourceName == > $resourceName, > > > this after [10s] lastNotification) > > from entry-point "MONITORING") > > not (ManagerFailedAlarm(expired == false, > > resourceName == $resourceName)) > > not (DataSource(name == $resourceName, > > state == ResourceState.SHUNNED || > > state == ResourceState.FAILED)) > > then > Object[] params = {$resourceName}; > if (policyMgr.getMode() != > ClusterPolicyManagerMode.MAINTENANCE) > { > > lastNotification.setResourceState(ResourceState.UNKNOWN); > ManagerFailedAlarm alarm = > > new > ManagerFailedAlarm(lastNotification, "rule detected monitor > stop", > > > 6, AlarmSeverity.FAULT); > logger.info(alarm.toString()); > insert(alarm); > update(lastNotification); > } > end > > > > > > > > _______________________________________________ > rules-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/rules-dev > _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Edson Tirelli-4
|
Edward, Are you able to provide us with a test case? that would help us ensure we fix this and prevent future regressions. Thanks, Edson 2009/11/4 Edward Archibald <[hidden email]> Hi Greg, -- Edson Tirelli JBoss Drools Core Development JBoss by Red Hat @ www.jboss.com _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Edward Archibald
|
In reply to this post
by Michael Neale
Hello Michael and Greg,
I have pulled the drools head, made the patch that Greg suggested (thanks Greg!) and deployed the drools-core jar with my app. Prior to this change, I was able to reproduce the deadlock - verified in the debugger and in exactly the same place as my earlier post - roughly 50% of the time. I have tried the same test scenario, now, 10 times with no failures. >From what I can tell, this problem will easily happen every time that I am already in the 'delayed execution' code created by the rule with the 'duration()' qualifier and, at the same time, I get a new 'fact' and attempt to insert it. Knowing this, I can probably come up with a simpler test case. I'll give it a shot. Thanks, again, to you both for the quick responses. Edward -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Michael Neale Sent: Wednesday, November 04, 2009 2:26 AM To: Rules Dev List Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? ha - was just musing with someone the other day who uses "duration" anymore ;) I guess its still useful to people ! I would say that the "duration" codebase is probably fairly "old" - in the sense that it probably pre-dates the availability of j.u.concurrent (which was java 5 I think? ) - so please try out that patch, if it works, we can probably pull it in (hoping Edson can take a look). On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]> wrote: > Well, I'm not sure how to avoid the deadlock without changing the drools codebase. I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.) Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized. Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.) > > diff attached. If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock. If it does it's up to the drools devs as to whether the change should be made. I'm just hacking about. :P > > --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote: > >> From: Edward Archibald <[hidden email]> >> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? >> To: "[hidden email]" <[hidden email]> >> Date: Tuesday, November 3, 2009, 9:41 PM >> >> I found the following deadlock which is, apparently, due to >> the concurrent execution >> of a task for a 'delayed' rule with a concurrently >> executing application thread attempting to get access to a >> 'global'. Any recommendations for avoiding this type >> of deadlock besides not using rules with 'duration()' etc. >> which cause asynchronous execution with respect to my main >> application thread? >> >> This problem is somewhat difficult to reproduce on demand >> but it does come up frequently when the 'delayed' rule >> "DETECT MONITORING HAS STOPPED" is activated as a result of >> the trigger conditions. >> >> =================================================================================== >> >> This thread, my application's EnterprisePolicyManager >> thread, is attempting to get access to a global, policyMgr, >> and is waiting for >> the 'lock.lock' on RetooStatefulSession >> >> It owns the 'ReteooStatefulSession.actionQueue' >> and is waiting for the ReteooStatefulSession.lock.lock >> >> owns: java.util.LinkedList<E> (id=207) >> waited by: Thread [pool-3-thread-1] (Suspended) >> owns: >> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine >> (id=208) >> sun.misc.Unsafe.park(boolean, long) line: not available >> [native method] [local variables unavailable] >> java.util.concurrent.locks.LockSupport.park() line: 118 >> [local variables unavailable] >> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() >> line: 681 [local variables unavailable] >> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, >> int) line: 711 >> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) >> line: 1041 >> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() >> line: 184 [local variables unavailable] >> java.util.concurrent.locks.ReentrantLock.lock() line: 256 >> [local variables unavailable] >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) >> line: 587 >> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, >> org.drools.rule.Declaration[], org.drools.WorkingMemory, >> java.lang.Object) line: not available >> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, >> org.drools.WorkingMemory, java.lang.Object) line: 117 >> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory) line: 180 >> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory, >> org.drools.reteoo.LeftTuple) line: 117 >> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, >> org.drools.reteoo.RightTuple, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory, boolean) line: 28 >> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory) line: 175 >> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, >> org.drools.spi.PropagationContext, >> org.drools.common.InternalWorkingMemory) line: 42 >> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, >> org.drools.common.InternalWorkingMemory) line: 326 >> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) >> line: 221 >> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) >> line: 394 >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() >> line: 1486 >> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, >> java.lang.Object, org.drools.rule.Rule, >> org.drools.spi.Activation) line: 158 >> org.drools.common.NamedEntryPoint.insert(java.lang.Object, >> boolean, boolean, org.drools.rule.Rule, >> org.drools.spi.Activation) line: 122 >> org.drools.common.NamedEntryPoint.insert(java.lang.Object) >> line: 80 >> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, >> java.lang.Object, boolean) line: 162 >> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() >> line: 249 >> java.lang.Thread.run() line: 595 >> >> The rule implicated in the above thread is: >> >> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS" >> salience 999 >> when >> notification : ClusterResourceNotification() >> from entry-point "MONITORING" >> eval(policyMgr.getMode() == >> ClusterPolicyManagerMode.MAINTENANCE) >> then >> statistics.increment("IF IN >> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"); >> retract(notification); >> end >> >> >> >> This other thread, apparently a scheduled thread for a rule >> with a 10 second duration, >> is attempting to insert a fact and owns the 'lock.lock' on >> ReteooStatefulSession and >> is waiting for the 'ReteooStatefulSession.actionQueue'. >> >> owns: org.drools.common.DefaultAgenda (id=4046) >> waiting for: java.util.LinkedList<E> (id=207) >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() >> line: 1480 >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, >> java.lang.Object, org.drools.rule.Rule, >> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) >> line: 1051 >> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, >> boolean, boolean, org.drools.rule.Rule, >> org.drools.spi.Activation) line: 1001 >> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, >> boolean) line: 114 >> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) >> line: 108 >> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, >> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, >> org.drools.FactHandle, java.lang.String, >> org.drools.FactHandle, >> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, >> org.apache.log4j.Logger) line: not available >> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, >> org.drools.WorkingMemory) line: not available >> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) >> line: 934 >> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) >> line: 70 >> org.drools.time.impl.JDKTimerService$JDKCallableJob.call() >> line: 132 >> org.drools.time.impl.JDKTimerService$JDKCallableJob.call() >> line: 110 >> java.util.concurrent.FutureTask$Sync.innerRun() line: 269 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() >> line: 123 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) >> line: 65 >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() >> line: 168 >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) >> line: 650 >> java.util.concurrent.ThreadPoolExecutor$Worker.run() line: >> 675 >> java.lang.Thread.run() line: 595 >> >> The rule for this task looks like: >> rule "DETECT MONITORING HAS STOPPED" >> duration(10s) >> salience 1000 >> when >> lastNotification : >> DataServerNotification($resourceName : resourceName) >> >> from entry-point "MONITORING" >> >> not (DataServerNotification(resourceName == >> $resourceName, >> >> >> this after [10s] lastNotification) >> >> from entry-point "MONITORING") >> >> not (ManagerFailedAlarm(expired == false, >> >> resourceName == $resourceName)) >> >> not (DataSource(name == $resourceName, >> >> state == ResourceState.SHUNNED || >> >> state == ResourceState.FAILED)) >> >> then >> Object[] params = {$resourceName}; >> if (policyMgr.getMode() != >> ClusterPolicyManagerMode.MAINTENANCE) >> { >> >> lastNotification.setResourceState(ResourceState.UNKNOWN); >> ManagerFailedAlarm alarm = >> >> new >> ManagerFailedAlarm(lastNotification, "rule detected monitor >> stop", >> >> >> 6, AlarmSeverity.FAULT); >> logger.info(alarm.toString()); >> insert(alarm); >> update(lastNotification); >> } >> end >> >> >> >> >> >> >> >> _______________________________________________ >> rules-dev mailing list >> [hidden email] >> https://lists.jboss.org/mailman/listinfo/rules-dev >> > > > > _______________________________________________ > rules-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/rules-dev > > -- Michael D Neale home: www.michaelneale.net blog: michaelneale.blogspot.com _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Edward Archibald
|
In reply to this post
by Edson Tirelli-4
Some javascript/style in this post has been disabled (why?)
Hi Edson, Our emails just crossed paths. Yes, I believe that I can
reproduce this. I’ll try to turn this around quickly. Ed From:
[hidden email] [mailto:[hidden email]] On
Behalf Of Edson Tirelli
2009/11/4 Edward Archibald <[hidden email]> Hi Greg,
_______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Michael Neale
|
In reply to this post
by Edward Archibald
That is great news. If there are other places we can move to
java.util.concurrent in older code, that would also be welcome, j.u.concurrent is awesome magic which saves so much pain. On Thu, Nov 5, 2009 at 12:06 PM, Edward Archibald <[hidden email]> wrote: > Hello Michael and Greg, > > I have pulled the drools head, made the patch that Greg suggested (thanks Greg!) and deployed the drools-core jar with my app. Prior to this change, I was able to reproduce the deadlock - verified in the debugger and in exactly the same place as my earlier post - roughly 50% of the time. I have tried the same test scenario, now, 10 times with no failures. > > >From what I can tell, this problem will easily happen every time that I am already in the 'delayed execution' code created by the rule with the 'duration()' qualifier and, at the same time, I get a new 'fact' and attempt to insert it. Knowing this, I can probably come up with a simpler test case. I'll give it a shot. > > Thanks, again, to you both for the quick responses. > > Edward > > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] On Behalf Of Michael Neale > Sent: Wednesday, November 04, 2009 2:26 AM > To: Rules Dev List > Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? > > ha - was just musing with someone the other day who uses "duration" > anymore ;) I guess its still useful to people ! > > I would say that the "duration" codebase is probably fairly "old" - in > the sense that it probably pre-dates the availability of > j.u.concurrent (which was java 5 I think? ) - so please try out that > patch, if it works, we can probably pull it in (hoping Edson can take > a look). > > On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]> wrote: >> Well, I'm not sure how to avoid the deadlock without changing the drools codebase. I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.) Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized. Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.) >> >> diff attached. If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock. If it does it's up to the drools devs as to whether the change should be made. I'm just hacking about. :P >> >> --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote: >> >>> From: Edward Archibald <[hidden email]> >>> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution? >>> To: "[hidden email]" <[hidden email]> >>> Date: Tuesday, November 3, 2009, 9:41 PM >>> >>> I found the following deadlock which is, apparently, due to >>> the concurrent execution >>> of a task for a 'delayed' rule with a concurrently >>> executing application thread attempting to get access to a >>> 'global'. Any recommendations for avoiding this type >>> of deadlock besides not using rules with 'duration()' etc. >>> which cause asynchronous execution with respect to my main >>> application thread? >>> >>> This problem is somewhat difficult to reproduce on demand >>> but it does come up frequently when the 'delayed' rule >>> "DETECT MONITORING HAS STOPPED" is activated as a result of >>> the trigger conditions. >>> >>> =================================================================================== >>> >>> This thread, my application's EnterprisePolicyManager >>> thread, is attempting to get access to a global, policyMgr, >>> and is waiting for >>> the 'lock.lock' on RetooStatefulSession >>> >>> It owns the 'ReteooStatefulSession.actionQueue' >>> and is waiting for the ReteooStatefulSession.lock.lock >>> >>> owns: java.util.LinkedList<E> (id=207) >>> waited by: Thread [pool-3-thread-1] (Suspended) >>> owns: >>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine >>> (id=208) >>> sun.misc.Unsafe.park(boolean, long) line: not available >>> [native method] [local variables unavailable] >>> java.util.concurrent.locks.LockSupport.park() line: 118 >>> [local variables unavailable] >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() >>> line: 681 [local variables unavailable] >>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, >>> int) line: 711 >>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) >>> line: 1041 >>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() >>> line: 184 [local variables unavailable] >>> java.util.concurrent.locks.ReentrantLock.lock() line: 256 >>> [local variables unavailable] >>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) >>> line: 587 >>> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, >>> org.drools.rule.Declaration[], org.drools.WorkingMemory, >>> java.lang.Object) line: not available >>> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, >>> org.drools.WorkingMemory, java.lang.Object) line: 117 >>> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, >>> org.drools.spi.PropagationContext, >>> org.drools.common.InternalWorkingMemory) line: 180 >>> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, >>> org.drools.common.InternalWorkingMemory, >>> org.drools.reteoo.LeftTuple) line: 117 >>> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, >>> org.drools.reteoo.RightTuple, >>> org.drools.spi.PropagationContext, >>> org.drools.common.InternalWorkingMemory, boolean) line: 28 >>> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, >>> org.drools.spi.PropagationContext, >>> org.drools.common.InternalWorkingMemory) line: 175 >>> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, >>> org.drools.spi.PropagationContext, >>> org.drools.common.InternalWorkingMemory) line: 42 >>> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, >>> org.drools.common.InternalWorkingMemory) line: 326 >>> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) >>> line: 221 >>> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) >>> line: 394 >>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() >>> line: 1486 >>> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, >>> java.lang.Object, org.drools.rule.Rule, >>> org.drools.spi.Activation) line: 158 >>> org.drools.common.NamedEntryPoint.insert(java.lang.Object, >>> boolean, boolean, org.drools.rule.Rule, >>> org.drools.spi.Activation) line: 122 >>> org.drools.common.NamedEntryPoint.insert(java.lang.Object) >>> line: 80 >>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, >>> java.lang.Object, boolean) line: 162 >>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() >>> line: 249 >>> java.lang.Thread.run() line: 595 >>> >>> The rule implicated in the above thread is: >>> >>> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS" >>> salience 999 >>> when >>> notification : ClusterResourceNotification() >>> from entry-point "MONITORING" >>> eval(policyMgr.getMode() == >>> ClusterPolicyManagerMode.MAINTENANCE) >>> then >>> statistics.increment("IF IN >>> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"); >>> retract(notification); >>> end >>> >>> >>> >>> This other thread, apparently a scheduled thread for a rule >>> with a 10 second duration, >>> is attempting to insert a fact and owns the 'lock.lock' on >>> ReteooStatefulSession and >>> is waiting for the 'ReteooStatefulSession.actionQueue'. >>> >>> owns: org.drools.common.DefaultAgenda (id=4046) >>> waiting for: java.util.LinkedList<E> (id=207) >>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() >>> line: 1480 >>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, >>> java.lang.Object, org.drools.rule.Rule, >>> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) >>> line: 1051 >>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, >>> boolean, boolean, org.drools.rule.Rule, >>> org.drools.spi.Activation) line: 1001 >>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, >>> boolean) line: 114 >>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) >>> line: 108 >>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, >>> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, >>> org.drools.FactHandle, java.lang.String, >>> org.drools.FactHandle, >>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, >>> org.apache.log4j.Logger) line: not available >>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, >>> org.drools.WorkingMemory) line: not available >>> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) >>> line: 934 >>> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) >>> line: 70 >>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call() >>> line: 132 >>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call() >>> line: 110 >>> java.util.concurrent.FutureTask$Sync.innerRun() line: 269 >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() >>> line: 123 >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) >>> line: 65 >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() >>> line: 168 >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) >>> line: 650 >>> java.util.concurrent.ThreadPoolExecutor$Worker.run() line: >>> 675 >>> java.lang.Thread.run() line: 595 >>> >>> The rule for this task looks like: >>> rule "DETECT MONITORING HAS STOPPED" >>> duration(10s) >>> salience 1000 >>> when >>> lastNotification : >>> DataServerNotification($resourceName : resourceName) >>> >>> from entry-point "MONITORING" >>> >>> not (DataServerNotification(resourceName == >>> $resourceName, >>> >>> >>> this after [10s] lastNotification) >>> >>> from entry-point "MONITORING") >>> >>> not (ManagerFailedAlarm(expired == false, >>> >>> resourceName == $resourceName)) >>> >>> not (DataSource(name == $resourceName, >>> >>> state == ResourceState.SHUNNED || >>> >>> state == ResourceState.FAILED)) >>> >>> then >>> Object[] params = {$resourceName}; >>> if (policyMgr.getMode() != >>> ClusterPolicyManagerMode.MAINTENANCE) >>> { >>> >>> lastNotification.setResourceState(ResourceState.UNKNOWN); >>> ManagerFailedAlarm alarm = >>> >>> new >>> ManagerFailedAlarm(lastNotification, "rule detected monitor >>> stop", >>> >>> >>> 6, AlarmSeverity.FAULT); >>> logger.info(alarm.toString()); >>> insert(alarm); >>> update(lastNotification); >>> } >>> end >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> rules-dev mailing list >>> [hidden email] >>> https://lists.jboss.org/mailman/listinfo/rules-dev >>> >> >> >> >> _______________________________________________ >> rules-dev mailing list >> [hidden email] >> https://lists.jboss.org/mailman/listinfo/rules-dev >> >> > > > > -- > Michael D Neale > home: www.michaelneale.net > blog: michaelneale.blogspot.com > > _______________________________________________ > rules-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/rules-dev > > _______________________________________________ > rules-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/rules-dev > -- Michael D Neale home: www.michaelneale.net blog: michaelneale.blogspot.com _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
|
Greg Barton
|
I'll poke around.
--- On Wed, 11/4/09, Michael Neale <[hidden email]> wrote: > From: Michael Neale <[hidden email]> > Subject: Re: [rules-dev] Deadlock in the Drools core - suggested patch appears to resolve the issue > To: "Rules Dev List" <[hidden email]> > Date: Wednesday, November 4, 2009, 9:01 PM > That is great news. If there are > other places we can move to > java.util.concurrent in older code, that would also be > welcome, > j.u.concurrent is awesome magic which saves so much pain. > > > > On Thu, Nov 5, 2009 at 12:06 PM, Edward Archibald > <[hidden email]> > wrote: > > Hello Michael and Greg, > > > > I have pulled the drools head, made the patch that > Greg suggested (thanks Greg!) and deployed the drools-core > jar with my app. Prior to this change, I was able to > reproduce the deadlock - verified in the debugger and in > exactly the same place as my earlier post - roughly 50% of > the time. I have tried the same test scenario, now, 10 > times with no failures. > > > > >From what I can tell, this problem will easily > happen every time that I am already in the 'delayed > execution' code created by the rule with the 'duration()' > qualifier and, at the same time, I get a new 'fact' and > attempt to insert it. Knowing this, I can probably come up > with a simpler test case. I'll give it a shot. > > > > Thanks, again, to you both for the quick responses. > > > > Edward > > > > -----Original Message----- > > From: [hidden email] > [mailto:[hidden email]] > On Behalf Of Michael Neale > > Sent: Wednesday, November 04, 2009 2:26 AM > > To: Rules Dev List > > Subject: Re: [rules-dev] Deadlock in the Drools core - > Drools 5.0 - any suggestions for resolution? > > > > ha - was just musing with someone the other day who > uses "duration" > > anymore ;) I guess its still useful to people ! > > > > I would say that the "duration" codebase is probably > fairly "old" - in > > the sense that it probably pre-dates the availability > of > > j.u.concurrent (which was java 5 I think? ) - so > please try out that > > patch, if it works, we can probably pull it in (hoping > Edson can take > > a look). > > > > On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]> > wrote: > >> Well, I'm not sure how to avoid the deadlock > without changing the drools codebase. I was, however, able > to change the type of AbstractWorkingMemory.actionQueue to > java.util.concurrent.ConcurrentLinkedQueue and remove the > synchronization over the queue with no apparent ill effects. > (Two tests failed for drools-core, but they failed whether > the change was made or not.) Also I don't like the fact > that the current code synchronizes on actionQueue, but then > exposes it outside the class through the getActionQueue() > method, where access can be unsynchronized. Changing it to > ConcurrentLinkedQueue makes it safe to expose externally. > (Not to mention that the lock can be stolen externally with > the current code.) > >> > >> diff attached. If you can run drools compiled > from trunk, apply the diff and see if it resolves the > deadlock. If it does it's up to the drools devs as to > whether the change should be made. I'm just hacking about. > :P > >> > >> --- On Tue, 11/3/09, Edward Archibald <[hidden email]> > wrote: > >> > >>> From: Edward Archibald <[hidden email]> > >>> Subject: [rules-dev] Deadlock in the Drools > core - Drools 5.0 - any suggestions for resolution? > >>> To: "[hidden email]" > <[hidden email]> > >>> Date: Tuesday, November 3, 2009, 9:41 PM > >>> > >>> I found the following deadlock which is, > apparently, due to > >>> the concurrent execution > >>> of a task for a 'delayed' rule with a > concurrently > >>> executing application thread attempting to get > access to a > >>> 'global'. Any recommendations for avoiding > this type > >>> of deadlock besides not using rules with > 'duration()' etc. > >>> which cause asynchronous execution with > respect to my main > >>> application thread? > >>> > >>> This problem is somewhat difficult to > reproduce on demand > >>> but it does come up frequently when the > 'delayed' rule > >>> "DETECT MONITORING HAS STOPPED" is activated > as a result of > >>> the trigger conditions. > >>> > >>> > =================================================================================== > >>> > >>> This thread, my application's > EnterprisePolicyManager > >>> thread, is attempting to get access to a > global, policyMgr, > >>> and is waiting for > >>> the 'lock.lock' on RetooStatefulSession > >>> > >>> It owns the > 'ReteooStatefulSession.actionQueue' > >>> and is waiting for the > ReteooStatefulSession.lock.lock > >>> > >>> owns: java.util.LinkedList<E> > (id=207) > >>> waited by: Thread [pool-3-thread-1] > (Suspended) > >>> owns: > >>> > com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine > >>> (id=208) > >>> sun.misc.Unsafe.park(boolean, long) line: not > available > >>> [native method] [local variables unavailable] > >>> java.util.concurrent.locks.LockSupport.park() > line: 118 > >>> [local variables unavailable] > >>> > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > >>> line: 681 [local variables unavailable] > >>> > java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > >>> int) line: 711 > >>> > java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) > >>> line: 1041 > >>> > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() > >>> line: 184 [local variables unavailable] > >>> > java.util.concurrent.locks.ReentrantLock.lock() line: 256 > >>> [local variables unavailable] > >>> > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) > >>> line: 587 > >>> > com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, > >>> org.drools.rule.Declaration[], > org.drools.WorkingMemory, > >>> java.lang.Object) line: not available > >>> > org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, > >>> org.drools.WorkingMemory, java.lang.Object) > line: 117 > >>> > org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, > >>> org.drools.spi.PropagationContext, > >>> org.drools.common.InternalWorkingMemory) line: > 180 > >>> > org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, > >>> org.drools.common.InternalWorkingMemory, > >>> org.drools.reteoo.LeftTuple) line: 117 > >>> > org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, > >>> org.drools.reteoo.RightTuple, > >>> org.drools.spi.PropagationContext, > >>> org.drools.common.InternalWorkingMemory, > boolean) line: 28 > >>> > org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, > >>> org.drools.spi.PropagationContext, > >>> org.drools.common.InternalWorkingMemory) line: > 175 > >>> > org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, > >>> org.drools.spi.PropagationContext, > >>> org.drools.common.InternalWorkingMemory) line: > 42 > >>> > org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, > >>> org.drools.common.InternalWorkingMemory) line: > 326 > >>> > org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) > >>> line: 221 > >>> > org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) > >>> line: 394 > >>> > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() > >>> line: 1486 > >>> > org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, > >>> java.lang.Object, org.drools.rule.Rule, > >>> org.drools.spi.Activation) line: 158 > >>> > org.drools.common.NamedEntryPoint.insert(java.lang.Object, > >>> boolean, boolean, org.drools.rule.Rule, > >>> org.drools.spi.Activation) line: 122 > >>> > org.drools.common.NamedEntryPoint.insert(java.lang.Object) > >>> line: 80 > >>> > com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, > >>> java.lang.Object, boolean) line: 162 > >>> > com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() > >>> line: 249 > >>> java.lang.Thread.run() line: 595 > >>> > >>> The rule implicated in the above thread is: > >>> > >>> rule "IF IN MAINTENANCE MODE, CONSUME ALL > NOTIFICATIONS" > >>> salience 999 > >>> when > >>> notification : > ClusterResourceNotification() > >>> from entry-point "MONITORING" > >>> eval(policyMgr.getMode() == > >>> ClusterPolicyManagerMode.MAINTENANCE) > >>> then > >>> statistics.increment("IF IN > >>> MAINTENANCE MODE, CONSUME ALL > NOTIFICATIONS"); > >>> retract(notification); > >>> end > >>> > >>> > >>> > >>> This other thread, apparently a scheduled > thread for a rule > >>> with a 10 second duration, > >>> is attempting to insert a fact and owns the > 'lock.lock' on > >>> ReteooStatefulSession and > >>> is waiting for the > 'ReteooStatefulSession.actionQueue'. > >>> > >>> owns: org.drools.common.DefaultAgenda > (id=4046) > >>> waiting for: java.util.LinkedList<E> > (id=207) > >>> > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() > >>> line: 1480 > >>> > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, > >>> java.lang.Object, org.drools.rule.Rule, > >>> org.drools.spi.Activation, > org.drools.reteoo.ObjectTypeConf) > >>> line: 1051 > >>> > org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, > >>> boolean, boolean, org.drools.rule.Rule, > >>> org.drools.spi.Activation) line: 1001 > >>> > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, > >>> boolean) line: 114 > >>> > org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) > >>> line: 108 > >>> > com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, > >>> > com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, > >>> org.drools.FactHandle, java.lang.String, > >>> org.drools.FactHandle, > >>> > com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, > >>> org.apache.log4j.Logger) line: not available > >>> > com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, > >>> org.drools.WorkingMemory) line: not available > >>> > org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) > >>> line: 934 > >>> > org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) > >>> line: 70 > >>> > org.drools.time.impl.JDKTimerService$JDKCallableJob.call() > >>> line: 132 > >>> > org.drools.time.impl.JDKTimerService$JDKCallableJob.call() > >>> line: 110 > >>> > java.util.concurrent.FutureTask$Sync.innerRun() line: 269 > >>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() > >>> line: 123 > >>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > >>> line: 65 > >>> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() > >>> line: 168 > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) > >>> line: 650 > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run() line: > >>> 675 > >>> java.lang.Thread.run() line: 595 > >>> > >>> The rule for this task looks like: > >>> rule "DETECT MONITORING HAS STOPPED" > >>> duration(10s) > >>> salience 1000 > >>> when > >>> lastNotification : > >>> DataServerNotification($resourceName : > resourceName) > >>> > >>> from entry-point "MONITORING" > >>> > >>> not (DataServerNotification(resourceName > == > >>> $resourceName, > >>> > >>> > >>> this after [10s] lastNotification) > >>> > >>> from entry-point "MONITORING") > >>> > >>> not (ManagerFailedAlarm(expired == > false, > >>> > >>> resourceName == $resourceName)) > >>> > >>> not (DataSource(name == $resourceName, > >>> > >>> state == ResourceState.SHUNNED || > >>> > >>> state == ResourceState.FAILED)) > >>> > >>> then > >>> Object[] params = {$resourceName}; > >>> if (policyMgr.getMode() != > >>> ClusterPolicyManagerMode.MAINTENANCE) > >>> { > >>> > >>> > lastNotification.setResourceState(ResourceState.UNKNOWN); > >>> ManagerFailedAlarm alarm = > >>> > >>> new > >>> ManagerFailedAlarm(lastNotification, "rule > detected monitor > >>> stop", > >>> > >>> > >>> 6, AlarmSeverity.FAULT); > >>> logger.info(alarm.toString()); > >>> insert(alarm); > >>> update(lastNotification); > >>> } > >>> end > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > _______________________________________________ > >>> rules-dev mailing list > >>> [hidden email] > >>> https://lists.jboss.org/mailman/listinfo/rules-dev > >>> > >> > >> > >> > >> _______________________________________________ > >> rules-dev mailing list > >> [hidden email] > >> https://lists.jboss.org/mailman/listinfo/rules-dev > >> > >> > > > > > > > > -- > > Michael D Neale > > home: www.michaelneale.net > > blog: michaelneale.blogspot.com > > > > _______________________________________________ > > rules-dev mailing list > > [hidden email] > > https://lists.jboss.org/mailman/listinfo/rules-dev > > > > _______________________________________________ > > rules-dev mailing list > > [hidden email] > > https://lists.jboss.org/mailman/listinfo/rules-dev > > > > > > -- > Michael D Neale > home: www.michaelneale.net > blog: michaelneale.blogspot.com > > _______________________________________________ > rules-dev mailing list > [hidden email] > https://lists.jboss.org/mailman/listinfo/rules-dev > _______________________________________________ rules-dev mailing list [hidden email] https://lists.jboss.org/mailman/listinfo/rules-dev |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |