Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

9 messages Options
Embed this post
Permalink
Edward Archibald

Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Reply Threaded More More options
Print post
Permalink

I found the following deadlock which is, apparently, due to the concurrent execution
of a task for a 'delayed' rule with a concurrently executing application thread attempting to get access to a 'global'.  Any recommendations for avoiding this type of deadlock besides not using rules with 'duration()' etc. which cause asynchronous execution with respect to my main application thread?

This problem is somewhat difficult to reproduce on demand but it does come up frequently when the 'delayed' rule "DETECT MONITORING HAS STOPPED" is activated as a result of the trigger conditions.

===================================================================================

This thread, my application's EnterprisePolicyManager thread, is attempting to get access to a global, policyMgr, and is waiting for
the 'lock.lock' on RetooStatefulSession

It owns the 'ReteooStatefulSession.actionQueue'
and is waiting for the ReteooStatefulSession.lock.lock

owns: java.util.LinkedList<E>  (id=207)
waited by: Thread [pool-3-thread-1] (Suspended)
owns: com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine  (id=208)
sun.misc.Unsafe.park(boolean, long) line: not available [native method] [local variables unavailable]
java.util.concurrent.locks.LockSupport.park() line: 118 [local variables unavailable]
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() line: 681 [local variables unavailable]
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, int) line: 711
java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int) line: 1041
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() line: 184 [local variables unavailable]
java.util.concurrent.locks.ReentrantLock.lock() line: 256 [local variables unavailable]
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String) line: 587
com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple, org.drools.rule.Declaration[], org.drools.WorkingMemory, java.lang.Object) line: not available
org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple, org.drools.WorkingMemory, java.lang.Object) line: 117
org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 180
org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory, org.drools.reteoo.LeftTuple) line: 117
org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple, org.drools.reteoo.RightTuple, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory, boolean) line: 28
org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 175
org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle, org.drools.spi.PropagationContext, org.drools.common.InternalWorkingMemory) line: 42
org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator, org.drools.common.InternalWorkingMemory) line: 326
org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory) line: 221
org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory) line: 394
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() line: 1486
org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle, java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation) line: 158
org.drools.common.NamedEntryPoint.insert(java.lang.Object, boolean, boolean, org.drools.rule.Rule, org.drools.spi.Activation) line: 122
org.drools.common.NamedEntryPoint.insert(java.lang.Object) line: 80
com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID, java.lang.Object, boolean) line: 162
com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run() line: 249
java.lang.Thread.run() line: 595

The rule implicated in the above thread is:

rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
salience 999
  when
    notification : ClusterResourceNotification() from entry-point "MONITORING"
    eval(policyMgr.getMode() == ClusterPolicyManagerMode.MAINTENANCE)
  then
     statistics.increment("IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
    retract(notification);
end



This other thread, apparently a scheduled thread for a rule with a 10 second duration,
is attempting to insert a fact and owns the 'lock.lock' on ReteooStatefulSession and
is waiting for the 'ReteooStatefulSession.actionQueue'.

owns: org.drools.common.DefaultAgenda  (id=4046)
waiting for: java.util.LinkedList<E>  (id=207)
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions() line: 1480
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle, java.lang.Object, org.drools.rule.Rule, org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf) line: 1051
org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object, boolean, boolean, org.drools.rule.Rule, org.drools.spi.Activation) line: 1001
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object, boolean) line: 114
org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object) line: 108
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper, com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification, org.drools.FactHandle, java.lang.String, org.drools.FactHandle, com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager, org.apache.log4j.Logger) line: not available
com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper, org.drools.WorkingMemory) line: not available
org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation) line: 934
org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext) line: 70
org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 132
org.drools.time.impl.JDKTimerService$JDKCallableJob.call() line: 110
java.util.concurrent.FutureTask$Sync.innerRun() line: 269
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run() line: 123
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) line: 65
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 168
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) line: 650
java.util.concurrent.ThreadPoolExecutor$Worker.run() line: 675
java.lang.Thread.run() line: 595

The rule for this task looks like:
rule "DETECT MONITORING HAS STOPPED"
duration(10s)
salience 1000
  when
    lastNotification : DataServerNotification($resourceName : resourceName)
                  from entry-point "MONITORING"

    not (DataServerNotification(resourceName == $resourceName,
                                 this after [10s] lastNotification)
                  from entry-point "MONITORING")
   
    not (ManagerFailedAlarm(expired == false,
                   resourceName == $resourceName))

    not (DataSource(name == $resourceName,
                   state == ResourceState.SHUNNED ||
                   state == ResourceState.FAILED))

  then
    Object[] params = {$resourceName};
    if (policyMgr.getMode() != ClusterPolicyManagerMode.MAINTENANCE)
    {
      lastNotification.setResourceState(ResourceState.UNKNOWN);
      ManagerFailedAlarm alarm =
                        new ManagerFailedAlarm(lastNotification, "rule detected monitor stop",
                                 6, AlarmSeverity.FAULT);
      logger.info(alarm.toString());
      insert(alarm);
      update(lastNotification);
     }
end







_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Greg Barton

Re: Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Reply Threaded More More options
Print post
Permalink
Well, I'm not sure how to avoid the deadlock without changing the drools codebase.  I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.)  Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)

diff attached.  If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock.  If it does it's up to the drools devs as to whether the change should be made.  I'm just hacking about. :P

--- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote:

> From: Edward Archibald <[hidden email]>
> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
> To: "[hidden email]" <[hidden email]>
> Date: Tuesday, November 3, 2009, 9:41 PM
>
> I found the following deadlock which is, apparently, due to
> the concurrent execution
> of a task for a 'delayed' rule with a concurrently
> executing application thread attempting to get access to a
> 'global'.  Any recommendations for avoiding this type
> of deadlock besides not using rules with 'duration()' etc.
> which cause asynchronous execution with respect to my main
> application thread?
>
> This problem is somewhat difficult to reproduce on demand
> but it does come up frequently when the 'delayed' rule
> "DETECT MONITORING HAS STOPPED" is activated as a result of
> the trigger conditions.
>
> ===================================================================================
>
> This thread, my application's EnterprisePolicyManager
> thread, is attempting to get access to a global, policyMgr,
> and is waiting for
> the 'lock.lock' on RetooStatefulSession
>
> It owns the 'ReteooStatefulSession.actionQueue'
> and is waiting for the ReteooStatefulSession.lock.lock
>
> owns: java.util.LinkedList<E>  (id=207)
> waited by: Thread [pool-3-thread-1] (Suspended)
> owns:
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine 
> (id=208)
> sun.misc.Unsafe.park(boolean, long) line: not available
> [native method] [local variables unavailable]
> java.util.concurrent.locks.LockSupport.park() line: 118
> [local variables unavailable]
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> line: 681 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
> int) line: 711
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
> line: 1041
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
> line: 184 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock.lock() line: 256
> [local variables unavailable]
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
> line: 587
> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
> org.drools.rule.Declaration[], org.drools.WorkingMemory,
> java.lang.Object) line: not available
> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
> org.drools.WorkingMemory, java.lang.Object) line: 117
> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 180
> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory,
> org.drools.reteoo.LeftTuple) line: 117
> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.reteoo.RightTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory, boolean) line: 28
> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 175
> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 42
> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
> org.drools.common.InternalWorkingMemory) line: 326
> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
> line: 221
> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
> line: 394
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1486
> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 158
> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 122
> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
> line: 80
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
> java.lang.Object, boolean) line: 162
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
> line: 249
> java.lang.Thread.run() line: 595
>
> The rule implicated in the above thread is:
>
> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
> salience 999
>   when
>     notification : ClusterResourceNotification()
> from entry-point "MONITORING"
>     eval(policyMgr.getMode() ==
> ClusterPolicyManagerMode.MAINTENANCE)
>   then
>      statistics.increment("IF IN
> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
>     retract(notification);
> end
>
>
>
> This other thread, apparently a scheduled thread for a rule
> with a 10 second duration,
> is attempting to insert a fact and owns the 'lock.lock' on
> ReteooStatefulSession and
> is waiting for the 'ReteooStatefulSession.actionQueue'.
>
> owns: org.drools.common.DefaultAgenda  (id=4046)
> waiting for: java.util.LinkedList<E>  (id=207)
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1480
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
> line: 1051
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 1001
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
> boolean) line: 114
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
> line: 108
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
> org.drools.FactHandle, java.lang.String,
> org.drools.FactHandle,
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
> org.apache.log4j.Logger) line: not available
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
> org.drools.WorkingMemory) line: not available
> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
> line: 934
> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
> line: 70
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 132
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 110
> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
> line: 123
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> line: 65
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
> line: 168
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
> line: 650
> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
> 675
> java.lang.Thread.run() line: 595
>
> The rule for this task looks like:
> rule "DETECT MONITORING HAS STOPPED"
> duration(10s)
> salience 1000
>   when
>     lastNotification :
> DataServerNotification($resourceName : resourceName)
>                
>   from entry-point "MONITORING"
>
>     not (DataServerNotification(resourceName ==
> $resourceName,
>                
>              
>    this after [10s] lastNotification)
>                
>   from entry-point "MONITORING")
>    
>     not (ManagerFailedAlarm(expired == false,
>                
>    resourceName == $resourceName))
>
>     not (DataSource(name == $resourceName,
>                
>    state == ResourceState.SHUNNED ||
>                
>    state == ResourceState.FAILED))
>
>   then
>     Object[] params = {$resourceName};
>     if (policyMgr.getMode() !=
> ClusterPolicyManagerMode.MAINTENANCE)
>     {
>      
> lastNotification.setResourceState(ResourceState.UNKNOWN);
>       ManagerFailedAlarm alarm =
>                
>         new
> ManagerFailedAlarm(lastNotification, "rule detected monitor
> stop",
>                
>              
>    6, AlarmSeverity.FAULT);
>       logger.info(alarm.toString());
>       insert(alarm);
>       update(lastNotification);
>      }
> end
>
>
>
>
>
>
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>


Index: drools-core/src/main/java/org/drools/common/AbstractWorkingMemory.java
===================================================================
--- drools-core/src/main/java/org/drools/common/AbstractWorkingMemory.java (revision 29938)
+++ drools-core/src/main/java/org/drools/common/AbstractWorkingMemory.java (working copy)
@@ -31,6 +31,7 @@
 import java.util.Queue;
 import java.util.Map.Entry;
 import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentLinkedQueue;
 import java.util.concurrent.Executors;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicLong;
@@ -289,7 +290,7 @@
             this.initialFactHandle = initialFactHandle;
         }
 
-        this.actionQueue = new LinkedList<WorkingMemoryAction>();
+        this.actionQueue = new ConcurrentLinkedQueue<WorkingMemoryAction>();
 
         this.addRemovePropertyChangeListenerArgs = new Object[]{this};
         this.queryResults = Collections.EMPTY_MAP;
@@ -1556,7 +1557,7 @@
     public void executeQueuedActions() {
         try {
             startOperation();
-            synchronized ( this.actionQueue ) {
+            //synchronized ( this.actionQueue ) {
                 if ( !this.actionQueue.isEmpty() && !evaluatingActionQueue ) {
                     evaluatingActionQueue = true;
                     WorkingMemoryAction action = null;
@@ -1571,7 +1572,7 @@
                     }
                     evaluatingActionQueue = false;
                 }
-            }
+            //}
         } finally {
             endOperation();
         }
@@ -1582,7 +1583,7 @@
     }
 
     public void queueWorkingMemoryAction(final WorkingMemoryAction action) {
-        synchronized ( this.actionQueue ) {
+        //synchronized ( this.actionQueue ) {
             try {
                 startOperation();
                 this.actionQueue.add( action );
@@ -1590,7 +1591,7 @@
             } finally {
                 endOperation();
             }
-        }
+        //}
     }
 
     public void removeLogicalDependencies(final Activation activation,

_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Michael Neale

Re: Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Reply Threaded More More options
Print post
Permalink
ha - was just musing with someone the other day who uses "duration"
anymore ;) I guess its still useful to people !

I would say that the "duration" codebase is probably fairly "old" - in
the sense that it probably pre-dates the availability of
j.u.concurrent (which was java 5 I think? ) - so please try out that
patch, if it works, we can probably pull it in (hoping Edson can take
a look).

On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]> wrote:

> Well, I'm not sure how to avoid the deadlock without changing the drools codebase.  I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.)  Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)
>
> diff attached.  If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock.  If it does it's up to the drools devs as to whether the change should be made.  I'm just hacking about. :P
>
> --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote:
>
>> From: Edward Archibald <[hidden email]>
>> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
>> To: "[hidden email]" <[hidden email]>
>> Date: Tuesday, November 3, 2009, 9:41 PM
>>
>> I found the following deadlock which is, apparently, due to
>> the concurrent execution
>> of a task for a 'delayed' rule with a concurrently
>> executing application thread attempting to get access to a
>> 'global'.  Any recommendations for avoiding this type
>> of deadlock besides not using rules with 'duration()' etc.
>> which cause asynchronous execution with respect to my main
>> application thread?
>>
>> This problem is somewhat difficult to reproduce on demand
>> but it does come up frequently when the 'delayed' rule
>> "DETECT MONITORING HAS STOPPED" is activated as a result of
>> the trigger conditions.
>>
>> ===================================================================================
>>
>> This thread, my application's EnterprisePolicyManager
>> thread, is attempting to get access to a global, policyMgr,
>> and is waiting for
>> the 'lock.lock' on RetooStatefulSession
>>
>> It owns the 'ReteooStatefulSession.actionQueue'
>> and is waiting for the ReteooStatefulSession.lock.lock
>>
>> owns: java.util.LinkedList<E>  (id=207)
>> waited by: Thread [pool-3-thread-1] (Suspended)
>> owns:
>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
>> (id=208)
>> sun.misc.Unsafe.park(boolean, long) line: not available
>> [native method] [local variables unavailable]
>> java.util.concurrent.locks.LockSupport.park() line: 118
>> [local variables unavailable]
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
>> line: 681 [local variables unavailable]
>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>> int) line: 711
>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
>> line: 1041
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
>> line: 184 [local variables unavailable]
>> java.util.concurrent.locks.ReentrantLock.lock() line: 256
>> [local variables unavailable]
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
>> line: 587
>> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
>> org.drools.rule.Declaration[], org.drools.WorkingMemory,
>> java.lang.Object) line: not available
>> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
>> org.drools.WorkingMemory, java.lang.Object) line: 117
>> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory) line: 180
>> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory,
>> org.drools.reteoo.LeftTuple) line: 117
>> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
>> org.drools.reteoo.RightTuple,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory, boolean) line: 28
>> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory) line: 175
>> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory) line: 42
>> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
>> org.drools.common.InternalWorkingMemory) line: 326
>> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
>> line: 221
>> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
>> line: 394
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>> line: 1486
>> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
>> java.lang.Object, org.drools.rule.Rule,
>> org.drools.spi.Activation) line: 158
>> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
>> boolean, boolean, org.drools.rule.Rule,
>> org.drools.spi.Activation) line: 122
>> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
>> line: 80
>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
>> java.lang.Object, boolean) line: 162
>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
>> line: 249
>> java.lang.Thread.run() line: 595
>>
>> The rule implicated in the above thread is:
>>
>> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
>> salience 999
>>   when
>>     notification : ClusterResourceNotification()
>> from entry-point "MONITORING"
>>     eval(policyMgr.getMode() ==
>> ClusterPolicyManagerMode.MAINTENANCE)
>>   then
>>      statistics.increment("IF IN
>> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
>>     retract(notification);
>> end
>>
>>
>>
>> This other thread, apparently a scheduled thread for a rule
>> with a 10 second duration,
>> is attempting to insert a fact and owns the 'lock.lock' on
>> ReteooStatefulSession and
>> is waiting for the 'ReteooStatefulSession.actionQueue'.
>>
>> owns: org.drools.common.DefaultAgenda  (id=4046)
>> waiting for: java.util.LinkedList<E>  (id=207)
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>> line: 1480
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
>> java.lang.Object, org.drools.rule.Rule,
>> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
>> line: 1051
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
>> boolean, boolean, org.drools.rule.Rule,
>> org.drools.spi.Activation) line: 1001
>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
>> boolean) line: 114
>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
>> line: 108
>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
>> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
>> org.drools.FactHandle, java.lang.String,
>> org.drools.FactHandle,
>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
>> org.apache.log4j.Logger) line: not available
>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
>> org.drools.WorkingMemory) line: not available
>> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
>> line: 934
>> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
>> line: 70
>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>> line: 132
>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>> line: 110
>> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
>> line: 123
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>> line: 65
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
>> line: 168
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
>> line: 650
>> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
>> 675
>> java.lang.Thread.run() line: 595
>>
>> The rule for this task looks like:
>> rule "DETECT MONITORING HAS STOPPED"
>> duration(10s)
>> salience 1000
>>   when
>>     lastNotification :
>> DataServerNotification($resourceName : resourceName)
>>
>>   from entry-point "MONITORING"
>>
>>     not (DataServerNotification(resourceName ==
>> $resourceName,
>>
>>
>>    this after [10s] lastNotification)
>>
>>   from entry-point "MONITORING")
>>
>>     not (ManagerFailedAlarm(expired == false,
>>
>>    resourceName == $resourceName))
>>
>>     not (DataSource(name == $resourceName,
>>
>>    state == ResourceState.SHUNNED ||
>>
>>    state == ResourceState.FAILED))
>>
>>   then
>>     Object[] params = {$resourceName};
>>     if (policyMgr.getMode() !=
>> ClusterPolicyManagerMode.MAINTENANCE)
>>     {
>>
>> lastNotification.setResourceState(ResourceState.UNKNOWN);
>>       ManagerFailedAlarm alarm =
>>
>>         new
>> ManagerFailedAlarm(lastNotification, "rule detected monitor
>> stop",
>>
>>
>>    6, AlarmSeverity.FAULT);
>>       logger.info(alarm.toString());
>>       insert(alarm);
>>       update(lastNotification);
>>      }
>> end
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> rules-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/rules-dev
>>
>
>
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>
>



--
Michael D Neale
home: www.michaelneale.net
blog: michaelneale.blogspot.com

_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Edward Archibald

Re: Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Edward Archibald
Hi Greg,

Thanks for the post.  I'll give this a shot.  Turns out that I can reproduce the issue often enough that I'll be able to see if this simple change resolves it.

Regards,

Edward

________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Greg Barton [[hidden email]]
Sent: Tuesday, November 03, 2009 9:43 PM
To: Rules Dev List
Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Well, I'm not sure how to avoid the deadlock without changing the drools codebase.  I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.)  Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)

diff attached.  If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock.  If it does it's up to the drools devs as to whether the change should be made.  I'm just hacking about. :P

--- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote:

> From: Edward Archibald <[hidden email]>
> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
> To: "[hidden email]" <[hidden email]>
> Date: Tuesday, November 3, 2009, 9:41 PM
>
> I found the following deadlock which is, apparently, due to
> the concurrent execution
> of a task for a 'delayed' rule with a concurrently
> executing application thread attempting to get access to a
> 'global'.  Any recommendations for avoiding this type
> of deadlock besides not using rules with 'duration()' etc.
> which cause asynchronous execution with respect to my main
> application thread?
>
> This problem is somewhat difficult to reproduce on demand
> but it does come up frequently when the 'delayed' rule
> "DETECT MONITORING HAS STOPPED" is activated as a result of
> the trigger conditions.
>
> ===================================================================================
>
> This thread, my application's EnterprisePolicyManager
> thread, is attempting to get access to a global, policyMgr,
> and is waiting for
> the 'lock.lock' on RetooStatefulSession
>
> It owns the 'ReteooStatefulSession.actionQueue'
> and is waiting for the ReteooStatefulSession.lock.lock
>
> owns: java.util.LinkedList<E>  (id=207)
> waited by: Thread [pool-3-thread-1] (Suspended)
> owns:
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
> (id=208)
> sun.misc.Unsafe.park(boolean, long) line: not available
> [native method] [local variables unavailable]
> java.util.concurrent.locks.LockSupport.park() line: 118
> [local variables unavailable]
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> line: 681 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
> int) line: 711
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
> line: 1041
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
> line: 184 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock.lock() line: 256
> [local variables unavailable]
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
> line: 587
> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
> org.drools.rule.Declaration[], org.drools.WorkingMemory,
> java.lang.Object) line: not available
> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
> org.drools.WorkingMemory, java.lang.Object) line: 117
> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 180
> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory,
> org.drools.reteoo.LeftTuple) line: 117
> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.reteoo.RightTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory, boolean) line: 28
> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 175
> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 42
> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
> org.drools.common.InternalWorkingMemory) line: 326
> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
> line: 221
> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
> line: 394
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1486
> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 158
> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 122
> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
> line: 80
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
> java.lang.Object, boolean) line: 162
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
> line: 249
> java.lang.Thread.run() line: 595
>
> The rule implicated in the above thread is:
>
> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
> salience 999
>   when
>     notification : ClusterResourceNotification()
> from entry-point "MONITORING"
>     eval(policyMgr.getMode() ==
> ClusterPolicyManagerMode.MAINTENANCE)
>   then
>      statistics.increment("IF IN
> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
>     retract(notification);
> end
>
>
>
> This other thread, apparently a scheduled thread for a rule
> with a 10 second duration,
> is attempting to insert a fact and owns the 'lock.lock' on
> ReteooStatefulSession and
> is waiting for the 'ReteooStatefulSession.actionQueue'.
>
> owns: org.drools.common.DefaultAgenda  (id=4046)
> waiting for: java.util.LinkedList<E>  (id=207)
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1480
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
> line: 1051
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 1001
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
> boolean) line: 114
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
> line: 108
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
> org.drools.FactHandle, java.lang.String,
> org.drools.FactHandle,
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
> org.apache.log4j.Logger) line: not available
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
> org.drools.WorkingMemory) line: not available
> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
> line: 934
> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
> line: 70
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 132
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 110
> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
> line: 123
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> line: 65
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
> line: 168
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
> line: 650
> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
> 675
> java.lang.Thread.run() line: 595
>
> The rule for this task looks like:
> rule "DETECT MONITORING HAS STOPPED"
> duration(10s)
> salience 1000
>   when
>     lastNotification :
> DataServerNotification($resourceName : resourceName)
>
>   from entry-point "MONITORING"
>
>     not (DataServerNotification(resourceName ==
> $resourceName,
>
>
>    this after [10s] lastNotification)
>
>   from entry-point "MONITORING")
>
>     not (ManagerFailedAlarm(expired == false,
>
>    resourceName == $resourceName))
>
>     not (DataSource(name == $resourceName,
>
>    state == ResourceState.SHUNNED ||
>
>    state == ResourceState.FAILED))
>
>   then
>     Object[] params = {$resourceName};
>     if (policyMgr.getMode() !=
> ClusterPolicyManagerMode.MAINTENANCE)
>     {
>
> lastNotification.setResourceState(ResourceState.UNKNOWN);
>       ManagerFailedAlarm alarm =
>
>         new
> ManagerFailedAlarm(lastNotification, "rule detected monitor
> stop",
>
>
>    6, AlarmSeverity.FAULT);
>       logger.info(alarm.toString());
>       insert(alarm);
>       update(lastNotification);
>      }
> end
>
>
>
>
>
>
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>



_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Edson Tirelli-4

Re: Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Reply Threaded More More options
Print post
Permalink

   Edward,

   Are you able to provide us with a test case? that would help us ensure we fix this and prevent future regressions.

   Thanks,
      Edson

2009/11/4 Edward Archibald <[hidden email]>
Hi Greg,

Thanks for the post.  I'll give this a shot.  Turns out that I can reproduce the issue often enough that I'll be able to see if this simple change resolves it.

Regards,

Edward

________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Greg Barton [[hidden email]]
Sent: Tuesday, November 03, 2009 9:43 PM
To: Rules Dev List
Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Well, I'm not sure how to avoid the deadlock without changing the drools codebase.  I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.)  Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)

diff attached.  If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock.  If it does it's up to the drools devs as to whether the change should be made.  I'm just hacking about. :P

--- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote:

> From: Edward Archibald <[hidden email]>
> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
> To: "[hidden email]" <[hidden email]>
> Date: Tuesday, November 3, 2009, 9:41 PM
>
> I found the following deadlock which is, apparently, due to
> the concurrent execution
> of a task for a 'delayed' rule with a concurrently
> executing application thread attempting to get access to a
> 'global'.  Any recommendations for avoiding this type
> of deadlock besides not using rules with 'duration()' etc.
> which cause asynchronous execution with respect to my main
> application thread?
>
> This problem is somewhat difficult to reproduce on demand
> but it does come up frequently when the 'delayed' rule
> "DETECT MONITORING HAS STOPPED" is activated as a result of
> the trigger conditions.
>
> ===================================================================================
>
> This thread, my application's EnterprisePolicyManager
> thread, is attempting to get access to a global, policyMgr,
> and is waiting for
> the 'lock.lock' on RetooStatefulSession
>
> It owns the 'ReteooStatefulSession.actionQueue'
> and is waiting for the ReteooStatefulSession.lock.lock
>
> owns: java.util.LinkedList<E>  (id=207)
> waited by: Thread [pool-3-thread-1] (Suspended)
> owns:
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
> (id=208)
> sun.misc.Unsafe.park(boolean, long) line: not available
> [native method] [local variables unavailable]
> java.util.concurrent.locks.LockSupport.park() line: 118
> [local variables unavailable]
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> line: 681 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
> int) line: 711
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
> line: 1041
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
> line: 184 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock.lock() line: 256
> [local variables unavailable]
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
> line: 587
> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
> org.drools.rule.Declaration[], org.drools.WorkingMemory,
> java.lang.Object) line: not available
> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
> org.drools.WorkingMemory, java.lang.Object) line: 117
> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 180
> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory,
> org.drools.reteoo.LeftTuple) line: 117
> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.reteoo.RightTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory, boolean) line: 28
> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 175
> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 42
> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
> org.drools.common.InternalWorkingMemory) line: 326
> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
> line: 221
> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
> line: 394
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1486
> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 158
> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 122
> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
> line: 80
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
> java.lang.Object, boolean) line: 162
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
> line: 249
> java.lang.Thread.run() line: 595
>
> The rule implicated in the above thread is:
>
> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
> salience 999
>   when
>     notification : ClusterResourceNotification()
> from entry-point "MONITORING"
>     eval(policyMgr.getMode() ==
> ClusterPolicyManagerMode.MAINTENANCE)
>   then
>      statistics.increment("IF IN
> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
>     retract(notification);
> end
>
>
>
> This other thread, apparently a scheduled thread for a rule
> with a 10 second duration,
> is attempting to insert a fact and owns the 'lock.lock' on
> ReteooStatefulSession and
> is waiting for the 'ReteooStatefulSession.actionQueue'.
>
> owns: org.drools.common.DefaultAgenda  (id=4046)
> waiting for: java.util.LinkedList<E>  (id=207)
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1480
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
> line: 1051
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 1001
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
> boolean) line: 114
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
> line: 108
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
> org.drools.FactHandle, java.lang.String,
> org.drools.FactHandle,
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
> org.apache.log4j.Logger) line: not available
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
> org.drools.WorkingMemory) line: not available
> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
> line: 934
> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
> line: 70
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 132
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 110
> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
> line: 123
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> line: 65
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
> line: 168
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
> line: 650
> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
> 675
> java.lang.Thread.run() line: 595
>
> The rule for this task looks like:
> rule "DETECT MONITORING HAS STOPPED"
> duration(10s)
> salience 1000
>   when
>     lastNotification :
> DataServerNotification($resourceName : resourceName)
>
>   from entry-point "MONITORING"
>
>     not (DataServerNotification(resourceName ==
> $resourceName,
>
>
>    this after [10s] lastNotification)
>
>   from entry-point "MONITORING")
>
>     not (ManagerFailedAlarm(expired == false,
>
>    resourceName == $resourceName))
>
>     not (DataSource(name == $resourceName,
>
>    state == ResourceState.SHUNNED ||
>
>    state == ResourceState.FAILED))
>
>   then
>     Object[] params = {$resourceName};
>     if (policyMgr.getMode() !=
> ClusterPolicyManagerMode.MAINTENANCE)
>     {
>
> lastNotification.setResourceState(ResourceState.UNKNOWN);
>       ManagerFailedAlarm alarm =
>
>         new
> ManagerFailedAlarm(lastNotification, "rule detected monitor
> stop",
>
>
>    6, AlarmSeverity.FAULT);
>       logger.info(alarm.toString());
>       insert(alarm);
>       update(lastNotification);
>      }
> end
>
>
>
>
>
>
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>



_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev



--
 Edson Tirelli
 JBoss Drools Core Development
 JBoss by Red Hat @ www.jboss.com

_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Edward Archibald

Re: Deadlock in the Drools core - suggested patch appears to resolve the issue

Reply Threaded More More options
Print post
Permalink
In reply to this post by Michael Neale
Hello Michael and Greg,

I have pulled the drools head, made the patch that Greg suggested (thanks Greg!) and deployed the drools-core jar with my app.  Prior to this change, I was able to reproduce the deadlock - verified in the debugger and in exactly the same place as my earlier post - roughly 50% of the time.  I have tried the same test scenario, now, 10 times with no failures.

>From what I can tell, this problem will easily happen every time that I am already in the 'delayed execution' code created by the rule with the 'duration()' qualifier and, at the same time, I get a new 'fact' and attempt to insert it. Knowing this, I can probably come up with a simpler test case. I'll give it a shot.

Thanks, again, to you both for the quick responses.

Edward

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Michael Neale
Sent: Wednesday, November 04, 2009 2:26 AM
To: Rules Dev List
Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

ha - was just musing with someone the other day who uses "duration"
anymore ;) I guess its still useful to people !

I would say that the "duration" codebase is probably fairly "old" - in
the sense that it probably pre-dates the availability of
j.u.concurrent (which was java 5 I think? ) - so please try out that
patch, if it works, we can probably pull it in (hoping Edson can take
a look).

On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]> wrote:

> Well, I'm not sure how to avoid the deadlock without changing the drools codebase.  I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.)  Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)
>
> diff attached.  If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock.  If it does it's up to the drools devs as to whether the change should be made.  I'm just hacking about. :P
>
> --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote:
>
>> From: Edward Archibald <[hidden email]>
>> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
>> To: "[hidden email]" <[hidden email]>
>> Date: Tuesday, November 3, 2009, 9:41 PM
>>
>> I found the following deadlock which is, apparently, due to
>> the concurrent execution
>> of a task for a 'delayed' rule with a concurrently
>> executing application thread attempting to get access to a
>> 'global'.  Any recommendations for avoiding this type
>> of deadlock besides not using rules with 'duration()' etc.
>> which cause asynchronous execution with respect to my main
>> application thread?
>>
>> This problem is somewhat difficult to reproduce on demand
>> but it does come up frequently when the 'delayed' rule
>> "DETECT MONITORING HAS STOPPED" is activated as a result of
>> the trigger conditions.
>>
>> ===================================================================================
>>
>> This thread, my application's EnterprisePolicyManager
>> thread, is attempting to get access to a global, policyMgr,
>> and is waiting for
>> the 'lock.lock' on RetooStatefulSession
>>
>> It owns the 'ReteooStatefulSession.actionQueue'
>> and is waiting for the ReteooStatefulSession.lock.lock
>>
>> owns: java.util.LinkedList<E>  (id=207)
>> waited by: Thread [pool-3-thread-1] (Suspended)
>> owns:
>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
>> (id=208)
>> sun.misc.Unsafe.park(boolean, long) line: not available
>> [native method] [local variables unavailable]
>> java.util.concurrent.locks.LockSupport.park() line: 118
>> [local variables unavailable]
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
>> line: 681 [local variables unavailable]
>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>> int) line: 711
>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
>> line: 1041
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
>> line: 184 [local variables unavailable]
>> java.util.concurrent.locks.ReentrantLock.lock() line: 256
>> [local variables unavailable]
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
>> line: 587
>> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
>> org.drools.rule.Declaration[], org.drools.WorkingMemory,
>> java.lang.Object) line: not available
>> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
>> org.drools.WorkingMemory, java.lang.Object) line: 117
>> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory) line: 180
>> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory,
>> org.drools.reteoo.LeftTuple) line: 117
>> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
>> org.drools.reteoo.RightTuple,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory, boolean) line: 28
>> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory) line: 175
>> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
>> org.drools.spi.PropagationContext,
>> org.drools.common.InternalWorkingMemory) line: 42
>> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
>> org.drools.common.InternalWorkingMemory) line: 326
>> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
>> line: 221
>> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
>> line: 394
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>> line: 1486
>> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
>> java.lang.Object, org.drools.rule.Rule,
>> org.drools.spi.Activation) line: 158
>> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
>> boolean, boolean, org.drools.rule.Rule,
>> org.drools.spi.Activation) line: 122
>> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
>> line: 80
>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
>> java.lang.Object, boolean) line: 162
>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
>> line: 249
>> java.lang.Thread.run() line: 595
>>
>> The rule implicated in the above thread is:
>>
>> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
>> salience 999
>>   when
>>     notification : ClusterResourceNotification()
>> from entry-point "MONITORING"
>>     eval(policyMgr.getMode() ==
>> ClusterPolicyManagerMode.MAINTENANCE)
>>   then
>>      statistics.increment("IF IN
>> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
>>     retract(notification);
>> end
>>
>>
>>
>> This other thread, apparently a scheduled thread for a rule
>> with a 10 second duration,
>> is attempting to insert a fact and owns the 'lock.lock' on
>> ReteooStatefulSession and
>> is waiting for the 'ReteooStatefulSession.actionQueue'.
>>
>> owns: org.drools.common.DefaultAgenda  (id=4046)
>> waiting for: java.util.LinkedList<E>  (id=207)
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>> line: 1480
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
>> java.lang.Object, org.drools.rule.Rule,
>> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
>> line: 1051
>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
>> boolean, boolean, org.drools.rule.Rule,
>> org.drools.spi.Activation) line: 1001
>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
>> boolean) line: 114
>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
>> line: 108
>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
>> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
>> org.drools.FactHandle, java.lang.String,
>> org.drools.FactHandle,
>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
>> org.apache.log4j.Logger) line: not available
>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
>> org.drools.WorkingMemory) line: not available
>> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
>> line: 934
>> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
>> line: 70
>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>> line: 132
>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>> line: 110
>> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
>> line: 123
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>> line: 65
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
>> line: 168
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
>> line: 650
>> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
>> 675
>> java.lang.Thread.run() line: 595
>>
>> The rule for this task looks like:
>> rule "DETECT MONITORING HAS STOPPED"
>> duration(10s)
>> salience 1000
>>   when
>>     lastNotification :
>> DataServerNotification($resourceName : resourceName)
>>
>>   from entry-point "MONITORING"
>>
>>     not (DataServerNotification(resourceName ==
>> $resourceName,
>>
>>
>>    this after [10s] lastNotification)
>>
>>   from entry-point "MONITORING")
>>
>>     not (ManagerFailedAlarm(expired == false,
>>
>>    resourceName == $resourceName))
>>
>>     not (DataSource(name == $resourceName,
>>
>>    state == ResourceState.SHUNNED ||
>>
>>    state == ResourceState.FAILED))
>>
>>   then
>>     Object[] params = {$resourceName};
>>     if (policyMgr.getMode() !=
>> ClusterPolicyManagerMode.MAINTENANCE)
>>     {
>>
>> lastNotification.setResourceState(ResourceState.UNKNOWN);
>>       ManagerFailedAlarm alarm =
>>
>>         new
>> ManagerFailedAlarm(lastNotification, "rule detected monitor
>> stop",
>>
>>
>>    6, AlarmSeverity.FAULT);
>>       logger.info(alarm.toString());
>>       insert(alarm);
>>       update(lastNotification);
>>      }
>> end
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> rules-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/rules-dev
>>
>
>
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>
>



--
Michael D Neale
home: www.michaelneale.net
blog: michaelneale.blogspot.com

_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev

_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Edward Archibald

Re: Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Edson Tirelli-4
Some javascript/style in this post has been disabled (why?)

Hi Edson,

 

Our emails just crossed paths.  Yes, I believe that I can reproduce this.  I’ll try to turn this around quickly.

 

Ed

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Edson Tirelli
Sent: Wednesday, November 04, 2009 5:06 PM
To: Rules Dev List
Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?

 


   Edward,

   Are you able to provide us with a test case? that would help us ensure we fix this and prevent future regressions.

   Thanks,
      Edson

2009/11/4 Edward Archibald <[hidden email]>

Hi Greg,

Thanks for the post.  I'll give this a shot.  Turns out that I can reproduce the issue often enough that I'll be able to see if this simple change resolves it.

Regards,

Edward

________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Greg Barton [[hidden email]]
Sent: Tuesday, November 03, 2009 9:43 PM
To: Rules Dev List
Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?


Well, I'm not sure how to avoid the deadlock without changing the drools codebase.  I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.)  Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)

diff attached.  If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock.  If it does it's up to the drools devs as to whether the change should be made.  I'm just hacking about. :P

--- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote:

> From: Edward Archibald <[hidden email]>
> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
> To: "[hidden email]" <[hidden email]>
> Date: Tuesday, November 3, 2009, 9:41 PM
>
> I found the following deadlock which is, apparently, due to
> the concurrent execution
> of a task for a 'delayed' rule with a concurrently
> executing application thread attempting to get access to a
> 'global'.  Any recommendations for avoiding this type
> of deadlock besides not using rules with 'duration()' etc.
> which cause asynchronous execution with respect to my main
> application thread?
>
> This problem is somewhat difficult to reproduce on demand
> but it does come up frequently when the 'delayed' rule
> "DETECT MONITORING HAS STOPPED" is activated as a result of
> the trigger conditions.
>
> ===================================================================================
>
> This thread, my application's EnterprisePolicyManager
> thread, is attempting to get access to a global, policyMgr,
> and is waiting for
> the 'lock.lock' on RetooStatefulSession
>
> It owns the 'ReteooStatefulSession.actionQueue'
> and is waiting for the ReteooStatefulSession.lock.lock
>
> owns: java.util.LinkedList<E>  (id=207)
> waited by: Thread [pool-3-thread-1] (Suspended)
> owns:
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
> (id=208)
> sun.misc.Unsafe.park(boolean, long) line: not available
> [native method] [local variables unavailable]
> java.util.concurrent.locks.LockSupport.park() line: 118
> [local variables unavailable]
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> line: 681 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
> int) line: 711
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
> line: 1041
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
> line: 184 [local variables unavailable]
> java.util.concurrent.locks.ReentrantLock.lock() line: 256
> [local variables unavailable]
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
> line: 587
> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
> org.drools.rule.Declaration[], org.drools.WorkingMemory,
> java.lang.Object) line: not available
> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
> org.drools.WorkingMemory, java.lang.Object) line: 117
> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 180
> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory,
> org.drools.reteoo.LeftTuple) line: 117
> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
> org.drools.reteoo.RightTuple,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory, boolean) line: 28
> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 175
> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
> org.drools.spi.PropagationContext,
> org.drools.common.InternalWorkingMemory) line: 42
> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
> org.drools.common.InternalWorkingMemory) line: 326
> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
> line: 221
> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
> line: 394
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1486
> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 158
> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 122
> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
> line: 80
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
> java.lang.Object, boolean) line: 162
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
> line: 249
> java.lang.Thread.run() line: 595
>
> The rule implicated in the above thread is:
>
> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
> salience 999
>   when
>     notification : ClusterResourceNotification()
> from entry-point "MONITORING"
>     eval(policyMgr.getMode() ==
> ClusterPolicyManagerMode.MAINTENANCE)
>   then
>      statistics.increment("IF IN
> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
>     retract(notification);
> end
>
>
>
> This other thread, apparently a scheduled thread for a rule
> with a 10 second duration,
> is attempting to insert a fact and owns the 'lock.lock' on
> ReteooStatefulSession and
> is waiting for the 'ReteooStatefulSession.actionQueue'.
>
> owns: org.drools.common.DefaultAgenda  (id=4046)
> waiting for: java.util.LinkedList<E>  (id=207)
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> line: 1480
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
> java.lang.Object, org.drools.rule.Rule,
> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
> line: 1051
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
> boolean, boolean, org.drools.rule.Rule,
> org.drools.spi.Activation) line: 1001
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
> boolean) line: 114
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
> line: 108
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
> org.drools.FactHandle, java.lang.String,
> org.drools.FactHandle,
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
> org.apache.log4j.Logger) line: not available
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
> org.drools.WorkingMemory) line: not available
> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
> line: 934
> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
> line: 70
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 132
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> line: 110
> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
> line: 123
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> line: 65
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
> line: 168
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
> line: 650
> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
> 675
> java.lang.Thread.run() line: 595
>
> The rule for this task looks like:
> rule "DETECT MONITORING HAS STOPPED"
> duration(10s)
> salience 1000
>   when
>     lastNotification :
> DataServerNotification($resourceName : resourceName)
>
>   from entry-point "MONITORING"
>
>     not (DataServerNotification(resourceName ==
> $resourceName,
>
>
>    this after [10s] lastNotification)
>
>   from entry-point "MONITORING")
>
>     not (ManagerFailedAlarm(expired == false,
>
>    resourceName == $resourceName))
>
>     not (DataSource(name == $resourceName,
>
>    state == ResourceState.SHUNNED ||
>
>    state == ResourceState.FAILED))
>
>   then
>     Object[] params = {$resourceName};
>     if (policyMgr.getMode() !=
> ClusterPolicyManagerMode.MAINTENANCE)
>     {
>
> lastNotification.setResourceState(ResourceState.UNKNOWN);
>       ManagerFailedAlarm alarm =
>
>         new
> ManagerFailedAlarm(lastNotification, "rule detected monitor
> stop",
>
>
>    6, AlarmSeverity.FAULT);
>       logger.info(alarm.toString());
>       insert(alarm);
>       update(lastNotification);
>      }
> end
>
>
>
>
>
>
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>



_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev




--
 Edson Tirelli
 JBoss Drools Core Development
 JBoss by Red Hat @ www.jboss.com


_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Michael Neale

Re: Deadlock in the Drools core - suggested patch appears to resolve the issue

Reply Threaded More More options
Print post
Permalink
In reply to this post by Edward Archibald
That is great news. If there are other places we can move to
java.util.concurrent in older code, that would also be welcome,
j.u.concurrent is awesome magic which saves so much pain.



On Thu, Nov 5, 2009 at 12:06 PM, Edward Archibald
<[hidden email]> wrote:

> Hello Michael and Greg,
>
> I have pulled the drools head, made the patch that Greg suggested (thanks Greg!) and deployed the drools-core jar with my app.  Prior to this change, I was able to reproduce the deadlock - verified in the debugger and in exactly the same place as my earlier post - roughly 50% of the time.  I have tried the same test scenario, now, 10 times with no failures.
>
> >From what I can tell, this problem will easily happen every time that I am already in the 'delayed execution' code created by the rule with the 'duration()' qualifier and, at the same time, I get a new 'fact' and attempt to insert it. Knowing this, I can probably come up with a simpler test case. I'll give it a shot.
>
> Thanks, again, to you both for the quick responses.
>
> Edward
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Michael Neale
> Sent: Wednesday, November 04, 2009 2:26 AM
> To: Rules Dev List
> Subject: Re: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
>
> ha - was just musing with someone the other day who uses "duration"
> anymore ;) I guess its still useful to people !
>
> I would say that the "duration" codebase is probably fairly "old" - in
> the sense that it probably pre-dates the availability of
> j.u.concurrent (which was java 5 I think? ) - so please try out that
> patch, if it works, we can probably pull it in (hoping Edson can take
> a look).
>
> On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]> wrote:
>> Well, I'm not sure how to avoid the deadlock without changing the drools codebase.  I was, however, able to change the type of AbstractWorkingMemory.actionQueue to java.util.concurrent.ConcurrentLinkedQueue and remove the synchronization over the queue with no apparent ill effects. (Two tests failed for drools-core, but they failed whether the change was made or not.)  Also I don't like the fact that the current code synchronizes on actionQueue, but then exposes it outside the class through the getActionQueue() method, where access can be unsynchronized.  Changing it to ConcurrentLinkedQueue makes it safe to expose externally. (Not to mention that the lock can be stolen externally with the current code.)
>>
>> diff attached.  If you can run drools compiled from trunk, apply the diff and see if it resolves the deadlock.  If it does it's up to the drools devs as to whether the change should be made.  I'm just hacking about. :P
>>
>> --- On Tue, 11/3/09, Edward Archibald <[hidden email]> wrote:
>>
>>> From: Edward Archibald <[hidden email]>
>>> Subject: [rules-dev] Deadlock in the Drools core - Drools 5.0 - any suggestions for resolution?
>>> To: "[hidden email]" <[hidden email]>
>>> Date: Tuesday, November 3, 2009, 9:41 PM
>>>
>>> I found the following deadlock which is, apparently, due to
>>> the concurrent execution
>>> of a task for a 'delayed' rule with a concurrently
>>> executing application thread attempting to get access to a
>>> 'global'.  Any recommendations for avoiding this type
>>> of deadlock besides not using rules with 'duration()' etc.
>>> which cause asynchronous execution with respect to my main
>>> application thread?
>>>
>>> This problem is somewhat difficult to reproduce on demand
>>> but it does come up frequently when the 'delayed' rule
>>> "DETECT MONITORING HAS STOPPED" is activated as a result of
>>> the trigger conditions.
>>>
>>> ===================================================================================
>>>
>>> This thread, my application's EnterprisePolicyManager
>>> thread, is attempting to get access to a global, policyMgr,
>>> and is waiting for
>>> the 'lock.lock' on RetooStatefulSession
>>>
>>> It owns the 'ReteooStatefulSession.actionQueue'
>>> and is waiting for the ReteooStatefulSession.lock.lock
>>>
>>> owns: java.util.LinkedList<E>  (id=207)
>>> waited by: Thread [pool-3-thread-1] (Suspended)
>>> owns:
>>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
>>> (id=208)
>>> sun.misc.Unsafe.park(boolean, long) line: not available
>>> [native method] [local variables unavailable]
>>> java.util.concurrent.locks.LockSupport.park() line: 118
>>> [local variables unavailable]
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
>>> line: 681 [local variables unavailable]
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>>> int) line: 711
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
>>> line: 1041
>>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
>>> line: 184 [local variables unavailable]
>>> java.util.concurrent.locks.ReentrantLock.lock() line: 256
>>> [local variables unavailable]
>>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
>>> line: 587
>>> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
>>> org.drools.rule.Declaration[], org.drools.WorkingMemory,
>>> java.lang.Object) line: not available
>>> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
>>> org.drools.WorkingMemory, java.lang.Object) line: 117
>>> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory) line: 180
>>> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory,
>>> org.drools.reteoo.LeftTuple) line: 117
>>> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
>>> org.drools.reteoo.RightTuple,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory, boolean) line: 28
>>> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory) line: 175
>>> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
>>> org.drools.spi.PropagationContext,
>>> org.drools.common.InternalWorkingMemory) line: 42
>>> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
>>> org.drools.common.InternalWorkingMemory) line: 326
>>> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
>>> line: 221
>>> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
>>> line: 394
>>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>>> line: 1486
>>> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
>>> java.lang.Object, org.drools.rule.Rule,
>>> org.drools.spi.Activation) line: 158
>>> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
>>> boolean, boolean, org.drools.rule.Rule,
>>> org.drools.spi.Activation) line: 122
>>> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
>>> line: 80
>>> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
>>> java.lang.Object, boolean) line: 162
>>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
>>> line: 249
>>> java.lang.Thread.run() line: 595
>>>
>>> The rule implicated in the above thread is:
>>>
>>> rule "IF IN MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS"
>>> salience 999
>>>   when
>>>     notification : ClusterResourceNotification()
>>> from entry-point "MONITORING"
>>>     eval(policyMgr.getMode() ==
>>> ClusterPolicyManagerMode.MAINTENANCE)
>>>   then
>>>      statistics.increment("IF IN
>>> MAINTENANCE MODE, CONSUME ALL NOTIFICATIONS");
>>>     retract(notification);
>>> end
>>>
>>>
>>>
>>> This other thread, apparently a scheduled thread for a rule
>>> with a 10 second duration,
>>> is attempting to insert a fact and owns the 'lock.lock' on
>>> ReteooStatefulSession and
>>> is waiting for the 'ReteooStatefulSession.actionQueue'.
>>>
>>> owns: org.drools.common.DefaultAgenda  (id=4046)
>>> waiting for: java.util.LinkedList<E>  (id=207)
>>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
>>> line: 1480
>>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
>>> java.lang.Object, org.drools.rule.Rule,
>>> org.drools.spi.Activation, org.drools.reteoo.ObjectTypeConf)
>>> line: 1051
>>> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
>>> boolean, boolean, org.drools.rule.Rule,
>>> org.drools.spi.Activation) line: 1001
>>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
>>> boolean) line: 114
>>> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
>>> line: 108
>>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
>>> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
>>> org.drools.FactHandle, java.lang.String,
>>> org.drools.FactHandle,
>>> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
>>> org.apache.log4j.Logger) line: not available
>>> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
>>> org.drools.WorkingMemory) line: not available
>>> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
>>> line: 934
>>> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
>>> line: 70
>>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>>> line: 132
>>> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
>>> line: 110
>>> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
>>> line: 123
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>>> line: 65
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
>>> line: 168
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
>>> line: 650
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
>>> 675
>>> java.lang.Thread.run() line: 595
>>>
>>> The rule for this task looks like:
>>> rule "DETECT MONITORING HAS STOPPED"
>>> duration(10s)
>>> salience 1000
>>>   when
>>>     lastNotification :
>>> DataServerNotification($resourceName : resourceName)
>>>
>>>   from entry-point "MONITORING"
>>>
>>>     not (DataServerNotification(resourceName ==
>>> $resourceName,
>>>
>>>
>>>    this after [10s] lastNotification)
>>>
>>>   from entry-point "MONITORING")
>>>
>>>     not (ManagerFailedAlarm(expired == false,
>>>
>>>    resourceName == $resourceName))
>>>
>>>     not (DataSource(name == $resourceName,
>>>
>>>    state == ResourceState.SHUNNED ||
>>>
>>>    state == ResourceState.FAILED))
>>>
>>>   then
>>>     Object[] params = {$resourceName};
>>>     if (policyMgr.getMode() !=
>>> ClusterPolicyManagerMode.MAINTENANCE)
>>>     {
>>>
>>> lastNotification.setResourceState(ResourceState.UNKNOWN);
>>>       ManagerFailedAlarm alarm =
>>>
>>>         new
>>> ManagerFailedAlarm(lastNotification, "rule detected monitor
>>> stop",
>>>
>>>
>>>    6, AlarmSeverity.FAULT);
>>>       logger.info(alarm.toString());
>>>       insert(alarm);
>>>       update(lastNotification);
>>>      }
>>> end
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> rules-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/rules-dev
>>>
>>
>>
>>
>> _______________________________________________
>> rules-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/rules-dev
>>
>>
>
>
>
> --
> Michael D Neale
> home: www.michaelneale.net
> blog: michaelneale.blogspot.com
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>



--
Michael D Neale
home: www.michaelneale.net
blog: michaelneale.blogspot.com

_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev
Greg Barton

Re: Deadlock in the Drools core - suggested patch appears to resolve the issue

Reply Threaded More More options
Print post
Permalink
I'll poke around.

--- On Wed, 11/4/09, Michael Neale <[hidden email]> wrote:

> From: Michael Neale <[hidden email]>
> Subject: Re: [rules-dev] Deadlock in the Drools core - suggested patch appears to resolve the issue
> To: "Rules Dev List" <[hidden email]>
> Date: Wednesday, November 4, 2009, 9:01 PM
> That is great news. If there are
> other places we can move to
> java.util.concurrent in older code, that would also be
> welcome,
> j.u.concurrent is awesome magic which saves so much pain.
>
>
>
> On Thu, Nov 5, 2009 at 12:06 PM, Edward Archibald
> <[hidden email]>
> wrote:
> > Hello Michael and Greg,
> >
> > I have pulled the drools head, made the patch that
> Greg suggested (thanks Greg!) and deployed the drools-core
> jar with my app.  Prior to this change, I was able to
> reproduce the deadlock - verified in the debugger and in
> exactly the same place as my earlier post - roughly 50% of
> the time.  I have tried the same test scenario, now, 10
> times with no failures.
> >
> > >From what I can tell, this problem will easily
> happen every time that I am already in the 'delayed
> execution' code created by the rule with the 'duration()'
> qualifier and, at the same time, I get a new 'fact' and
> attempt to insert it. Knowing this, I can probably come up
> with a simpler test case. I'll give it a shot.
> >
> > Thanks, again, to you both for the quick responses.
> >
> > Edward
> >
> > -----Original Message-----
> > From: [hidden email]
> [mailto:[hidden email]]
> On Behalf Of Michael Neale
> > Sent: Wednesday, November 04, 2009 2:26 AM
> > To: Rules Dev List
> > Subject: Re: [rules-dev] Deadlock in the Drools core -
> Drools 5.0 - any suggestions for resolution?
> >
> > ha - was just musing with someone the other day who
> uses "duration"
> > anymore ;) I guess its still useful to people !
> >
> > I would say that the "duration" codebase is probably
> fairly "old" - in
> > the sense that it probably pre-dates the availability
> of
> > j.u.concurrent (which was java 5 I think? ) - so
> please try out that
> > patch, if it works, we can probably pull it in (hoping
> Edson can take
> > a look).
> >
> > On Wed, Nov 4, 2009 at 4:43 PM, Greg Barton <[hidden email]>
> wrote:
> >> Well, I'm not sure how to avoid the deadlock
> without changing the drools codebase.  I was, however, able
> to change the type of AbstractWorkingMemory.actionQueue to
> java.util.concurrent.ConcurrentLinkedQueue and remove the
> synchronization over the queue with no apparent ill effects.
> (Two tests failed for drools-core, but they failed whether
> the change was made or not.)  Also I don't like the fact
> that the current code synchronizes on actionQueue, but then
> exposes it outside the class through the getActionQueue()
> method, where access can be unsynchronized.  Changing it to
> ConcurrentLinkedQueue makes it safe to expose externally.
> (Not to mention that the lock can be stolen externally with
> the current code.)
> >>
> >> diff attached.  If you can run drools compiled
> from trunk, apply the diff and see if it resolves the
> deadlock.  If it does it's up to the drools devs as to
> whether the change should be made.  I'm just hacking about.
> :P
> >>
> >> --- On Tue, 11/3/09, Edward Archibald <[hidden email]>
> wrote:
> >>
> >>> From: Edward Archibald <[hidden email]>
> >>> Subject: [rules-dev] Deadlock in the Drools
> core - Drools 5.0 - any suggestions for resolution?
> >>> To: "[hidden email]"
> <[hidden email]>
> >>> Date: Tuesday, November 3, 2009, 9:41 PM
> >>>
> >>> I found the following deadlock which is,
> apparently, due to
> >>> the concurrent execution
> >>> of a task for a 'delayed' rule with a
> concurrently
> >>> executing application thread attempting to get
> access to a
> >>> 'global'.  Any recommendations for avoiding
> this type
> >>> of deadlock besides not using rules with
> 'duration()' etc.
> >>> which cause asynchronous execution with
> respect to my main
> >>> application thread?
> >>>
> >>> This problem is somewhat difficult to
> reproduce on demand
> >>> but it does come up frequently when the
> 'delayed' rule
> >>> "DETECT MONITORING HAS STOPPED" is activated
> as a result of
> >>> the trigger conditions.
> >>>
> >>>
> ===================================================================================
> >>>
> >>> This thread, my application's
> EnterprisePolicyManager
> >>> thread, is attempting to get access to a
> global, policyMgr,
> >>> and is waiting for
> >>> the 'lock.lock' on RetooStatefulSession
> >>>
> >>> It owns the
> 'ReteooStatefulSession.actionQueue'
> >>> and is waiting for the
> ReteooStatefulSession.lock.lock
> >>>
> >>> owns: java.util.LinkedList<E> 
> (id=207)
> >>> waited by: Thread [pool-3-thread-1]
> (Suspended)
> >>> owns:
> >>>
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine
> >>> (id=208)
> >>> sun.misc.Unsafe.park(boolean, long) line: not
> available
> >>> [native method] [local variables unavailable]
> >>> java.util.concurrent.locks.LockSupport.park()
> line: 118
> >>> [local variables unavailable]
> >>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
> >>> line: 681 [local variables unavailable]
> >>>
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
> >>> int) line: 711
> >>>
> java.util.concurrent.locks.ReentrantLock$NonfairSync(java.util.concurrent.locks.AbstractQueuedSynchronizer).acquire(int)
> >>> line: 1041
> >>>
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock()
> >>> line: 184 [local variables unavailable]
> >>>
> java.util.concurrent.locks.ReentrantLock.lock() line: 256
> >>> [local variables unavailable]
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).getGlobal(java.lang.String)
> >>> line: 587
> >>>
> com.continuent.tungsten.cluster.manager.policy.Rule_IF_IN_MAINTENANCE_MODE__CONSUME_ALL_NOTIFICATIONS_0Eval0Invoker.evaluate(org.drools.spi.Tuple,
> >>> org.drools.rule.Declaration[],
> org.drools.WorkingMemory,
> >>> java.lang.Object) line: not available
> >>>
> org.drools.rule.EvalCondition.isAllowed(org.drools.spi.Tuple,
> >>> org.drools.WorkingMemory, java.lang.Object)
> line: 117
> >>>
> org.drools.reteoo.EvalConditionNode.assertLeftTuple(org.drools.reteoo.LeftTuple,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory) line:
> 180
> >>>
> org.drools.reteoo.SingleLeftTupleSinkAdapter.doPropagateAssertLeftTuple(org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory,
> >>> org.drools.reteoo.LeftTuple) line: 117
> >>>
> org.drools.reteoo.SingleLeftTupleSinkAdapter.propagateAssertLeftTuple(org.drools.reteoo.LeftTuple,
> >>> org.drools.reteoo.RightTuple,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory,
> boolean) line: 28
> >>>
> org.drools.reteoo.JoinNode.assertObject(org.drools.common.InternalFactHandle,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory) line:
> 175
> >>>
> org.drools.reteoo.SingleObjectSinkAdapter.propagateAssertObject(org.drools.common.InternalFactHandle,
> >>> org.drools.spi.PropagationContext,
> >>> org.drools.common.InternalWorkingMemory) line:
> 42
> >>>
> org.drools.reteoo.PropagationQueuingNode$AssertAction.execute(org.drools.reteoo.ObjectSinkPropagator,
> >>> org.drools.common.InternalWorkingMemory) line:
> 326
> >>>
> org.drools.reteoo.PropagationQueuingNode.propagateActions(org.drools.common.InternalWorkingMemory)
> >>> line: 221
> >>>
> org.drools.reteoo.PropagationQueuingNode$PropagateAction.execute(org.drools.common.InternalWorkingMemory)
> >>> line: 394
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> >>> line: 1486
> >>>
> org.drools.common.NamedEntryPoint.insert(org.drools.common.InternalFactHandle,
> >>> java.lang.Object, org.drools.rule.Rule,
> >>> org.drools.spi.Activation) line: 158

> >>>
> org.drools.common.NamedEntryPoint.insert(java.lang.Object,
> >>> boolean, boolean, org.drools.rule.Rule,
> >>> org.drools.spi.Activation) line: 122
> >>>
> org.drools.common.NamedEntryPoint.insert(java.lang.Object)
> >>> line: 80
> >>>
> com.continuent.tungsten.cluster.manager.rules.engine.RulesEngine.insertFact(com.continuent.tungsten.commons.cluster.resource.notification.NotificationStreamID,
> >>> java.lang.Object, boolean) line: 162
> >>>
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager.run()
> >>> line: 249
> >>> java.lang.Thread.run() line: 595
> >>>
> >>> The rule implicated in the above thread is:
> >>>
> >>> rule "IF IN MAINTENANCE MODE, CONSUME ALL
> NOTIFICATIONS"
> >>> salience 999
> >>>   when
> >>>     notification :
> ClusterResourceNotification()
> >>> from entry-point "MONITORING"
> >>>     eval(policyMgr.getMode() ==
> >>> ClusterPolicyManagerMode.MAINTENANCE)
> >>>   then
> >>>      statistics.increment("IF IN
> >>> MAINTENANCE MODE, CONSUME ALL
> NOTIFICATIONS");
> >>>     retract(notification);
> >>> end
> >>>
> >>>
> >>>
> >>> This other thread, apparently a scheduled
> thread for a rule
> >>> with a 10 second duration,
> >>> is attempting to insert a fact and owns the
> 'lock.lock' on
> >>> ReteooStatefulSession and
> >>> is waiting for the
> 'ReteooStatefulSession.actionQueue'.
> >>>
> >>> owns: org.drools.common.DefaultAgenda 
> (id=4046)
> >>> waiting for: java.util.LinkedList<E> 
> (id=207)
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).executeQueuedActions()
> >>> line: 1480
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(org.drools.common.InternalFactHandle,
> >>> java.lang.Object, org.drools.rule.Rule,
> >>> org.drools.spi.Activation,
> org.drools.reteoo.ObjectTypeConf)
> >>> line: 1051
> >>>
> org.drools.reteoo.ReteooStatefulSession(org.drools.common.AbstractWorkingMemory).insert(java.lang.Object,
> >>> boolean, boolean, org.drools.rule.Rule,
> >>> org.drools.spi.Activation) line: 1001
> >>>
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object,
> >>> boolean) line: 114
> >>>
> org.drools.base.DefaultKnowledgeHelper.insert(java.lang.Object)
> >>> line: 108
> >>>
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0.consequence(org.drools.spi.KnowledgeHelper,
> >>>
> com.continuent.tungsten.commons.cluster.resource.notification.DataServerNotification,
> >>> org.drools.FactHandle, java.lang.String,
> >>> org.drools.FactHandle,
> >>>
> com.continuent.tungsten.cluster.manager.policy.EnterprisePolicyManager,
> >>> org.apache.log4j.Logger) line: not available
> >>>
> com.continuent.tungsten.cluster.manager.policy.Rule_DETECT_MONITORING_HAS_STOPPED_0ConsequenceInvoker.evaluate(org.drools.spi.KnowledgeHelper,
> >>> org.drools.WorkingMemory) line: not available
> >>>
> org.drools.common.DefaultAgenda.fireActivation(org.drools.spi.Activation)
> >>> line: 934
> >>>
> org.drools.common.Scheduler$DuractionJob.execute(org.drools.time.JobContext)
> >>> line: 70
> >>>
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> >>> line: 132
> >>>
> org.drools.time.impl.JDKTimerService$JDKCallableJob.call()
> >>> line: 110
> >>>
> java.util.concurrent.FutureTask$Sync.innerRun() line: 269
> >>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(java.util.concurrent.FutureTask<V>).run()
> >>> line: 123
> >>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
> >>> line: 65
> >>>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run()
> >>> line: 168
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
> >>> line: 650
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run() line:
> >>> 675
> >>> java.lang.Thread.run() line: 595
> >>>
> >>> The rule for this task looks like:
> >>> rule "DETECT MONITORING HAS STOPPED"
> >>> duration(10s)
> >>> salience 1000
> >>>   when
> >>>     lastNotification :
> >>> DataServerNotification($resourceName :
> resourceName)
> >>>
> >>>   from entry-point "MONITORING"
> >>>
> >>>     not (DataServerNotification(resourceName
> ==
> >>> $resourceName,
> >>>
> >>>
> >>>    this after [10s] lastNotification)
> >>>
> >>>   from entry-point "MONITORING")
> >>>
> >>>     not (ManagerFailedAlarm(expired ==
> false,
> >>>
> >>>    resourceName == $resourceName))
> >>>
> >>>     not (DataSource(name == $resourceName,
> >>>
> >>>    state == ResourceState.SHUNNED ||
> >>>
> >>>    state == ResourceState.FAILED))
> >>>
> >>>   then
> >>>     Object[] params = {$resourceName};
> >>>     if (policyMgr.getMode() !=
> >>> ClusterPolicyManagerMode.MAINTENANCE)
> >>>     {
> >>>
> >>>
> lastNotification.setResourceState(ResourceState.UNKNOWN);
> >>>       ManagerFailedAlarm alarm =
> >>>
> >>>         new
> >>> ManagerFailedAlarm(lastNotification, "rule
> detected monitor
> >>> stop",
> >>>
> >>>
> >>>    6, AlarmSeverity.FAULT);
> >>>       logger.info(alarm.toString());
> >>>       insert(alarm);
> >>>       update(lastNotification);
> >>>      }
> >>> end
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> _______________________________________________
> >>> rules-dev mailing list
> >>> [hidden email]
> >>> https://lists.jboss.org/mailman/listinfo/rules-dev
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> rules-dev mailing list
> >> [hidden email]
> >> https://lists.jboss.org/mailman/listinfo/rules-dev
> >>
> >>
> >
> >
> >
> > --
> > Michael D Neale
> > home: www.michaelneale.net
> > blog: michaelneale.blogspot.com
> >
> > _______________________________________________
> > rules-dev mailing list
> > [hidden email]
> > https://lists.jboss.org/mailman/listinfo/rules-dev
> >
> > _______________________________________________
> > rules-dev mailing list
> > [hidden email]
> > https://lists.jboss.org/mailman/listinfo/rules-dev
> >
>
>
>
> --
> Michael D Neale
> home: www.michaelneale.net
> blog: michaelneale.blogspot.com
>
> _______________________________________________
> rules-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/rules-dev
>


     

_______________________________________________
rules-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/rules-dev