Resource pool (QB 5.0.29) #2526

Jonathan · 1 decade ago

I like the node based resource system in QuickBuild, it works well and covers 90% of what we want to do. However we have a good use case for "resource pools" e.g. Three agents provide a single instance of ResourceA. A step that requires ResourceA can run on Agent1, 2 or 3 but only one instance of the step can run at a time. I don't want to bore you with the gory details of why we need this, however I would like some assistance with my implementation. Of course if you would like to implement "resources pools" in an upcoming release that would be most appreciated <img src="{SMILIES_PATH}/icon_smile.gif" alt=":)" title="Smile" />

In this case I want to use the "resource pool" idea to link three build agents i.e. before a build agent can execute a step it first has to acquire a "mutex". Once Agent1 has the mutex, Agents 2 and 3 must wait before they can execute a step. Each build agent specifies the name of the mutex it requires via it's user attributes. I have created a variable at the root of our configuration tree with the value 0 to represent the mutex. If the node that is chosen to run a step requires the mutex it first increments the variable during it's node selection script. However if the attribute does not have a value of 0 the node selection script returns false. In a post execution script the value is decremented i.e. the resource is freed.

Node selection script:


groovy:
// First remove any nodes that don't have required resources.
String s = vars.getValue("resourceList");
def list = s.tokenize(',');
for (resource in list) {
  if (node.hasResource(resource.trim()) == false) {
    return false;
  }
}
// Next: does this node require the mutex?
String mutex = node.getAttribute("qb.dbf.mutex");
if (mutex == null) {
  return true;
}
// If so increment the number of nodes locking the mutex.
def lock = vars.get(mutex).increase(true);
// If the mutex was already locked remove our reference and return false.
if (lock > 0) {
  vars.get(mutex).decrease(true);
  return false;
}
return true;

The above script seems to work, one node is able to acquire the mutex everyone else has to wait.

Post-execute script:


groovy:
// Does the node require a mutex, if so it must have locked the mutex to get here.
String mutex = node.getAttribute("qb.dbf.mutex");
if (mutex != null && vars.get(mutex).getIntValue() > 0) {
  // release our lock.
  vars.get(mutex).decrease(true);
}

My configuration setup is as follows:
>- Parallel step:
>>- Sequential step:
>>>- Dummy step: acquires mutex (runs node selection script).
>>>- Ant step: runs build.
>>>- Dummy step: releases mutex (runs post-execute script).

In the post-execute code the value of the variable is always zero. Note that the Node Selection script and Post-Exectute script are run in different steps, however they are contained in the same sequential parent and are guaranteed to run on the same node.

I have added logging to these steps and can see the "resource" being acquired while all other parallel steps wait. I can see the post-execute code run on the correct node however the value of the variable is zero as such decrease() is not called and all other parralel steps continue to wait until the build times out.

Any help or thoughts on how to achieve the same result in a different way are appreciated.

replies 4
views 3638
stars 0

robinshen ADMIN · 1 decade ago

This seems quite complex and error-prone to me. How about define resourceA on a controlling agent (can be a separate agent or any one of agent1, 2 and 3). Then put the working step requiring resourceA inside a container and configure the container to require resourceA instead. Node selection of the working step can select any one of agent1, 2 and 3. This way once an instance of working step is running, no other step instance can run with the help of controlling agent.

Jonathan · 1 decade ago

I have created a feature request for resource pools and virtual agents that can control Xen, ESXi etc. snapshots:
http://track.pmease.com/browse/QB-1801
http://track.pmease.com/browse/QB-1802

These two features encapsulate what we are trying to achieve.

After doing some testing I have noticed something odd. Variables that are set or modified during a node selection script are not accessible in later steps or even in different parts of the same step. To re-create this:

1- Create a configuration with two Dummy steps (StepA and StepB), ensure that StepB runs after StepA.
2- Create a variable "_test" with the value zero.
3- In the node selection script for StepA use the following script.


groovy:
logger.info(node.getAddress() + ":1:" + vars.getValue("_test"));
vars.get("_test").increase(true);
logger.info(node.getAddress() + ":2:" + vars.getValue("_test"));
return true;

4- In the pre-execute script for StepA use the following script.


groovy:
logger.info(node.getAddress() + ":3:" + vars.getValue("_test"));

5- In the pre-execute script for StepB use the following script.


groovy:
logger.info(node.getAddress() + ":4:" + vars.getValue("_test"));
vars.get("_test").decrease(true);
logger.info(node.getAddress() + ":5:" + vars.getValue("_test"));

With two build agents (and cutting out all the extraneous entries) you will get the following output:


21:40:30,445 INFO  - Agent1:8811:1:0
21:40:30,453 INFO  - Agent1:8811:2:1
21:40:30,454 INFO  - Agent2:8811:1:1
21:40:30,462 INFO  - Agent2:8811:2:2
21:40:10,056 INFO  - Agent2:8811:3:0
21:40:13,734 INFO  - Agent2:8811:4:0
21:40:14,599 INFO  - Agent2:8811:5:-1

As you can see the value of the variable _test is known inside the node selection script, it is incremented for each build agent as I would expect. However the value is not available outside of this step or even in other parts of the same step. Now change the setup and modify StepA to move the script from the node selection script to execute condition script:


22:06:31,268 INFO  - Agent2:8811:1:0
22:06:31,276 INFO  - Agent2:8811:2:1
22:06:28,099 INFO  - Agent2:8811:3:1
22:06:31,721 INFO  - Agent2:8811:4:1
22:06:32,593 INFO  - Agent2:8811:5:0

Note the values of log entries 3, 4 and 5. The value of the variable is now available outside of the step.

Is this a bug? If not it is extremely unfortunate and I would suggest should be changed. Can you suggest a workaround or let me know when this is likely to be fixed.

robinshen ADMIN · 1 decade ago

This should be a bug, please file a bug request a track.pmease.com, and we will try to get it addressed in 5.1 rc1 in a few days.

Jonathan · 1 decade ago

I have created a track issue for this bug:

http://track.pmease.com/browse/QB-1806