[argobots-discuss] constraining overall stack memory allocation

16 Jun 2022

      Hi all,

I was rummaging around in the code looking for ideas just now and 
figured I might save myself some time by asking on the list to see if 
anyone else has encountered this.

A quick review of the use case: we are using large stack sizes (2 MiB 
right now, though we could probably go lower but it will still be much 
larger than the ABT default).  We also create, execute, and complete a 
large number of detached ULTs.  Only a very few are intentionally long 
lived.

Our current strategy is that a central producer (who drives network 
progress) creates ULTs that may be placed on other pools/ESs depending 
on configuration.

I had *thought* that the ULT stacks were not allocated until the ULT was 
selected for execution by a scheduler, but I see now that's not the 
case.  The stack is allocated up front at ABT_thread_create() time.  I'm 
kicking myself for not understanding that sooner.  It didn't matter so 
much when we used to use small stack sizes.

At any rate, at this point this strategy has a few implications. If the 
ES schedulers don't retire old ULTs fast enough (even if they are very 
"close" to completion) then we can balloon memory consumption even if it 
doesn't look like our actual concurrency is all that high, simply 
because we are greedily taking more memory for stacks without regard to 
ULT completion.  Secondly, the one producer is always paying the 
allocation cost, and the memory is always local to that one core.

What would be ideal for me would be if ABT_thread_create() would defer 
stack allocation somehow.  Ideally not consuming so much memory for a 
thread until a) it can really be executed and b) the scheduler thinks it 
is a good idea to do so.  Even better if the the allocation were in the 
context of the ES that popped the thread, rather than the ES that 
spawned the thread.

Is this possible?

It would be neat if this could be done internal to Argobots somehow for 
generality for my use case, but walking through the code I have the 
sinking feeling that we need to do this above Argobots (explicitly 
queueing up work and letting the "worker" execution streams create their 
own ULTs to perform that work a needed, rather than letting the ULT 
pools within Argobots serve double duty as our work queue).

I'm comfortable with custom pools and schedulers, but it looks like the 
key step is already out of our hands at ULT creation time so there isn't 
much a custom pool or scheduler could do.

Thanks for hearing me out, and thanks in advance for feedback (even if 
it takes the form of "that's a silly idea" :) ).

thanks,

-Phil

[argobots-discuss] constraining overall stack memory allocation

Phil Carns