GEAR: according to the Volker's comment:
"""The memory consumption of the domain decomposition in the public gadget2
code scales badly with processor number, and I suspect this has a lot to
do with your trouble for large CPU number. You can try the following
change: In the routine "domain_topsplit_local()", you find a line
if(TopNodes[sub].Count > All.TotNumPart / (TOPNODEFACTOR * NTask * NTask))
Change the denominator to something like
(TOPNODEFACTOR * 8 * NTask)"""
We add a 8* in domain_topsplit_local(). The 8 was missing !!!