Scalable Work-Stealing Load-Balancer for HPC Distributed Memory Systems