# Size- and Energy-aware Task Assignment in Server Farms

Esa Hyytiä http://www.netlab.tkk.fi/u/esa/ Docent, Senior Research Scientist, Department of Communications and Networking, Aalto University, Finland

Summary:

The tutorial focuses on task assignment (dispatching) problems arising e.g. in computer systems (web-server farms, super computing, data centres, routing). In the tutorial, we approach the task assignment problem in the framework of Markov decision processes and, in particular, we apply the (first) policy iteration step to derive efficient dispatching rules that take into account the current state and scheduling disciplines in the servers, as well as, the anticipated future arrivals. The tutorial is in large part based on the recent results presented in [1 - 11].

In task assignment problems, one assigns a server (a queue) for each job upon arrival. The assignment policy aims at minimizing the mean response time, a fairness criteria such as slowdown, energy related costs, or some other objective based on job-specific holding cost rates. The optimal assignment depends on the available information, the server system and the local scheduling disciplines in the servers. Only a few optimality results are however available. For minimizing the mean response time, round-robin is known to be the optimal when one only knows that all servers were in the same state initially [12 - 14]. On the other hand, Winston [15] has shown the optimality of the Join-the-Shortest-Queue (JSQ) in the setting of Poisson arrivals and exponentially distributed service times and identical servers. Similarly, the Size-Interval-Task-Assignment (SITA) policy, proposed by Harchol-Balter et al., can be shown to be the optimal static policy when the service time of the new job is known. However, in general one has to resort heuristic dispatching policies, e.g., if the job sizes are known (cf. file sizes) or energy consumption and setup delays are included.

In this tutorial, we show how near-optimal dispatching policies can be developed within the framework of Markov decision processes. To this end, one needs the so-called value function for the system (when its operating under some basic policy) w.r.t. the given objective (e.g., response time, energy). We derive the size-aware value functions for M/G/1 operating under FCFS, LCFS, SPT, SRPT, SPTP and PS, including also the setup delay if the server is switched off when idle. By carrying out the policy improvement step (or the so-called Lookahead), we obtain improved, and often near-optimal, operating policies that take into account the particular cost structure and the anticipated future arrivals. We also consider the joint-optimization of switching off servers and task assignment in a server farm.

Outline:

1. Task assignment problem

• Description of the problem setting (parallel servers, local scheduling disciplines)

• Different objectives (response time, slowdown, general holding costs, energy)

• Review of optimality results

2. Markov decision processes (MDP)}

• Introduction to MDPs, value functions and the first policy iteration (FPI)

• Past work utilizing FPI (no size information, e.g. [16 - 17])

3. Value functions for single-server M/G/1 queues with setup delay}

• Derivation of the value function for FCFS, LCFS, SPT, SRPT,SPTP and PS w.r.t. response time

• Optimality of scheduling disciplines: SPTP w.r.t. slowdown, SRPT w.r.t. response time

• Derivation of the value functions for energy (switching and processing costs), with setup delay

4. FPI-based dispatching policies

• Derivation of efficient dispatching policies:

• FPI step when applied to SITA is shown to yield ``SITA with dynamic thresholds''

• Applicability of the approach to heterogeneous systems:

5. Lookahead policies

• Introduction to the Lookahead approach (a top heuristic to this date, which also utilizes value functions)

• Derivation of the Lookahead policy for FCFS and LCFS servers

• Lookahead policy in terms of energy costs

6. Examples

• Near-optimal task assignment to parallel servers w.r.t. response time and slowdown

• Comparison of FPI and Lookahead to the optimal policy in case of fixed size jobs and heterogeneous servers

• Joint-optimization of switching off policy and routing of jobs in server farms (energy-aware)

References:

1. E. Hyytiä, ``Optimal routing of fixed size jobs to two parallel servers,'' INFOR: Information Systems and Operational Research}, 2014, to appear.

2. E. Hyytiä, R. Righter, and S. Aalto, ``Task assignment in a heterogeneous server farm with switching delays and general energy-aware cost structure,'' Performance Evaluation, 2014, to appear.

3. E. Hyytiä, and S. Aalto, ``Round-robin routing policy: Value functions and mean performance with job- and server-specific costs,'' in 7th Int. Conf. on Performance Evaluation Methodologies and Tools (ValueTools), Torino, Italy, Dec. 2013.

4. E. Hyytiä, ``Lookahead actions in dispatching to parallel queues,'' Performance Evaluation, vol. 70, no. 10, pp. 859--872, 2013

5. E. Hyytiä, and S.Aalto, ``To split or not to split: Selecting the right server with batch arrivals,'' Operations Research Letters, vol.41, no.4, pp. 325--330, Jul. 2013

6. E. Hyytiä, S. Aalto, and A. Penttinen, ``Minimizing slowdown in heterogeneous size-aware dispatching systems,'' ACM SIGMETRICS Performance Evaluation Review, vol. 40, pp. 29--40, Jun. 2012

7. E. Hyytiä, S. Aalto, A. Penttinen, and J. Virtamo, ``On the value function of the M/G/1 FCFS and LCFS queues,'' J. of Applied Probability, vol. 49, no. 4, pp. 1052--1071, Dec. 2012.

8. E. Hyytiä, A. Penttinen, and S. Aalto, ``Size- and state-aware dispatching problem with queue-specific job sizes,'' European J. of Operational Research, vol. 217, no. 2, pp. 357--370, Mar. 2012.

9. A. Penttinen, E. Hyytiä, and S. Aalto, ``Energy-aware dispatching in parallel queues with on-off energy consumption,'' in 30th IEEE Int. Performance Computing and Communications Conf. (IPCCC), Orlando, FL, USA, Nov. 2011.

10. E. Hyytiä, J. Virtamo, S. Aalto, and A. Penttinen, `` M/M/1-PS queue and size-aware task assignment,'' Performance Evaluation, vol. 68, no. 11, pp. 1136--1148, Nov. 2011

11. E. Hyytiä, A. Penttinen, S. Aalto, and J. Virtamo, ``Dispatching problem with fixed size jobs and processor sharing discipline,'' in 23rd Int. Teletraffic Congress (ITC'23), San Fransisco, USA, Sep. 2011, pp. 190--197.

12. A. Ephremides, P. Varaiya, and J. Walrand, ``A simple dynamic routing problem,'' IEEE Trans. on Automatic Control, vol. 25, no. 4, pp. 690--693, Aug. 1980.

13. Z. Liu and D. Towsley, ``Optimality of the round-robin routing policy,'' J. of Applied Probability, vol. 31, no. 2, pp. 466--475, Jun. 1994.

14. Z. Liu and R. Righter, ``Optimal load balancing on distributed homogeneous unreliable processors,'' Operations Research, vol. 46, no. 4, pp. 563--573, 1998.

15. W. Winston, ``Optimality of the shortest line discipline,'' J. of Applied Probability, vol. 14, pp. 181--189, 1977.

16. K. R. Krishnan, ``Joining the right queue: a state-dependent decision rule,'' IEEE Trans. on Automatic Control, vol. 35, no. 1, pp. 104--108, Jan. 1990.

17. K. R. Krishnan, ``Markov decision algorithms for dynamic routing,'' IEEE Communications Magazine, pp. 66--69, Oct. 1990.