curpriority_cmp
, maybe_resched
,
resetpriority
, roundrobin
,
roundrobin_interval
,
sched_setup
, schedclock
,
schedcpu
, setrunnable
,
updatepri
—
perform round-robin scheduling of runnable processes
#include <sys/param.h>
#include <sys/proc.h>
int
curpriority_cmp
(struct
proc *p);
void
maybe_resched
(struct
thread *td);
void
propagate_priority
(struct
proc *p);
void
resetpriority
(struct
ksegrp *kg);
void
roundrobin
(void
*arg);
int
roundrobin_interval
(void);
void
sched_setup
(void
*dummy);
void
schedclock
(struct
thread *td);
void
schedcpu
(void
*arg);
void
setrunnable
(struct
thread *td);
void
updatepri
(struct
thread *td);
Each process has three different priorities stored in struct
proc: p_usrpri, p_nativepri,
and p_priority.
The p_usrpri member is the user priority of
the process calculated from a process' estimated CPU time and nice
level.
The p_nativepri member is the saved priority
used by propagate_priority
(). When a process obtains
a mutex, its priority is saved in p_nativepri. While
it holds the mutex, the process's priority may be bumped by another process
that blocks on the mutex. When the process releases the mutex, then its
priority is restored to the priority saved in
p_nativepri.
The p_priority member is the actual priority
of the process and is used to determine what
runqueue(9)
it runs on, for example.
The curpriority_cmp
() function compares
the cached priority of the currently running process with process
p. If the currently running process has a higher
priority, then it will return a value less than zero. If the current process
has a lower priority, then it will return a value greater than zero. If the
current process has the same priority as p, then
curpriority_cmp
() will return zero. The cached
priority of the currently running process is updated when a process resumes
from
tsleep(9)
or returns to userland in userret
() and is stored in
the private variable curpriority.
The maybe_resched
() function compares the
priorities of the current thread and td. If
td has a higher priority than the current thread, then
a context switch is needed, and KEF_NEEDRESCHED
is
set.
The propagate_priority
() looks at the
process that owns the mutex p is blocked on. That
process's priority is bumped to the priority of p if
needed. If the process is currently running, then the function returns. If
the process is on a
runqueue(9),
then the process is moved to the appropriate
runqueue(9)
for its new priority. If the process is blocked on a mutex, its position in
the list of processes blocked on the mutex in question is updated to reflect
its new priority. Then, the function repeats the procedure using the process
that owns the mutex just encountered. Note that a process's priorities are
only bumped to the priority of the original process p,
not to the priority of the previously encountered process.
The resetpriority
() function recomputes
the user priority of the ksegrp kg (stored in
kg_user_pri) and calls
maybe_resched
() to force a reschedule of each thread
in the group if needed.
The roundrobin
() function is used as a
timeout(9)
function to force a reschedule every sched_quantum
ticks.
The roundrobin_interval
() function simply
returns the number of clock ticks in between reschedules triggered by
roundrobin
(). Thus, all it does is return the
current value of sched_quantum.
The sched_setup
() function is a
SYSINIT(9)
that is called to start the callout driven scheduler functions. It just
calls the roundrobin
() and
schedcpu
() functions for the first time. After the
initial call, the two functions will propagate themselves by registering
their callout event again at the completion of the respective function.
The schedclock
() function is called by
statclock
() to adjust the priority of the currently
running thread's ksegrp. It updates the group's estimated CPU time and then
adjusts the priority via resetpriority
().
The schedcpu
() function updates all
process priorities. First, it updates statistics that track how long
processes have been in various process states. Secondly, it updates the
estimated CPU time for the current process such that about 90% of the CPU
usage is forgotten in 5 * load average seconds. For example, if the load
average is 2.00, then at least 90% of the estimated CPU time for the process
should be based on the amount of CPU time the process has had in the last 10
seconds. It then recomputes the priority of the process and moves it to the
appropriate
runqueue(9)
if necessary. Thirdly, it updates the %CPU estimate used by utilities such
as ps(1)
and top(1)
so that 95% of the CPU usage is forgotten in 60 seconds. Once all process
priorities have been updated, schedcpu
() calls
vmmeter
() to update various other statistics
including the load average. Finally, it schedules itself to run again in
hz clock ticks.
The setrunnable
() function is used to
change a process's state to be runnable. The process is placed on a
runqueue(9)
if needed, and the swapper process is woken up and told to swap the process
in if the process is swapped out. If the process has been asleep for at
least one run of schedcpu
(), then
updatepri
() is used to adjust the priority of the
process.
The updatepri
() function is used to adjust
the priority of a process that has been asleep. It retroactively decays the
estimated CPU time of the process for each
schedcpu
() event that the process was asleep.
Finally, it calls resetpriority
() to adjust the
priority of the process.
The curpriority variable really should be per-CPU. In
addition, maybe_resched
() should compare the priority
of chk with that of each CPU, and then send an IPI to
the processor with the lowest priority to trigger a reschedule if needed.
Priority propagation is broken and is thus disabled by default.
The p_nativepri variable is only updated if a process
does not obtain a sleep mutex on the first try. Also, if a process obtains
more than one sleep mutex in this manner, and had its priority bumped in
between, then p_nativepri will be clobbered.