Reward Schedule

What is a reward schedule?

A reward schedule is a plan that specifies how often and under what conditions, a behaviour or response will be rewarded. In order to maintain our dogs' trained behaviours, we need to have a thorough understanding of the use of reward schedules.

What is a Continuous Reward Schedule (CRS)?

While a dog is learning a new behaviour, a continuous reward schedule will be most effective in providing the animal with a clear understanding of the required action or behaviour to achieve the reward. The Law of Effect is behind this learning process: when a response produces reinforcement, that response increases in probability. All schedules of reinforcement are dependent on a correct response by the dog. In a continual reward schedule, every correct response is rewarded. Imagine the increased difficulty for a dog if, whilst trying to understand exactly what response it is that is achieving the reward, the reward only happens on every second or third correct response.

What is Extinction?

Having established the continuous reward schedule, it would now be a very simple matter of extinguishing the learned response, by simply no longer rewarding it at all. The behaviour will begin to diminish from the very first time that it is not reinforced.

What is a Fixed Ratio Reward Schedule (FR)?

Behaviour learned on a continuous reward schedule will not become extinct on the first occasion that it is not rewarded. The animal will try it a number of times before giving up. We might decide to reward every third response - a fixed ratio reward schedule of 3 (FR3). Once the animal is performing reliably at this level, understanding that it must give three responses before the reward is achieved, we could move on to rewarding, perhaps, every fifth correct response - FR5. This progression could continue gradually to a considerably high figure (depending on the response required and the individual animal involved).

To be effective, the progression must be gradual. If we tried to progress from a Continuous Reward Schedule to a FR20, extinction of the response will have happened before the first reward has been achieved.

A good example of using a Fixed Ratio Reward Schedule is training a dog to do mathematics. Firstly, we train the dog to bark on command. On a Continuous Reward Schedule, every bark response is rewarded. Once the dog is reliably barking on command, we will fail to reward on one occasion. The dog will most likely bark again in an attempt to get his reward - we will reward this second bark. The dog is now on a FR2. On achieving reliable performance, we will progress to higher FR schedules - the dog will continue to bark until rewarded - he is effectively now doing mathematics.

What is a Fixed Interval Reward Schedule (FI)?

The reward is still dependent on a correct response, but only after the lapse of a set period of time. We use this schedule in training the Beagles to hold the sit response position for longer and longer lengths of time. This will begin to simulate the airport situation of having to question the passenger and search the bag.

As with the FR schedule, progression must be gradual to be most effective. If we have been reinforcing sits as soon as they happen and then expect the dog to hold the sit for 30 seconds before rewarding, the dog would probably have broken position before the time had lapsed.

With the continued barking on command exercise, we could use a Fixed Interval Reward Schedule to get the dog to continue barking for a length of time before the reward is given.

What is a Variable Ratio Schedule (VR)?

On a variable ratio reward schedule, the number of responses required to achieve the reward will vary. For example, on a variable ratio schedule of 10 (VR10), to achieve a reward, the dog might be required to firstly make 4 responses, then 14 responses, then 6 responses, then 2 responses, then 18 responses and so on. The value of the VR schedule is the average number of responses required to earn the reward.

When training dogs, we attempt to give the reward for the best responses, eg the straightest sit, the quickest drop, etc. Make a gradual progression, start on a VR2 and work upwards.

Training our dog to continue barking on command until praised/rewarded, to make it appear as if he can do mathematics, will ultimately require Variable Schedules of reward. If we were to reach a FR100, we would find that the dog's rate of response immediately after he has received a reward, will be greatly reduced. This is because the dog knows that he has to bark many times before the next reward will come. However, if the dog cannot predict when the next reward is likely to happen, he will be keen to try in case this occasion will be the one. This is the principle that the poker machine operates on us.

What is a Variable Interval Schedule (VI)?

Reward will depend on a correct response after the lapsing of a period of time. However, the time frame between each reward will vary. For example, a variable interval schedule of 5 minutes, will find the reward happening after 4 mins, then after 2.5 mins, then 20 seconds, then 6 minutes, then 5 mins, then 8 mins and so on. There is no way of predicting when the reward will come.

Progression must be gradual - start on a VI schedule of 5 seconds. Training a dog to hold the sit stay position will be most effective, after the initial learning, when we progress onto a Variable Interval Schedule.

While it may appear obvious that the variable schedules of reward will be most effective in maintaining behaviours, it is important to understand the progression to reach this point.

Rewards Schedule notes

Extinction training will take effect very quickly when the dog has been on a continuous reward schedule. However, having progressed onto FI or FR, it will take more unrewarded responses to produce extinction. It will take many more unrewarded responses to produce extinction, the higher a level of VI or VR schedule that the dog has been brought to.

For most effective initial learning, use the Continuous Reward Schedule. Progress onto low levels of Fixed Ratio and Fixed Interval Schedules. Ultimately, for strongest maintenance use Variable Ratio & Variable Interval Schedule.

Patterns of Responding Using the Four Reward Schedules

The downward dashes are reinforcements and the rate of responding, in time, is indicated by the slope of the curve. Note that the variable schedules maintain steady rates of responding. Fixed schedules maintain patterns of responding, with a pause after each reinforcement.

It is possible to maintain responding of several hundred correct responses to achieve one reinforcement - the trick is to obtain this level gradually.