Overview of two scientists accredited with the beginning of behavior change science, the differences between Positive Reinforcement, Negative Reinforcement, Positive Punishment, Negative Punishment, and which types I use in my training techniques.
What is Classical Conditioning and Operant Conditioning?
There are two main people who have shaped (a pun, you’ll soon see in a couple of posts) how people see behavior and psychology. This can be the behavior of people, of dogs, of cats, of hawks, of sea lions, and even fish! First up is Ivan Pavlov.
Pavlov, a physiologist discovered the tie between reflexes and an outside stimulus. In 1905, Pavlov did as all scientists do, and tested his hypothesis on dogs. Knowing food causes salivation, he thought that if a neutral sound preceded food, that eventually the neutral sound would have the same physiological effect on the dogs (1). Pavlov’s hypothesis was supported, creating the Theory of Classical Conditioning! Pavlov’s study showed the physiological effect of a dog salivating in the presence of food (an innate behavior, also known as an unconditioned response) can be brought on by the sound of a bell, without the presence of food.
Food (an unconditioned stimulus, UCS) leads to the innate behavior of salivation (unconditioned response, UCR), as most dog owners can attest. And most people understand that any random sound (or a neutral stimulus, NS) would not lead to this innate behavior. If the NS is heard before the arrival of the UCS, the dog would salivate once they saw the UCS. After repeated instances of a NS with the UCS, the sound begins to elicit (or produce) the UCR. The UCR (salivation) then changes to a conditioned response, CR (still salivation, just a different label). This means that every time the specific sound is heard, it elicits the CR. The sound’s label then changes from neutral stimulus to the conditioned stimulus (CS).
Here’s another way to explain Classical Conditioning:
Food is presented (the UCS is presented) → Salivation occurs (the UCR occurs)
A bell is heard (the NS is presented) → No salivation occurs (no Response)
Bell is heard + Food Presented (NS + UCS) → Salivation occurs (UCR occurs)
This is done several times, leading to: Bell is heard → Salivation occurs
If the salivation follows the bell, then:
Bell is heard (CS presented) → Salivation occurs (CR occurs)
Classical Conditioning teaches us that a stimulus can cause a biological response. We see this in ourselves: your heart races when someone’s ringtone is your alarm sound, or you feel sensations of nostalgia when you smell a long-forgotten scent.
Next, we have B.F. Skinner.
Unlike Pavlov’s physiological approach, Skinner completed his psychology PhD in 1931. Skinner investigated how environmental experience leads to learning, and how learning changes behavior. Unlike the psychologists of the time who psychoanalyzed behavior, Skinner created an experimental type of study. This led to the Theory of Operant Conditioning.
With the use of pigeons and rats, Skinner made an Operant Conditioning chamber, or box, to control all external factors. His goals were to teach pigeons to spin in a circle, rats to press a lever, among others, with each experiment using different consequences. He found that different stimuli would either increase or decrease behavior (2). Where Pavlov’s study showed an innate behavioral response in reaction to a paired stimulus (salivation elicited by the sound), Skinner’s showed the learning of consequences. A behavior leads to something happening. And the behavior changed based on what happened!
When the consequence leads the subject to repeat a behavior (or the likelihood of the behavior increases), Skinner labeled the consequence as reinforcement. When the consequence leads the subject to not repeat a behavior (or the likelihood of the behavior decreases), this is known as punishment. Additionally, the stimulus that causes a consequence can be given or taken away, known as positive or negative, respectively.
From Pavlov’s research of seeing how a neutral stimulus can become significant and cause a behavioral reaction, Skinner’s research shows we can use a stimulus to create, or change, a behavior.
So how do the terms in Operant Conditioning relate to each other? And how does this connect to animal training and behavior change?
With these new words, there are four quadrants of operant conditioning:
Positive Reinforcement (R+) : a stimulus is given, or added, that increases the likelihood of the behavior occurring.
Negative Reinforcement (R-) : a stimulus is removed, or negated, that increases the likelihood of the behavior occurring.
Positive Punishment (P+) : a stimulus is given, or added, that decreases the likelihood of the behavior occurring.
Negative Punishment (P-) : a stimulus is removed, or negated, that decreases the likelihood of the behavior occurring.
It’s important to note that the terms Positive and Negative do not refer to an emotional response or feeling. This is the most common mistake for everyone, from first-time dog owners to professional behavior consultants. The terms simply refer to the addition or removal of a stimulus.
Similarly, Reinforcement and Punishment do not always mean a good thing and a bad thing. This is about the likelihood of the behavior occurring again. Negative Reinforcement applies an aversive (‘bad thing’), while Negative Punishment removes an opportunity (‘good thing’).
Here’s an example of each quadrant to help note the differences:
1. R+ : the dog sits, receives a piece of food
Sitting is the behavior you want to happen more often (reinforcement). The consequence is food is given (positive). The food fills a biological need, so the dog will be more likely to sit in order to receive more food.
2. R-- : the dog sits, the pressure on the bum is lessened
Sitting is the behavior you want to happen more often (reinforcement). The consequence is removing the pressure (negative). Sitting removes the unpleasant/aversive pressure.
3. P+ : the dog barks, pressure is felt when their mouth is held closed
Barking is the behavior you want to stop (punishment). The consequence is adding pressure to the muzzle (positive). The pressure is unpleasant/painful/aversive, so the dog will stop barking.
4. P-- : the dog barks, the person looks away and ignores,
Barking is the behavior you want to stop (punishment). The consequence is removing the attention (negative). The dog is barking to keep you engaged, by ignoring you are taking away the social need, leading to less barking.
Traditional dominance training relies on Negative Reinforcement and Positive Punishment. Both of these types of Operant Conditioning use an aversive stimulus. In R--, the dog’s behavior changes to stop the aversive. The aversive has to be applied prior to the behavior, then removed when the behavior is performed. In P+, the dog’s behavior changes to avoid the aversive. The aversive is applied after the behavior is performed. Either way, the end product is the same: increased fear, stress, and aggression between the animal and you (3, 4).
The two quadrants I use are Positive Reinforcement and Negative Punishment. Both of these types of Operant Conditioning use something the animal wants and has no aversive aspect (yes, even Negative Punishment, even though it sounds scary). In R+, a good thing is given after a behavior is performed. In P--, the opportunity for a good thing is delayed until a behavior decreases. The end product involves encouraged learning, increased ability to problem solve, and a stronger relationship between the animal and you.
The Nobel Prize (2021). Ivan Pavlov Biographical. 28 Jan 2021. <https://www.nobelprize.org/prizes/medicine/1904/pavlov/biographical/>
Harvard Brain Tour (2021). Skinner and Behaviorism. 28 Jan 2021. https://braintour.harvard.edu/archives/portfolio-items/skinner-and-behaviorism
Madson, C. (2019). Dog Training Aversives: What are They and Why Should You Avoid Them? https://www.preventivevet.com/dogs/dog-training-aversives
Pryor, K. (2005). Samples of Negative Reinforcement. https://www.clickertraining.com/node/274
Comments