This paper presents an OCPA (operant conditioning probabilistic automaton) bionic autonomous learning system based on Skinner's operant conditioning theory for solving the balance control problem of a two-wheeled flexible robot. The OCPA learning system consists of two stages: in the first stage, an operant action is selected stochastically from a set of operant actions and then used as the input of the control system; in the second stage, the learning system gathers the orientation information of the system and uses it for optimization until achieves control target. At the same time, the size of the operant action set can be automatically reduced during the learning process for avoiding little probability event. Theory analysis is made for the designed OCPA learning system in the paper, which theoretically proves the convergence of operant conditioning learning mechanism in OCPA learning system, namely the operant action entropy will converge to minimum with the learning process. And then OCPA learning system is applied to posture balanced control of two-wheeled flexible self-balanced robots. Robot does not have posutre balanced skill in initial state and the selecting probability of each operant in operant sets is equal. With the learning proceeding, the selected probabilities of optimal operant gradually tend to one and the operant action entropy gradually tends to minimum, and so robot gradually learned the posture balanced skill.
Operant conditioning is one of the fundamental mechanisms of animal learning, which suggests that the behavior of all animals, from protists to humans, is guided by its consequences. We present a new stochastic learning automaton called a Skinner au- tomaton that is a psychological model for formalizing the theory of operant conditioning. We identify animal operant learning with a thermodynamic process, and derive a so-called Skinner algorithm from Monte Carlo method as well as Metropolis algo- rithm and simulated annealing. Under certain conditions, we prove that the Skinner automaton is expedient, 6-optimal, optimal, and that the operant probabilities converge to the set of stable roots with probability of 1. The Skinner automaton enables ma- chines to autonomously learn in an animal-like way.