Reinforcement Learning to Invert the Kapitza Oscillator

On this page, you can find supplementary videos to the paper Reinforcement Learning to Autonomously Prepare Floquet-Engineered States: Inverting the Quantum Kapitza Oscillator [1].

The movies below show three stages of time evolution:

(i) the oscillator is subject to the control field in the presence of the Floquet drive. The arrow direction indicates the direction of the horizontal control kick.
(ii) Once the control stage is over, the Floquet-drive is kept on but the control field is turned off.
(iii) Both the Floquet drive and the control are turned off, and the system evolves under the free oscillator Hamiltonian $H_0$, see paper.

Movie 1 shows the real-space probability distribution of being in the target state for the quantum Kapitza oscillator, following the best-encountered RL protocol. The control process takes $N_T=15$ drive cycles, and the drive protocol contains $8$ steps per cycle. The oscillator parameters are $N_T=15$ periods with $8$ steps each, $\Omega/\omega_0=10$, $A=2$ and $m\omega_0=1$.

Movie 2 shows the real-space probability distribution of being in the target state for the quantum Kapitza oscillator, following the best Stochastic Descent protocol. The control process takes $N_T=15$ drive cycles, and the drive protocol contains $8$ steps per cycle. The oscillator parameters are $N_T=15$ periods with $8$ steps each, $\Omega/\omega_0=10$, $A=2$ and $m\omega_0=1$.

Movie 3 shows the classical Kapitza pendulum dynamics, following the best-encountered RL protocol. The control process takes $N_T=4$ drive cycles, and the drive protocol contains $8$ steps per cycle. The pendulum parameters are $N_T=4$ periods with $8$ steps each, $\Omega/\omega_0=10$, $A=2$ and $m\omega_0=1$.

References:
[1] M.B., arXiv: 1808.08910 (2018).