Learning a Car Driving Simulator to enable Deep Reinforcement Learning

BearbeiterIn:Tim Nisslbeck
Titel:Learning a Car Driving Simulator to enable Deep Reinforcement Learning
Typ:masters thesis
Betreuer:Mutschler, C.; Edelhäußer, T.; Philippsen, M.
Status:abgeschlossen am 2. Juli 2018

Programming in Python, C/C++, Reinforcement Learning


Over the past decades autonomy has become a goal in robotics. Think of a robot that performs some desired behavior by controlling itself without external control inputs (for instance, commands by a human). A policy usually makes those decisions for control inputs of the robot automatically. In control engineering, the traditional approach to create such a policy is to design controllers based on expert knowledge on the dynamics associated with the desired behavior. Lately, policies for robots have been determined by applying machine learning methods, such as (deep) reinforcement learning. However, learning-based approaches to find satisfactory policies for a given task require large training data sets (whose sizes depend on the complexity of the task and on the learning algorithm). Instead of collecting large amounts of data in the real world, new data streams can be automatically generated through the interaction between a simulator and a reinforcement learning agent (for instance the robot). Thus, creating a simulator reduces the necessary amount of manual data collection and presents itself as a cost-effective possibility to automate the process of policy learning.

For a wide range of tasks, policies to model complex robot behaviors have already been learned, e.g., for flying a helicopter [1], for controlling cars [2-5], and for controlling the arms of a robot [6]. Previous research on policy learning for car driving relies on simulators that describe the vehicle physics [4, 7, 8]. Similar to the approaches for finding policies, these simulators as well either can be modeled with expert knowledge, or they can be determined by means of machine learning. An example for the expert-based approach is the open racing car simulator (TORCS) [9] which can generate training data for policy learning. However, depending on the real-world environmental conditions and attributes of specific vehicles, the modeled vehicle physics and resulting data streams often diverge from those observed in the real world. While this is where learning-based simulators can close the gap between real world and simulated world, there is only little such research for car driving. Santana & Hotz [10] proposed a framework to predict future video frames of car driving based on previous frames using several neural networks. In particular, they implemented a naive recurrent neural network to generate new data. Cutler & How [13] proposed a framework that finds control policies for car drifting. They start from randomly generated policies, then gradually improve them by applying shallow reinforcement learning before the policies are tested in simulators, which are based on their previous work [11, 12]. Finally, they evaluate the best control policies in the real world using a radio-controlled car. Sensor readings are then fed back into the learning process. In contrast to existing learning-based simulators [10-12, 14] that either apply shallow machine learning or predict image data streams, this thesis will apply deep learning to predict motion data streams. More precisely, a deep neural network will learn to approximate the vehicle physics to predict the unknown state-transition dynamics of the real-world environment. The choice of using deep neural networks for this thesis is based on their success in various other fields such as computer vision and games [15-17].

The thesis will adopt the following strategy for training and testing the simulator:

  • As it is necessary for training the simulator, it is a subtask of the thesis to manually collect da-ta, i.e., correctly measure the state of a moving RC car over time to capture the vehicle dy-namics of the car in its real-world environment. To do so, a Tamiya TT-02 car [18] will be man-ually maneuvered in a specified driving area that is covered by a motion capture system. In this thesis, the Advanced Realtime Tracking (ART) motion capture system [19] should estimate the car’s absolute pose, i.e., its orientation and position. On-board sensors such as an accelerometer and a gyroscope should also be used as they provide additional relative motion data. Moreover, the control inputs by a human expert used to perform several car maneuvers should be recorded. The thesis should argue for the selection of the maneuvers (as the maneuvers should cover a broad range of available states and transitions). Data from all sources then has to be fused together to a single time series data stream with a unified time-reference.
  • This thesis should then train the deep neural network representing the simulator by applying an appropriate learning algorithm, e.g., any gradient descent optimization algorithm. To validate the prediction model, its ability to generalize to any given input has to be tested. This thesis chooses and adopts one of the following two ways for both training and testing the prediction model: (1) One common model validation technique is cross-validation in which the collected data streams are split into training and test sets. The training sets are used to train the model before the model’s generalization is evaluated using unseen test sets. This evaluates the simulator with the recorded data sets. (2) An alternative model validation technique implicitly evaluates the simulator. This thesis may also learn policies using data generated by the simulator for a number of car driving maneuvers. These policies can then be applied using the Tamiya car in the real world. If the policies are successfully executed in the real world this not only proves that the simulator provides reliable state-transition dynamics it also proves that the simulator generalizes to environment dynamics. This thesis should argue for the selected evaluation strategy and compare its advantages and disadvantages for the application.


  • Study related work on autonomous car driving
  • Study related work on driving simulators learned by means of machine learning
  • Study related work on (deep) reinforcement learning
  • Capture and analyze relevant sensor data to obtain the car’s state by manually driving the car
  • Implement and train a car driving simulator using deep learning
  • Evaluate the simulator by applying one of the described ways for model validation
  • Elaborate the results

[1] H. J. Kim, M. I. Jordan, S. Sastry, and A. Y. Ng: Autonomous helicopter flight via reinforcement learning. In Advances in neural information processing systems. pp. 799–806. 2004.
[2] A. Liniger, A. Domahidi, and M. Morari: Optimization-based autonomous racing of 1:43 scale RC cars. In Optimal Control Applications and Methods. Vol. 36, No. 5. pp. 628–647. 2015.
[3] J. Z. Kolter, C. Plagemann, D. T. Jackson, A. Y. Ng, and S. Thrun: A probabilistic approach to mixed open-loop and closed-loop control, with application to extreme autonomous driving. In Proc. IEEE International Conference on Robotics and Automation (ICRA) (Anchorage, Alaska). pp. 839– 845. 2010.
[4] R. Y. Hindiyeh and J. C. Gerdes: A controller framework for autonomous drifting: Design, sta-bility, and experimental validation. Journal of Dynamic Systems, Measurement, and Control. Vol. 136, No. 5. 2014.
[5] H. Xu, Y. Gao, F. Yu, and T. Darrell: End-to-end learning of driving models from large-scale video datasets. arXiv preprint arXiv:1612.01079. 2016.
[6] S.Levine,C.Finn,T.Darrell,and P.Abbeel: End-to-end training of deep visuomotor policies. Journal of Machine Learning Research. Vol. 17, No. 39. pp. 1–40. 2016.
[7] A. Ganesh, J. Charalel, M. D. Sarma, and N. Xu: Deep reinforcement learning for simulated autonomous driving. 2016.
[8] C. Chen, A. Seff, A. Kornhauser, and J. Xiao: Deepdriving: Learning affordance for direct perception in autonomous driving. In Proc. IEEE International Conference on Computer Vision. pp. 2722–2730. 2015.
[9] B. Wymann, C. Dimitrakakisy, A. Sumnery, and C. Guionneauz: Torcs: The open racing car simulator. 2015.
[10] E. Santana and G. Hotz: Learning a driving simulator. arXiv preprint arXiv:1608.01230. 2016.
[11] M. Cutler, T. J. Walsh, and J. P. How: Real-world reinforcement learning via multifidelity simulators. In Proc. IEEE Transactions on Robotics (T-RO). pp. 655–671. 2015.
[12] M. Cutler and J. P. How: Efficient reinforcement learning for robots using informative simulated priors. In Proc. IEEE International Conference on Robotics and Automation (ICRA). pp. 2605–2612. 2015.
[13] ——: Autonomous drifting using simulation-aided reinforcement learning. In Proc. IEEE International Conference on Robotics and Automation (ICRA) (Stockholm, Sweden). pp. 5442–5448. 2016.
[14] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei- Fei, and A. Farhadi: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proc. IEEE International Conference on Robotics and Automation (ICRA). pp. 3357–3364. 2017.
[15] Y. LeCun, Y. Bengio, and G. Hinton: Deep learning. Nature. Vol. 521, No. 7553. pp. 436–444. 2015.
[16] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwie-ser, I. Antonoglou, V. Panneershelvam, M. Lanctot: Mastering the game of go with deep neural networks and tree search. Nature. Vol. 529, No. 7587. pp. 484–489. 2016.
[17] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski: Human-level control through deep reinforcement learning. Nature. Vol. 518, No. 7540. pp. 529–533. 2015.
[18] Tamiya, "Tamiya TT-02 – Radio Control 4WD High Performance Racing Car,” http://www.tamiya.com/english/rc/rcmanual/tt02.pdf, accessed: 2017-09-19.
[19] Advanced Realtime Tracking GmbH, "Advanced Realtime Tracking – Motion Capture System,” http://www.ar-tracking. com/products/motion-capture/optical-target-set/, accessed: 2017-09-19.
[20] Y. Li: Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274, 2017.

watermark seal