RuNN - Recurrent Neuronal Networks (RNNs) for Real-Time Estimation of Nonlinear Motion Models

Director:Philippsen, M.
Period:October 1, 2017 - October 1, 2020
Coworker:Feigl, T.

With the growing availability of information about an environment (e.g., the geometry of a gymnasium) and about the objects therein (e.g., athletes in the gymnasium), there is an increasing interest in bringing that information together profitably (so-called information fusion) and in processing that information. For example, one would like to reconstruct physically correct animations (e.g., in virtual reality, VR) of complex and highly dynamic movements (e.g., in sports situations) in real-time. Likewise, e.g., manufacturing plants of the industry, which suffer from unfavorable environmental conditions (e.g., magnetic field interference or missing GPS signal), benefit from, e.g., high-precision goods location. Typically, to describe movements, one uses either poses that describe a "snapshot" of a state of motion (e.g., idle state, stoppage), or a motion model that describes movement over time (e.g., walking or running). In addition, human movements may be identified, detected, and sensed by different sensors (e.g., on the body) and mapped in the form of poses and motion models. Different types of modern sensors (e.g., camera, radio, and inertial sensors) provide information of varying quality.

In principle, with the help of expensive and high-precision measuring instruments, the extraction of poses and resp. of the motion model, e.g., from positions (positions described by / or describe poses and motion models) on small tracking surfaces is possible without errors. Camera-based sensors deliver the required high-frequency and high-precision reference measurements on small areas. However, as the size of the tracking surface increases, the usability of camera-based systems decreases (due to inaccuracies or occlusion issues). Likewise, radio and inertial sensors only provide noisy and inaccurate measurements on large areas. Although a combination of radio and inertial sensors based on Bayesian filters achieves greater accuracy, it is still inadequate to precisely sense human motion on large areas, e.g., in sports, as human movement changes abruptly and rapidly. Thus, the resulting motion models are inaccurate.
Furthermore, every human movement is highly nonlinear (or unpredictable). This nonlinearity cannot be mapped correctly with today's motion models, as described, e.g., by Bayes filters, because these (statistical) methods break down a nonlinear problem into linear subproblems, which in turn cannot physically represent the motion. In addition, current methods produce high latency when accuracy is required.

Due to these three problems (inaccurate position data on large areas, nonlinearity, and latency), today's methods are unusable, e.g., for sports applications that require short response times. This project aims to counteract these nonlinearities by using machine learning methods. The project includes research on recurrent neural networks (RNN) for the determination of nonlinear motion models. Nonlinear human movements (e.g., the position of the head to the trunk during walking or running) can be described by modern Bayesian filtering methods (e.g., Kalman and Particle filters) and other statistical methods only by their linear proportions and thus are physically not completely correct.
The core objective is therefore to evaluate how machine learning methods can be used to describe complex and nonlinear movements. The aim is to investigate whether RNNs physically describe the movements of an object correctly and how existing methods can be supported or replaced.

This project addresses three key topics:

I. A basic implementation investigates how and why methods of machine learning can be used to determine models of human movement.
In 2018, a deeper understanding of the initial situation and problem definition was first established. With the help of different basic implementations (different motion models) it was investigated (1) how different movements (e.g., humans - walk, run, slalom; vehicles - meander, zig-zag) affect measurement inaccuracies of different sensor families, (2) how measurement inaccuracies of different sensor families (e.g., visible orientation errors, audible noise, and deliberated artificial errors) affect human motion, and (3) how different filter methods for error correction (that balance accuracy and latency) affect both motion and sensing. In addition, it has been shown (4) how measurement inaccuracies (due to the use of current Bayesian filtering techniques) correlate nonlinearly with human posture (e.g., gait apparatus) and predictably affect health (simulator sickness) through machine learning.
Methods of machine and deep learning for motion detection (human - head, body, upper and lower extremity; vehicle - single- and biaxial) and motion reconstruction (5) based on inertial, camera, and radio sensors were studied and various methods for feature extraction (e.g., SVM, DT, k-NN, VAE, 2D-CNN, 3D-CNN, RNN, LSTM, M/GRU). These were interconnected into different hybrid filter models to enrich extracted features with temporal and context-sensitive motion information, potentially creating more accurate, robust, and close to real-time motion models. In this way, (6) motion models for multi-axle vehicles (e.g., forklifts) based on inertial, radio, and camera data was learned, which generalize on different environments or tracking surfaces (with varying size, shape, and sensory structure, e.g., magnetic field, multipath, texturing, and illumination). Further (7) a deeper understanding of the effects of non-constant accelerated motion models on radio signals was investigated. On the basis of these findings, a LSTM model was trained, which predicts different movement speeds and motion forms of a single-axis robot (i.e., Segway) close to real-time and more accurately than conventional methods.
In the future, these models should also predict the human movement (motion model) at runtime either completely self-sufficient or integrated as support points in localization estimators (e.g., in Pedestrian Dead Reckoning, PDR, methods) and should be tested in a large study.

II. Based on this, we try to find ways to optimize the basic implementation in terms of robustness, latency, and reusability.
In 2018, the findings from I. (1-7) were used to stabilize so-called (1) relative Pedestrian Dead Reckoning (PDR) methods using motion classifiers. In the future, these should enable a generalization to any environment. The deeper radio signal understanding (2) allowed the mapping of long-term errors in RNN-based motion models to improve position accuracy, stability, and to predict near real-time. In the future, a generalization to any environment, resp. tracking areas, should be enabled. The robustness of the movement models (3) are shown in first experiments with the help of different real (unknown to the models) movement trajectories for one- and two-axle vehicles. Further, we investigated (4) how hybrid filter models (e.g., interconnection of feature extractors such as 2D/3D-CNNs and time-series trackers such as RNNs-LSTM) provide more accurate, more stable, and filtered (outlier-corrected) results.
The explainability, interpretability, and robustness of the models, examined here, and their reusability on the human movement should be examined in the future.

III. Finally, a demonstration of feasibility shall be tested.
In 2018, a large-scale social science study opened the world's largest virtual dinosaur museum and showed that (1) a pre-selected (application-optimized) model of human movement robustly and accurately (meaning no significant impact on simulator sickness) maps human motion, resp. predicts it. This should be used in the future as a basis for comparison tests for other models (that are human-centered and generalize to different environments).

watermark seal