Application of Deep Learning Methods to Process Natural Phenomena

BearbeiterIn:Thomas Altstidl
Titel:Application of Deep Learning Methods to Process Natural Phenomena
Typ:bachelor thesis
Betreuer:Feigl, T.; Mutschler, C.; Philippsen, M.
Status:abgeschlossen am 15. Februar 2019

Geysers have long captured the minds of casual visitors and geoscience researchers alike. Ever since the first geysers have been observed, eruption times have been dutifully recorded in logbooks. This collected data serves as the basis for studies on individual geyser behavior as well as the interaction among different geysers [7]. It can also be used to study the influence of other natural phenomena, such as earthquakes and the weather, on geyser patterns [7,8]. In fact, research in the field of earthquake early warning systems often exploits knowledge and findings in the exploration of geysers [6]. A comprehensive dataset of geyser eruptions can also aid the assessment of geyser models and simulations by providing a real-world benchmark for comparison.

Manual collection of geyser eruption times in logbooks is laborious and requires the constant presence of a human observer, which is why automated methods are preferable for long-term studies. One popular method to automatically collect eruption data of geysers places a temperature sensor in the runoff channel of a geyser. The evaluation of temperature peaks that are present in the collected temperature trace during an actual eruption allow inference of the eruption time and in some cases also the eruption duration. However, this is not without its limitations. Interpretation of the logged data requires prior knowledge of the time delay between the start of the eruption and the time it gets picked up by the sensors, which in some instances differs between eruptions. Due to the unstable ground of geothermal areas, suitable placement of the temperature sensors as well as retrieval of recorded data can be difficult and is often done infrequently and at irregular intervals. Errors in the collection process, for example due to misplacement of the sensor by wild animals, are thus often detected too late and can lead to the loss of valuable data. Also, due to the mentioned limitations on sensor placement, eruption durations can often not be determined from the temperature trace for geysers with shorter durations, such as Old Faithful, a famous geyser in Yellowstone National Park.

Another sensor type that could be used to automatically determine geyser eruption times are cameras. The National Park Service provides a live stream of Old Faithful geyser [1], and more generally the Upper Geyser Basin, which contains the world’s largest concentration of geysers making it a popular subject for related research. Compared to temperature sensors, this alternative method uses visual information to infer eruption times and thus more closely matches the way human observers time eruptions, eliminating the difficulty in data interpretation. It also allows automated eruption timings and data retrieval from outside of hazardous areas. Other advantages include the possibility to more easily review suspicious data after its collection, while also allowing supervision of the real-time analysis of geyser eruptions by (remote or local) human observers. This would reduce the likeliness for errors even further. Unfortunately, this approach again is not without its limitations. During night-time and on rainy or foggy days visibility of the geyser can be obscured and it is then impossible to accurately detect eruptions.

This thesis builds the foundations required to extract geyser eruption times and durations from videos. Due to the absence of more traditional methods capable of coping with the variability in position, scale and lighting apparent in the webcam stream as well as the recent successes of deep convolutional neural networks in visual image analysis, deep learning techniques are a prime candidate. In order to apply these, a labeled dataset of the Old Faithful webcam needs to be prepared based on human interpretation of the video data and eruption times collected in the field available on GeyserTimes. Archived footage of this webcam is readily available on the YouTube channel of the Geyser Observation and Study Association (GOSA) [2]. This work primarily focuses on day-time observations with good weather conditions. It is also limited to Old Faithful geyser eruptions, although it should generalize well to other geysers.

Goal and research focus: The overall goal is an end-to-end neural network that can accurately classify a sequence of frames taken from a video into connected scenes, considering both the short-term and long-term connections between the images. However, most state-of-the-art image classification algorithms only classify a single frame. Since humans observe the progression of events over time, a higher accuracy is expected when the temporal context of the images within the video is also taken into consideration. As such, the research focus of this thesis is the development and evaluation of a neural network architecture that makes use of these additional features in the time domain. The preparation and usage of the Old Faithful eruption dataset underlines these goals as it provides a good benchmark with a limited number of categories along with well understood short-term (preplay versus eruption) and long-term (eruption around every 94 minutes) dependencies between frames. As a reference and benchmark, this thesis develops a frame-by-frame approach that uses neural networks to process and classify natural phenomena such as eruptions. The final implementation yields a classification into three classes: in eruption, not in eruption and indiscernible (e.g. for foggy or dark frames). This thesis implements and studies three different neural network architectures and compares their accuracy and computational performance in respect to the natural phenomena event classification problem at hand:
1. A state-of-the-art convolutional neuronal network (CNN) is investigated. It is expected that an accurate classification depends on the temporal domain, since for example Old Faithful preplay is easily mistaken for an eruption if no additional context is available.
2. Therefore, this thesis also explores additional neural network architectures such as recurrent neural networks (RNN) that consider the temporal structure. The implementation will build upon the work presented in [3] and [4]. The thesis will demonstrate that the combination of CNN and RNN can model such time-based dependencies.
3. Another architecture that is investigated uses a three-dimensional CNN for adjacent frames as presented in [5]. The experimental results of these three different architectures, including classification accuracy and computational performance, should be compared with respect to their ability to capture dependencies in the time domain and possible improvements should be outlined. The final neural network architecture used for the eruption extraction should then be derived based on this comparative analysis.

Finally, the thesis should evaluate the end-to-end system by providing a tool that takes a webcam video sequence as input and provides the extracted eruption times and durations for Old Faithful as output. These results should be compared to actual observations collected in the field. Potential sources of error should be broken down and analyzed as a basis for future research.

Timetable (5 months; respectively 20 weeks fulltime):

  • [4 weeks] Orientation and scientific literature research;
  • [4 weeks] Data pre-processing: preparation and labeling of publicly available datasets;
  • [8 weeks] Implementation and evaluation of the three neural network architectures (1) CNN, (2) CNN+RNN, and (3) 3D-CNN and the extraction tool;
  • [4 weeks] Final paper

Expected results and contributions:

  • Preparation and labeling of publicly available video datasets (i.e. live streaming webcam of Old Faithful);
  • Implementation of three different neural network architectures for classification of video frames into one of three classes (in eruption, not in eruption, indiscernible) and evaluation of their success rates on test datasets;
  • Frame-by-frame image classification using a 2D CNN as reference [9]
  • State-based frame classification using a combination of 2D CNN and RNN [3, 4]
  • Combination of adjacent frames (past and future) using a 3D CNN [5]
  • Extraction tool for previously unseen datasets that yields Old Faithful eruption times and durations

The resulting neural network pipeline provides a complementary tool for the collection of geyser eruption records, thus increasing the overall quality of the dataset. It also aids the deeper understanding of the influence of the time domain and the frame’s context for video frame classification results.

[1] Webcams - Yellowstone National Park (U.S. National Park Service)., 2018.
[2] Geyser Observation And Study Association – YouTube., 2018.
[3] J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. on Pattern Analysis and Machine Intelligence, 39(4):677–691, 2017.
[4] M. Liang and X. Hu. Recurrent convolutional neural network for object recognition. In Proc. 2016 IEEE Intl. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 3367–3375. Boston, MA, 2015.
[5] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proc. 2015 IEEE Intl. Conf. Computer Vision (ICCV), pp. 4489–4497. Washington, DC, 2015,.
[6] H. Wakita, Y. Nakamura, and Y. Sano. Short-term and intermediate-term geochemical precursors. Pure and Applied Geophysics, 126(2):267–278, 1988.
[7] S. Rojstaczer, D. L. Galloway, S. E. Ingebritsen, and D. M. Rubin. Variability in geyser eruptive timing and its causes: Yellowstone National Park. Geophysical Research Letters, 30(18), 2003.
[8] S. Hurwitz, A. G. Hunt, and W. C. Evans. Temporal variations of geyser water chemistry in the Upper Geyser Basin, Yellowstone National Park, USA. Geochemistry, Geophysics, Geosystems, 13(12), 2012.
[9] F. Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. In Proc. 2017 IEEE Intl. Conf. Computer Vision and Pattern Recognition (CVPR), pp. 1800-1807. Honolulu, HI, 2017.

watermark seal