Sussex-Huawei Locomotion Challenge
The goal of this machine learning/data science challenge is to recognize 8 modes of locomotion and transportation (activities) from the inertial sensor data of a smartphone. The dataset used for this challenge comprises 271 hours of training data and 95 hours of test data.
The participants will have to develop an algorithm pipeline that will process the sensor data, create models and output the recognized activities. The best three teams will also receive prizes!
*Note: Prizes may increase subject to additional sponsors.
- Registration via email: as soon as possible, but not later than 20.06.2018
- Challenge duration: 01.06.2018 – 15.07.2018
- Submission deadline: 15.07.2018
- HASCA-SHL paper submission: 31.07.2018
- HASCA-SHL camera ready submission: 18.08.2018
- HASCA-SHL Workshop presentation at UbiComp in Singapore: 12.10.2018
- Release of the ground-truth of the test data: 15.10.2018
Each team should send a registration email to firstname.lastname@example.org as soon as possible but not later than 20.06.2018, stating the:
- The name of the team
- The names of the participants in the team
- The organization/company (individuals are also encouraged)
- The contact person with his/hers email address
To be part of the final ranking, participants will be required to submit a detailed paper to the HASCA workshop. The paper should contain technical description of the processing pipeline, the algorithms and the results achieved during the development/training phase. The paper submission date is 31.07.2018. The submissions must follow the HASCA format (up to 10 pages).
Submission of predictions on the test dataset
The participants should submit a plain text predictions file (e.g. “teamName_predictions.txt”) for the testing dataset, corresponding to the sensor data in the testing dataset. The structure of the file should be the same as the label file in the training dataset. This means that the submitted file should contain a matrix of size 5698 lines x 6000 columns corresponding to each sample in the testing dataset.
The participants should use the following format for the predictions file: “teamName_predictions.txt”.
An example of submission is given here.
The participants’ predictions should be submitted online by sending an email to email@example.com, in which there should be a link to the predictions file, using services such as Dropbox, Google Drive, etc. In case the participants cannot provide link using some file sharing service, they should contact the organizers via email firstname.lastname@example.org, which will provide an alternate way to send the data.
To be part of the final ranking, participants will be required to publish a detailed paper in the proceedings of the HASCA workshop. The date for the paper submission is 31.07.2018.
All the papers must be formatted as “ACM SIGCHI Extended Abstracts format” (landscape). Submissions do not need to be anonymous.
Submission is electronic, using precision submission system. The submission site is open at https://new.precisionconference.com/user/login (select SIGCHI / UbiComp 2018 / UbiComp 2018 Workshop – HASCA SHL and push Go button). See the image below.
Two submissions are allowed per team, provided that the team writes 2 papers for HASCA and the papers should be substantially different (>70%).
Data and Activities
The data is recorded by a Huawei Mate 9 smartphone by a single participant in a period of 4 months. The participant was performing the activities on a daily basis (approximately 5-8 hours per day) with the phone logging the sensors data and being worn inside the front right pocket (not fixed orientation).
The following sensor data can be used in order to recognize the activities: accelerometer, gyroscope, magnetometer, linear acceleration, gravity, orientation (quaternions), ambient pressure.
The following 8 activities have to be recognized: Car, Bus, Train, Subway, Walk, Run, Bike, and Still.
All of the data samples are labeled with these 8 activities.
Dataset and Format
The data is divided into two parts: train and test. The train data contains the raw sensors data and the appropriate activity labels (class label). The test data contains only the raw sensors data, the labels are kept for evaluation and scoring. The idea is that the participants use the train data to create their algorithm pipeline and model that will recognize the activities using the sensor data.
We use the data from the pocket phone (Hips) of User 1, which contains 82 days. We use 20 days for testing and 62 days for training.
For both training and testing dataset, the frames were generated by segmenting the whole data with a non-overlap sliding window with 1-minute length. After segmentation, the order of the frames are randomly permuted and thus there is no temporal dependency among frames. This aims to force the participants to use a window length shorter than 1 minute. However, for reference, the original order of the frames in the training dataset is also provided (Train_order). This is a simple .txt file that contains the order of the training samples, which can be used to order the training samples according to their timestamp.
If you add this column to the data files, and sort the samples by this column, the result should be the ordered training data.
Train data (~5.5 GB)
The training dataset contains 21 plain text files corresponding to various sensor channel and the labels:
- Accelerometer: train_acc.zip (Acc_x.txt, Acc_y.txt, Acc_z.txt),
- Gyroscope: train_gyr.zip (Gyr_x.txt, Gyr_y.txt, Gyr_z.txt),
- Magnetometer: train_mag.zip (Mag_x.txt, Mag_y.txt, Mag_z.txt),
- Linear accelerometer: train_lacc.zip (LAcc_x.txt, LAcc_y.txt, LAcc_z.txt),
- Gravity: train_gra.zip (Gra_x.txt, Gra_y.txt, Gra_z.txt)
- Orientation: train_ori.zip (Ori_w.txt, Ori_x.txt, Ori_y.txt, Ori_z.txt),
- Pressure: train_pressure.zip (Pressure.txt),
- Label: train_label.zip (Label.txt)
- Order: train_order.zip (train_order.txt)
Each sensor data file contains a matrix of size 16310 lines x 6000 columns, corresponding to 16310 frames each containing 6000 samples (1 minute at the sampling rate 100 Hz). The Label file is also of the same size (16310 x 6000), indicating sample-wise transportation activity. The 8 numbers in the Label file correspond to the 8 class activities: 1 – Still; 2 – Walk; 3 – Run; 4 – Bike; 5 – Car; 6 – Bus; 7 – Train; 8 – Subway.
The original order of the permutated frames is given by train_order.txt, which contains a vector of size 16310 x 1, with each element indicating the original order of the corresponding frame. The first element is 1575, which means that this (first) frame is the 1575-th frame in the original dataset.
Test data (~1.9 GB)
The testing dataset has the same structure as the training dataset, except that the size of the file is of 5698 lines x 6000 columns, corresponding to 5698 frames each containing 6000 samples. Additionally, the test dataset does not contain the label file. The test labels will be used for evaluation of the predictions.
Ground truth of the test data (released on 15/10/2018)
F1-score (average over all of the activities) will be used for evaluation. The F1-score for one class is defined as F1 = 2*(recall*precision) / (recall+precision).
A Matlab example of evaluation script is given here.
The final evaluation will be performed at the end of the competition, which reveals the final results and rankings.
Some of the main rules are listed below. The detailed rules are contained in the following document.
- You do not work in or collaborate with the SHL project (http://www.shl-dataset.org/);
- If you submit an entry, but are not qualified to enter the contest, this entry is voluntary. The organizers reserve the right to evaluate it for scientific purposes. If you are not qualified to submit a contest entry and still choose to submit one, under no circumstances will such entries qualify for sponsored prizes.
- Registration (see above): as soon as possible but not later than 20.06.2018.
- Challenge: Participants will submit prediction results on test data.
- Workshop paper: To be part of the final ranking, participants will be required to publish a detailed paper in the proceedings of the HASCA workshop (http://hasca2018.hasc.jp/); The dates will be set during the competition.
- Submission: The participants’ predictions should be submitted online by sending an email to email@example.com, in which there should be a link to the predictions file, using services such as Dropbox, Google Drive, etc. In case the participants cannot provide link using some file sharing service, they should contact the organizers via email firstname.lastname@example.org, which will provide an alternate way to send the data.
- Two submissions are allowed per team, provided that the team writes 2 papers for HASCA and the papers should be substantially different (>70%).
The final results and ranking of the teams will be announced at the HASCA 2018 Workshop at the Ubicomp 2018 conference in Singapore, 12th of October 2018.
All inquiries should be directed to: email@example.com
- Dr. Hristijan Gjoreski, University of Sussex (UK) & Ss. Cyril and Methodius University (MK)
- Dr. Lin Wang, University of Sussex (UK)
- Dr. Daniel Roggen, University of Sussex (UK)
- Dr. Kazuya Murao, Ritsumeikan University (JP)
- Dr. Tsuyoshi Okita, Kyushu Institute of Technology (JP)
1. JSI-Deep: 93.86%
2. JSI-Classic: 92.41%
3. Tesaguri: 88.83%
4. S304: 87.46%
5. Confusin Matrix: 87.45%