Anomaly Detection in Proctoring Videos

Any behaviour that is considered abnormal or doesn’t fit in can be considered an anomaly. Anomalies in Proctored Videos are detected in case a student’s behaviour is suspicious which could imply a high chance of cheating in the examinations. In an attempt to detect anomalous behaviours in online exams and implement a way of Automated Proctoring, we hereby try to design an auto-encoder model based on human pose features which would give an idea how anomalous the video clip is. Possible actions of students giving exams can be classified as anomalous and non-anomalous.

Some examples of anomalous actions can be using a phone, standing up and walking away from the laptop screen, or another person entering the camera’s view etc as shown:

Non-anomalous or normal behaviour expected from the students giving exams would be a few hand or neck movements, drinking water, reading or writing, checking their watch for time etc as shown:

Proposed Pipeline

Collection of Video Datasets was done using web-cam recordings, proctoring videos collected by CodeTantra and IIT Bombay, and NTU RGB Dataset
Anonymization for ensuring confidentiality of the students
Frame capture at 30fps
Video summarization to identify a small number of keyframes or video segments which contain as much information as possible of the original video. We expect these to be with high degrees of anomaly in the video. We look into both synchronous and asynchronous ways of summarization for the pipeline to work in both online and offline settings (Video Maximal Marginal Relevance used)
Key point detection consists of 4 steps: 1) pre-processing (background removal, bounding box creations), 2) feature extraction (Mobilenetv1_100 used), 3) Inference (confidence maps for every joint), and 4) post-processing (gives pose-scores based on likeliness)
Human pose estimation to predict the body part positions of the person/people in the video (Image posenet used)
A vanilla autoencoder model with LSTM is used as feature detector on posenets. More the reconstruction error, more is the anomaly in a frame
Evaluation is done by analyzing the degree of anomaly in the top 5% frames with maximum reconstruction loss

Results:

We obtained the best results after using the shown autoencoder, training on the multiple video datasets with ReLU activation function. After normalization, final loss during training is of the order of 10ˆ-6. Frames with minimum reconstruction loss (in the order 10^-3 to 10^-2) were obtained from the non-anomalous videos while the ones with high reconstruction loss were from anomalous ones(1-100). The posenet and the test results for the example images shown above are:

Figure: Classified as anomalous with reconstruction loss in the order of 10^-3 to 10^-2

Figure: Classified as non-anomalous with reconstruction loss in the order of 1 to 100
Conclusion and Future Work:

Autoencoder training and testing for NTU-RGB Dataset with n degrees of anomalies in actions needs to optimized and testing should be made into a n-class classification problem. The other way to work on this would be to rank the frames in descending order of reconstruction error (Rank 1 is most difficult to reconstruct and that frame is the most anomalous. We also need to consider the optical flow between frames so as to comment on video segments instead of just infering from the frames.

Efficient Anomaly detection and Automated Proctoring using posenet features for different modes of exams needs to be implemented. Synchronous setting needs to be explored further into, video-MMR could be a good start for this.

Use of AI and Video Analytics for Anomaly Detection in Proctoring Videos

Bachelor's Thesis I

Proposed Pipeline