Guy Torbet

Detecting Cheaters in Online Chess


TODOs

Why?

Online chess platforms have experienced significant growth, paralleled by an increase in cheating incidents that undermine the integrity of the game.

Existing detection systems often struggle to keep pace with the evolving methods of cheating, which range from using chess engines, consulting databases, or receiving external help during play.

There is a surprising lack of research detailing which features and methods are best suitable to detect use of computers in online chess.

Save losses/accs for each model in NPZ and model against Stanford

chess.com monitors if a user is changing tabs and making better moves.

Marking

The project is assessed on the basis of a written report which should typically contain:

Objective

The primary objective of the dissertation is to develop a model for detecting and classifying cheating in online chess. This involves:

  1. Analysing various cheating methods in online chess.
  2. Designing algorithms capable of detecting subtle and complex forms of cheating.
  3. Evaluating the effectiveness of these algorithms under different game conditions.

Research Questions

  1. What are the most prevalent methods of cheating in online chess as of the latest data?
  2. How can machine learning models be tailored to effectively recognise patterns indicative of cheating?
  3. What features or player behaviours are most indicative of cheating and how can they be quantitatively measured?

Methodology

The project will adopt a mixed-methods approach:

Expected Outcomes

The research is expected to contribute to the development of more sophisticated and fair cheat detection systems for online chess platforms. By improving cheat detection algorithms, we can enhance the credibility and competitive fairness of online chess.

Key Features

Questions

Datasets

Preprocessing

Board Representation

As per the original paper, the data will be formed into a nx8x8x6 tensor, where n is the number of moves in the game and the 6 channels represent each piece type, with 1 for white and -1 for black. Can also look into how Alpha-Zero represents the board.

Model Architecture

Binary cross-entropy loss

Benchmark with linear.

I plan to use convolutional neural networks as the 8x8 chess board lends itself nicely to this architecture.

If a chess board is an image, a sequence of moves is a video.

LSTM over a sequence of moves might be good, but…

Due to the massive number of possibilities a chess game has, machine learning models cannot use the raw moves of a game as its features. Otherwise, every game is like a new game, and there are no generalisable patterns to learn from.

Conv3d 3x3x3 convs Transformer With transformer we can pad and mask the input, so we can utilise all games and the extra empty board states won’t affect training as much.

Chess.com Method

Test Results

BCE Loss (20 epochs)

bce_10000

1000500010000
Dense171.0576.15
Dense372.976.1
Dense674.3577.18
Conv171.0574.15
Conv372.4575.15
Conv673.276.3
ConvLSTM74.0577.72
Conv3D75.2
ConvLSTMExtra84.987.98
Shit Transformer77.03
TranformerExtra83.38

Cross Entropy Loss (20 epochs)

ce_10000

100010000
Dense177.58
Dense378.95
Dense677.48
Conv176.85
Conv377.42
Conv677.42
ConvLSTM66.7578.03
Conv3D
ConvLSTMExtra89.18
ConvLSTMExtra (bidirectional)91.15
ConvLSTMExtra288.55
ConvLSTMExtra2 (bidirectional)87.8
Swin3D78.03
ViT3D53.25

Stanford Results

10000
Dense176.8
Dense376.8
Dense676.9
Conv177.5
Conv377.8
Conv678.4

Fun Ideas

Stanford Paper (75% Accuracy)

GitHub Repo (80% Accuracy)

Random Quotes

Cheating is essentially indistinguishable from normal play if it’s done cleverly enough at an elite level. This is one of the real challenges, because identifying games with cheating where the cheating consists of only 1 or 2 moves, might be impossible.

Let’s say we are playing a coin flip game. If the coin lands heads, I pay you $1; if tails, you pay me $1. We do this 100 times. If it is a fair coin, this game is statistically a draw. But let’s say I am cheating. The coin is fair, except on some of the flips, I use a radio controlled laser thingie to control how the coin will land. As a result, in our 100 flip game, we get 55 tails and I win $5 from you. How would you go about statistically detecting this cheating? It comes down to old fashioned hypothesis testing (H0 = the coin is fair and I’m not influencing it, compute the standard deviation yada yada). Basically you can catch me if and only if I do it so much that my win rate is unlikely to occur naturally. How can you tell which specific flips I’m controlling with my device, even if I do it quite often? You really can’t.

To detect such “selective cheating,” Chess.com utilizes sophisticated statistical methods that dig deep into the probability that any individual player could have achieved their results based on past performance. Our detection system requires robust methodologies beyond simply looking at best moves, player rating, and centipawn loss.
To effectively identify the vast majority of cheating, Chess.com computes an aggregate Strength Score. Strength Score is a measurement of the similarity between the moves made by the player, and the moves suggested as “strongest moves” by the chess engine. This Strength Score can show when a player is performing at a level above their actual chess strength, and on its own, our Strength Score is a helpful tool in successfully identifying cheating at nearly every level of play. Any player can have strong games of chess, but the Strength Score can tell us if continued strong play is legitimate or beyond the realm of statistical probability when compared to their overall skill level. Rating plays no part in Chess.com’s Strength Score, as players can significantly over-perform or underperform their rating. However, in our experience, human players generally cannot legitimately stray too far from their established Strength Score for long.

Cheater-Detection Agent

Creating a chess-playing agent (AI) that doesn’t just play to win, but also actively tries to identify if the opponent is a human or using computer assistance. The agent would do this by guiding the game into positions that make it easier to tell whether moves are “human-like” or “computer-like.”

The chess-playing agent would be designed to steer the game towards board positions that are complex, requiring deeper calculation. In such positions, humans might struggle or make mistakes, while computer-assisted players would more consistently find the optimal moves.

Can use reinforcement learning to maximise the chances of creating positions that reveal cheating behaviour. Rewards: Leading opponent into complex positions, determining how human or computer-like the opponents moves are in these positions. Can use already trained models for this.

Identify critical positions: analyse past chess games, identifying types of positions where humans tend to make mistakes or play sub-optimally, but computers handle well - can be used as target states.

References