Every time a player makes a move, compared that to the AI's choices.
For sigmoid (softmax needs renormalization) output, set target ELO as AI output value for the player's chosen move * 800.
Average target ELO = sum of ELO for every move / (number of moves + 1)
Moves +1 to penalize abandoning games.
At the end of the game, new ELO = current ELO * .9 + average target ELO * .1
A beginner gains 79 ELO if all corresponding AI outputs are close to 1, but that's highly unlikely.
A novice player ruining the game does not punish other players as long as they made the right moves.
Maximum ELO for any player is 799, you can't get thousands of ELO by keep playing continuously.
Harder to game the system to boost your ELO if you have no access to AI output.
Not an accurate representation of skills but represents better than any other mechanisms seen so far.
Having an AI player may also help in tuning other potential imbalances of the game.
The same ELO system can be also used for other games.