Plus petite compression du jeu d'échecs

39

Écrivez un algorithme ou un programme capable d’encoder et de décoder un échiquier. L'objectif est de faire la plus petite représentation possible d'un échiquier (une fois décodé) pour déterminer toutes les possibilités de déplacement pour un joueur de ce tour.

Le codage doit pouvoir montrer:

  • A qui c'est le tour.
  • Si le joueur peut chiner de chaque côté.
  • Si le joueur peut effectuer un en-passant, et si oui, lequel de ses pions?
  • Positions de toutes les pièces.

Remarque importante sur les castings: si les Blancs déplacent leur roi d’un tour, puis de le ramener d’un tour, il doit être clair qu’ils ne peuvent faire de château d’un côté ou de l’autre. Les mêmes iraient s'ils bougeaient leur tour gauche ou droite. Bien que la carte soit visuellement dans le même état qu’il ya deux tours, l’état du jeu a changé. Plus d'infos ici: http://en.wikipedia.org/wiki/Chess#Castling

Note importante sur en-passant: C'est aussi un coup sensible au tour par tour. Lisez les règles pour plus d'informations. http://en.wikipedia.org/wiki/Chess#En_passant

Déterminez les entrées et les sorties au besoin. Des accessoires majeurs à qui peut le compresser le plus!

Votre score est déterminé dans le pire des cas - taille maximale possible en bits. Assurez-vous de montrer comment vous avez calculé ce nombre et ce que vous avez comptabilisé. Tirez pour le pire des cas!

Seltz
la source
Qu'entendez-vous par "bitwise"?
Peter Taylor
Est-ce le plus petit code ou le plus compressé? Le plus compressé est plus intéressant.
Justin
Désolé de ne pas clarifier. Bitwise en ce sens que vous ne pouvez le compresser que si vous commencez à le représenter sous forme de bits, ce qui nécessitera une opération au niveau des bits. Mauvaise utilisation de ma part. Aussi le plus compressé, pas le plus petit code.
Seltzer
2
@ GeekWithALife Oui, il est possible d'avoir 18 reines sur le tableau en même temps. Suivez ce lien et cliquez sur le bouton de lecture pour un exemple.
ossifrage délicat
1
@AJMansfield, cela pourrait être utile pour les conseils avec environ 28 hommes, mais je calcule que 117 bits est suffisant pour les conseils avec les 32 hommes, ce qui représente environ 50 bits de moins que la cible. La complication est qu’une fois passé en dessous de 32 hommes, la promotion peut donner à un joueur plus d’évêques.
Peter Taylor

Réponses:

27

Min: 12 bits
Max:
Avg:

Avait et pensé hier soir que je pourrais éventuellement le rendre encore plus petit.

x   Colour to play next (0 -> Black, 1-> White)
1   Only King left?

00000 Position of White King (0 -> A1 ... 63 -> H8)
00000 Position of Black King

01 00000 11111  WK:A1, BK:H2 (Black to play)
11 00000 11111  WK:A1, BK:H2 (White to play)

Le résultat est une taille impressionnante de 12 bits !

Alors qu'en est-il K +1 autre type de pièce?

x
 0
   0
     000  +Pawn
     001  +Rook   
     010  +Knight
     011  +Bishop
     100  +Queen

Il 2 arrangement possible du sous-arbre.

   /\      /\
  +  K    K  +

Les deux résultent dans les mêmes tailles de bits pour toutes les pièces. Donc, cela ne fait aucune différence pour ce que nous utilisons, je vais choisir le premier.

x
 0
  0
   000
      1011001110000000000000000000000000000000000000000000000000000000000000
(+ 000) En-Passant (if >= 2 pawn & pawn in en-passant positions)
(+ 00 ) Castlings  (if >= 1 rook)
Min: 75 bit
Max: 109 bits

Donc sur King +2 autres types de pièces

x
 0
  1
   PRBNQ
   00011  +N +Q
   00101  +B +Q
   00110  +B +N
   01001  +R +Q
   01010  +R +N
   01100  +R +B
   10001  +P +Q
   10010  +P +N
   10100  +P +B
   11000  +P +R

Il y a 5 sous-arbres possibles (je vais utiliser 1 et 2 pour indiquer laquelle des pièces.)

   /\          /\       /\         /\          /\
  /  \        /  \     /  \       /  \        /  \
 K   /\      /\   2   /\   \     1   /\      /\   \
    1  2    K  1     K  2   1       K  2    1  2   K

Nous aurons donc besoin de 3 bits pour encoder le sous-arbre à utiliser.

x
 0
  1
   PRBNQ
         000  Sub Tree used

Min:= 11 = Header 
       6 = 2 * 3
       4 = 1 * 4
       4 = 1 * 4
      60 = 60    Empty
      --
      85 bits

Max:=  11 = Header
        4 =  2 * 4 Kings
       48 = 16 * 3 Pawns
       12 =  4 * 3 Rook
       42 = 42 * 1 Empty
        3 =  1 * 3 En-Passant
        2 =  1 * 2 Castlings
      ---
      122 bits

Toujours faire l'analyse pour plus de morceaux

+3 Autre

x
 0
  1
   PRBNQ
         0000  Sub Tree used (of 14 possible)

+4 Autre

x
 0
  1
   PRBNQ
         000000  Sub Tree used (of 42 possible)

+5 Autre

x
 0
  1
   PRBNQ
         0000000  Sub Tree used (of 132 possible)
 (+000)
 (+00)

Max: 208?


Est-il possible de coder tous ces sous-arbres sur 9 bits?

Si nous totalisons tous les sous-arbres possibles, nous obtenons 392 sous-arbres possibles.

 1  0
 2  2
 3  5
 4  14
 5  42
 6  132
    ---
    392  <= 2^9

Utilisation de Freq ID

Depuis 164603 fréquences de morceaux uniques .

Log2( 164603) = 17.3286110452
             ~ 18 bits

0
 0000 0000 0000 0000 00  Freq ID

(+000) (+00) Castling

Max: = 204 bits


rev 3

Min: 82 Max: 199 Moy.: 160

Enfin, nous avons effectué des analyses pour trouver la taille de bit maximale. Avec codage Huffman optimal pour chacune des fréquences de morceaux uniques .

               0   Player
              00  Castling
               0  En-Passant Possible
            ?000  En-Passant column (include if En-Passant Possible = 1
  0000 0000 0000  Tree Encoding ID
[Board Encoding]  Between 66 .. 180 bits 

Notez que c’est la plus mauvaise taille possible, celle des bits de la colonne En-Passant si le nombre de pions est supérieur à un. Quelles que soient la couleur et la position de ces pions, il est possible que certaines planches soient plus petites de 3 bits.

En outre, il n'y a que 144 tailles différentes (le pire des cas) pour la taille de la carte.


75 - 216 bits (v2) v1 La taille minimale est de 98 bits (12,25 octets), seuls les deux rois de la carte.

La taille maximale n'est que de 216 bits (27 octets).

  9 x Queens
  1 x King
  2 x Rooks
  2 x Knights
  2 x Bishops
on each side.

En moyenne, la taille sera d'environ 157 bits (19,625 octets).

Pièces

Pour encoder le tableau, j'utilise un schéma d'encodage d'arborescence binaire. Une case vide est la plus fréquente parmi toutes celles où entre 32 et 62 apparitions. Viennent ensuite les pions, puis les tours, les chevaliers, les évêques et les moins fréquents sont la reine et le roi.

0 - left node
1 - righ node

     /\
    e  \    e:= Empty Square
      B/\W  B:= Black ; W:= White
      /  \
     /    \
    /      \
   /\      /\
  p  \    p  \  p:= Pawn
     /\      /\
    /  \    /  \
   /\  /\  /\  /\
  r  b n \ r b n \  r:= Rook; b:= Bishop; n:= Knight
         /\      /\ 
        q  k    q  k  q:= Queen ; k:= King

La carte de départ peut être encodée en seulement 166 bits (20,75 octets)

  A     B     C      D      E     F     G     H
-----+-----+-----+------+------+-----+-----+------+
10100 10101 10110 101110 101111 10110 10101 10100 | 8 
  100   100   100    100    100   100   100   100 | 7
    0     0     0      0      0     0     0     0 | 6
    0     0     0      0      0     0     0     0 | 5
    0     0     0      0      0     0     0     0 | 4
    0     0     0      0      0     0     0     0 | 3
  110   110   110    110    110   110   110   110 | 2
11100 11101 11110 111110 111111 11110 11101 11100 | 1

Pour indiquer qui se déplace, il ne faut qu'un seul bit

0-> Black , 1-> White

Castling peut être codé en 4 bits.

 B  W
LR LR
00 00

J'ai donc utilisé 171 bits (21,375 octets)

En-Passe peut être codé en seulement 16 bits (2 octets)

Donc, au total, c'est 187 bits (23,375 octets).

Disposition

  bits    Encodes
 0 -  15  En-Passe
16 -  19  Castling
      20  Move 
21 -  23  Unused
24 -> ..  Board

Pas encore écrit de code.

Notez que 3 des bits non utilisés. Donc, le maximum est de 213 bits .


Améliorations possibles

1) Réduction de la forme de bloc d'en-tête de 24 à 8 bits (avec la suggestion de @Peter Taylor)

0 - 2 En-Passant
    3 Move
4 - 7 Castling
8 ... Board Pieces 

2) tête de longueur variable

Un petit en-tête fixe 4 bits

0 0 0 0
| | | |
| | | +-> En-Passant Block Present?
| | | 
| | +---> Pawns on board?
| |
| +-----> Castling still possible?
|                
+-------> Who's move? 0-Black 
                      1-White

Bloc suivant de bits supplémentaires (si le castling est toujours possible)

00 00
|| ||
|| |+-> White Castle Right
|| +--> White Castle Left
||
|+----> Black Castle Right
+-----> Black Castle Left

Bloc suivant de bits supplémentaires (si des pions sont présents)

000--> En-Passant column Position

Alors maintenant, j'ai un en-tête de longueur variable 4 - 11 bits


3) Utilisez un schéma de codage différent en fonction des pièces laissées sur le tableau.

En modifiant l’encodage de l’arbre en fonction des éléments présents sur le tableau et de leur fréquence.

Un encodage possible pour un état final (No Pawns)

        /\            
       e /\           
  Black /  \ White
       /    \
      /      \
     /        \       
    /\        /\
   /  \      /  \     
  /   /\    /   /\
 /\  / /\  /\  / /\   
r b n q k  r b n q k

Ce qui représente environ 4 bits par pièce.

Quels types de pièces sont présentes sur le plateau?

RBNQK Permutation
11111 (11111)

La permutation est de longueur variable 0-5 bits. S'il ne reste qu'un type de pièce, ne l'incluez pas.

Quelle permutation de ces pièces à utiliser pour l'arbre? Ceci est la factorielle du nombre de pièces dans l'exemple ci-dessus c'est 5 pièces, donc 120 permutations possibles qui peuvent être codées.

 #    !  bit 
 6  720  10  (If pawn included)
 5  120   6
 4   24   5
 3    6   3
 2    2   1  Don't include as of equal size.
 1    1   0  Don't include as its not needed.

Rappelez-vous qu’il existe des bits d’addition pour les carrés et les couleurs vides.


Exemples

Donnons un exemple de QK restant

RBNKQ
00011

  /\
 s  \
    /\
  B/  \W
  /\  /\
q  k q  k

101 100  0 x 60 110 111 ==> 60 + (2 x 6) = 60 + 12 = 72 bits for the board

0000 00011 Header ==> 9 bits

81 bits au total


Donnons et exemple de seulement les rois restants

 RBNQK
 00001 

  /\
 s  k
   / \
  B   W

 10 0 0 0 0 0 0 0   K... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 11  .... ...k

Mettre tous ensemble

 header  4   0 0 0 0
 pieces  5   0 0 0 0 1
 perm    0   - - - - - -
  board 66   10 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 11

Donc, je calcule le plus petit encodage pour la carte à 75bits (9 bits 3 bits)

Il reste encore à calculer comment ce schéma de codage affecte la taille maximale.


Amélioration 4

Réduisez le nombre de bits pour le castling à seulement 2 bits. Il suffit de lancer pour le joueur qui a son tour.

 0 Castling possible (from header block)
 LR 
 00

En y réfléchissant, il serait peut-être préférable d'inclure simplement les 2 bits à l'intérieur du bloc d'en-tête.

Adam Speight
la source
Vous n'avez pas besoin de 16 bits pour passer. Au plus un pion ayant bougé au dernier tour, quatre bits suffisent (par exemple, 1111pour "non en passant possible" ou la colonne sous la forme d'un nombre binaire sinon).
Peter Taylor
Pourquoi les pions ont-ils une meilleure position dans un arbre? Dans le cas le plus courant, elles sont les plus courantes, mais dans les cas extrêmes, R / B / N peut apparaître 10 fois.
Ugoren
@PeterTaylor, en fait, 3 bits suffisent pour passer. Si vous ne comptez que les colonnes avec un pion noir au 5ème rang (en supposant un déplacement blanc), 8 devient invalide.
Ugoren
1
Notez que votre pire des cas est en réalité impossible. La promotion est impossible sans captures (au moins une capture par 2 promotions est nécessaire).
Ugoren
2
Remarque Pour coder 64 positions (pour le roi blanc ou noir), vous avez besoin de 6 bits (2 ** 6 = 64).
lambruscoAcido
17

192 bits (pire des cas)

Voici un schéma de stockage très simple qui devrait faire face à des promotions de pions arbitraires, et ne nécessite jamais plus de 64 + 4 × 32 = 192 bits:

  • Les 64 premiers bits stockent un bitboard qui indique où sont les morceaux (mais pas ce qu’ils sont). C'est-à-dire que nous stockons un bit pour chaque carré de l'échiquier (en partant du carré a1, puis de b1, c1, etc. jusqu'au carré h8), de sorte qu'un carré vide est représenté par 0 et un carré occupé par 1.

  • Ensuite, pour chacune des cases marquées comme occupées sur le panneau d'affichage, nous stockons un quartet de 4 bits codant la pièce sur cette case. Le premier des quatre bits code la couleur de la pièce (0 = blanc, 1 = noir), tandis que les trois autres bits indiquent le type de la pièce:

    +-----+-----+-----+-----+
    |Black|   Piece Type    |
    +-----+-----+-----+-----+
       4     3     2     1    Bits
    

    Type de pièce

    0 = (normal) pion
    1 = (normal) tour
    2 = chevalier
    3 = évêque
    4 = reine
    5 = roi (du joueur à déplacer suivant)
    6 = roi (de l'autre joueur)

    Notez les deux encodages pour le roi, utilisés pour déterminer le tour de joueur à déplacer. (En fait, comme il y a toujours deux rois sur le tableau, ce qui permet quatre combinaisons des codes 5 et 6, nous pourrions coder assez facilement un second bit d'information ici.)

    Black Type Description
    +----+----+--------------------------------+
    |  0 | 5  | White King; White to move next |
    +----+----+--------------------------------+
    |  0 | 6  | White King                     |
    +----+----+--------------------------------+
    |  1 | 5  | Black King; Black to move next |
    +----+----+--------------------------------+
    |  1 | 6  | Black King                     |
    +----+----+--------------------------------+
    

    Pour encoder les informations supplémentaires nécessaires aux règles en passant et castling, nous introduisons un type de pièce supplémentaire, qui désigne un pion ou une tour, en fonction de la ligne sur laquelle il apparaît:

    7 (rangées 1 et 8) = une tour qui n'a jamais bougé, et dont le roi n'a jamais bougé non plus, et qui est donc éligible pour le castling
    7 (rangées 4 et 5) = un pion qui vient de progresser de deux cases, et peut donc être capturé en passant

Mettre tous ensemble:

     Hex Description
    +---+---------------------------------------------+
    | 0 | White Pawn (normal)                         |
    | 1 | White Rook (has moved)                      |
    | 2 | White Knight                                |
    | 3 | White Bishop                                |
    | 4 | White Queen                                 |
    | 5 | White King; White to move next              |
    | 6 | White King                                  |
    | 7 | White Rook (pre castle) / Pawn (en Passant) |
    | 8 | Black Pawn (normal)                         |
    | 9 | Black Rook (has moved)                      |
    | A | Black Knight                                |
    | B | Black Bishop                                |
    | C | Black Queen                                 |
    | D | Black King; Black to move next              |
    | E | Black King                                  |
    | F | Black Rook (pre castle) / Pawn (en Passant) |
    +---+---------------------------------------------+

Le nombre total de bits nécessaires pour coder l'état de la carte est donc de 64 + 4 × # de pièces à bord. Comme il ne peut jamais y avoir plus de 32 morceaux sur la carte, la longueur maximale de ce codage est de 192 bits.

Par exemple, en utilisant le codage décrit ci-dessus, l’état initial de la carte serait codé comme suit (espace inséré pour la lisibilité):

1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111
0111 0010 0011 0100 0101 0011 0010 0111 0000 0000 0000 0000 0000 0000 0000 0000
1000 1000 1000 1000 1000 1000 1000 1000 1111 1010 1011 1100 1110 1011 1010 1111

ou, en hexadécimal:

FFFF 0000 0000 FFFF 7234 5327 0000 0000 8888 8888 FABC EBAF
Ilmari Karonen
la source
2
"Un pion qui a juste avancé de deux cases" et "une tour qui n'a jamais bougé" pourraient partager le même emplacement d'état, car ils s'excluent mutuellement en fonction de la ligne sur laquelle ils se trouvent. L'état extra libre pourrait être utilisé pour encoder "roi de couleur à qui c'est le tour"; vous devriez être capable de vous débarrasser de la question pendante de cette façon.
FireFly
You can also arguably save a bit by only storing 63 bits for the bitboard, and inferring the last bit from the number of men encoded. (It's not clear to me whether this is cheating or not, because it requires external encoding of the length of the bit sequence). And if you ditch the states 6 and 7 for the men you need to encode up to 6^32, which takes 82.7 bits; rounding up to 83 it's saving 13 bits over using states 6 and 7, and you can encode that same information in only 8 bits.
Peter Taylor
Thanks, @FireFly! I've implemented your suggestion. (Peter Taylor's suggestion is even more efficient, of course, but I haven't used it so far because I kind of like the simple binary encoding of the current scheme. You could always submit it as a separate entry...)
Ilmari Karonen
Okay -- this is mildly complicated. But hear me out. If you just take the 1 bit hit on a turn indicator, you could replace king with turn with a piece I call pawn1. Change pawn to pawn0. Now, whenever there is a (not en passant vulnerable) pawn -- you also gain one bit of information on the next piece (either a 0 for pawn0 or a 1 for pawn1). Because there are 16 pawns, you gain 16 bits less 1 for the turn indicator. Whenever there is a pawn that is vulnerable to en passant you have to add one bit back in after it. But as en passant must happen immediately, your minimum gains are 14 bits.
user5957401
You could also do a similar thing with something like bishops. Suppose that you have a bishop not in its a 'special zone' (the 10 spots in its corners and the center row on its side) is marked as special. Because you know its location -- you know its a bishop. Now you have two bishops and could give each of them a 0 or 1 on the next piece. This gives up to another 4 bits -- but the worst case has bishops all over the special zones, and no gain.
user5957401
14

160 bits worst case

After posting my previous answer 22 bytes, I began to wonder if we could get down to 21 bytes. However when I saw Peter Taylor's amazing 166 bytes I thought "Hang on, it looks like five 32-bit words could be possible!"

So after quite a lot of thought, I came up with this: 159.91936391 bytes (a pretty tight fit!) This level of compression would need a fairly complicated program but I have thought about how to make it run in reasonable time.

This is going to be a long post, so please bear with me, I will post what I can today and add a few bits of code soon.

So, here's how to do it:

En Passant and castling encoded by illegal positions (0 bits)

En Passant

As mentioned in other answers, there are a maximum of 5 possible squares on which a pawn vulnerable to en passant can stand. These are the squares next to the pawns of the player whose turn it is.

To encode this, the pawn vulnerable to en passant is exchanged with whatever is on one of the squares in the first or last row. This may be either a man or an empty square. This produces an illegal position, as pawns cannot be on these rows. The decoder must return the pawn to its correct position, extracting the en passant information.

In order for this to not interfere with the castling encoding, it is important that the squares on which the kings stand at the start of the game are not disturbed, and that the en passant encoding procedure does not place the kings next to each other, which would be an illegal king position. To satisfy the second of these points, the encoder has two choices as to which square it exchanges the en passant pawn. First choice square for each of the up to 5 pawns are A8,B8,C8,G8,H8. Second choice: A1,B1,C1,G1,H1.

Castling

A king who is allowed to castle is by definition, still on his initial square. With the white king on his initial square, there are a total of 63 squares where the black king can stand, 58 of which are legal (he is not allowed to move right next to the white king because he would put himself in check.) If the white king is allowed to castle, he is either allowed to castle with his left rook, his right rook, or both. There are thus 3x58=174 possibilities where the white king can castle, another 174 where the black king can castle, and a further 3x3=9 where both can castle, a total of 357.

There are 420 illegal arrangements of the two kings where they are on adjacent squares: 3x4=12 when the white king is in the corner, 5x24=120 when he is on the edge, and 8x36=288 when he is on another square. Therefore there are easily enough illegal positions to encode all possible castling possibilities.

If at least one king is allowed to castle, encoder would look up the castling data and the positional data of those kings not allowed to castle in a table (or alternatively, use an algorithm which I won't specify here) and produce an illegal position of the two kings. It would then exchange the kings with whatever happened to be on these squares.

It is important that this is encoded after and decoded before the en passant, otherwise there are some potential interferences.

Comparison

So, so far I have used no bits! Looking at Peter's answer, I still have the following to encode:

Whose turn is it?                                   1.000 bits
Which squares are occupied by men of which colour? 91.552 bits 
Subtotal                                          *92.552 bits* 
For the two colours, which men and which order?   *68.613 bits* 
GRAND TOTAL                                       161.165 bits

This is for the worst case of 29 men (see Peter's answer.) Below I will show how I make a marginal improvement (at least for the case of 29 men) on both of the points marked in **.

Which squares are occupied / whose turn is it?

The easy way to encode which squares are occupied is with a 64 bit grid. This also tells us how many squares are occupied. However it is somewhat wasteful because it is not possible for there to be more than 32 squares occupied. My solution is to use 1's to encode for the occupied squares when it is White's turn and 0's to encode for occupied squares when it is Black's turn. Now all combinations are used and there is no waste.

Thus we save on a bit for storing the turn: less than 32 1's, it is white's turn, more than 32 1's, it is black's turn. The only ambiguous case is when all the men are on the board and there are 32 1's and 32 0's. Therefore an extra bit is needed for this case only. As there can be no promotions until a capture has occurred, this extra bit does not affect the worst case overall (which occurs with 3 men captured and 29 remaining.)

Colour of the men occupying the squares

We know from the above how many men there are. The following extract of Pascal's triangle tells how many possibilities there are for the different distributions of black and white. For example, for 3 men, the possibilities are: 3 black men (1 permutation) 2 black, 1 white, (3 permutations), 1 black, 2 white (3 permutations), 3 white (1 permutation.) The total is 23=8. In general, for the lower numbers of men there are 2n possibilities. However the all black and all white possibilities are illegal (at least the king of each side must be on the board) so the actual number of legal permutations is 2n-2 (ignore the 1's on the Pascals triangle.)

For more than 16 total men, there is an additional constraint in that there can be no more than 16 men of each colour on the board. Therefore when all 32 men are on the board there must be 16 of each and the total number possibilities is 601080390 which is quite a bit less than 232.

1   1    1    1      1     1      1       1       1        1        1         1         1         1          1          1          1 
1   2    3    4     5      6      7       8       9       10       11        12        13        14         15         16         17
1   3    6   10    15     21     28      36      45       55       66        78        91       105        120        136        153
1   4   10   20    35     56     84     120     165      220      286       364       455       560        680        816        969
1   5   15   35    70    126    210     330     495      715     1001      1365      1820      2380       3060       3876       4845
1   6   21   56   126    252    462     792    1287     2002     3003      4368      6188      8568      11628      15504      20349
1   7   28   84   210    462    924    1716    3003     5005     8008     12376     18564     27132      38760      54264      74613
1   8   36  120   330    792   1716    3432    6435    11440    19448     31824     50388     77520     116280     170544     245157
1   9   45  165   495   1287   3003    6435   12870    24310    43758     75582    125970    203490     319770     490314     735471
1  10   55  220   715   2002   5005   11440   24310    48620    92378    167960    293930    497420     817190    1307504    2042975
1  11   66  286  1001   3003   8008   19448   43758    92378   184756    352716    646646   1144066    1961256    3268760    5311735
1  12   78  364  1365   4368  12376   31824   75582   167960   352716    705432   1352078   2496144    4457400    7726160   13037895
1  13   91  455  1820   6188  18564   50388  125970   293930   646646   1352078   2704156   5200300    9657700   17383860   30421755
1  14  105  560  2380   8568  27132   77520  203490   497420  1144066   2496144   5200300  10400600   20058300   37442160   67863915
1  15  120  680  3060  11628  38760  116280  319770   817190  1961256   4457400   9657700  20058300   40116600   77558760  145422675
1  16  136  816  3876  15504  54264  170544  490314  1307504  3268760   7726160  17383860  37442160   77558760  155117520  300540195
1  17  153  969  4845  20349  74613  245157  735471  2042975  5311735  13037895  30421755  67863915  145422675  300540195  601080390

The number of possibilities can be found by summing the "rows" of this extract of pascals's triangle (by which I mean the NE-SW diagonals of the table , because I rotated the triangle 45 degrees anticlockwise for convenient presentation. The number of bits needed to encode the turn, occupied squares and colour of the men is therefore as follows:

Up to 25 men: slightly less than 64+(number of men)
For more than 25 men:

men permutations  bits required  occupied sq+turn   
    of colours                   (bits required)  total bits
26   55791790     25.7335495      64              89.7335495
27  100960110     26.58921015     64              90.58921015
28  175844430     27.3897244      64              91.3897244
29  290845350     28.115677       64              92.115677   
30  445962870     28.73234836     64              92.73234836
31  601080390     29.16298271     64              93.16298271
32  601080390     29.16298271     65              94.16298271

For the two colours, which men and in which order?

As per previous answers, no pawns can be promoted until a capture has occurred, because each pawn is blocked by a pawn of the opposite colour on the same column. Peter's answer considered (as an upper bound) that every capture could lead to one promotion for the side being captured and two for the side capturing. However we can split this into several cases:

  1. Black pawn captures white pawn: Now the capturing pawn is free to promote as he is now on a different column. His colleague on the same column can also promote. The black pawn on the white pawn's original column can also promote. This is the only case that permits 3 promotions.

  2. Black pawn goes past white pawn on adjacent column and then captures white piece (other than pawn) behind. This enables the capturing pawn and the white pawn who was on the original column to promote. One promotion for each side.

  3. White pawn is captured by piece (other than pawn.) This would normally allow one promotion for Black only. The only exception is when it liberates a blocked up pawn formation which was already caused by several captures moving pawns onto the same column.

So, as a base case, we can consider that each capture permits one promotion each for both sides. In the case that the man captured is a pawn, there may be an additional promotion for the capturing side.

I have written a program similar to Peter's. It is somewhat cruder, but it has an important addition: it can calculate the number of permutations possible when a player starts with less than the normal 8 pawns. Here is some data produced by the program:

Max promotions   0            1            2             3             4              5 
8 PAWNS 
13 men    18725850    146911050    567991710    1373480394    2297173164     2902775304
14 men    36756720    339459120   1555313760    4501448952    9021804792    13325103792
15 men    60810750    660810150   3555401850   12144582450   28834205400    50030580600
16 men    64864800    843242400   5383778400   21810428640   61514893440    1.26476E+11
7 PAWNS                         
13 men    17760600    141003720    546949260    1321302840    2200401060     2761730400
14 men    30270240    287567280   1331890560    3852728880    7641553920    11068817760
15 men    32432400    372972600   2075673600    7209001800   17135118000    29315286000
6PAWNS                          
13 men    14054040    114594480    447026580    1069488420    1739577840     2113185360
14 men    15135120    151351200    718918200    2087805720    4073028960     5697051360                         
5 PAWNS                         
13 men     6486480     55135080    217297080     510630120     794233440      910235040

We can see that for a case like 8 pawns, 15 men, 0 promotions, the number of permutations is only slightly less than for 8 pawns 16 men, 0 promotions. However if we consider a case like 7 pawns, 15 men, 0 promotions (which is the same as considering that the man captured was definitely a pawn) we get about half the number of permutations.

So, For the case when Black has 16 men and white has 15 men, we can consider an upper bound estimate of 2 promotions for Black and one promotion for White:

5383778400 x 660810150 = 3.55766E+18 possibilities

However we can do better if we proceed as follows.

A. Consider one promotion each for Black and White assuming that the man White has lost could be of any type:

843242400 x 660810150 = 5.57223E+17 possibilities

B. Consider the additional possibilities for Black if he has two promotions, multiplied by only those possibilities for White in which he has lost a pawn.

(5383778400-843242400) x 372972600 = 1.6935 E+18 possibilities.

Adding these two together we get 2.25072E+18, which is a smaller number than 3.55766E+18. All the possibilities for up to 3 captures (29 men remaining) are listed below.

(Promotions, Pawns lost) possibilities

BLACK 16 MEN, WHITE 15 MEN. ESTIMATE   3.55766E+18 = 2^61.62563249
(1,0)   843242400 x (1,0)  660810150 = 5.57223E+17
(2,0)  4540536000 x (1,1)  372972600 = 1.6935 E+18
                               TOTAL   2.25072E+18 = 2^60.96509144


BLACK 16 MEN, WHITE 14 MEN. ESTIMATE   9.5675 E+19 = 2^66.3747752
(2,0)  5383778400 x (2,0) 1555313760 = 8.37346E+18
(3,0) 16426650240 x (2,1) 1331890560 = 2.18785E+19
(4,0) 39704464800 x (2,2)  718918200 = 2.85443E+19
                               TOTAL   5.87962E+19 = 2^65.67235739


BLACK 16 MEN, WHITE 13 MEN. ESTIMATE   2.69447E+20 = 2^67.86856193
(3,0) 21810428640 x (3,0) 1373480394 = 2.99562E+19
(4,0) 39704464800 x (3,1) 1321302840 = 5.24616E+19
(5,0) 64960896000 x (3,2) 1069488420 = 6.94749E+19
(6,0) 69702272640 x (3,3)  510630120 = 3.55921E+19
                               TOTAL   1.87485E+20 = 2^67.34533572


BLACK 15 MEN, WHITE 15 MEN. ESTIMATE   1.47491E+20 = 2^66.99918768
(2,0)  3555401850 x (2,0) 3555401850 = 1.26409E+19
(2,1)  2075673600 x (3,0) 8589180600 = 1.78283E+19
(3,0)  8589180600 x (2,1) 2075673600 = 1.78283E+19
(3,1)  5133328200 x (3,1) 5133328200 = 2.63511E+19
                  TOTAL BOTH COLUMNS   7.46486E+19 = 2^66.01674923


BLACK 15 MEN, WHITE 14 MEN. ESTIMATE   4.51366E+20 = 2^68.61286007      
(3,0) 12144582450 x (3,0) 4501448952 = 5.46682E+19
(3,1)  7209001800 x (4,0) 4520355840 = 3.25873E+19
(4,0) 16689622950 x (3,1) 3852728880 = 6.43006E+19
(4,1)  9926116200 x (4,1) 3788825040 = 3.76083E+19
(5,0) 21196375200 x (3,2) 2087805720 = 4.42539E+19
(5,1) 12180168000 x (4,2) 1985223240 = 2.41804E+19
                  TOTAL BOTH COLUMNS   2.57599E+20 = 2^67.80368692

So for the worst case of one side with 15 men and the other with 14 men, we need 67.804 bits.

Adding this to the 92.116 bits required to specify which squares and which colour, we get a total of 67.804+92.116=159.92 bits.

Level River St
la source
1
Many thanks to @Einacio for changing my decimal commas to decimal points. I did a lot of my tables on Excel on a Spanish computer, and posting this was a big job, so fixing this was something I left for later. As I say, I haven't finished this post yet, I will add my permutation counting program and some fragments of code about encoding / decoding when I have time. PS. I had no idea so many people were reading this :-)
Level River St
at the end you managed to tak of bytes instead of bits which is the thing you meant, that might cause some cinfusion to the readers
masterX244
13

177 bits worst case

This algoritm, while hardly simple, gives a 177 bits worst case (184b=23B in practice), 13b (16b=2B) best case scenario when there's just 2 kings left.

Bit     Description
  1     Turn (0=white 1=black)
  2-  7 White king position (2-4=letter, 5-7=number)
  8- 13 Black king position (8-10=letter, 11-13=number)
 14- 75 Which squares contain pieces (skipping the 2 king squares, so only 62)
        Ordered a1-h1,a2-h2,(...)
 76-105 Which color owns the square with their piece (0=white, 1=black)
        If there's LESS than 30 pieces (apart from kings), this area is
        smaller
106-end Square data

Square data has the following system:
Every square gets assigned a number which determines piece. Number is:
0 Queen
1 Rook
2 Bishop
3 Knight
4 Pawn OR allowed-castle rook depending on square
5 Pawn subject to potential enpassant

The first bits (max 13) is the potential enpassant slots from A-H, determined
from data of 1 + 14-105 for which of the squares has a piece, and which color
owns the piece and whose turn it is. For example, if turn is White (bit 1 is
0), all pieces on row 5 which is Black owned (determined from 14-105 metadata)
and has at least 1 adjacant (on the same row) square owned by White, is
explained in A-H order. A base 6 number is used which is converted to binary
for the storage. On reading, it's converted and read A-H according to the
numbers above (4 is obviously pawn in this case).
The second amount of bits takes care of the 1st and 8th row (not corners!)
in b1-g1,b8-g8. These only take up 2 bits since 4 or 5 is never needed
(pawn on 1st or 8th is invalid).
The third amount of bits takes care of the rest of the board, in the following
order: a1,h1,a2-h2,a3-h3,a4-h4,a5-h5,a6-h6,a7-h7,a8,h8 (skipping the
"enpassant" slots), in base 5 (since piece ID 0-4 are the only used) converted
to binary.

Best case: 13 bits (bit 1 for turn, bit 2-12 for kings)
Worst case: 177 bits
* 32 pieces with kings
* 5 viable enpassant pawns
* No pieces at 1st or 8th row (except if kings+rooks are at initial posions
whether or not they can castle)
In this case, the space as following:
  1   bit   turn
+ 12  bits  king positions
+ 62  bits  which squares have pieces
+ 30  bits  color of pieces
+ 13  bits  enpassant area
+ 0   bits  initial rows area
+ 59  bits  the rest of the area
= 177 bits  total

Potential optimizations but not really worth it IMO:
* Decrease average by make corners 2 bits as well if kings aren't at e1/e8
* Alter reading order to read b1-g1,b8-g8 last - decreases worst case to
  176 bits if the "which squares have pieces" area is cut off if 30 existing
  pieces has been defined already. Would actually save 8 bits on file but meh
FIQ
la source
Very nice. You can make this even more efficient by replacing bits 14-105 (92 bits) with an encoding based on multinomial coefficients. sum_{i=0}^{15} sum_{j=0}^{15} 62! / (i! j! (62-i-j)!) < 2^87.45.
Peter Taylor
The only thing I would change is to create a rather more simplified version for the enpassant area. For example: if you encode the 30 pieces in base 5, and there are a maximum of 5 enpassant positions, then you can have 5^31 < 2^72. The same as if you split them in enpassant (13) and non-enpassant(59), but without the extra complexity.
Alin Stoian
Doing that would actually use 1 extra bit. The reason is that there can (worst case) be 5 enpassant possibilities, but I still need to declare the possibility for "no enpassant", i.e. a 6th state. The 1 extra bit would in this case go to declaring enpassant possible or not (and with this approach I could as well use an even simpler approach with encoding the 30 pieces skipping the enpassant block, and use 3 bits seperately for enpassant check, which would also lead to +1 bit use). The following 5th row would enable 5 potential enpassants (White's turn): BWBBWBBW
FIQ
yes, you're right.
Alin Stoian
7

166 bits

  • 1 bit: whose turn is it?
  • 2 bits: which castling options are open? (NB on close reading of the question, it's only necessary to record castling options for the player whose turn it is).
  • lg 6 ~= 2.585 bits: which en passant options are open? (See my other answer)
  • lg sum_{i=1}^{16} sum_{j=1}^{16} 64! / (i! j! (64-i-j)! = lg 3629590441720924477681996172 ~= 91.552 bits: which squares are occupied by men of which colour?
  • At worst lg 451366131803622235200 ~= 68.613 bits to indicate which men and in which order (see below)

Using arithmetic encoding (since at each step we're applying a uniform distribution) we can achieve ceil(3 + 2.585 + 91.552 + 68.613) = 166 bits.

The encoding for the men: given that we know how many men of a given colour there are, we can easily enumerate all possible distributions/multisets of men (e.g. with 5 men we might have one King, one Queen, two Rooks, and a Pawn) and then we can consider all possible permutations of each distribution.

However, we can do even better by taking into account interdependencies. I'm only doing this on a very basic level: how many possible promotions? A pawn can only become "passed" and able to promote in three ways: it captures and so moves into a different column; or its opposing pawn captures and so moves into a different column; or its opposing pawn is captured. Thus a capture for white potentially creates two passed pawns for white and one for black.

We can build up a partial table of the upper bounds on promotions:

(Max white promos, max black promos):

           White men
           16      15      14      13
Black men
       16  (0, 0)  (1, 2)  (2, 4)  (3, 6)
       15  (2, 1)  (3, 3)  (4, 5)  (5, 7)
       14  (4, 2)  (5, 4)  (6, 6)  (7, 8)
       13  (6, 3)  (7, 5)  (8, 7)  (8, 8)

We can also calculate the number of permutations given that a player has N men and no more than P promoted pawns:

Num of permutations (cumulative):
    max promotions: 0              1              2              3              4              5              6              7              8
 1 men              1              1              1              1              1              1              1              1              1
 2 men             10             10             10             10             10             10             10             10             10
 3 men             72             75             75             75             75             75             75             75             75
 4 men            436            496            500            500            500            500            500            500            500
 5 men           2305           3025           3120           3125           3125           3125           3125           3125           3125
 6 men          10746          17106          18606          18744          18750          18750          18750          18750          18750
 7 men          44170          88795         106260         109179         109368         109375         109375         109375         109375
 8 men         159832         415360         575240         619200         624744         624992         625000         625000         625000
 9 men         509841        1721961        2884815        3398769        3504735        3515301        3515616        3515625        3515625
10 men        1447200        6258240       13063080       17697780       19260180       19510320       19530840       19531230       19531240
11 men        3706065       20021265       52183395       85007571      102173181      106786581      107369592      107409918      107410281
12 men        8678340       57101220      183088620      364510476      509818716      570620556      584017632      585352152      585430164
13 men       18725850      146911050      567991710     1373480394     2297173164     2902775304     3107861328     3143928216     3146014014
14 men       36756720      339459120     1555313760     4501448952     9021804792    13325103792    15664512864    16283899632    16360920576
15 men       60810750      660810150     3555401850    12144582450    28834205400    50030580600    66655789200    73588394880    74576231730
16 men       64864800      843242400     5383778400    21810428640    61514893440   126475789440   196178062080   240747386880   253686232800

Combining the two, we can get the number of bits of required to specify both permutations given the numbers of men on both sides:

           White men
           16      15      14      13      <13
Black men
       16  51.902  61.626  66.375  67.868  <=67.009
       15  --      67.000  68.613  67.534  <=65.243
       14  --      --      67.734  65.480  <=63.055
       13  --      --      --      63.102  <=60.676

If it's not in this section of the table, we can just assume that both sides have up to 8 promotions and we're still doing better than the worst case, which is 68.613 bits when one has 14 men and the other has 15 men.

Note that this is still a long way from being a perfect representation, because it allows many many illegal positions.

Code for calculating the permutation table:

import java.util.*;

public class ChessCombinatorics {
    public static void main(String[] args) {
        long[] f = new long[17];
        f[0] = 1;
        for (int i = 1; i < 17; i++) f[i] = i * f[i-1];

        // Indexed by num promotions, then total num men.
        long[][] distribs = new long[9][17];
        long[][] perms = new long[9][17];

        for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
            Map<Integer, Map<String, Long>> numCases = new HashMap<Integer, Map<String, Long>>();
            for (int i = 1; i < 17; i++) numCases.put(i, new HashMap<String, Long>());

            for (int extraQ = 0; extraQ <= promotedPawns; extraQ++) {
                for (int extraR = 0; extraR + extraQ <= promotedPawns; extraR++) {
                    for (int extraN = 0; extraN + extraR + extraQ <= promotedPawns; extraN++) {
                        int extraB = promotedPawns - extraN - extraR - extraQ;
                        int unpromotedPawns = 8 - promotedPawns;

                        // Promoted pawns should only count towards their new type if the existing ones are alive.
                        // Otherwise we double-count some cases.
                        int minQ, maxQ, minR, maxR, minN, maxN, minB, maxB;
                        if (extraQ == 0) {minQ = 0; maxQ = 1;} else {minQ = maxQ = 1 + extraQ;}
                        if (extraR == 0) {minR = 0; maxR = 2;} else {minR = maxR = 2 + extraR;}
                        if (extraN == 0) {minN = 0; maxN = 2;} else {minN = maxN = 2 + extraN;}
                        if (extraB == 0) {minB = 0; maxB = 2;} else {minB = maxB = 2 + extraB;}

                        for (int numQ = minQ; numQ <= maxQ; numQ++) {
                            for (int numR = minR; numR <= maxR; numR++) {
                                for (int numN = minN; numN <= maxN; numN++) {
                                    for (int numB = minB; numB <= maxB; numB++) {
                                        for (int numP = 0; numP <= unpromotedPawns; numP++) {
                                            // The number of possibilities at these values is (numK + numQ + numR + numN + numB + numP)! / (numK! numQ! numR! numN! numB! numP!)
                                            numCases.get(1+numQ+numR+numN+numB+numP).put(numQ+","+numR+","+numN+","+numB+","+numP, f[1 + numQ + numR + numN + numB + numP] / f[numQ] / f[numR] / f[numN] / f[numB] / f[numP]);
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }

            for (int numMen = 1; numMen < 17; numMen++) {
                distribs[promotedPawns][numMen] = numCases.get(numMen).size();
                if (distribs[promotedPawns][numMen] > 0) {
                    for (Long l : numCases.get(numMen).values()) perms[promotedPawns][numMen] += l;
                }
            }
        }

        System.out.println("Num of permutations (cumulative):");
        System.out.println("    max promotions: 0              1              2              3              4              5              6              7              8");
        for (int numMen = 1; numMen < 17; numMen++) {
            System.out.print(String.format("%2d men", numMen));
            long cumul = 0;
            for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
                cumul += perms[promotedPawns][numMen];
                System.out.print(String.format("%15d", cumul));
            }
            System.out.println();
        }

        System.out.println("Entropy of permutations:");
        System.out.println("    max promotions: 0              1              2              3              4              5              6              7              8");
        for (int numMen = 1; numMen < 17; numMen++) {
            System.out.print(String.format("%2d men", numMen));
            long cumul = 0;
            for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
                cumul += perms[promotedPawns][numMen];
                System.out.print(String.format("  %6.3f", Math.log(cumul) / Math.log(2)));
            }
            System.out.println();
        }

    }
}
Peter Taylor
la source
How do you deduce the positions of the kings? You use 15 men in your computation and no special bits for king positions.
Alin Stoian
@AlinStoian, oops. I had a < rather than <= in the output loop of my program. Thanks for pointing it out. I could still recover the previous score by special-casing all 32 men being on the board, but I won't do that right now.
Peter Taylor
Interesting data! Theoretical worst case with 3 men captured
Level River St
@steveverrill, what I would really like to do is encode the pawn positions and number of promotions in one "block" and then the piece positions and values. However, there are at least 2^38 pawn positions without taking into account promotions, and enumerating them efficiently has so far escaped me.
Peter Taylor
@petertaylor If you only have 16 pawns on the board, restricted to 48 squares, you have 48!/32!/8!/8!= 29019905518636890 possibilities. Slightly more than 2^54! Of these, some are illegal, you can't have all the pawns of one colour on one side of the board.
Level River St
5

178 bits (174 at a pinch!) worst case

Hi, just getting back into coding, which I haven't really done since college. I saw this site and thought this looked interesting. I did a little theroretical check and it appears at least 146 bits are needed for a perfect algorithm, probably quite a few more (I will explain in the comments when I have a moment.)

Anyway, this is the way I structure the data. The basic concept comes in at 178 bits, but with some jiggery pokery it can brought down to 174 (that's 21 3/4 bytes). 175 is slight easier to program, is more human readable, and still within 22 bytes.

A) Position of both kings: 6 bits each for white and black 12 BITS

B) Of the remaining 62 squares, which are occupied? A matrix of 62 BITS

C) Whose turn is it? 1 BIT

TOTAL SO FAR: 75 BITS

D) En Passant. If it is white’s turn to move, up to 5 black pawns may look like they could be captured En Passant. The black pawn has to be on row 5 (from bottom to top starting at zero), and have a white pawn next to it. One situation with maximum number of possible captures looks like this:

B W B B W B B W

If there were 6 black pawns on row 5, white would only have 2 squares to stand on and could only threaten 4 black pawns, so it is not possible to have more than 5 black pawns apparently under threat from En passant at the same time. So we need a number 1 to 5 indicating which of the (up to 5) pawns on row 5 that has a hostile (in this case white) pawn next to it was advanced 2 squares on the last turn (or, zero if no pawn in this situation was moved in this way on the last turn.)

E) Of the up to 30 occupied squares (not including kings) what do they contain?

There are 10 possibilities, each represented by a decimal number.

The least significant bit represents the colour.

Hence even numbers are white, odd numbers are black.

White/Black

Pawn 0/1 (or Rook that is allowed to castle*)

Knight 2/3

Bishop 4/5

Rook 6/7

Queen 8/9

*A rook that is allowed to castle (and therefore has never been moved from the first or last row) is represented by 0 or 1 instead of 6 or 7. It cannot be confused with a pawn, because pawns cannot be found on the first or last row.

This gives a decimal number of up to 30 digits, which we can multiply by 6 and then add the code for En passant. The resulting number will fit into 103 bits, which when added to the 75 mentioned above is 103+75=178 bits. Actually, if we simply multiply by 10 instead of 6, it makes no difference to the number of bits used, and decoding is easier.

This is just 2 bits more than 22 bytes. However we can push it down to 174 bits, as explained below.

If no piece has been captured, then it is impossible for a pawn to be promoted.

The proof is as follows. Imagine white is obsessed with promoting his pawn on (for example) column E, from the very start of the game. There is a black pawn opposite this pawn that is in the way. Therefore to promote this pawn, one of the following must happen:

1) The black pawn is captured.

2) The black pawn captures another piece and therefore moves out of the way.

3) the white pawn captures a pawn on an adjacent column such as column D.

4) the white pawn passes (or is passed by) a black pawn on an adjacent column and then captures a piece on that same adjacent column, causing the white pawn to change column.

Case 4 is the most interesting, because it is not just the white pawn that started on column E that now has a clear path to promotion. The black pawn on column that remains on column E can also promote. Therefore a single capture can clear the way for one pawn of each colour to promote.

Anyway, the fact that no pawn can promote until a piece is captured means that we do not have to store the 30th piece. We can work it out by elimination (or by subtraction, because the complete set of piece codes at the start of the game always adds up to the same amount =80.) One minor point is that we must ensure that the squares where the rooks stand at the start of the game are amongst the first scanned (because if they were last, we would not know if the rook could castle or not.) This is easily done by scanning row 0 and then rows 7 to 1: For r= 8 to 1 scan row [r mod 8].

So, the matrix of bits in (B) will tell us how many pieces there are (excluding kings.) If there are a full 30, ignore the last piece when encoding, the decoder will work out what it was. We now have an up to 29 digit decimal number, which we multiply by 6 and add to the En Passant code. The resulting number will just squeeze into 99 bits, giving a total of 99+75=174 bits.

As an example Here’s an actual position. White has just made his first move (advanced king’s pawn) and it is Black`s turn.

rnbqkbnr
pppppppp


    P

PPPP PPP
RNBQKBNR

A) Position of kings (White / Black in octal, 12 bits): 03 73 = 000011 111011

B) Which squares are occupied? Start with row zero (bottom row) then all other rows from top to bottom, skipping the kings:

1111 111

1111 111
11111111
00000000
00000000
00001000
00000000
11110111 

C) Black’s turn: Turn Bit = 1

D) En Passant. There is no white pawn next to next to a black pawn, therefore there is no pawn that can be taken en passant (even though this pawn did advance last move) so D=0. If, instead of considering only pawns which have a hostile pawn beside them, we consider all pawns that do not have friendly pieces beside them on both sides, then D would be 1, as there is one such pawn in this situation, and this particular pawn was indeed moved in the last turn.

E) Again, bottom row first, then all other rows from top to bottom, skipping the kings, and with the four uncastled rooks referred to as 0 or 1 (numbers normally reserved for pawns.)

RNBQ BNR =   0248 420
rnbq bnr =   1359 531
pppppppp =   11111111
PPPPPPPP = (0)0000000

The 30th digit (in brackets) can be discarded.

Though it is not very apparent here, the pawn that White has advanced is actually at one end of the list of pawns, because we scan row by row.

Our data now looks like this, with 29 codes for the content of squares, plus the En Passant code:

 (0 discarded) 0000000 11111111 1359531 0248420 (0 en passant)

It is best to scan from right to left when decoding and from left to right (reverse order) when encoding. This means that when there are fewer pieces we will have a smaller number, while retaining maximum consistency (i.e we want the blank space / zeroes to be leading, not trailing, to enable compression of sparsely occupied boards.) When we have only 2 kings on the board, we will have the 75 bits mentioned above, plus 3 bits to store en passant data = 78 bits in the best case. Each additional piece comes in slightly under 3.5 bits (2 pieces can be stored in 7 bits, because 100<128.)

There is a practical problem in that a 99 bit integer is too big to fit in a 64 bit integer variable, which means many programming languages do not provide support for it (you can't simply convert a string representation of a 29-30 digit number into an integer.) As an easy way of encoding for 22 bytes, we can break a 30 digit number (29 piece codes + en passant code) into two 15 digit numbers each of which will fit in 50 bits each (total 100 bits plus the 75 mentioned above makes 175 bits worst case.)

For maximum compression, as stated above, 29 decimal digits plus the En Passant code (6 possible values), will just about fit into 99 bits (for a total of 174 bits) but without support from the language for integers of this size it is complicated to program. It may be easier to separate out the 29 colour bits and work with the piece-type codes (5 possibilities) and En passant code (6 possibilities) separately from the colours (70 bits, almost fits into a 64 bit variable.)

Level River St
la source
Nice trick with the last man.
Peter Taylor
5

Here is a full solution, actual worst case 181 bits

The focus here is a simple program you can easily understand

Input is FEN, here is opening position, it has six fields (5 & 6 ignored):

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

First field (piece placement) is parsed

perl -pe 's/\d/"_"x$&/ge;s/\s.*//;s|/||g'

To produce:

rnbqkbnrpppppppp________________________________PPPPPPPPRNBQKBNR

Field one: encode the location of kings (12 bits):

printf("%b",index('k',$_))
printf("%b",index('K',$_))

Field two: encode pieces (up to 5 bits per piece):

s/_/0/g     Blank
s/P/100/g   From here, as normal chess meaning
s/p/101/g
s/Q/11000/g
s/q/11001/g
s/R/11010/g
s/r/11011/g
s/B/11100/g
s/b/11101/g
s/N/11110/g
s/n/11111/g
s/K//
s/k//

Field three: active color (1 bit)

s/w/0/
s/b/1/

Field four: castling availability (4 bits)

m/K/?1:0
m/k/?1:0
m/Q/?1:0
m/q/?1:0
  • FIQ shows this field could be removed altogether.

Field five: en passant (zero or 3 bits)

printf("%b",ord($1)-ord("a")) unless m/-/
// The EP's rank is 3 or 6 based on active color, only need to encode file

Naive worst case 200 bits

  • Two kings placements - 12 bits
  • Board
    • QRRBBNN QQQQQQQQ - 75 bits
    • qrrbbnn qqqqqqqq - 75 bits
    • Empty squares - 30 bits
  • Active color - 1 bit
  • Castling - 4 bits
  • En Passant - 3 bits

Actual worst case

Each player cannot promote all pawns without capturing other pieces. Here is the entropy effect of piece capture:

  • PpR (3+3+5 = 11 bits) => Qq_ (5+5+1 = 11 bits)
  • PPpp (3+3+3+3 = 12 bits) => QQq_ (5+5+5+1 = 16 bits)

So actually the worst case board is:

  • QRRBBNN QQQQQQQQ - 75 bits
  • qrrbbnn qqqq - 55 bits
  • Empty squares - 34 bits

Worst case is to promote all pieces rather than leave pawn for en passant.

TOTAL ACTUAL WORST CASE WITH SHOWN CODE 12+75+55+34+1+4 = 181 bits

FIQ shows two improvements to this simple scheme, but they are harder to code:

  • Remove bit 2 from piece encoding on rows 1 and 8 since pawns can't go there (up to 16 bit savings)
  • Use pawns to encode castable rooks (4 bit savings)

The only remaining code not shown in this answer (for brevity) is: breaking the input FEN in fields (split /\s/) and variable assignment.

William Entriken
la source
A player can promote all his pawns (Given that he is able to capture enemy pawns); Qn4QQ/Qb6/Qq1k4/Qr6/Qb6/Qr6/Qn4NK/RNB2B1R b - - 0 84
Krzysztof Szewczyk
@KrzysztofSzewczyk, yes that is noted above at PPpp => QQq_
William Entriken
4

Total data needs 33 bytes

(Thanks to someone in the comments I realised this doesn't work for pawn promotion. Will update this when I can solve it)

for the first byte we use five bits:

  • first bit: player's turn, 1=white
  • second bit: black king side castle, 1=can castle
  • third bit: black queen side castle, 1=can castle
  • fourth bit: white king side castle, 1=can castle
  • fifth bit: white queen side castle, 1=can castle

the next 32 bytes are used to represent each chess piece, in a predefined order

  • 3-bits: represent row
  • 3-bits: represent column
  • 1-bit: represent en-passant, 1=can en-passant
  • 1-bit: if "can en-passant" bit is 1: indicates which side, 0=left
    else represent whether it has been captured. 0=not captured
    (If it can en-passant then it is definitely not captured)

Some C code to represent this idea (which doesn't actually work)

int main() {
    char b, c[32], i;

    //decode:

    FILE *p=fopen("/path/to/file.csv","r");
    fscanf(p,"%d,",&b);
    for(i=0;i<31;i++) fscanf(p,"%d,",&c[i]);
    fscanf(p,"%d",&c[31]);
    fclose(p);
    if(b&16) /* white's turn */
    else /* black's turn */
    if(b&8) /* black king side can castle */
    if(b&4) /* black queen side can castle */
    if(b&2) /* white king side can castle */
    if(b&1) /* white queen side can castle */

    for(i=0;i<32;i++) {
        int row, column;
        row=c[i]&7;
        column=c[i]&56;
        if(c[i]&64 && isPawn(c[i])) { //can en-passant
            if(c[i]&128) //can en-passant to the right
            else //can en-passant to the left
        }
        if(!(c[i]&64)) {
            if(c[i]&128) //captured
            else //not captured
        }
    }

    //encode:

    p=fopen("/path/to/file.csv","w");

    if(b&16) b&=239;
    else b|=16;
    if(black_king_side_cannot_castle) b&=247;
    if(black_queen_side_cannot_castle) b&=251;
    if(white_king_side_cannot_castle) b&=253;
    if(white_queen_side_cannot_castle) b&=254;

    for(i=0;i<32;i++) {
        c[i]=row;
        c[i]+=column*8;
        if(isPawn(c[i]) && can_en_Passant) {
            c[i]|=64;
            if(can_en_Passant_left) c[i]&=127;
            else c[i]|=128;
        }
        if(!(c[i]&64)) {
            if(isCaptured(c[i])) c[i]|=128;
            else c[i]&=127;
        }
    }
    fprintf(p,"%d,",b);
    for(i=0;i<31;i++) fprintf(p,"%d,",c[i]);
    fprintf(p,"%d",c[31]);
    fclose(p);
    return 0;
}
pastebin.com slash 0mr8spkT
la source
-1, this is not an answer...
Doorknob
1
Your predefined order won't be of much use when a pawn reaches the eighth rank and is promoted to a knight, bishop, rook or queen.
squeamish ossifrage
@Doorknob of Snow ok I'm working on the algorithm, it's just a bit late at night and I'm tired, so I'm doing it slightly slower
pastebin.com slash 0mr8spkT
Well then why did you post this if you don't even have a solution?
Doorknob
3
if you people still think this answer is crap and has no value here then go ahead and vote to delete it and downvote it and you may as well delete my account here. i just want to share what i could think of, why do you guys have to be so mean?
pastebin.com slash 0mr8spkT
4

256 242 bits

Here's a basic compression algorithm that can probably be improved on because it doesn't exclude certain illegal positions from being represented.

The board starts with 5 bits of header info, as follows:

0 1 1 1 1
---------
1 2 3 4 5

1: Turn (black = 1, white = 0)
2: Black can castle queen-side
3: Black can castle king-side
4: White can castle queen-side
5: White can castle king-side

Then, a string of 12 bits representing the positions of the kings.

0 0 0 1 0 0 1 1 1 1 0 0
-----------------------
0 0 0 0 0 0 0 0 0 1 1 1
1 2 3 4 5 6 7 8 9 0 1 2

01 - 03: white king's rank
04 - 06: white king's file
07 - 09: white king's rank
10 - 12: white king's file

Then, a huge 64-digit number in base 11, which is then multiplied by 9 to add another digit at the end representing the state of en-passant. Each digit in base 11 represents a square on the board, with the following possible values:

0: empty

1: white pawn
2: white knight
3: white bishop
4: white rook
5: white queen

For the black equivalent of each white piece, add 5.

And the digits in base 9:

0: no en-passant possible
1 - 8: en-passant on rank 1 - 8

1164 × 9 is about 2224.57, which requires 225 bits to encode. Plus the 17 header bits at the top, is 242 bits total.


Thanks to ugoren for the improvements.

Joe Z.
la source
This is very similar to ace's algorithm, but it uses board positions instead of pieces in a predetermined order, which both excludes it from having the pawn promotion problem and allows the data to be chopped off a bit.
Joe Z.
You can save en-passant as one number between 0 and 8 (on which column can the current player capture en-passant?). 13^64 * 9 is 239.99, saving 11 bits. Save more by encoding king positions separately.
ugoren
According to Doorknob of Snow who commented on my post, this kind of answer is "not an answer". Just saying. FYI his comment was posted before I added the C code to my answer.
pastebin.com slash 0mr8spkT
@ugoren: I forgot about that. You're right, I forgot that only one pawn can be en-passant at the same time.
Joe Z.
@ace: Your answer isn't invalid because it doesn't include encoding and decoding code; it's invalid because it doesn't account for the pawn-promotion case (in which case your predefined order of pieces doesn't do anything). The problem, at its core, asks for a data-encoding scheme. The program is just something to interface with that.
Joe Z.
3

? bits

(≥ 217 worst-case, 17 best-case, 179 for initial board)


Encoding description

Extra metadata consists of whose turn it is (one bit) and castling (four bits, i.e. may white castle on kings' side? on queens' side? and similarly for black).

For the board position itself, we encode it as a set of active pieces. Well, actually, we make sure to enumerate the pieces in a particular order for reasons that I'll explain in a bit. For each piece we store its colour (one bit), its kind (three bits to encompass 6 kinds, plus one extra kind for "pawn that may be taken by en passant"), as well as its position.

Here's the interesting part: to encode the position of a piece, instead of storing it as a coordinate, we store its relative distance from the last piece when enumerating the pieces in left-to-right, top-to-down order (i.e. A8, B8, ..., G1, H1). In addition, we store the distance as a variable-length number, using 1 to mean that this piece is right next to the previous, 0xx for skipping 1-3 pieces, 000xxx for skipping 4-10 pieces, 000000xxxx for 11-25, 0000000000xxxxx for 26-56 and finally 000000000000000xxx for 57-62.

Examples

I made a gist of some positions coded with this encoding, and annotated the one for the initial position with some comments.

I don't know how to analyse the worst-case bit size, but going by the examples in the gist, I believe that the algorithm should be somewhat efficient at least.


Implementation of decoder

Below is a quick-and-dirty decoder for this encoding (taking as input the binary data encoded in text, as in the above annotated example, and skipping over things that aren't '0' or '1'). Produces a unicode chess board to stdout.

#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

char buf[1024];
int wi = 0, ri = 0;

int read_n(int n) {
  int res = 0;
  for (int i = 0; i < n; i++) {
    res = res << 1 | (buf[ri++] == '1');
  }
  return res;
}

int read_varnum() {
  int v, c = 0;

  for (int i = 1; i <= 5; i++) {
    v = read_n(i);
    if (v != 0) return c + v;
    c += (1 << i) - 1;
  }

  assert(false); /* Shouldn't happen */
}

char *piece_to_str(int piece, int color) {       /* ↓ pawn that may be taken with en passant */
  char *pieces[] = { "♙", "♘", "♗", "♖", "♕", "♔", "♙",
                     "♟", "♞", "♝", "♜", "♛", "♚", "♟" };
  return pieces[color * 7 + piece];
}

int main(void) {
  int ch;
  while (ch = getchar(), ch != EOF) {
    if (ch == '0' || ch == '1') buf[wi++] = ch;
  }

  int board[64];
  memset(board, -1, 64 * sizeof(int));

  /* Read metadata */
  int is_white = read_n(1);
  int castling = read_n(4);

  /* Read the board state */
  int bi = -1;
  while (ri != wi) {
    int color = read_n(1);
    int kind  = read_n(3);
    int delta = read_varnum();
    board[bi + delta] = color << 8 | kind;
    bi += delta;
  }

  /* Print metadata */
  printf("  Current turn: %s's turn to play.\n", is_white? "white" : "black");
  printf("  Castling: White may castle? %s %s\n",
         castling & 0x8? "left" : "", castling & 0x4? "right" : "");
  printf("            Black may castle? %s %s\n",
         castling & 0x2? "left" : "", castling & 0x1? "right" : "");
  printf("\n");

  /* Print the board out */
  printf("+");
  for (int x = 0; x < 8; x++) printf("--");
  printf("-+\n");

  for (int y = 0; y < 8; y++) {
    printf("|");
    for (int x = 0; x < 8; x++) {
      int piece = board[y*8 + x],
          color = piece >> 8,
          kind  = piece & 0xFF;

      if (piece == -1) printf("  ");
      else printf(" %s", piece_to_str(kind, color));
    }
    printf(" |\n");
  }

  printf("+");
  for (int x = 0; x < 8; x++) printf("--");
  printf("-+\n");

  return 0;
}
FireFly
la source
I have no idea what your worst-case bit count is, but I suspect it's considerably more than 179 bits. For example, how would your algorithm handle this layout? (It is valid, by the way; I made it using the editor at chess.com)
squeamish ossifrage
@squeamishossifrage that one seems to require 217 bits: gist.github.com/FireyFly/8639791 (thanks for the test-case, I wanted to try it with a trickier one!)
FireFly
Hey, that's not bad. Keep going!
squeamish ossifrage
2

Max: 184 bits, Min: 75 bits

I was inspired by @AdamSpeight's Huffman coding for pieces to try crafting my own scheme. I doubt this will win, but it does have calculable limits.

This scheme treats the chess pieces as though there are 11 different types of pieces. I approximately followed the Huffman coding algorithm to group these classes by their frequencies and real types to generate the following encodings:

 Piece Class                | Frequency Per Team | Encoding
============================+====================+==========
 Pawn, normal               | 0 - 8              | 0
 Pawn, jumped two last turn | 0 - 1              | 1111
 Knight                     | 0 - 2              | 1000
 Bishop                     | 0 - 2              | 1001
 Queen-side rook, unmoved   | 0 - 1              | 10100
 Queen-side rook, moved     | 0 - 1              | 10101
 King-side rook, unmoved    | 0 - 1              | 10110
 King-side rook, moved      | 0 - 1              | 10111
 Queen                      | 0 - 1              | 1110
 King, unmoved              | 0 - 1              | 1100
 King, moved                | 0 - 1              | 1101

Each piece's code can be preceded by two bits to show to which team it belongs (10 for White, 11 for Black). 0 can be used to encode empty spaces. These ideas allow us to build a square-by-square encoding for the whole board using whatever procedure we want. I will assume the iteration order a1, b1, c1, ... f8, g8, h8. This means that just listing these codes as shown above encodes all the information except whose turn it is. A very simple chess engine can use the "classes" for the pawns, rooks, and kings to determine whether castling and en passant are legal. Furthermore, this scheme easily handles pawn promotions. If a player promotes a pawn to a rook, either the king or queen-side codes may be used so long as the "moved" variant is chosen.

Excepting pathological positions that I do not think could ever happen, the worst-case scenario for this encoding is when all pieces are still on the board and the previous player moved a pawn forward two spaces. As an example, the board below encodes 1. e4:

1101010010100010100110111010110010100110100010101101001001001100100100100000000000000101111000000000000000000011011011011011011011011011101001110001110011111101111001110011110001110110
===========================================================================
                              Black's move
1110100 111000 111001 111110 111100 111001 111000 1110110 | r n b q k b n r
    110    110    110    110    110    110    110     110 | p p p p p p p p
      0      0      0      0      0      0      0       0 | . . . . . . . .
      0      0      0      0      0      0      0       0 | . . . . . . . .
      0      0      0      0 101111      0      0       0 | . . . . P . . .
      0      0      0      0      0      0      0       0 | . . . . . . . .
    100    100    100    110      0    100    100     100 | P P P P . P P P
1010100 101000 101001 101110 101100 101001 101000 1010110 | R N B Q K B N R

This means that the worst-case encoding has 184 bits: 1 for indicating the player, 32 for the empty spaces, 45 for the unmoved pawns, 6 for the two-space-jumping pawn, 24 for the knights, 24 for the bishops, 28 for the rooks, 12 for the queens, and 12 for the kings.

As pieces moved about and are captured, the size of the encoding will drop. The best case scenario is represented by two kings alone on the board: 1 bit for indicating the player + 62 bits for indicating the empty squares + 12 bits for indicating the kings gives a minimum length of 75 bits.

I will come back and edit this response with some code that demonstrates this encoding in action later today or tomorrow. For now, I am curious to see what other people think of this encoding.

sadakatsu
la source
1
You can save bits on the rooks. You can determine queen- and king-side based on position, and you don't need to know which side a moved-rook came from. so you just need one bit of information on the rook - moved or unmoved.
Not that Charles
... And actually, storing that data on a piece level is easier to encode/decode, but if you just stored a marker at the start, you may save bits overall worst case (e.g., the marker 11001 means B'S MOVE W CAN KSC W CANT QSC B CANT KSC B CAN QSC). That is 4 bits instead of the 5 bits per side in your encoding, or 3 bits per side if you eliminate the side marker on the rooks. (KSC = King's-side castle. QSC = Queen's-side castle)
Not that Charles
Also, due to promotions, it is quite possible to have up to 9 Queens or 10 Knights (or any other non-regal, non-pawn piece)
Not that Charles
1

184 bits = 23 bytes worst case, and not too complicated:

A. Which squares occupied: 64 bits = 8 bytes B. Which colors for the <=32 occupied squares: 32 bits = 4 bytes And now using only the <=30 squares occupied by non-kings: B. use PNBRQ encoded in radix 5, for 5^30 possibilities; and 32*31*2*4*4*9 for king positions, mover-color, white & black castling rights, en passant square (among 8 possibilities plus none, a 9th); this number 5^30 * 32 * 31 * 2 * 4*4*9 = 266075134277343750000000000 = 2^87.782 fits in 88 bits for a total of 64+32+88=184 bits for the whole encoding.

This can be reduced, e.g. 32*31*2*4*4*9=285696 can be reduced to 2*(32*31+31*3+31*3+3*3)*6=14244, by using fact at most 6 en passant victim-candidates (including none), and encoding castling rights and king positions inside same set using fact castling rights only matter when king on initial square. This saves 4 bits.

I have reason to believe that it is not possible to achieve 18 bytes, i.e. the total number of legal chess positions exceeds 2^144.

Your 160-bit scheme is very ingenious but I think it needs to be given as encode/decode programs before I'll be completely confident about it.

Warren D. Smith
la source
1

171 bits worst case:

I've combined a couple of ideas, as well as some of my own thoughts.

We are going to start with a 64 bit board. Each bit represents an occupied space on the board. They fill along rows. So the start looks like: 1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111

Now, each piece will be represented by 4 bits. 1st bit: color (0=white, 1=black) 2nd-4th bits: One of 8 types.

0=king, 1=queen, 2=bishop0, 3=knight, 4=rook, 5=pawn0, 6=pawn1, 7=bishop1

At the end we will include a bit designating the turn. 0=white, 1=black.

4bits*32 pieces=128 bits and I've already got 64+1 from turn and board. That gives a total of 128+64+1=193 I haven't even started with en passant or castling. Well over my limit with nothing -- not even turns. This is where the tricks start.

Okay -- you see those types above? Bishop0 and Bishop1? Pawn0 and pawn1? Those types are so designated, because they tell us a bit to insert after this piece. So Bishop0 means that after it, there will be a 0 -- i.e that the next piece is white. Bishop1 tells us the next piece is black, and the pawn0 and pawn1 work the same. (If this piece is the last piece being enumerated, then it tells us about the next turn instead).

My worst case involves all the pieces on the board, so with 16 pawns and 4 bishops, this saves me 20 bits. I'm down to 173.

Okay. For another bit in my worst case -- once there are 16 of a color encoded, we stop encoding color -- as we know it going forwards. My worst case now involves a white piece making it to the far corner with no captures. There, I save only one bit. 172.

I'm going to now change the order in which I name the pieces. We will name them along columns starting at the outside moving in. The board named at the beginning will stay the same, but when I place pieces on it, i start from whites bottom right, and go up that column. Then I jump to black's bottom right, and go up that column. I jump to white's bottom right unknown cell, and so forth -- this means my worst case is again the start. My reason has to do with my en passant trick, the next two bits I lose, and castling.

Now, for a pawn to be promoted (as has been discussed at length) a piece must be captured. Thus, when we know there are 32 pieces, we only need to denote 31 of them. The last piece is uniquely identified. As it turns out, for me, this only saves 2 bits -- because my last piece might be a pawn/bishop (which normally costs me 3 bits because I save one on the next piece) who's color is determined already and so was only 2 bits. I'm down to 170 bits.

When pawns get promoted, they simply change their type. For each piece that goes off the board I rid myself of (minimum) 3 bits, and two pawn promotions cost me 2 bits, so I'm declining (slowly) in promotions.

Castling. For castling to happen, neither king nor relevant rook may have moved. Thus, when a rook is capable of castling both it and the king will be in their original places. So, rooks capable of castling will be listed in the same place as kings of that color. As this is illegal (two kings or three kings of the same color on the board) and only can happen in three possible setups for each color (left, right, both rooks are listed as kings) -- decoding is utterly possible. So, for no bits we have encoded castling.

En Passant Here we will also use illegal positions. Only one pawn can be in danger of en passant at a time. It must be in its fourth row. The pawn that is vulnerable will be 'flipped' back to its home row -- switched with whatever is there. As that pawn is in its own first row -- an illegal position, the situation will be obvious to the decoder -- it will reverse the positions, and mark the pawn as vulnerable to en passant.

We needed to muck about with the order because the last piece has to be 'uniquely identified'. In a standard order, we wouldn't be able to tell if the rook in the back corner could castle -- that's not known. When we work from the outside in, we guarantee that wherever that rook is 'listed' be it in its own corner, or in the middle of the board because it was switched with an en passant vulnerable pawn, there will be a piece listed after it -- so we are told the rook's type. We know there will be a piece after it because, for a pawn to be vulnerable there must be a pawn to its inside (which will crucially be encoded later as per the instructions above).

Oh, and to help make sure that the worst case involves all pieces on the board, once we have less than 32 pieces, we can use the 64 bit board to identify turn as well as occupied positions by using 0's to represent pieces when its white's turn and 1's when it is blacks turn.

So we got en passant and castling for free. We picked up the last piece for free, though it took some finagling with order to make that play nice with the en passant and castling rules. We ditched 20 bits on the standard pieces. I believe the worst case here involves a midde white pawn moved forward and a black piece in between it and its queen while all pieces are on the board. Quick double check: first piece is captured -- call it a pawn, 3 bits off the board in the pawn, 3 bits on the board in the form of a last piece, one bit off in the turn marker disappearing. Promote two pawns 2 bits back on the board. Ah damn, I'm at 171.

EDIT I've added code for a (working?) decoder -- in R -- below. It takes vectors of booleans as input -- (sorry -- I'm not capable of coding well in anything that would let me actually use the bits) I've also included the start position.

separate = function(vec){
    #Using a boolean vector (sorry R doesn't handle bits well and this will build quickest)
    board = matrix(vec[1:64],8,8,byrow=T)
    npieces = min(sum(board),64-sum(board))
    n = length(vec)
    a = vec[65:n]
    counter = 0
    pieces = list()
    white = 0
    Letters=c(letters,LETTERS)
    for (i in 1:npieces){
        col = ifelse(a[1],"black",{white=white+1;"white"})
        typ = a[2:4]
        a=a[-(1:4)]
        num = 4*typ[1] + 2*typ[2] + typ[3]
        type = switch(letters[num+1],a="king",b="queen",c="knight",d="rook",e="bishop0",f="bishop1",g="pawn0",h="pawn1")
        if (num > 3) {
            if(num%%2){
                a = c(T,a)
            } else {
                a = c(F,a)
            }
            type = substr(type,1,nchar(type)-1)
        }
        pieces[[Letters[i]]] = list(color=col,type=type)
        if (length(pieces)==31&&npieces==32) {
            col = ifelse(16-white,{white=white+1;"white"},"black")
            type = "TBD"
            pieces[[Letters[i+1]]] = list(color=col,type=type)
            break
        }
    }

    if (npieces==32) {
        f=function(y){sum(sapply(pieces,function(x)x$type==y))}
        if (f("pawn")<16) {pieces[[32]]$type="pawn"}
        if (f("bishop")<4) {pieces[[32]]$type="bishop"}
        if (f("knight")<4) {pieces[[32]]$type="knight"}
        if (f("queen")<2)  {pieces[[32]]$type="queen"}
        if (f("king")<2)   {pieces[[32]]$type="king"}
        if (f("rook")<(6-f("king"))) {pieces[[32]]$type="rook"}
    }
    return(list(board,pieces,turn=ifelse(a[length(a)],"black","white")))
}


fillboard = function(out) {
    board = out[[1]]
    pieces = out[[2]]
    turn = out[[3]]
    lpieces = lapply(pieces,function(x) paste(substr(x$color,1,1),x$type))
    game = matrix("     ",8,8)
    #Start with corners.
    a = c(1,57,8,64)
    #Then kings
    b = c(25,32)
    #Then rooks in en passant
    c = c(4,60,5,61)
    #Then kings in en passant
    d = 28:29
    exceptions = list(a,b,c,d)
    for (places in exceptions) {
        c= which(board[places])
        if (length(c)) {
            repl = lpieces[1:length(c)]
            game[places[c]] = unlist(repl)
            board[places] = F
            lpieces = lpieces[-(1:length(c))]
        }
    }
    #Loop through rows.
    for (i in c(1:4,8:5)) {
        j = which(board[i,])
        if (length(j)) {
            repl = lpieces[1:length(j)]
            game[i,j] = unlist(repl)
            board[i,j] = F
            lpieces = lpieces[-(1:length(j))]
        }
    }
    return(matrix(unlist(game),8,8,F))
}

swapillegal = function(matr) {
    mat = matr
    if (any(mat[8,]=="b pawn")) {
        j = which(mat[8,]=="b pawn")
        mat[8,j] = mat[5,j]
        mat[5,j] = "b pawn-e"
    }
    if (any(mat[1,]=="w pawn")) {
        j = which(mat[1,]=="w pawn")
        mat[1,j] = mat[4,j]
        mat[4,j] = "w pawn-e"
    }

    if (sum(mat[8,]=="b king") > 1) {
        j = which(mat[8,-4]=="b king")
        j[j==7] = 8
        mat[8,j] = "b rook-c"
    }
    if (sum(mat[1,]=="w king") >1) {
        j = which(mat[1,-4]=="w king")
        j[j==7] = 8
        mat[1,j] = "w rook-c"
    }
    return(mat)
}

decode = function(vec) {
    a = separate(vec)
    b = fillboard(a)
    c = swapillegal(b)
    list(board=c,turn=a[[3]])
}


startboard = c(rep(T,16),rep(F,32),rep(T,16))
#Re-ordering -- first spots will be the corners. Then kings. then en passant positions of those spots
pieces = c(F,F,T,T,F,F,T,T,T,F,T,T,T,F,T,T,F,F,F,F,T,F,F,F,F,F,T,F,F,T,F,F,F,F,T,F,T,F,F,F,T,F,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,T,F,T,F,T,T,F,T,F,F,T,T,T,F,T,F,T,F,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F)
########## w rook -w rook -B rook -B rook -W king -B king -w kni  -w bi 0 -w Q  -w Bi 0 -w kni-w p0   - 2   -   3 -  4  - 5   -  6  -  7  -w p1 -b kni-b bi1  -b q  -b bi1  -b kni-b p1   -2    - 3   - 4   - 5   - 6   - b p0- implicit b p0.
#After the kings come all the prior en passant positions. which are empty at start. Then we start at whites bottom right, and move across rows to middle. Then we go to blacks bottom left, and fill across to the middle.
#Changing start to properly encode rooks for castling
newpieces= c(F,F,F,F,F,F,F,F,T,F,F,F,T,F,F,F ,pieces[-(1:16)])
test2 = decode(c(startboard,newpieces))

This code builds 4 functions. One that does some basic separation of the pieces and the board structure, as well as figuring out piece type and any 'implicit' pieces. The next function fills in the board structure with those pieces in a slightly bizzarre (and different from my initial algorithm's) order [explained in code comments]. The next function takes the filled in board and detects any illegal positions -- it then fixes them and renames pawns that are vulnerable to en passant "x pawn-e" and any rooks that can castle "x rook-c". The final function is a wrapper that runs those functions in order and gives an output which is the current board as well as the turn.

I've also included the encoding of the start position (though to see it you will have to call c(startboard,newpieces) and the code calls the wrapper function on that position.

user5957401
la source
This is interesting. I'd love to see a working implementation as a proof of concept.
Mego
Implementation is usually a few steps beyond proof of concept, but I've added a decoder. It is in R and thus uses Booleans rather than bits (sorry -- didn't want to use too much time). But I believe it should be proof of concept.
user5957401
0

229 / 226 bits

This turns out not to be very successful, but might save other people going down the same path.

The simple version:

  • 1 bit for whose turn it is
  • 4 bits for the four castling possibilities
  • 3 bits for the en passant possibilities. This has more depth that I at first understood. En passant must be done by a pawn moving from the same rank (row) as the pawn which is captured. Case analysis indicates that once we know how many pawns of the colour which last moved have advanced exactly two squares, there will be at most 6 en passant cases (including the case that there is no pawn vulnerable to en passant). The worse case is 5 pawns (potentially all vulnerable: e.g. PpPPpPPp has five vulnerable Ps). With 6 pawns there are at most 2 enemy pawns in the same rank, each of which can threaten at most 2 pawns en passant. Therefore we need ceil(lg 6) = 3 bits here.

Then the board. The board has 64 squares, so a square index can be encoded in 6 bits. We list the men by rank, alternating colours, starting with the king.

  • 6 bits: position of white king. (Guaranteed to be on the board).
  • 6 bits: position of black king. (Guaranteed to be on the board. In the worst case that the white king is in a corner, there are 60 possible places he could be; in the best case that the white isn't on an edge, there are 55).
  • 6 bits: position of white queen. If there is no white queen, repeat the white king's position as a signal.
  • For each additional white queen, a 1 bit followed by 6 bits for the position.
  • A 0 bit.
  • Ditto for black queen(s).
  • Similar process for rooks, bishops, knights, and pawns, although we can skip the pawns for a colour if we already have 16 men of that colour accounted for.
  • Delete the final 0 bit.

This costs a certain 12 bits for the kings, and 2*7*5-1 = 69 bits for the other men. In the worst case that all 32 men are on the board, it costs 7 bits for each man other than the kings, for a total of 12 + 7*30 - 1 = 221 bits. So with the initial 8 bits for global state we have 229 bits.


The advanced version:

By using arithmetic coding we can operate with lg num_possibilities rather than ceil(lg num_possibilities) and just take one ceil at the end.

  • 1 bit for whose turn it is
  • 4 bits for the four castling possibilities
  • lg 6 bits for the en passant possibilities.
  • 6 bits for the white king
  • lg 60 bits (worst case) for the black king
  • lg 63 bits (because I don't want to complicate it to the level of looking at checks) for the white queen, using the white king's position if there is none
  • For each additional white queen, a 1 bit followed by lg 62, lg 61, etc. bits for her position.
  • A 0 bit.
  • lg 63 bits (or fewer, if there were any white queens) for the black queen.
  • etc.

The nth man who's actually present has 66-n possible values. If a type is absent for a colour, we spent 66-#men so far bits in recording that (plus one bit for the separator). The extreme cases are:

  1. All men present, including at least one unpromoted pawn from each side. We spend 5+lg 6 on global state, 6+lg 60 on the kings, 29 on separator bits, and SUM_{n=32}^{63} lg n bits on positions. Grand total: ceil(225.82) bits. Disappointing.
  2. Only unpromoted pawns left. We spend 5+lg 6 on global state, 6+lg 60 on the kings, 29 on separator bits, 8*lg 63 saying that there are no other pieces, and SUM_{n=48}^{63} lg n on positions of the pawns. Grand total: ceil(188.94) bits. Better - by saving the second rook, knight, and bishop for each side we've pulled ahead a bit.

So the worst case seems likely to be 226 bits, for a measly saving of 3.

We can definitely do better in the average case by encoding pawns before pieces, since they're restricted to 48 squares rather than the full 64. However, in the worst case that all men are on the board and all pawns have been promoted I think this would end up costing 2 bits more because we would need a "no pawns" flag rather than being able to count the men.

Peter Taylor
la source
0

This is a discussion topic in Chess circles.

Here is one very simple proof with 164 bits https://groups.google.com/forum/#!topic/rec.games.chess.computer/vmvI0ePH2kI 155 is shown here http://homepages.cwi.nl/~tromp/chess/chess.html

Over simplified strategy:

  • Limit pawns to the place where pawns may be found
  • Limit armies to consider original pieces and possible pawn promotions
  • Think hard about promotions and situations where promotions are not possible
William Entriken
la source
2
Link-only answers are not good answers. You should provide the complete answer within your post, not just some links to the answer. And if your answer is not your own work you should probably make your post a community wiki.
pastebin.com slash 0mr8spkT
Post is original. Adding details to answer.
William Entriken
1
This is an example of why you include the content in your answer. The 2nd link is dead. Is this it? tromp.github.io/chess/chess.html
mbomb007
-2

Min: 0 bits

Max: 1734 243 bits (4.335 4.401 bits/board amortized)

Expected: 351 177 bits (4.376 4.430 bits/board amortized)

Since I can determine the input and output however I want I decided to go with encoding the history of the game up until this point. One advantage is that the additional information of who's turn it is, en-passant, and who has the ability to castle where can be derived and not encoded.

Attempt 1:

Naively I thought that I can encode each move in 12 bits, 4 triplets of the form (start x, start y, end x, end y) where each is 3 bits.

We would assume the starting position and move pieces from there with white going first. The board is arranged such that ( 0 , 0 ) is white's lower left corner.

For example the game:

  e4    e5
 Nf3    f6
Nxe5  fxe5
...    ...

Would be encoded as:

100001 100010 100110 100100
110000 101010 101110 101101
101010 100100 101101 100100
...

This leads to an encoding of 12 m bits where m is the number of moves made

On one hand this could get really big, on the other hand you can consider each move to be it's own game so each encoding really encodes m "chess boards". If you amortized this you get that each "chess board" is 12 bits. But I feel this is a bit cheating...

Attempt 2:

I realized that each move in the previous attempt encodes many illegal moves. So I decided to only encode legal moves. We enumerate possible moves as follows, number each square such that ( 0 , 0 ) → 0 , ( 1 , 0 ) → 1 , ( x , y ) → x + 8 y . Iterate through the tiles and check if a piece is there and if it can move. If so add the positions it can go to to a list. Choose the list index that is the move you want to make. Add that number to the running total of moves weighted by 1 plus the number of possible moves.

Example as above: From the starting position the first piece that can move is the knight on square 1, it can move to square 16 or 18, so add those to the list [(1,16),(1,18)]. Next is the knight on square 6, add it's moves. Overall we get:

[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 12 , 28 ) , we encode this as 13 in base 20 since there are 20 possible moves.

So now we get the game number g0 = 13

Next we do the same for black except we number the tiles in reverse (to make it easier, not required) to get the list of moves:

[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 11 , 27 ) , we encode this as 11 in base 20 since there are 20 possible moves.

So now we get the game number g1 = ( 11 ⋅ 20 ) + 13 = 233

Next we get the following list of moves for white:

[(1,16),(1,18),(3,12),(3,21),(3,30),(3,39),(4,12),(5,12),(5,19),(5,26),(5,33),(5,40),(6,12),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27)(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 6 , 21 ) , we encode this as 13 in base 29 since there are 29 possible moves.

So now we get the game number g2 = ( ( 13 ⋅ 20 ) + 11 ) 20 + 13 = 5433

Next we get the following list of moves for black: [(1,11),(1,16),(1,18),(2,11),(2,20),(2,29),(2,38),(2,47),(3,11),(4,11),(4,18),(4,25),(4,32),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move $(10, 18)$ ( 10 , 18 )

So now we get the game number g3 = ( ( ( 19 ⋅ 29 + 13 ) 20 ) + 11 ) 20 + 13 = 225833

And continue this process for all the remaining moves. You can think of g as the function g ( x , y , z ) = x y + z . Thus g0 = g ( 1 , 1 , 13 ) , g1 = g ( g ( 1 , 1 , 11 ) , 20 , 13 ) , g2 = g ( g ( g ( 1 , 1 , 13 ) , 20 , 11 ) , 20 , 13 ) , g3 = g ( g ( g ( g ( 1 , 1 , 19 ) , 29 , 13 ) , 20 , 11 ) , 20 , 13 )

To decode a game number g0, we start at the initial position and enumerate all possible moves. Then we compute g1 = g0 // l, m0 = g0 % l, where l is the number of possible moves, '//' is the integer division operator and '%' is the modulus operator. It should hold that g0 = g1 + m0. Next we make the move m0 and repeat.

From the example above if g0 = 225833 then g1 = 225833 // 20 = 11291 and m0 = 225833 % 20= 13. Next g2 = 11291 // 20 = 564 and m1 = 11291 % 20 = 11. Then g3 = 11291 // 20 = 564 and m2 = 11291 % 20 = 11. Therefore g4 = 564 // 29 = 19 and_m_3 = 564 % 29 = 13. Finally g5= 19 // 29 = 0 and m4 = 19 % 29 = 19.

So how many bits are used to encode a game this way?

For simplicity, let's say there are always 20 moves each turn and for the worst case scenario we always pick the biggest one, 19. The number we will get is 19 ⋅ 20m

+ 19 ⋅ 20m-1 + 19 ⋅ 20m-2 + ⋯ + 19 ⋅ 20 + 19 = 20m+1 − 1 where _m is the number of moves. To encode 20m+1 − 1 we need about log2 ( 20m+1 ) bits which is about ( m + 1 ) ∗ log2 ( 20 ) = 4.3219 ∗ ( m + 1 )

On average m = 80 (40 moves per player) so this would take 351 bits to encode. If we were recording many games we would need a universal encoding since we don't know how many bits each number will need

Worst case when m = 400 (200 moves per player) so this would take 1734 bits to encode.

Note that the position we want to encode must be given to us via the shortest path to get there by following the rules. For example the game theorized here doesn't need m = 11741 to encode the final position. Instead we run a Breadth-First search to find the shortest path to that position and encode that instead. I don't know how deep we would need to go to enumerate all chess positions, but I suspect that 400 is an overestimate.

Quick calculation:

There are 12 unique pieces or the square can be empty so to position them on a chessboard is 1364. This is a vast overestimate since it includes many invalid positions. When we are m moves into the game we have created about 20m positions. So we are looking for when 20m = 1364. Log both sides to get m = 64 * log20(13) = 54.797. This shows that we should be able to get to any position in 55 moves.

Now that I calculated the worst case to be m = 55 not m = 400 I will edit my results. To encode a position where m = 55 takes 243 bits. I'm going to also say that the average case is around m = 40 which takes 177 bits to encode.

If we use the amortization argument from earlier we are encoding 400 "chess boards" in 1734 bits so we get that each "chess board" takes up 4.335 bits in the worst case.

Note that g = 0 denotes a valid game, the one where the piece on the lowest square moves to the lowest square it can.

Additional Notes:

If you want to refer to a specific position in the game you may need to encode the index. This can be added either manually e.g concatenate the index to the game, or add an additional "end" move as the last possible move each turn. This can now account for players conceding, or 2 in a row to denote the players agreed to a draw. This is only necessary if the game did not end in a checkmate or stalemate based on the position, in this case it is implied. In this case it brings the number of bits needed on average to 356 and in the worst case 1762.

edggy
la source
1
Your “amortization” argument does not work. The goal is to encode a board given by the user. You do not get to divide the cost of encoding this board among the 400 auxiliary boards you might happen to generate along the way. (This would only be okay if those 400 boards were allowed to be independently chosen by the user, and not constrained by the requirement to form a chess game.) Also, a chess game can theoretically take many thousands of moves, and the OP is clear about being interested in the worst case.
Anders Kaseorg
@AndersKaseorg: Very true. It really depends on the objective. If you are trying to store an entire game with all of the other algorithms it would take m*c bytes where m is the number of moves and c is the size of their output. So if you are trying to store an entire 80 move game using the 160 bit solution it would take 12800 bits while mine would only take 351. For the sake of the competition I admit that you are correct, but I thought it was a nice property to point out since it is very common to want to store entire games and not just boards.
edggy
The objective is unambiguous. “The goal is to make the smallest representation of a chessboard that could be used (once decoded) to determine all movement possibilities for a player on that turn. … Your score is determined worst-case scenario - maximum possible size in bits.”
Anders Kaseorg
@AndersKaseorg: I never claimed it was ambiguous. I just decided to take a different approach. One thing to note is in order to find the smallest board using my algorithm I need the smallest path to get to this position. For example in the 11741 turn game that you linked to get to the same board position I don't need to follow that path if all we care about is the board. So in order to encode the linked game I just find the shortest game which is left with 2 kings on those squares which may only be 200 turns or less. This can be done with a depth-first search.
edggy
Using a shorter equivalent game is fine, if you can actually prove that every position is reachable in 200 turns or less (right now this looks like just a guess). You’ll also need to account for chess positions with more than 100 legal moves, unless you can prove that they don’t arise within your construction. Dividing your result by the number of moves in the game is still not allowed by the rules of this challenge.
Anders Kaseorg