Écrivez un algorithme ou un programme capable d’encoder et de décoder un échiquier. L'objectif est de faire la plus petite représentation possible d'un échiquier (une fois décodé) pour déterminer toutes les possibilités de déplacement pour un joueur de ce tour.
Le codage doit pouvoir montrer:
- A qui c'est le tour.
- Si le joueur peut chiner de chaque côté.
- Si le joueur peut effectuer un en-passant, et si oui, lequel de ses pions?
- Positions de toutes les pièces.
Remarque importante sur les castings: si les Blancs déplacent leur roi d’un tour, puis de le ramener d’un tour, il doit être clair qu’ils ne peuvent faire de château d’un côté ou de l’autre. Les mêmes iraient s'ils bougeaient leur tour gauche ou droite. Bien que la carte soit visuellement dans le même état qu’il ya deux tours, l’état du jeu a changé. Plus d'infos ici: http://en.wikipedia.org/wiki/Chess#Castling
Note importante sur en-passant: C'est aussi un coup sensible au tour par tour. Lisez les règles pour plus d'informations. http://en.wikipedia.org/wiki/Chess#En_passant
Déterminez les entrées et les sorties au besoin. Des accessoires majeurs à qui peut le compresser le plus!
Votre score est déterminé dans le pire des cas - taille maximale possible en bits. Assurez-vous de montrer comment vous avez calculé ce nombre et ce que vous avez comptabilisé. Tirez pour le pire des cas!
Réponses:
Min: 12 bits
Max:
Avg:
Avait et pensé hier soir que je pourrais éventuellement le rendre encore plus petit.
Le résultat est une taille impressionnante de 12 bits !
Alors qu'en est-il K +1 autre type de pièce?
Il 2 arrangement possible du sous-arbre.
Les deux résultent dans les mêmes tailles de bits pour toutes les pièces. Donc, cela ne fait aucune différence pour ce que nous utilisons, je vais choisir le premier.
Donc sur King +2 autres types de pièces
Il y a 5 sous-arbres possibles (je vais utiliser 1 et 2 pour indiquer laquelle des pièces.)
Nous aurons donc besoin de 3 bits pour encoder le sous-arbre à utiliser.
Toujours faire l'analyse pour plus de morceaux
+3 Autre
+4 Autre
+5 Autre
Max: 208?
Est-il possible de coder tous ces sous-arbres sur 9 bits?
Si nous totalisons tous les sous-arbres possibles, nous obtenons 392 sous-arbres possibles.
Utilisation de Freq ID
Depuis 164603 fréquences de morceaux uniques .
(+000) (+00) Castling
Max: = 204 bits
rev 3
Min: 82 Max: 199 Moy.: 160
Enfin, nous avons effectué des analyses pour trouver la taille de bit maximale. Avec codage Huffman optimal pour chacune des fréquences de morceaux uniques .
Notez que c’est la plus mauvaise taille possible, celle des bits de la colonne En-Passant si le nombre de pions est supérieur à un. Quelles que soient la couleur et la position de ces pions, il est possible que certaines planches soient plus petites de 3 bits.
En outre, il n'y a que 144 tailles différentes (le pire des cas) pour la taille de la carte.
75 - 216 bits (v2) v1 La taille minimale est de 98 bits (12,25 octets), seuls les deux rois de la carte.
La taille maximale n'est que de 216 bits (27 octets).
En moyenne, la taille sera d'environ 157 bits (19,625 octets).
Pièces
Pour encoder le tableau, j'utilise un schéma d'encodage d'arborescence binaire. Une case vide est la plus fréquente parmi toutes celles où entre 32 et 62 apparitions. Viennent ensuite les pions, puis les tours, les chevaliers, les évêques et les moins fréquents sont la reine et le roi.
La carte de départ peut être encodée en seulement 166 bits (20,75 octets)
Pour indiquer qui se déplace, il ne faut qu'un seul bit
Castling peut être codé en 4 bits.
J'ai donc utilisé 171 bits (21,375 octets)
En-Passe peut être codé en seulement 16 bits (2 octets)
Donc, au total, c'est 187 bits (23,375 octets).
Disposition
Pas encore écrit de code.
Notez que 3 des bits non utilisés. Donc, le maximum est de 213 bits .
Améliorations possibles
1) Réduction de la forme de bloc d'en-tête de 24 à 8 bits (avec la suggestion de @Peter Taylor)
2) tête de longueur variable
Un petit en-tête fixe 4 bits
Bloc suivant de bits supplémentaires (si le castling est toujours possible)
Bloc suivant de bits supplémentaires (si des pions sont présents)
Alors maintenant, j'ai un en-tête de longueur variable 4 - 11 bits
3) Utilisez un schéma de codage différent en fonction des pièces laissées sur le tableau.
En modifiant l’encodage de l’arbre en fonction des éléments présents sur le tableau et de leur fréquence.
Un encodage possible pour un état final (No Pawns)
Ce qui représente environ 4 bits par pièce.
Quels types de pièces sont présentes sur le plateau?
La permutation est de longueur variable 0-5 bits. S'il ne reste qu'un type de pièce, ne l'incluez pas.
Quelle permutation de ces pièces à utiliser pour l'arbre? Ceci est la factorielle du nombre de pièces dans l'exemple ci-dessus c'est 5 pièces, donc 120 permutations possibles qui peuvent être codées.
Rappelez-vous qu’il existe des bits d’addition pour les carrés et les couleurs vides.
Exemples
Donnons un exemple de QK restant
81 bits au total
Donnons et exemple de seulement les rois restants
Mettre tous ensemble
Donc, je calcule le plus petit encodage pour la carte à 75bits (9 bits 3 bits)
Il reste encore à calculer comment ce schéma de codage affecte la taille maximale.
Amélioration 4
Réduisez le nombre de bits pour le castling à seulement 2 bits. Il suffit de lancer pour le joueur qui a son tour.
En y réfléchissant, il serait peut-être préférable d'inclure simplement les 2 bits à l'intérieur du bloc d'en-tête.
la source
1111
pour "non en passant possible" ou la colonne sous la forme d'un nombre binaire sinon).192 bits (pire des cas)
Voici un schéma de stockage très simple qui devrait faire face à des promotions de pions arbitraires, et ne nécessite jamais plus de 64 + 4 × 32 = 192 bits:
Les 64 premiers bits stockent un bitboard qui indique où sont les morceaux (mais pas ce qu’ils sont). C'est-à-dire que nous stockons un bit pour chaque carré de l'échiquier (en partant du carré a1, puis de b1, c1, etc. jusqu'au carré h8), de sorte qu'un carré vide est représenté par 0 et un carré occupé par 1.
Ensuite, pour chacune des cases marquées comme occupées sur le panneau d'affichage, nous stockons un quartet de 4 bits codant la pièce sur cette case. Le premier des quatre bits code la couleur de la pièce (0 = blanc, 1 = noir), tandis que les trois autres bits indiquent le type de la pièce:
Notez les deux encodages pour le roi, utilisés pour déterminer le tour de joueur à déplacer. (En fait, comme il y a toujours deux rois sur le tableau, ce qui permet quatre combinaisons des codes 5 et 6, nous pourrions coder assez facilement un second bit d'information ici.)
Pour encoder les informations supplémentaires nécessaires aux règles en passant et castling, nous introduisons un type de pièce supplémentaire, qui désigne un pion ou une tour, en fonction de la ligne sur laquelle il apparaît:
Mettre tous ensemble:
Le nombre total de bits nécessaires pour coder l'état de la carte est donc de 64 + 4 × # de pièces à bord. Comme il ne peut jamais y avoir plus de 32 morceaux sur la carte, la longueur maximale de ce codage est de 192 bits.
Par exemple, en utilisant le codage décrit ci-dessus, l’état initial de la carte serait codé comme suit (espace inséré pour la lisibilité):
ou, en hexadécimal:
la source
160 bits worst case
After posting my previous answer 22 bytes, I began to wonder if we could get down to 21 bytes. However when I saw Peter Taylor's amazing 166 bytes I thought "Hang on, it looks like five 32-bit words could be possible!"
So after quite a lot of thought, I came up with this: 159.91936391 bytes (a pretty tight fit!) This level of compression would need a fairly complicated program but I have thought about how to make it run in reasonable time.
This is going to be a long post, so please bear with me, I will post what I can today and add a few bits of code soon.
So, here's how to do it:
En Passant and castling encoded by illegal positions (0 bits)
En Passant
As mentioned in other answers, there are a maximum of 5 possible squares on which a pawn vulnerable to en passant can stand. These are the squares next to the pawns of the player whose turn it is.
To encode this, the pawn vulnerable to en passant is exchanged with whatever is on one of the squares in the first or last row. This may be either a man or an empty square. This produces an illegal position, as pawns cannot be on these rows. The decoder must return the pawn to its correct position, extracting the en passant information.
In order for this to not interfere with the castling encoding, it is important that the squares on which the kings stand at the start of the game are not disturbed, and that the en passant encoding procedure does not place the kings next to each other, which would be an illegal king position. To satisfy the second of these points, the encoder has two choices as to which square it exchanges the en passant pawn. First choice square for each of the up to 5 pawns are A8,B8,C8,G8,H8. Second choice: A1,B1,C1,G1,H1.
Castling
A king who is allowed to castle is by definition, still on his initial square. With the white king on his initial square, there are a total of 63 squares where the black king can stand, 58 of which are legal (he is not allowed to move right next to the white king because he would put himself in check.) If the white king is allowed to castle, he is either allowed to castle with his left rook, his right rook, or both. There are thus 3x58=174 possibilities where the white king can castle, another 174 where the black king can castle, and a further 3x3=9 where both can castle, a total of 357.
There are 420 illegal arrangements of the two kings where they are on adjacent squares: 3x4=12 when the white king is in the corner, 5x24=120 when he is on the edge, and 8x36=288 when he is on another square. Therefore there are easily enough illegal positions to encode all possible castling possibilities.
If at least one king is allowed to castle, encoder would look up the castling data and the positional data of those kings not allowed to castle in a table (or alternatively, use an algorithm which I won't specify here) and produce an illegal position of the two kings. It would then exchange the kings with whatever happened to be on these squares.
It is important that this is encoded after and decoded before the en passant, otherwise there are some potential interferences.
Comparison
So, so far I have used no bits! Looking at Peter's answer, I still have the following to encode:
This is for the worst case of 29 men (see Peter's answer.) Below I will show how I make a marginal improvement (at least for the case of 29 men) on both of the points marked in **.
Which squares are occupied / whose turn is it?
The easy way to encode which squares are occupied is with a 64 bit grid. This also tells us how many squares are occupied. However it is somewhat wasteful because it is not possible for there to be more than 32 squares occupied. My solution is to use 1's to encode for the occupied squares when it is White's turn and 0's to encode for occupied squares when it is Black's turn. Now all combinations are used and there is no waste.
Thus we save on a bit for storing the turn: less than 32 1's, it is white's turn, more than 32 1's, it is black's turn. The only ambiguous case is when all the men are on the board and there are 32 1's and 32 0's. Therefore an extra bit is needed for this case only. As there can be no promotions until a capture has occurred, this extra bit does not affect the worst case overall (which occurs with 3 men captured and 29 remaining.)
Colour of the men occupying the squares
We know from the above how many men there are. The following extract of Pascal's triangle tells how many possibilities there are for the different distributions of black and white. For example, for 3 men, the possibilities are: 3 black men (1 permutation) 2 black, 1 white, (3 permutations), 1 black, 2 white (3 permutations), 3 white (1 permutation.) The total is 23=8. In general, for the lower numbers of men there are 2n possibilities. However the all black and all white possibilities are illegal (at least the king of each side must be on the board) so the actual number of legal permutations is 2n-2 (ignore the 1's on the Pascals triangle.)
For more than 16 total men, there is an additional constraint in that there can be no more than 16 men of each colour on the board. Therefore when all 32 men are on the board there must be 16 of each and the total number possibilities is 601080390 which is quite a bit less than 232.
The number of possibilities can be found by summing the "rows" of this extract of pascals's triangle (by which I mean the NE-SW diagonals of the table , because I rotated the triangle 45 degrees anticlockwise for convenient presentation. The number of bits needed to encode the turn, occupied squares and colour of the men is therefore as follows:
Up to 25 men: slightly less than 64+(number of men)
For more than 25 men:
For the two colours, which men and in which order?
As per previous answers, no pawns can be promoted until a capture has occurred, because each pawn is blocked by a pawn of the opposite colour on the same column. Peter's answer considered (as an upper bound) that every capture could lead to one promotion for the side being captured and two for the side capturing. However we can split this into several cases:
Black pawn captures white pawn: Now the capturing pawn is free to promote as he is now on a different column. His colleague on the same column can also promote. The black pawn on the white pawn's original column can also promote. This is the only case that permits 3 promotions.
Black pawn goes past white pawn on adjacent column and then captures white piece (other than pawn) behind. This enables the capturing pawn and the white pawn who was on the original column to promote. One promotion for each side.
White pawn is captured by piece (other than pawn.) This would normally allow one promotion for Black only. The only exception is when it liberates a blocked up pawn formation which was already caused by several captures moving pawns onto the same column.
So, as a base case, we can consider that each capture permits one promotion each for both sides. In the case that the man captured is a pawn, there may be an additional promotion for the capturing side.
I have written a program similar to Peter's. It is somewhat cruder, but it has an important addition: it can calculate the number of permutations possible when a player starts with less than the normal 8 pawns. Here is some data produced by the program:
We can see that for a case like 8 pawns, 15 men, 0 promotions, the number of permutations is only slightly less than for 8 pawns 16 men, 0 promotions. However if we consider a case like 7 pawns, 15 men, 0 promotions (which is the same as considering that the man captured was definitely a pawn) we get about half the number of permutations.
So, For the case when Black has 16 men and white has 15 men, we can consider an upper bound estimate of 2 promotions for Black and one promotion for White:
However we can do better if we proceed as follows.
A. Consider one promotion each for Black and White assuming that the man White has lost could be of any type:
B. Consider the additional possibilities for Black if he has two promotions, multiplied by only those possibilities for White in which he has lost a pawn.
Adding these two together we get 2.25072E+18, which is a smaller number than 3.55766E+18. All the possibilities for up to 3 captures (29 men remaining) are listed below.
So for the worst case of one side with 15 men and the other with 14 men, we need 67.804 bits.
Adding this to the 92.116 bits required to specify which squares and which colour, we get a total of 67.804+92.116=159.92 bits.
la source
177 bits worst case
This algoritm, while hardly simple, gives a 177 bits worst case (184b=23B in practice), 13b (16b=2B) best case scenario when there's just 2 kings left.
la source
sum_{i=0}^{15} sum_{j=0}^{15} 62! / (i! j! (62-i-j)!) < 2^87.45
.166 bits
1
bit: whose turn is it?2
bits: which castling options are open? (NB on close reading of the question, it's only necessary to record castling options for the player whose turn it is).lg 6 ~= 2.585
bits: which en passant options are open? (See my other answer)lg sum_{i=1}^{16} sum_{j=1}^{16} 64! / (i! j! (64-i-j)! = lg 3629590441720924477681996172 ~= 91.552
bits: which squares are occupied by men of which colour?lg 451366131803622235200 ~= 68.613
bits to indicate which men and in which order (see below)Using arithmetic encoding (since at each step we're applying a uniform distribution) we can achieve
ceil(3 + 2.585 + 91.552 + 68.613) = 166
bits.The encoding for the men: given that we know how many men of a given colour there are, we can easily enumerate all possible distributions/multisets of men (e.g. with 5 men we might have one King, one Queen, two Rooks, and a Pawn) and then we can consider all possible permutations of each distribution.
However, we can do even better by taking into account interdependencies. I'm only doing this on a very basic level: how many possible promotions? A pawn can only become "passed" and able to promote in three ways: it captures and so moves into a different column; or its opposing pawn captures and so moves into a different column; or its opposing pawn is captured. Thus a capture for white potentially creates two passed pawns for white and one for black.
We can build up a partial table of the upper bounds on promotions:
We can also calculate the number of permutations given that a player has
N
men and no more thanP
promoted pawns:Combining the two, we can get the number of bits of required to specify both permutations given the numbers of men on both sides:
If it's not in this section of the table, we can just assume that both sides have up to 8 promotions and we're still doing better than the worst case, which is 68.613 bits when one has 14 men and the other has 15 men.
Note that this is still a long way from being a perfect representation, because it allows many many illegal positions.
Code for calculating the permutation table:
la source
<
rather than<=
in the output loop of my program. Thanks for pointing it out. I could still recover the previous score by special-casing all 32 men being on the board, but I won't do that right now.178 bits (174 at a pinch!) worst case
Hi, just getting back into coding, which I haven't really done since college. I saw this site and thought this looked interesting. I did a little theroretical check and it appears at least 146 bits are needed for a perfect algorithm, probably quite a few more (I will explain in the comments when I have a moment.)
Anyway, this is the way I structure the data. The basic concept comes in at 178 bits, but with some jiggery pokery it can brought down to 174 (that's 21 3/4 bytes). 175 is slight easier to program, is more human readable, and still within 22 bytes.
A) Position of both kings: 6 bits each for white and black 12 BITS
B) Of the remaining 62 squares, which are occupied? A matrix of 62 BITS
C) Whose turn is it? 1 BIT
TOTAL SO FAR: 75 BITS
D) En Passant. If it is white’s turn to move, up to 5 black pawns may look like they could be captured En Passant. The black pawn has to be on row 5 (from bottom to top starting at zero), and have a white pawn next to it. One situation with maximum number of possible captures looks like this:
B W B B W B B W
If there were 6 black pawns on row 5, white would only have 2 squares to stand on and could only threaten 4 black pawns, so it is not possible to have more than 5 black pawns apparently under threat from En passant at the same time. So we need a number 1 to 5 indicating which of the (up to 5) pawns on row 5 that has a hostile (in this case white) pawn next to it was advanced 2 squares on the last turn (or, zero if no pawn in this situation was moved in this way on the last turn.)
E) Of the up to 30 occupied squares (not including kings) what do they contain?
There are 10 possibilities, each represented by a decimal number.
The least significant bit represents the colour.
Hence even numbers are white, odd numbers are black.
White/Black
Pawn 0/1 (or Rook that is allowed to castle*)
Knight 2/3
Bishop 4/5
Rook 6/7
Queen 8/9
*A rook that is allowed to castle (and therefore has never been moved from the first or last row) is represented by 0 or 1 instead of 6 or 7. It cannot be confused with a pawn, because pawns cannot be found on the first or last row.
This gives a decimal number of up to 30 digits, which we can multiply by 6 and then add the code for En passant. The resulting number will fit into 103 bits, which when added to the 75 mentioned above is 103+75=178 bits. Actually, if we simply multiply by 10 instead of 6, it makes no difference to the number of bits used, and decoding is easier.
This is just 2 bits more than 22 bytes. However we can push it down to 174 bits, as explained below.
If no piece has been captured, then it is impossible for a pawn to be promoted.
The proof is as follows. Imagine white is obsessed with promoting his pawn on (for example) column E, from the very start of the game. There is a black pawn opposite this pawn that is in the way. Therefore to promote this pawn, one of the following must happen:
1) The black pawn is captured.
2) The black pawn captures another piece and therefore moves out of the way.
3) the white pawn captures a pawn on an adjacent column such as column D.
4) the white pawn passes (or is passed by) a black pawn on an adjacent column and then captures a piece on that same adjacent column, causing the white pawn to change column.
Case 4 is the most interesting, because it is not just the white pawn that started on column E that now has a clear path to promotion. The black pawn on column that remains on column E can also promote. Therefore a single capture can clear the way for one pawn of each colour to promote.
Anyway, the fact that no pawn can promote until a piece is captured means that we do not have to store the 30th piece. We can work it out by elimination (or by subtraction, because the complete set of piece codes at the start of the game always adds up to the same amount =80.) One minor point is that we must ensure that the squares where the rooks stand at the start of the game are amongst the first scanned (because if they were last, we would not know if the rook could castle or not.) This is easily done by scanning row 0 and then rows 7 to 1: For r= 8 to 1 scan row [r mod 8].
So, the matrix of bits in (B) will tell us how many pieces there are (excluding kings.) If there are a full 30, ignore the last piece when encoding, the decoder will work out what it was. We now have an up to 29 digit decimal number, which we multiply by 6 and add to the En Passant code. The resulting number will just squeeze into 99 bits, giving a total of 99+75=174 bits.
As an example Here’s an actual position. White has just made his first move (advanced king’s pawn) and it is Black`s turn.
A) Position of kings (White / Black in octal, 12 bits): 03 73 = 000011 111011
B) Which squares are occupied? Start with row zero (bottom row) then all other rows from top to bottom, skipping the kings:
C) Black’s turn: Turn Bit = 1
D) En Passant. There is no white pawn next to next to a black pawn, therefore there is no pawn that can be taken en passant (even though this pawn did advance last move) so D=0. If, instead of considering only pawns which have a hostile pawn beside them, we consider all pawns that do not have friendly pieces beside them on both sides, then D would be 1, as there is one such pawn in this situation, and this particular pawn was indeed moved in the last turn.
E) Again, bottom row first, then all other rows from top to bottom, skipping the kings, and with the four uncastled rooks referred to as 0 or 1 (numbers normally reserved for pawns.)
The 30th digit (in brackets) can be discarded.
Though it is not very apparent here, the pawn that White has advanced is actually at one end of the list of pawns, because we scan row by row.
Our data now looks like this, with 29 codes for the content of squares, plus the En Passant code:
It is best to scan from right to left when decoding and from left to right (reverse order) when encoding. This means that when there are fewer pieces we will have a smaller number, while retaining maximum consistency (i.e we want the blank space / zeroes to be leading, not trailing, to enable compression of sparsely occupied boards.) When we have only 2 kings on the board, we will have the 75 bits mentioned above, plus 3 bits to store en passant data = 78 bits in the best case. Each additional piece comes in slightly under 3.5 bits (2 pieces can be stored in 7 bits, because 100<128.)
There is a practical problem in that a 99 bit integer is too big to fit in a 64 bit integer variable, which means many programming languages do not provide support for it (you can't simply convert a string representation of a 29-30 digit number into an integer.) As an easy way of encoding for 22 bytes, we can break a 30 digit number (29 piece codes + en passant code) into two 15 digit numbers each of which will fit in 50 bits each (total 100 bits plus the 75 mentioned above makes 175 bits worst case.)
For maximum compression, as stated above, 29 decimal digits plus the En Passant code (6 possible values), will just about fit into 99 bits (for a total of 174 bits) but without support from the language for integers of this size it is complicated to program. It may be easier to separate out the 29 colour bits and work with the piece-type codes (5 possibilities) and En passant code (6 possibilities) separately from the colours (70 bits, almost fits into a 64 bit variable.)
la source
Here is a full solution, actual worst case 181 bits
The focus here is a simple program you can easily understand
Input is FEN, here is opening position, it has six fields (5 & 6 ignored):
First field (piece placement) is parsed
To produce:
Field one: encode the location of kings (12 bits):
Field two: encode pieces (up to 5 bits per piece):
Field three: active color (1 bit)
Field four: castling availability (4 bits)
Field five: en passant (zero or 3 bits)
Naive worst case 200 bits
QRRBBNN QQQQQQQQ
- 75 bitsqrrbbnn qqqqqqqq
- 75 bitsActual worst case
Each player cannot promote all pawns without capturing other pieces. Here is the entropy effect of piece capture:
PpR
(3+3+5 = 11 bits) =>Qq_
(5+5+1 = 11 bits)PPpp
(3+3+3+3 = 12 bits) =>QQq_
(5+5+5+1 = 16 bits)So actually the worst case board is:
QRRBBNN QQQQQQQQ
- 75 bitsqrrbbnn qqqq
- 55 bitsWorst case is to promote all pieces rather than leave pawn for en passant.
TOTAL ACTUAL WORST CASE WITH SHOWN CODE 12+75+55+34+1+4 = 181 bits
FIQ shows two improvements to this simple scheme, but they are harder to code:
The only remaining code not shown in this answer (for brevity) is: breaking the input FEN in fields (
split /\s/
) and variable assignment.la source
PPpp
=>QQq_
Total data needs 33 bytes
(Thanks to someone in the comments I realised this doesn't work for pawn promotion. Will update this when I can solve it)
for the first byte we use five bits:
the next 32 bytes are used to represent each chess piece, in a predefined order
else represent whether it has been captured. 0=not captured
(If it can en-passant then it is definitely not captured)
Some C code to represent this idea (which doesn't actually work)
la source
256242 bitsHere's a basic compression algorithm that can probably be improved on because it doesn't exclude certain illegal positions from being represented.
The board starts with 5 bits of header info, as follows:
Then, a string of 12 bits representing the positions of the kings.
Then, a huge 64-digit number in base 11, which is then multiplied by 9 to add another digit at the end representing the state of en-passant. Each digit in base 11 represents a square on the board, with the following possible values:
And the digits in base 9:
1164 × 9 is about 2224.57, which requires 225 bits to encode. Plus the 17 header bits at the top, is 242 bits total.
Thanks to ugoren for the improvements.
la source
13^64 * 9
is 239.99, saving 11 bits. Save more by encoding king positions separately.? bits
(≥ 217 worst-case, 17 best-case, 179 for initial board)
Encoding description
Extra metadata consists of whose turn it is (one bit) and castling (four bits, i.e. may white castle on kings' side? on queens' side? and similarly for black).
For the board position itself, we encode it as a set of active pieces. Well, actually, we make sure to enumerate the pieces in a particular order for reasons that I'll explain in a bit. For each piece we store its colour (one bit), its kind (three bits to encompass 6 kinds, plus one extra kind for "pawn that may be taken by en passant"), as well as its position.
Here's the interesting part: to encode the position of a piece, instead of storing it as a coordinate, we store its relative distance from the last piece when enumerating the pieces in left-to-right, top-to-down order (i.e. A8, B8, ..., G1, H1). In addition, we store the distance as a variable-length number, using
1
to mean that this piece is right next to the previous,0xx
for skipping 1-3 pieces,000xxx
for skipping 4-10 pieces,000000xxxx
for 11-25,0000000000xxxxx
for 26-56 and finally000000000000000xxx
for 57-62.Examples
I made a gist of some positions coded with this encoding, and annotated the one for the initial position with some comments.
I don't know how to analyse the worst-case bit size, but going by the examples in the gist, I believe that the algorithm should be somewhat efficient at least.
Implementation of decoder
Below is a quick-and-dirty decoder for this encoding (taking as input the binary data encoded in text, as in the above annotated example, and skipping over things that aren't '0' or '1'). Produces a unicode chess board to stdout.
la source
Max: 184 bits, Min: 75 bits
I was inspired by @AdamSpeight's Huffman coding for pieces to try crafting my own scheme. I doubt this will win, but it does have calculable limits.
This scheme treats the chess pieces as though there are 11 different types of pieces. I approximately followed the Huffman coding algorithm to group these classes by their frequencies and real types to generate the following encodings:
Each piece's code can be preceded by two bits to show to which team it belongs (
10
for White,11
for Black).0
can be used to encode empty spaces. These ideas allow us to build a square-by-square encoding for the whole board using whatever procedure we want. I will assume the iteration ordera1, b1, c1, ... f8, g8, h8
. This means that just listing these codes as shown above encodes all the information except whose turn it is. A very simple chess engine can use the "classes" for the pawns, rooks, and kings to determine whether castling and en passant are legal. Furthermore, this scheme easily handles pawn promotions. If a player promotes a pawn to a rook, either the king or queen-side codes may be used so long as the "moved" variant is chosen.Excepting pathological positions that I do not think could ever happen, the worst-case scenario for this encoding is when all pieces are still on the board and the previous player moved a pawn forward two spaces. As an example, the board below encodes
1. e4
:This means that the worst-case encoding has 184 bits: 1 for indicating the player, 32 for the empty spaces, 45 for the unmoved pawns, 6 for the two-space-jumping pawn, 24 for the knights, 24 for the bishops, 28 for the rooks, 12 for the queens, and 12 for the kings.
As pieces moved about and are captured, the size of the encoding will drop. The best case scenario is represented by two kings alone on the board: 1 bit for indicating the player + 62 bits for indicating the empty squares + 12 bits for indicating the kings gives a minimum length of 75 bits.
I will come back and edit this response with some code that demonstrates this encoding in action later today or tomorrow. For now, I am curious to see what other people think of this encoding.
la source
11001
meansB'S MOVE
W CAN KSC
W CANT QSC
B CANT KSC
B CAN QSC
). That is 4 bits instead of the 5 bits per side in your encoding, or 3 bits per side if you eliminate the side marker on the rooks. (KSC
= King's-side castle.QSC
= Queen's-side castle)184 bits = 23 bytes worst case, and not too complicated:
A. Which squares occupied: 64 bits = 8 bytes B. Which colors for the <=32 occupied squares: 32 bits = 4 bytes And now using only the <=30 squares occupied by non-kings: B. use PNBRQ encoded in radix 5, for 5^30 possibilities; and 32*31*2*4*4*9 for king positions, mover-color, white & black castling rights, en passant square (among 8 possibilities plus none, a 9th); this number 5^30 * 32 * 31 * 2 * 4*4*9 = 266075134277343750000000000 = 2^87.782 fits in 88 bits for a total of 64+32+88=184 bits for the whole encoding.
This can be reduced, e.g. 32*31*2*4*4*9=285696 can be reduced to 2*(32*31+31*3+31*3+3*3)*6=14244, by using fact at most 6 en passant victim-candidates (including none), and encoding castling rights and king positions inside same set using fact castling rights only matter when king on initial square. This saves 4 bits.
I have reason to believe that it is not possible to achieve 18 bytes, i.e. the total number of legal chess positions exceeds 2^144.
Your 160-bit scheme is very ingenious but I think it needs to be given as encode/decode programs before I'll be completely confident about it.
la source
171 bits worst case:
I've combined a couple of ideas, as well as some of my own thoughts.
We are going to start with a 64 bit board. Each bit represents an occupied space on the board. They fill along rows. So the start looks like:
1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111
Now, each piece will be represented by 4 bits. 1st bit: color (
0=white, 1=black
) 2nd-4th bits: One of 8 types.At the end we will include a bit designating the turn.
0=white, 1=black
.4bits*32 pieces=128 bits and I've already got 64+1 from turn and board. That gives a total of 128+64+1=193 I haven't even started with en passant or castling. Well over my limit with nothing -- not even turns. This is where the tricks start.
Okay -- you see those types above? Bishop0 and Bishop1? Pawn0 and pawn1? Those types are so designated, because they tell us a bit to insert after this piece. So Bishop0 means that after it, there will be a 0 -- i.e that the next piece is white. Bishop1 tells us the next piece is black, and the pawn0 and pawn1 work the same. (If this piece is the last piece being enumerated, then it tells us about the next turn instead).
My worst case involves all the pieces on the board, so with 16 pawns and 4 bishops, this saves me 20 bits. I'm down to 173.
Okay. For another bit in my worst case -- once there are 16 of a color encoded, we stop encoding color -- as we know it going forwards. My worst case now involves a white piece making it to the far corner with no captures. There, I save only one bit. 172.
I'm going to now change the order in which I name the pieces. We will name them along columns starting at the outside moving in. The board named at the beginning will stay the same, but when I place pieces on it, i start from whites bottom right, and go up that column. Then I jump to black's bottom right, and go up that column. I jump to white's bottom right unknown cell, and so forth -- this means my worst case is again the start. My reason has to do with my en passant trick, the next two bits I lose, and castling.
Now, for a pawn to be promoted (as has been discussed at length) a piece must be captured. Thus, when we know there are 32 pieces, we only need to denote 31 of them. The last piece is uniquely identified. As it turns out, for me, this only saves 2 bits -- because my last piece might be a pawn/bishop (which normally costs me 3 bits because I save one on the next piece) who's color is determined already and so was only 2 bits. I'm down to 170 bits.
When pawns get promoted, they simply change their type. For each piece that goes off the board I rid myself of (minimum) 3 bits, and two pawn promotions cost me 2 bits, so I'm declining (slowly) in promotions.
Castling. For castling to happen, neither king nor relevant rook may have moved. Thus, when a rook is capable of castling both it and the king will be in their original places. So, rooks capable of castling will be listed in the same place as kings of that color. As this is illegal (two kings or three kings of the same color on the board) and only can happen in three possible setups for each color (left, right, both rooks are listed as kings) -- decoding is utterly possible. So, for no bits we have encoded castling.
En Passant Here we will also use illegal positions. Only one pawn can be in danger of en passant at a time. It must be in its fourth row. The pawn that is vulnerable will be 'flipped' back to its home row -- switched with whatever is there. As that pawn is in its own first row -- an illegal position, the situation will be obvious to the decoder -- it will reverse the positions, and mark the pawn as vulnerable to en passant.
We needed to muck about with the order because the last piece has to be 'uniquely identified'. In a standard order, we wouldn't be able to tell if the rook in the back corner could castle -- that's not known. When we work from the outside in, we guarantee that wherever that rook is 'listed' be it in its own corner, or in the middle of the board because it was switched with an en passant vulnerable pawn, there will be a piece listed after it -- so we are told the rook's type. We know there will be a piece after it because, for a pawn to be vulnerable there must be a pawn to its inside (which will crucially be encoded later as per the instructions above).
Oh, and to help make sure that the worst case involves all pieces on the board, once we have less than 32 pieces, we can use the 64 bit board to identify turn as well as occupied positions by using 0's to represent pieces when its white's turn and 1's when it is blacks turn.
So we got en passant and castling for free. We picked up the last piece for free, though it took some finagling with order to make that play nice with the en passant and castling rules. We ditched 20 bits on the standard pieces. I believe the worst case here involves a midde white pawn moved forward and a black piece in between it and its queen while all pieces are on the board. Quick double check: first piece is captured -- call it a pawn, 3 bits off the board in the pawn, 3 bits on the board in the form of a last piece, one bit off in the turn marker disappearing. Promote two pawns 2 bits back on the board. Ah damn, I'm at 171.
EDIT I've added code for a (working?) decoder -- in R -- below. It takes vectors of booleans as input -- (sorry -- I'm not capable of coding well in anything that would let me actually use the bits) I've also included the start position.
This code builds 4 functions. One that does some basic separation of the pieces and the board structure, as well as figuring out piece type and any 'implicit' pieces. The next function fills in the board structure with those pieces in a slightly bizzarre (and different from my initial algorithm's) order [explained in code comments]. The next function takes the filled in board and detects any illegal positions -- it then fixes them and renames pawns that are vulnerable to en passant "x pawn-e" and any rooks that can castle "x rook-c". The final function is a wrapper that runs those functions in order and gives an output which is the current board as well as the turn.
I've also included the encoding of the start position (though to see it you will have to call
c(startboard,newpieces)
and the code calls the wrapper function on that position.la source
229 / 226 bits
This turns out not to be very successful, but might save other people going down the same path.
The simple version:
1
bit for whose turn it is4
bits for the four castling possibilities3
bits for the en passant possibilities. This has more depth that I at first understood. En passant must be done by a pawn moving from the same rank (row) as the pawn which is captured. Case analysis indicates that once we know how many pawns of the colour which last moved have advanced exactly two squares, there will be at most 6 en passant cases (including the case that there is no pawn vulnerable to en passant). The worse case is 5 pawns (potentially all vulnerable: e.g.PpPPpPPp
has five vulnerableP
s). With 6 pawns there are at most 2 enemy pawns in the same rank, each of which can threaten at most 2 pawns en passant. Therefore we needceil(lg 6) = 3
bits here.Then the board. The board has 64 squares, so a square index can be encoded in 6 bits. We list the men by rank, alternating colours, starting with the king.
6
bits: position of white king. (Guaranteed to be on the board).6
bits: position of black king. (Guaranteed to be on the board. In the worst case that the white king is in a corner, there are 60 possible places he could be; in the best case that the white isn't on an edge, there are 55).6
bits: position of white queen. If there is no white queen, repeat the white king's position as a signal.1
bit followed by 6 bits for the position.0
bit.0
bit.This costs a certain
12
bits for the kings, and2*7*5-1 = 69
bits for the other men. In the worst case that all 32 men are on the board, it costs7
bits for each man other than the kings, for a total of12 + 7*30 - 1 = 221 bits
. So with the initial8
bits for global state we have 229 bits.The advanced version:
By using arithmetic coding we can operate with
lg num_possibilities
rather thanceil(lg num_possibilities)
and just take oneceil
at the end.1
bit for whose turn it is4
bits for the four castling possibilitieslg 6
bits for the en passant possibilities.6
bits for the white kinglg 60
bits (worst case) for the black kinglg 63
bits (because I don't want to complicate it to the level of looking at checks) for the white queen, using the white king's position if there is none1
bit followed bylg 62
,lg 61
, etc. bits for her position.0
bit.lg 63
bits (or fewer, if there were any white queens) for the black queen.The nth man who's actually present has
66-n
possible values. If a type is absent for a colour, we spent66-#men so far
bits in recording that (plus one bit for the separator). The extreme cases are:5+lg 6
on global state,6+lg 60
on the kings,29
on separator bits, andSUM_{n=32}^{63} lg n
bits on positions. Grand total:ceil(225.82)
bits. Disappointing.5+lg 6
on global state,6+lg 60
on the kings,29
on separator bits,8*lg 63
saying that there are no other pieces, andSUM_{n=48}^{63} lg n
on positions of the pawns. Grand total:ceil(188.94)
bits. Better - by saving the second rook, knight, and bishop for each side we've pulled ahead a bit.So the worst case seems likely to be 226 bits, for a measly saving of 3.
We can definitely do better in the average case by encoding pawns before pieces, since they're restricted to 48 squares rather than the full 64. However, in the worst case that all men are on the board and all pawns have been promoted I think this would end up costing 2 bits more because we would need a "no pawns" flag rather than being able to count the men.
la source
This is a discussion topic in Chess circles.
Here is one very simple proof with 164 bits https://groups.google.com/forum/#!topic/rec.games.chess.computer/vmvI0ePH2kI 155 is shown here http://homepages.cwi.nl/~tromp/chess/chess.html
Over simplified strategy:
la source
Min: 0 bits
Max:
1734243 bits (4.3354.401 bits/board amortized)Expected:
351177 bits (4.3764.430 bits/board amortized)Since I can determine the input and output however I want I decided to go with encoding the history of the game up until this point. One advantage is that the additional information of who's turn it is, en-passant, and who has the ability to castle where can be derived and not encoded.
Attempt 1:
Naively I thought that I can encode each move in 12 bits, 4 triplets of the form (start x, start y, end x, end y) where each is 3 bits.
We would assume the starting position and move pieces from there with white going first. The board is arranged such that ( 0 , 0 ) is white's lower left corner.
For example the game:
Would be encoded as:
This leads to an encoding of 12 m bits where m is the number of moves made
On one hand this could get really big, on the other hand you can consider each move to be it's own game so each encoding really encodes m "chess boards". If you amortized this you get that each "chess board" is 12 bits. But I feel this is a bit cheating...
Attempt 2:
I realized that each move in the previous attempt encodes many illegal moves. So I decided to only encode legal moves. We enumerate possible moves as follows, number each square such that ( 0 , 0 ) → 0 , ( 1 , 0 ) → 1 , ( x , y ) → x + 8 y . Iterate through the tiles and check if a piece is there and if it can move. If so add the positions it can go to to a list. Choose the list index that is the move you want to make. Add that number to the running total of moves weighted by 1 plus the number of possible moves.
Example as above: From the starting position the first piece that can move is the knight on square 1, it can move to square 16 or 18, so add those to the list
[(1,16),(1,18)]
. Next is the knight on square 6, add it's moves. Overall we get:[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]
Since we want the move ( 12 , 28 ) , we encode this as 13 in base 20 since there are 20 possible moves.
So now we get the game number g0 = 13
Next we do the same for black except we number the tiles in reverse (to make it easier, not required) to get the list of moves:
[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]
Since we want the move ( 11 , 27 ) , we encode this as 11 in base 20 since there are 20 possible moves.
So now we get the game number g1 = ( 11 ⋅ 20 ) + 13 = 233
Next we get the following list of moves for white:
[(1,16),(1,18),(3,12),(3,21),(3,30),(3,39),(4,12),(5,12),(5,19),(5,26),(5,33),(5,40),(6,12),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27)(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]
Since we want the move ( 6 , 21 ) , we encode this as 13 in base 29 since there are 29 possible moves.
So now we get the game number g2 = ( ( 13 ⋅ 20 ) + 11 ) 20 + 13 = 5433
Next we get the following list of moves for black:
[(1,11),(1,16),(1,18),(2,11),(2,20),(2,29),(2,38),(2,47),(3,11),(4,11),(4,18),(4,25),(4,32),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]
Since we want the move $(10, 18)$ ( 10 , 18 )
So now we get the game number g3 = ( ( ( 19 ⋅ 29 + 13 ) 20 ) + 11 ) 20 + 13 = 225833
And continue this process for all the remaining moves. You can think of g as the function g ( x , y , z ) = x y + z . Thus g0 = g ( 1 , 1 , 13 ) , g1 = g ( g ( 1 , 1 , 11 ) , 20 , 13 ) , g2 = g ( g ( g ( 1 , 1 , 13 ) , 20 , 11 ) , 20 , 13 ) , g3 = g ( g ( g ( g ( 1 , 1 , 19 ) , 29 , 13 ) , 20 , 11 ) , 20 , 13 )
To decode a game number g0, we start at the initial position and enumerate all possible moves. Then we compute g1 = g0 // l, m0 = g0 % l, where l is the number of possible moves, '//' is the integer division operator and '%' is the modulus operator. It should hold that g0 = g1 + m0. Next we make the move m0 and repeat.
From the example above if g0 = 225833 then g1 = 225833 // 20 = 11291 and m0 = 225833 % 20= 13. Next g2 = 11291 // 20 = 564 and m1 = 11291 % 20 = 11. Then g3 = 11291 // 20 = 564 and m2 = 11291 % 20 = 11. Therefore g4 = 564 // 29 = 19 and_m_3 = 564 % 29 = 13. Finally g5= 19 // 29 = 0 and m4 = 19 % 29 = 19.
So how many bits are used to encode a game this way?
For simplicity, let's say there are always 20 moves each turn and for the worst case scenario we always pick the biggest one, 19. The number we will get is 19 ⋅ 20m
+ 19 ⋅ 20m-1 + 19 ⋅ 20m-2 + ⋯ + 19 ⋅ 20 + 19 = 20m+1 − 1 where _m is the number of moves. To encode 20m+1 − 1 we need about log2 ( 20m+1 ) bits which is about ( m + 1 ) ∗ log2 ( 20 ) = 4.3219 ∗ ( m + 1 )
On average m = 80 (40 moves per player) so this would take 351 bits to encode. If we were recording many games we would need a universal encoding since we don't know how many bits each number will need
Worst case when m = 400 (200 moves per player) so this would take 1734 bits to encode.
Note that the position we want to encode must be given to us via the shortest path to get there by following the rules. For example the game theorized here doesn't need m = 11741 to encode the final position. Instead we run a Breadth-First search to find the shortest path to that position and encode that instead. I don't know how deep we would need to go to enumerate all chess positions, but I suspect that 400 is an overestimate.
Quick calculation:
There are 12 unique pieces or the square can be empty so to position them on a chessboard is 1364. This is a vast overestimate since it includes many invalid positions. When we are m moves into the game we have created about 20m positions. So we are looking for when 20m = 1364. Log both sides to get m = 64 * log20(13) = 54.797. This shows that we should be able to get to any position in 55 moves.
Now that I calculated the worst case to be m = 55 not m = 400 I will edit my results. To encode a position where m = 55 takes 243 bits. I'm going to also say that the average case is around m = 40 which takes 177 bits to encode.
If we use the amortization argument from earlier we are encoding 400 "chess boards" in 1734 bits so we get that each "chess board" takes up 4.335 bits in the worst case.
Note that g = 0 denotes a valid game, the one where the piece on the lowest square moves to the lowest square it can.
Additional Notes:
If you want to refer to a specific position in the game you may need to encode the index. This can be added either manually e.g concatenate the index to the game, or add an additional "end" move as the last possible move each turn. This can now account for players conceding, or 2 in a row to denote the players agreed to a draw. This is only necessary if the game did not end in a checkmate or stalemate based on the position, in this case it is implied. In this case it brings the number of bits needed on average to 356 and in the worst case 1762.
la source