Pourquoi OOM-killer invoqué même quand une énorme quantité de RAM est disponible

8

Sur l'instance xlarge ubuntu EC2 soutenue par ebs, oom-killer est appelé. À partir de la sortie / var / log / syslog ci-dessous, il semble que ZONE_NORMAL manque de mémoire:

Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB

mais pourquoi ZONE_NORMAL ne dispose-t-il que de 11 Mo de 15 Go de RAM totale? Ou y a-t-il une autre raison pour manquer de mémoire?

La RAM disponible sur la machine (instance xlarge) est de 15 Go. La somme de la colonne rss dans le journal ci-dessous est de 3,7 Go et celle de total_vm est de 11,4 Go.

Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456146] node invoked oom-killer: gfp_mask=0x84d0, order=0, oom_adj=0, oom_score_adj=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456153] node cpuset=/ mems_allowed=0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456157] Pid: 639, comm: node Not tainted 2.6.38-1-virtual #28-Ubuntu
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456160] Call Trace:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456170]  [<ffffffff810b259d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456189]  [<ffffffff81106a01>] ? dump_header+0x91/0x1e0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456196]  [<ffffffff81007e7f>] ? xen_restore_fl_direct_end+0x0/0x1
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456203]  [<ffffffff815c92de>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456208]  [<ffffffff8110705d>] ? oom_kill_process+0x8d/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456213]  [<ffffffff81107492>] ? out_of_memory+0x102/0x240
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456218]  [<ffffffff8110c209>] ? __alloc_pages_nodemask+0x709/0x7f0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456224]  [<ffffffff81105c78>] ? filemap_fault+0x398/0x490
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456230]  [<ffffffff811414bd>] ? alloc_pages_current+0x9d/0x110
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456235]  [<ffffffff8104338b>] ? pte_alloc_one+0x1b/0x50
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456239]  [<ffffffff8100530e>] ? xen_pud_val+0xe/0x10
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456245]  [<ffffffff811255c5>] ? __pte_alloc+0x35/0x100
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456250]  [<ffffffff81129559>] ? handle_mm_fault+0x129/0x250
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456257]  [<ffffffff815ccf82>] ? do_page_fault+0x1a2/0x540
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456262]  [<ffffffff8109aacb>] ? do_futex+0xbb/0x210
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456266]  [<ffffffff8109ac9b>] ? sys_futex+0x7b/0x180
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456274]  [<ffffffff814c2975>] ? sys_recvmsg+0x75/0x90
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456278]  [<ffffffff815c9a15>] ? page_fault+0x25/0x30
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456282] Mem-Info:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456284] Node 0 DMA per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456288] CPU    0: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456290] CPU    1: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456293] CPU    2: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456295] CPU    3: hi:    0, btch:   1 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456299] Node 0 DMA32 per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456303] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456307] CPU    1: hi:  186, btch:  31 usd:  81
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456310] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456314] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456317] Node 0 Normal per-cpu:
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456321] CPU    0: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456323] CPU    1: hi:  186, btch:  31 usd: 175
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456326] CPU    2: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456329] CPU    3: hi:  186, btch:  31 usd:   0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456334] active_anon:3726183 inactive_anon:44 isolated_anon:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456335]  active_file:214 inactive_file:237 isolated_file:64
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456336]  unevictable:0 dirty:1 writeback:0 unstable:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456337]  free:18955 slab_reclaimable:2811 slab_unreclaimable:7277
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456338]  mapped:256 shmem:101 pagetables:12634 bounce:0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456341] Node 0 DMA free:15880kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(
anon):0kB isolated(file):0kB present:15688kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456353] lowmem_reserve[]: 0 4024 15142 15142
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456360] Node 0 DMA32 free:48596kB min:4180kB low:5224kB high:6268kB active_anon:3968448kB inactive_anon:32kB active_file:168kB inactive_file:208kB unevic
table:0kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:0kB dirty:0kB writeback:0kB mapped:328kB shmem:80kB slab_reclaimable:2208kB slab_unreclaimable:8604kB kernel_stack:48kB pagetabl
es:9724kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:601 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456371] lowmem_reserve[]: 0 0 11117 11117
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456377] Node 0 Normal free:11344kB min:11556kB low:14444kB high:17332kB active_anon:10936284kB inactive_anon:144kB active_file:688kB inactive_file:740kB
unevictable:0kB isolated(anon):0kB isolated(file):256kB present:11384720kB mlocked:0kB dirty:4kB writeback:0kB mapped:696kB shmem:324kB slab_reclaimable:9036kB slab_unreclaimable:20504kB kernel_stack:20
16kB pagetables:40812kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2190 all_unreclaimable? yes
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456388] lowmem_reserve[]: 0 0 0 0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456393] Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15880kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456409] Node 0 DMA32: 715*4kB 525*8kB 343*16kB 245*32kB 147*64kB 50*128kB 16*256kB 5*512kB 2*1024kB 0*2048kB 1*4096kB = 48996kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456423] Node 0 Normal: 1919*4kB 5*8kB 2*16kB 2*32kB 1*64kB 1*128kB 2*256kB 2*512kB 0*1024kB 1*2048kB 0*4096kB = 11588kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456438] 768 total pagecache pages
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456440] 0 pages in swap cache
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456443] Swap cache stats: add 0, delete 0, find 0/0
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456445] Free swap  = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.456447] Total swap = 0kB
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495285] 3934192 pages RAM
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495290] 91275 pages reserved
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495291] 1661 pages shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495293] 3822220 pages non-shared
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495296] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495310] [  228]     0   228     4297       71   2       0             0 upstart-udev-br
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495315] [  242]     0   242     5353      123   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495319] [  319]     0   319     5352      126   2     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495323] [  320]     0   320     5352      126   0     -17         -1000 udevd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495327] [  392]     0   392     1771      124   2       0             0 dhclient
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495332] [  467]     0   467    12366      140   3     -17         -1000 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495336] [  474]   101   474    47476      160   2       0             0 rsyslogd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495340] [  502]   102   502     5927      108   2       0             0 dbus-daemon
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495344] [  534]     0   534     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495348] [  539]     0   539     1035       27   0       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495352] [  543]     0   543    13589     2510   1       0             0 supervisord
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495356] [  544]     0   544     1035       25   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495360] [  545]     0   545     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495364] [  547]     0   547     1035       27   2       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495368] [  552]     0   552     5302       65   1       0             0 cron
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495372] [  553]     0   553     4753       44   1       0             0 atd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495376] [  565]     0   565     3946       46   2       0             0 irqbalance
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495380] [  571]   106   571    35760     3859   1       0             0 memcached
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495384] [  637]  1000   637   222128    36427   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495388] [  639]  1000   639   189634     4396   1       0             0 node
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495392] [  696]     0   696     1035       27   3       0             0 getty
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495396] [  755]     0   755    31312      372   2       0             0 console-kit-dae
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495401] [ 2404]     0  2404    35058     3644   1       0             0 carbon-cache.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495405] [ 2601]  1000  2601     5982      199   1       0             0 screen
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495409] [ 2602]  1000  2602     5576      766   0       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495413] [ 2661]     0  2661     6801       67   1       0             0 sudo
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495417] [ 2663]     0  2663     7133      851   1       0             0 run-graphite-de
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495421] [ 2668]     0  2668     1056       25   1       0             0 sh
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495425] [ 2669]     0  2669    12906     3942   1       0             0 django-admin.py
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495429] [ 2670]     0  2670    50313    11546   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495433] [ 2785]     0  2785    15953      489   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495437] [ 3123]  1000  3123  1184911   266122   1       0             0 java
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495443] [18135]     0 18135  5884532     9128   1       0             0 mongod
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495449] [12574]  1000 12574   966781   932357   2       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495455] [30139]  1000 30139   759010   741642   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495461] [14186]    33 14186    19905     4657   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495465] [14187]    33 14187    19726     4527   1       0             0 nginx
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495470] [14190]  1000 14190    23767     8459   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495474] [14201]  1000 14201  1706261  1687237   1       0             0 python
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495478] [24553]     0 24553    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495482] [24584]  1000 24584    19831      200   1       0             0 sshd
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495486] [24585]  1000 24585     6464     1655   1       0             0 bash
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495490] Out of memory: Kill process 14201 (python) score 439 or sacrifice child
Dec 11 08:53:53 ip-10-60-61-71 kernel: [19427969.495509] Killed process 14201 (python) total-vm:6825044kB, anon-rss:6748848kB, file-rss:100kB

Merci de votre aide!

tarkeshwar
la source
4
Parce que tu n'as pas désactivé oom-killer?
HopelessN00b
2
Vous devriez peut-être inclure des informations sur la façon de procéder.
mdpc
pmap -x <pid> affiche les valeurs rss en ko. Dans le rapport de plantage oom, la colonne rss est en nombre de pages. Multipliez-le par la taille de la page (4 Ko).
tarkeshwar

Réponses:

13

Cela ressemble à un MOO assez légitime pour moi:

  1. Vous avez un nœud NUMA (à savoir le nœud 0) qui a trois zones:
    1. DMA qui a 15880kB gratuit
    2. DMA32 qui a 48996 Ko gratuit
    3. Normal qui a 11588kB gratuit
  2. Presque toute votre mémoire 3968448kB + 10936284kBest dans la active_anonliste LRU.
  3. Les tailles de mémoire dans le tableau sont en pages (qui, selon votre architecture, sont probablement de 4 Ko), elles totalisent donc 14,8 Go, et non 3,7.
SaveTheRbtz
la source
0

Les deux ont raison 3,7 Go (4) = 14,8 Go. Chaque valeur rss doit être multipliée par la taille des pages pour que la taille réelle soit allouée en mémoire. Dans le descripteur de mémoire, le rss est juste le nombre de / * pages allouées * /

Cordialement

Samuel Paz Nuñez
la source