In particular, I'd suggest this bug:
https://jira.hpdd.intel.com/browse/LU-5392
There were several related issues fixed last year, I think this particular bugs points at
both of the relevant patches.
________________________________________
From: HPDD-discuss [hpdd-discuss-bounces(a)lists.01.org] on behalf of Dilger, Andreas
[andreas.dilger(a)intel.com]
Sent: Saturday, April 11, 2015 1:12 AM
To: Marc Boisis
Cc: HPDD-discuss(a)ml01.01.org
Subject: Re: [HPDD-discuss] MDS crash
This looks like a journal credit problem that was already fixed. Did you look in Jira for
this stack?
Cheers, Andreas
On Apr 9, 2015, at 04:43, Marc Boisis
<marc.boisis@univ-lr.fr<mailto:marc.boisis@univ-lr.fr>> wrote:
Hi,
My lustre servers are running Lustre 2.5.2 and when I’m trying to walk trough the entire
filesystem (24To) with a simple find, the mds crash with this kernel error.
Have you an idea ?
<4>R13: ffff882018f2db58 R14: ffff8840214c7800 R15: 0000000000007000
<4>FS: 0000000000000000(0000) GS:ffff88011cc40000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000003200e73e90 CR3: 000000404fa2d000 CR4: 00000000001407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process mdt_rdpg00_002 (pid: 12981, threadinfo ffff882033eba000, task
ffff882036d79500)
<4>Stack:
<4> ffff88203267c438 ffffffffa0d55830 ffff882018f2db58 0000000000000000
<4><d> ffff882033ebb610 ffffffffa0d130eb ffff882033ebb600 ffffffff8109aeef
<4><d> ffff88401541ec40 ffff88203267c438 0000000000000400 ffff882018f2db58
<4>Call Trace:
<4> [<ffffffffa0d130eb>] __ldiskfs_handle_dirty_metadata+0x7b/0x100 [ldiskfs]
<4> [<ffffffff8109aeef>] ? wake_up_bit+0x2f/0x40
<4> [<ffffffffa0d48995>] ldiskfs_quota_write+0x165/0x210 [ldiskfs]
<4> [<ffffffff811ee86e>] write_blk+0x2e/0x30
<4> [<ffffffff811eee1a>] remove_free_dqentry+0x8a/0x140
<4> [<ffffffff811ef7c7>] do_insert_tree+0x317/0x3d0
<4> [<ffffffff811ef735>] do_insert_tree+0x285/0x3d0
<4> [<ffffffff811ef735>] do_insert_tree+0x285/0x3d0
<4> [<ffffffff811ef735>] do_insert_tree+0x285/0x3d0
<4> [<ffffffff811ef978>] qtree_write_dquot+0xf8/0x150
<4> [<ffffffff811eebee>] ? qtree_read_dquot+0x5e/0x200
<4> [<ffffffff811ee0c0>] v2_write_dquot+0x30/0x40
<4> [<ffffffff811ea270>] dquot_acquire+0xc0/0x140
<4> [<ffffffffa0d47b26>] ldiskfs_acquire_dquot+0x66/0xb0 [ldiskfs]
<4> [<ffffffff811ec25c>] dqget+0x2ac/0x390
<4> [<ffffffff811ec808>] dquot_initialize+0x98/0x240
<4> [<ffffffffa0d47d83>] ldiskfs_dquot_initialize+0x83/0xd0 [ldiskfs]
<4> [<ffffffffa0e12b7f>] osd_attr_set+0x12f/0x540 [osd_ldiskfs]
<4> [<ffffffffa099196b>] lod_attr_set+0x12b/0x450 [lod]
<4> [<ffffffffa0b5c9b1>] mdd_attr_set_internal+0x151/0x230 [mdd]
<4> [<ffffffffa0b5f30a>] mdd_attr_set+0x117a/0x1470 [mdd]
<4> [<ffffffffa0ad644c>] mdt_mfd_close+0x7ac/0x1bc0 [mdt]
<4> [<ffffffffa082ed65>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
<4> [<ffffffffa0855d26>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
<4> [<ffffffffa0682105>] ? class_handle2object+0x95/0x190 [obdclass]
<4> [<ffffffffa0ad8bf2>] mdt_close+0x642/0xa80 [mdt]
<4> [<ffffffffa0aae58a>] mdt_handle_common+0x52a/0x1470 [mdt]
<4> [<ffffffffa0aea735>] mds_readpage_handle+0x15/0x20 [mdt]
<4> [<ffffffffa083fbc5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
<4> [<ffffffffa05364ce>] ? cfs_timer_arm+0xe/0x10 [libcfs]
<4> [<ffffffffa05473cf>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
<4> [<ffffffffa08372a9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
<4> [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90
<4> [<ffffffffa0840f2d>] ptlrpc_main+0xaed/0x1740 [ptlrpc]
<4> [<ffffffffa0840440>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
<4> [<ffffffff8109ab56>] kthread+0x96/0xa0
<4> [<ffffffff8100c20a>] child_rip+0xa/0x20
<4> [<ffffffff8109aac0>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
<4>Code: c6 9c 03 00 00 4c 89 f7 e8 b1 f1 3b e1 48 8b 33 ba 01 00 00 00 4c 89 e7 e8
11 ec ff ff 4c 89 f0 66 ff 00 66 66 90 e9 73 ff ff ff <0f> 0b eb fe 0f 0b eb fe 0f
0b 66 0f 1f 84 00 00 00 00 00 eb f5
<1>RIP [<ffffffffa016b8ad>] jbd2_journal_dirty_metadata+0x10d/0x150 [jbd2]
<4> RSP <ffff882033ebb5b0>
quotas are disabled:
[root@mds1 ~]# lctl get_param osd-*.*.quota_slave.info
osd-ldiskfs.led-MDT0000.quota_slave.info=
target name: led-MDT0000
pool ID: 0
type: md
quota enabled: none
conn to master: setup
space acct: ug
user uptodate: glb[0],slv[0],reint[0]
group uptodate: glb[0],slv[0],reint[0]
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss@lists.01.org<mailto:HPDD-discuss@lists.01.org>
https://lists.01.org/mailman/listinfo/hpdd-discuss
_______________________________________________
HPDD-discuss mailing list
HPDD-discuss(a)lists.01.org
https://lists.01.org/mailman/listinfo/hpdd-discuss