Hello

 

Do you think I can use fsck with options –o or –c ?

It’s a good idea ?

 

Thank you

 

David

 

De : HPDD-discuss [mailto:hpdd-discuss-bounces@lists.01.org] De la part de David Roman
Envoyé : jeudi 18 juin 2015 14:49
À : 'Arman Khalatyan'
Cc : hpdd-discuss@ml01.01.org
Objet : Re: [HPDD-discuss] Data lost on OST

 

> did you try to find files based on ost ID?

> lfs find -O 1,2

 

Yes I do è (I just forgot in my previous post to write lfs

find /path_to_directory -type f | wc -l gave me 240 files found

lfs find -O OST1 -O OST2 /path_to_directory gave me just 4 files found

 

 

[root@archives-mds ~]# lfs find  -O archives-OST0001 /ARCHIVES/spectre_hr_2/tigr_nc/

[root@archives-mds ~]# lfs find  -O archives-OST0002 /ARCHIVES/spectre_hr_2/tigr_nc/

/ARCHIVES/spectre_hr_2/tigr_nc//spi4a_1241_Day.nc

/ARCHIVES/spectre_hr_2/tigr_nc//spi4a_0121_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc//spi4a_2261_Night.nc

[root@archives-mds ~]# find /ARCHIVES/spectre_hr_2/tigr_nc/

/ARCHIVES/spectre_hr_2/tigr_nc/

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_0061_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1050_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_0261_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_2161_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1461_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_0201_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_2001_Day.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_2021_Day.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1641_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_2001_Night.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_0341_Day.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1761_Day.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1221_Night.nc

… etc I truncated the output.

 

This file, spi4a_1241_Day.nc, is ok

==========================

[root@archives-mds ~]# lfs getstripe /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1241_Day.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1241_Day.nc

lmm_stripe_count:   1

lmm_stripe_size:    1048576

lmm_pattern:        1

lmm_layout_gen:     0

lmm_stripe_offset:  2

                obdidx                 objid                    objid                    group

                     2              14458853            0xdc9fe5                     0

 

 

[root@archives-mds ~]# stat /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1241_Day.nc

  File: « /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1241_Day.nc »

  Size: 40020545                Blocks: 78168      IO Block: 4194304 fichier

Device: effa7236h/4026167862d              Inode: 144115373027889935  Links: 1

Access: (0660/-rw-rw----)  Uid: (11161/tournier)   Gid: (11434/     sps)

Access: 2015-05-13 12:13:32.000000000 +0200

Modify: 2014-04-01 19:03:10.000000000 +0200

Change: 2015-06-17 14:56:15.000000000 +0200

 

[root@archives-mds ~]# cp /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1241_Day.nc /tmp/waste.nc It’s ok immediately

 

 

This file, spi4a_1941_Day.nc, is not ok

==========================

[root@archives-mds ~]# lfs getstripe /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1941_Day.nc

/ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1941_Day.nc

lmm_stripe_count:   1

lmm_stripe_size:    1048576

lmm_pattern:        1

lmm_layout_gen:     0

lmm_stripe_offset:  1

                obdidx                 objid                    objid                    group

                     1               9958236             0x97f35c                      0

 

 

[root@archives-mds ~]# stat /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1941_Day.nc

  File: « /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1941_Day.nc »

  Size: 40020545                Blocks: 78168      IO Block: 4194304 fichier

Device: effa7236h/4026167862d              Inode: 144115373027890005  Links: 1

Access: (0660/-rw-rw----)  Uid: (11161/tournier)   Gid: (11434/     sps)

Access: 2015-05-13 12:16:03.000000000 +0200

Modify: 2014-04-01 19:04:48.000000000 +0200

Change: 2015-06-17 14:58:03.000000000 +0200

 

[root@archives-mds ~]# cp /ARCHIVES/spectre_hr_2/tigr_nc/spi4a_1941_Day.nc /tmp/waste.nc

I wait fewer minutes that the system give the hand. But the copy is done after a long time of inactivity

 

David

 

 

-----Message d'origine-----
De : Arman Khalatyan [mailto:arm2arm@gmail.com]
Envoyé : jeudi 18 juin 2015 11:50
À : David Roman
Cc : hpdd-discuss@ml01.01.org
Objet : Re: [HPDD-discuss] Data lost on OST

 

did you try to find files based on ost ID?

lfs find -O 1,2

also check stat on files shat you cannot copy:

lfs get stripe /lustre/filename

stat /lustre/filename

***********************************************************

 

Dr. Arman Khalatyan  eScience -SuperComputing  Leibniz-Institut für Astrophysik Potsdam (AIP)  An der Sternwarte 16, 14482 Potsdam, Germany

 

***********************************************************

 

 

On Thu, Jun 18, 2015 at 11:01 AM, David Roman <David.Roman@noveltis.fr> wrote:

> I continue to try to understand my problem. I just do an amazing discovery ....

> A quick reminder :

> 

> 

> find /path_to_directory -type f | wc -l gave me 240 files found lfs

> find -O OST1 -O OST2 /path_to_directory gave me just 4 files found

> 

> I can read and copy these 4 files, but not the others.

> 

> If I do cp /path_to_directory/* /other_path the command doesn't work

> 

> BUT !!! The very amazing thing, with rsync command I can copy all data !!!!

> 

> David

> 

> 

> -----Message d'origine-----

> De : David Roman

> Envoyé : mercredi 17 juin 2015 16:38

> À : hpdd-discuss@ml01.01.org

> Objet : RE: [HPDD-discuss] Data lost on OST

> 

> I do a mistake when i replied. I re-post for all.

> 

> 

> -----Message d'origine-----

> De : David Roman

> Envoyé : mercredi 17 juin 2015 16:07

> À : 'Arman Khalatyan'

> Objet : RE: [HPDD-discuss] Data lost on OST

> 

> I already reboot servers this morning

> 

> 

> -----Message d'origine-----

> De : David Roman

> Envoyé : mercredi 17 juin 2015 16:03

> À : 'Arman Khalatyan'

> Objet : RE: [HPDD-discuss] Data lost on OST

> 

> No, I plan to use 3 OSS servers, each with 1 OST. In a first time I deployed archives-mds (MDT000) and archives-oss3 (OST002). In second time I deployed archives-oss2 (OST001). I never use the third server archives-oss1 (OST000). OST000 doesn't exist.

> 

> I do a test just now ...

> I copied some data to my lustre volume, about 106 GO.

> I have no error with the copy operation. But je problem is the same. I

> don't have all data. I found data on OST002, but nothing on OST001

> 

> 

> 

> -----Message d'origine-----

> De : Arman Khalatyan [mailto:arm2arm@gmail.com] Envoyé : mercredi 17

> juin 2015 15:53 À : David Roman Objet : Re: [HPDD-discuss] Data lost

> on OST

> 

> what about

> OST0000             : Resource temporarily unavailable???

> did you recently removed it from MDS?

> before lctl lfsck stry to reboot the MDS/OSS After start usually it is

> starting auto scrub

> ***********************************************************

> 

>  Dr. Arman Khalatyan  eScience -SuperComputing  Leibniz-Institut für

> Astrophysik Potsdam (AIP)  An der Sternwarte 16, 14482 Potsdam,

> Germany

> 

> ***********************************************************

> 

> 

> On Wed, Jun 17, 2015 at 3:43 PM, David Roman <David.Roman@noveltis.fr> wrote:

>> Yes if I do

>>         ls -l /directory/path

>>                 find /directory/path

>>         lfs find /directory/path

>> 

>> I see my files (240)

>> 

>> If I do

>>         lfs find -O archives-OST0001 /directory/path ==> I see nothing

>>         lfs find -O archives-OST0002 /directory/path ==> I see only 4 files, I can read only this files.

>> 

>> 

>> # lctl dl

>>   0 UP osd-ldiskfs archives-MDT0000-osd archives-MDT0000-osd_UUID 10

>>   1 UP mgs MGS MGS 51

>>   2 UP mgc MGC192.168.1.45@tcp 86f5008e-05e8-6d58-4fa6-64dfebed9dd8 5

>>   3 UP mds MDS MDS_uuid 3

>>   4 UP lod archives-MDT0000-mdtlov archives-MDT0000-mdtlov_UUID 4

>>   5 UP mdt archives-MDT0000 archives-MDT0000_UUID 53

>>   6 UP mdd archives-MDD0000 archives-MDD0000_UUID 4

>>   7 UP qmt archives-QMT0000 archives-QMT0000_UUID 4

>>   8 UP osp archives-OST0002-osc-MDT0000 archives-MDT0000-mdtlov_UUID 5

>>   9 UP osp archives-OST0001-osc-MDT0000 archives-MDT0000-mdtlov_UUID

>> 5

>>  10 UP lwp archives-MDT0000-lwp-MDT0000

>> archives-MDT0000-lwp-MDT0000_UUID 5

>>  11 UP lov archives-clilov-ffff880029b0e000

>> 3ba711ce-278f-bd95-e4be-9cae34c7a5ab 4

>>  12 UP lmv archives-clilmv-ffff880029b0e000

>> 3ba711ce-278f-bd95-e4be-9cae34c7a5ab 4

>>  13 UP mdc archives-MDT0000-mdc-ffff880029b0e000

>> 3ba711ce-278f-bd95-e4be-9cae34c7a5ab 5

>>  14 UP osc archives-OST0002-osc-ffff880029b0e000

>> 3ba711ce-278f-bd95-e4be-9cae34c7a5ab 5

>>  15 UP osc archives-OST0001-osc-ffff880029b0e000

>> 3ba711ce-278f-bd95-e4be-9cae34c7a5ab 5

>> *********************************************************************

>> *

>> *****

>> 

>> # lfs df

>> UUID                   1K-blocks        Used   Available Use% Mounted on

>> archives-MDT0000_UUID   366138224    10685360   330392200   3% /ARCHIVES[MDT:0]

>> OST0000             : Resource temporarily unavailable

>> archives-OST0001_UUID 42910527264 15650953496 25064315980  38% /ARCHIVES[OST:1]

>> archives-OST0002_UUID 42911625592 40387604352   377020420  99% /ARCHIVES[OST:2]

>> 

>> filesystem summary:  85822152856 56038557848 25441336400  69%

>> /ARCHIVES

>> *********************************************************************

>> *

>> *****

>> 

>> # cat /proc/fs/lustre/lov/*-MDT0000-mdtlov/target_obd

>> 1: archives-OST0001_UUID ACTIVE

>> 2: archives-OST0002_UUID ACTIVE

>> *********************************************************************

>> *

>> *****

>> 

>> 

>> 

>> 

>> 

>> 

>> 

>> 

>> I found some errors about lustre, but I don't understand then.

>> 

>> For exemple :

>> 

>> messages-20150531:May 27 17:41:28 archives-oss2 kernel: LustreError:

>> dumping log to /tmp/lustre-log.1432741288.2818 messages-20150531:May

>> 27 17:41:28 archives-oss2 kernel: [<ffffffffa080bad1>] ?

>> lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc] messages-20150531:May 27

>> 17:41:28 archives-oss2 kernel: [<ffffffffa080bc1e>] ?

>> lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] messages-20150531:May 27

>> 17:41:28 archives-oss2 kernel: [<ffffffffa080bad1>] ?

>> lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc] messages-20150531:May 27

>> 17:41:28 archives-oss2 kernel: [<ffffffffa080bc1e>] ?

>> lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc] messages-20150531:May 27

>> 17:41:28 archives-oss2 kernel: [<ffffffffa080bad1>] ?

>> lustre_pack_reply_v2+0x1e1/0x280 [ptlrpc] messages-20150531:May 27

>> 17:41:28 archives-oss2 kernel: [<ffffffffa080bc1e>] ?

>> lustre_pack_reply_flags+0xae/0x1f0 [ptlrpc]

>> 

>> Can I try to use lctl lfsck ???

>> 

>> 

>> David

>> 

>> 

>> 

>> 

>> -----Message d'origine-----

>> De : Arman Khalatyan [mailto:arm2arm@gmail.com] Envoyé : mercredi 17

>> juin 2015 15:16 À : David Roman Cc : hpdd-discuss@ml01.01.org Objet :

>> Re: [HPDD-discuss] Data lost on OST

>> 

>> Can you see with "ls -l" the file names?

>> Do you see any errors in logs? what you can check is connectivity from client to OSTs:

>> lctl dl

>> lfs df

>> or make on mds:

>> cat /proc/fs/lustre/lov/*-MDT0000-mdtlov/target_obd

>> 

>> ***********************************************************

>> 

>>  Dr. Arman Khalatyan              eScience -SuperComputing

>>  Leibniz-Institut für Astrophysik Potsdam (AIP)  An der Sternwarte

>> 16,

>> 14482 Potsdam, Germany

>> 

>> ***********************************************************

>> 

>> 

>> On Wed, Jun 17, 2015 at 11:23 AM, David Roman <David.Roman@noveltis.fr> wrote:

>>> Hello,

>>> 

>>> 

>>> I use Lustre 2.6. I have one MDS and 2 OSS servers.

>>> When I do a ls command in a specific directory I see my files. But when I want read some them with cat command, the command is blocked.

>>> 

>>> With lfs find -O <device> /my/directory i not see all files !!!

>>> 

>>> Could you help me please ???

>>> 

>>> 

>>> Tank you

>>> _______________________________________________

>>> HPDD-discuss mailing list

>>> HPDD-discuss@lists.01.org

>>> https://lists.01.org/mailman/listinfo/hpdd-discuss

> _______________________________________________

> HPDD-discuss mailing list

> HPDD-discuss@lists.01.org

> https://lists.01.org/mailman/listinfo/hpdd-discuss