On Jan 22, 2015, at 3:13 PM, "Kevin M. Hildebrand" <kevin(a)umd.edu>
wrote:
When I did the upgrade, the pre-upgraded hosts had bonded Ethernet
interfaces, which I believe had options lnet networks="o2ib0(ib0),tcp0(bond0)"
After the upgrade, that got broken, and I had eth0 in the place of bond0 by mistake for a
short time, (and eth0 has no configuration). I also had an old incorrect parameter in the
MDT config that 1.8.7 didn't seem to care about, but 2.5.3 refused to start with
(ost.quota_type=ug), and in clearing that I did a writeconf, probably while I had the
still had the bond0 lnet config missing. That's the only reason I can think that the
addresses got messed up.
I was wondering if something like that might have happened. At one point, I had a similar
situation where a node came up with a wrong NID, and this got embedded in the MGS logs
which propogated to the clients. I had to run a writeconf to fix things (which also
apparently solved your problem).
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu