Recently “someone/somewhere” started migrating there storage to PureStorage FlashArray. Usually when doing this kind of things we tend to flow different best practice dictated in this case by the storage vendor . Following best practice without carefully understanding them may have bad consequence. In this particular case multiple JAVA application stopped running after the migration !
After noticing the problem we started digging deeper to try to understand the problem. Using strace and jstack we were able to isolate the problem. The different JAVA program where blocking on “/dev/random” because there was not enough entropy. The available entropy as displayed in “/proc/sys/kernel/random/entropy_avail ” was lower than the thresholds “proc/sys/kernel/random/read_wakeup_threshold “.
But why this is happening ?
Linux uses naturally occurring chaotic events on the local system to generate entropy for the kernel pool. These events, such as disk IO events, network packet arrival times, keyboard presses, and mouse movements, are the primary sources of entropy on many systems.
Emmm ! One of the recommendation of PureSTorage was “eliminates the collection of entropy for the kernel random number generator, which has high CPU overhead when enabled for devices supporting high IOPS.”
REF: Linux_Recommended_Settings of PureStorage
We’ve got it ! This setting has reduced the entropy collection witch caused this problem.
To resolve this problem we have multiple solution :
- Re-enable entropy collection (this my add CPU overhead)
- Use /dev/urandom which is no blocking
- Use the linux “rngd” service to feeds data from a random number generator to the kernel’s random number entropy pool
There is of course other solution ! but the basic idea here is to carefully test and understand the different recommendation and best practice before applying them !
That’s it 😀