Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

Toomany opened files #4388

jintaeson ·

Hello,

Everything was stopped as an error occurred on the operating master server as follows.

2021-12-10 12:05:27,463 [qtp586965379-52] ERROR org.apache.wicket.pageStore.DiskDataStore - /home/suprem/quickbuild-80-9.0.27/temp/sessions/quickbuild-filestore/node0flxlem07muep71j923nmk5s42396/data (Too many open files)
java.io.FileNotFoundException: /home/suprem/quickbuild-80-9.0.27/temp/sessions/quickbuild-filestore/node0flxlem07muep71j923nmk5s42396/data (Too many open files)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at org.apache.wicket.pageStore.DiskDataStore$SessionEntry.getFileChannel(DiskDataStore.java:401)
at org.apache.wicket.pageStore.DiskDataStore$SessionEntry.savePage(DiskDataStore.java:328)
at org.apache.wicket.pageStore.DiskDataStore.storeData(DiskDataStore.java:176)
at org.apache.wicket.pageStore.AsynchronousDataStore.storeData(AsynchronousDataStore.java:232)
at org.apache.wicket.pageStore.DefaultPageStore.storePageData(DefaultPageStore.java:115)

I checked master server (qb java pid: 147465):

root@master:/home/suprem# while true; do cat /proc/sys/fs/file-nr; sleep 5; done
12656   0   26340735
12656   0   26340735


root@master:/home/suprem# lsof | awk '{ print $2; }' | uniq -c | sort -rn | head
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1001/gvfs
      Output information may be incomplete.
2491378 147465
117832 81283
 111996 81862
 105696 82451
 105552 83100
 104390 83310
 104390 82369
 104390 81650
 102808 82296
 102808 82043

root@master: cat /etc/security/limits.conf
* hard nofile 500000
* soft nofile 500000

# End of file


root@master:cat /proc/147465/limits
Max open files            4096                 4096                 files

Can I know why the QB process is set to 4096?
And can you suspect that this is causing a problem?

  • replies 8
  • views 1943
  • stars 0
robinshen ADMIN ·

You may consult your system administrator on how to increase open file limit of QB process depending on your system. This is definitely necessary if QB server has to handle many builds.

jintaeson ·

@robinshen
Isn't it adjusted according to the system setting?
Or is it fixed at 4096 no matter what?(when QB system start)

robinshen ADMIN ·

QB along can not adjust this, as this requires root privilege. Also the command may differ on different systems.

drdt ·

Hi, I am following up on this issue because we are seeing insane numbers of file handles opened by the QB agent... upwards of 225k, in spite of our system limit being set to 65k. QB itself is not failing, but this may be the cause of intermittent failures of other processes running on the machine.

My system admin believes this is due to a leak in the code. We see this running the 14.0.23 agent with Java 21, but not with Java 8.

Do you have any advice or information about this possibility?

drdt ·

We see this running the 14.0.23 agent with Java 21, but not with Java 8.

Actually, we see this with both Java 8 and Java 21. It affects some machines but not others with no clear differentiator.

robinshen ADMIN ·

Are you able to find two agents with and without this issue, both running same Java version, and doing same thing (running same builds etc)?

drdt ·

Yes, this number is arbitrarily high on multiple machines, and there seems to be no relationship to build activity. In fact I stopped and restarted an agent and it immediately came up with 225k handles.

The server also has a crazy-high number, but I understand that because the server has a lot to do.

We have now moved on and do not believe that this is related to the build failures we have seen. However, I am still curious why the agent is running with this many file handles, a number almost 4x the actual limit defined by the OS.

robinshen ADMIN ·

Yes, this number is arbitrarily high on multiple machines, and there seems to be no relationship to build activity. In fact I stopped and restarted an agent and it immediately came up with 225k handles.

Are you running any build jobs on this agent? If so, please restart again and avoid running any build jobs temporarily on this build agent to see if handles still increase high.