Your browser was unable to load all of the resources. They may have been blocked by your firewall, proxy or browser configuration.
Press Ctrl+F5 or Ctrl+Shift+R to have your browser try again.

Breaking issue: Too many files open. #4629

chrisnak ·

Hello,

On random times or when some unknown circumstances are met, our QB server starts melting down with the Too many files open issue.

We are trying to figure why this happen, but we don't know yet what triggers it.
Most of the time a restart is sufficient to keep the server functional.
Today we experienced the issue twice in a row even a few minutes after the restart.

We will need some help to investigate why this happens.

Currently we have two theories;

a. Our big number of schedules triggering too often.
b. Our rest api, we have a big number of connections from a lot of users.

We have a -lsof log that I'm currently digging into with nothing interesting so far.
Any help would be appreciated.

  • replies 9
  • views 55
  • stars 0
robinshen ADMIN ·

For server with many concurrent builds and api calls, the number of open file handle limit need to be increased:

chrisnak ·

Thanks for your reply Robin.
We have previously increased the open file handle limit and we solved our issues until now.
This time I'm trying to find a way to tell what is the root cause instead of treating the symptom.

So is there any information about limiting quickbuild's processes that keep open file handles?

robinshen ADMIN ·

If this issue still happens even if open file limit is increased, please run below command and post the result:

lsof -p <QB JVM process id>

Note that QB server has two processes, the wrapper process and the JVM process. Make sure to use JVM process id

chrisnak ·

Yes we are aware of that fact and we have set an alert for the process count.
The latest incident were yesterday were it bumped to the limit within 5 minutes.

So far I can't tell what happened because I don't have access to the server.
Can syncing data from perforce increase the open files count so fast?

drdt ·

What platform/OS are you using for QuickBuild server?

My team is experiencing this problem with another application on arm64 platforms (as opposed to amd64), unrelated to QuickBuild. It is new behavior which might be related to recent security updates to the environment.

We have tried many things, including Robin's suggestions above, with no success.

chrisnak ·

QuickBuild's server is running on amd64/ubuntu.
I'm not sure if there was any security update on our end.

robinshen ADMIN ·

It should not, as all opened handles related to P4 are carefully closed. Please run "lsof" as mentioned above at your convenient time as this gives me handles being opended.

drdt ·

We are having this issue on RedHat arm64, and we were finally able to resolve it with these OS changes:

We first attempted a restart of the host, but that didn't resolve the issue
We then:
Raised the /proc/sys/fs/file-max from 13299136 to 1329913600
Raised DefaultLimitNOFILE in /etc/systemd/system.conf to 131072
Ran

  • sudo systemctl daemon-reload
  • sudo systemctl daemon-reexec

Added the following to /etc/security/limits.conf

  • @root soft nofile 131072
  • @root hard nofile 262144
    • soft nofile 131072
    • hard nofile 262144

Then we restarted the host again, and the error was resolved.

Again, this was not a QuickBuild server, and it is arm64 not amd, but I hope it is helpful.

chrisnak ·

Thanks for the info. We can increase the open files limit but I don't think that's the solution for now.
QB server stays at about 400 files open throughout the daily, but we observe spikes that move the open files upwards to 1K or 2K and sometimes above the current limit which was previously set to 4096.

So we need to first identify what causes these spikes, if it's a process we need then increasing the limit is the way to go.
If it's something nasty we need to fix that.

In other words increasing the limit may hide problems under the mat and temporarily solve our issues until the server crashes again.