Problem with disk usage monitoring #2590

vanderhu · 1 decade ago

Lately we are facing problems with disk usage monitoring for several nodes, which seems to be due to previous entry points in database. How can you get rid of this entrypoints?

2013-11-14 04:52:15,126 [qtp1674438074-1272226] ERROR org.hibernate.util.JDBCExceptionReporter - Duplicate entry '1384401134996-<node_name>:8811-disk.usage./' for key 'QB_TIMESTAMP'
2013-11-14 04:52:15,126 [qtp1674438074-1272226] ERROR org.hibernate.event.def.AbstractFlushingEventListener - Could not synchronize database state with session
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:96)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:275)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:268)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:184)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:51)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1216)
at com.pmease.quickbuild.entitymanager.impl.DefaultMeasurementDataManager.flushAndClear(DefaultMeasurementDataManager.java:71)
at com.pmease.quickbuild.entitymanager.impl.DefaultMeasurementDataManager.save(DefaultMeasurementDataManager.java:98)
at com.pmease.quickbuild.plugin.measurement.core.reporter.MeasurementServerReporter.save(MeasurementServerReporter.java:105)

replies 15
views 5716
stars 0

steveluo ADMIN · 1 decade ago

I want to know what happened on those nodes? Time changed? As you can see we are using timestamp here and it uses system current milliseconds, it is hard to duplicate.

vanderhu · 1 decade ago

Time didnt change on the machines. Only thing which has been changed is the regular expression for which disks should be monitored. Since that change I see this prints continuously in audit log...

vanderhu · 1 decade ago

Any update on this Steve? Currently I am not having any measurements and alerts due to this.

Thanks,
Maikel

steveluo ADMIN · 1 decade ago

Hello Maikel,

I don't know why this occurred yet. When you say you don't have any measurements and alerts, do you mean all nodes or just some nodes? How about restarting the server and those nodes failed with duplicate entries?

vanderhu · 1 decade ago

I don't have alerts for the nodes which are printing the error lines in the audit log. Will try restarting some nodes to see if the problem is then gone, also main server will be restarted when performing upgrade to 5.1.0 (great release!).

vanderhu · 1 decade ago

Restarted both server and agent which are posting the duplicate entry points, but still audit log is showing:
2013-12-20 02:27:25,369 [pool-1-thread-78946] ERROR org.hibernate.util.JDBCExceptionReporter - Duplicate entry '1387502845311-pc-509943-vm1:8811-disk.usage./' for key 'QB_TIMESTAMP'
2013-12-20 02:27:25,370 [pool-1-thread-78946] ERROR org.hibernate.event.def.AbstractFlushingEventListener - Could not synchronize database state with session
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:96)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:275)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:268)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:184)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:51)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1216)
at com.pmease.quickbuild.entitymanager.impl.DefaultMeasurementDataManager.flushAndClear(DefaultMeasurementDataManager.java:71)
at com.pmease.quickbuild.entitymanager.impl.DefaultMeasurementDataManager.save(DefaultMeasurementDataManager.java:98)
at com.pmease.quickbuild.plugin.measurement.core.reporter.MeasurementServerReporter.save(MeasurementServerReporter.java:105)
at com.pmease.quickbuild.plugin.measurement.core.reporter.MeasurementServerReporter.access$100(MeasurementServerReporter.java:34)
at com.pmease.quickbuild.plugin.measurement.core.reporter.MeasurementServerReporter$NodeMetricsSender.execute(MeasurementServerReporter.java:93)
at com.pmease.quickbuild.grid.NodeJobExecuteJob.execute(NodeJobExecuteJob.java:26)
at com.pmease.quickbuild.grid.GridJob.run(GridJob.java:71)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.sql.BatchUpdateException: Duplicate entry '1387502845311-pc-509943-vm1:8811-disk.usage./' for key 'QB_TIMESTAMP'
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2056)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1467)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:70)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:268)
... 18 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '1387502845311-pc-509943-vm1:8811-disk.usage./' for key 'QB_TIMESTAMP'
at sun.reflect.GeneratedConstructorAccessor162.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1041)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4190)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4122)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2818)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2157)
at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2460)
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2008)
... 22 more

steveluo ADMIN · 1 decade ago

Hi,

I have filed a ticket below to track this:
http://track.pmease.com/browse/QB-1894

And also I created a patch plugin for this (see attachment on above web page), please download the patch plugin to see whether this can work or not. To use the attached plugin, please:
1. Stop your QuickBuild server
2. Replace the old plugin: ${qb-home}/plugins/com.pmease.quickbuild.plugin.measurement.core_{version}.jar with the patched plugin com.pmease.quickbuild.plugin.measurement.core_5.1.3
3. Restart your QuickBuild server

steveluo ADMIN · 1 decade ago

Please also note that the plugin is working on QuickBuild 5.1.x. If you are not using QuickBuild 5.1.x, please upgrade your QuickBuild to the latest first.

#10

vanderhu · 1 decade ago

I will give it a try tomorrow to see if it is working. We are having currently 5.1.3 version of QB.

#11

vanderhu · 1 decade ago

Installed the patched plugin in combination with QB 5.1.6 version, don't see the duplicate entry points anymore in the resources log of the main server. Will keep an eye on it and check again tomorrow around noon to be 100% sure.

#12

vanderhu · 1 decade ago

Still not seen any problem in the logging about duplicate entrypoint, so can consider as fixed, also agents with problems are now having disk monitoring again. So please integrate it into next official version.

#13

steveluo ADMIN · 1 decade ago

Thank you very much for the updating. We'll add this fix in next patch release.

#14

vanderhu · 1 decade ago

The problem is no longer being highlighted within the server log. Although I notice that the disk monitoring is not working any more as supposed, namely the disk usage measured is always equal while there is definitely fluctuation on the disk usage on the servesrs. This is also causing to not have proper alerts of your agents when disk is almost full as there is a flat line of for example 75% measured.

#15

steveluo ADMIN · 1 decade ago

Hi,

I need some information about your issue:

1. a screen shot of disk measurements on the issued agent (Grid -> Active Nodes -> the agent)
2. your monitoring disk setting about this issued agent (Administration -> Plugin Management -> Grid Measurement Plugin -> configure)
3. search you agent log and find string: "Duplicated mount point" to see if there is any log entry

drop me (steve at pmease dot com) a mail on these.

steve

#16

vanderhu · 1 decade ago

Supplied data by mail as requested