Datameer Metrics Monitoring

Introduction

Datameer X automatically tracks performance and error related metrics for the conductor. This information is written to the logs/metrics-jamon.log file and the logs/metrics.log files. The metrics can be configured in the default.properties, and log4j properties files (The log4j file varies depending on how you are running Datameer.  The possibilities are: 1) production - log4j-production.properties, 2) desktop production - log4j-desktop.properties, 3) development - log4j.properties).

Metrics collection is enabled by default, but can be disabled via the default.properties file. The metrics are primarily collected using JAMon (http://www.jamonapi.com). JAMon can also be disabled via the default.properties file.

What is Being Monitored?

Metrics such as execution time (average, min, max, total), call count, concurrency (average, max) are tracked for any time based metrics. Some metrics only track a count (for example how many Hibernate cache hits there were). Each metric can be identified by a key in the log file. The list below maps each metric to its key label and also specifies whether it is a time based metric, or a count based metric (units are ms. or count respectively).  All metrics below are stored in "logs/metrics-jamon.log" unless otherwise specified.

SQL 

All SQL keys start with "MonProxy-SQL", and track time (execution time in "ms.")

    • All SQL queries - "MonProxy-SQL-Type: All". Total metrics for every SQL statement that was executed including selects, inserts, deletes, create etc.
    • SQL queries by type - Tracks metrics for SQL types such as insert, select, delete. Example key: "MonProxy-SQL-Type: select"
    • Top 10 worst performing SQL statements by total time, average time, max time and call count. Depending on whether the query was a PreparedStatement or Statement the key has the following formats
      • "MonProxy-SQL-PreparedStatement: <sql statement>"
      • "MonProxy-SQL-Statement: <sql statement>"
    • Queries that exceed the configured size threshold as configured in default.properties. This property reduces the amount of memory metrics tracking consumes at the expense of not being able to track large queries by their unique SQL statement. Currently the default threshold is a query that is 5k.
      • "MonProxy-SQL-Type: sqlSizeExceedsThreshold" - Execution time metrics for queries exceeding the threshold size.
      • "MonProxy-SQL-sqlSizeExceedsThreshold.length" - SQL statement size metrics in bytes for queries exceeding the threshold. 
    • ResultSet Iteration -  The key has different formats, but include "ResultSet.next()". It tracks how many rows, are in the ResultSets as well as how long it takes to iterate them (in milliseconds). Note this is disabled by default.

Page requests 

All page request keys start with "com.jamonapi.http.JAMonJettyHandler", and track time (execution time in "ms.")

    • All Page Requests - "com.jamonapi.http.JAMonJettyHandler.request.allPages". Total metrics for every page request.
    • Page requests by http status code - Tracks execution time and frequency for pages that have the different http status codes (200, 404, 500 etc.) Example key: "com.jamonapi.http.JAMonJettyHandler.response.getStatus().value: 200"
      • In addition HttpError Log contains the specific page request that resulted in the http error status code.
    • Top 10 worst performing Page Requests by total time, average time, max time and call count. Example key: "com.jamonapi.http.JAMonJettyHandler.request.page.value: <page name>"
    • Page size in bytes - Metrics for the size of the returned pages. Example key: "com.jamonapi.http.JAMonJettyHandler.response.getContentCount()"

Hibernate

All hibernate metric keys start with "hibernate." Some of the metrics tracked for hibernate follow:

    • Global metrics: connectCount, sessionCount
    • Entity metrics (for each entity such as DataSourceConfiguration and WorkbookConfiguration): entityLoadCount, entityFetchCount, entityInsertCount, entityDeleteCount, entityUpdateCount
    • Collections: collectionLoadCount, collectionRemoveCount, collectionUpdateCount
    • Query: queryCacheHitCount, queryCacheMissCount, queryCacheHitCount, queryExecutionRowCount
    • Second Level Cache: secondLevelCacheElementCountInMemory, secondLevelCacheElementCountOnDisk, secondLevelCacheSizeInMemory, secondLevelCacheHitCount, secondLevelCacheMissCount

Jvm

Metrics for the Conductors jvm.

    • Used Memory - "memory.totalMemoryUsed". Total memory used in MB.
    • Percent free - "memory.percentFree". Percentage of free memory.

Garbage collection

Garbage Collection Metrics for the Conductor

    • For both minor and major collections metrics for frequency and duration are gathered.
    • Metrics are gathered every minute and contain the delta (change) of Garbage Collection frequency and duration since the previous minute.
    • Frequency of garbage collection (count): "garbage_collection_metrics.ConcurrentMarkSweep: Par Eden Space|Par Survivor Space|CMS Old Gen|CMS Perm Gen","count"
    • Duration of garbage collection (ms.): "garbage_collection_metrics.ConcurrentMarkSweep: Par Eden Space|Par Survivor Space|CMS Old Gen|CMS Perm Gen","ms."

Version

The current version of the Conductor. Example: "datameer.version: 3.1.0"

Errors

Count of errors in the Conductor.

    • Total Exceptions - The key has different formats, but include one of the following: "MonProxy-Exception", "com.jamonapi.Exceptions", "ServletException"
    • SQL Exception by type - Example: "MonProxy-Exception: Root cause exception=java.sql.SQLException,ErrorCode=-2,SQLState=08003"
    • log4j category count - A count of the different log4j categories called. Example: "com.jamonapi.log4j.JAMonAppender.ERROR"

Page Requests with Errors

Page requests that generate http error status codes are stored in the 'logs/httperror.log' file. The format of the file is explained in a later section.

Housekeeping

Metrics related to the housekeeping process which are written to logs/metrics.log

Hadoop Metrics

Metrics for Hadoop jobs (from imports, exports, workbooks) are written to the "logs/hadoop-metrics.log" file.  See a later section for example output.  Some representative metrics follow

  • The number of records read, and written
  • The number of bytes read and written
  • Processing time 

Where Are the Metrics Stored?

Housekeeping statistics are written to logs/metrics.log. Hadoop metrics are written to logs/hadoop-metrics.log.  Pages with Http errors are written to logs/httperror.log and all other metrics are written to logs/metrics-jamon.log. All files roll over daily and at the time of this writing are retained for 21 days (the retention period is configurable from the log4j properties file). Datameer X provides an app that can import and analyze the data on a schedule or clients can create their own app.

The metrics.log file is written to every time the housekeeper is run, and so contains all housekeeping data. hadoop-metrics.log is written to every time a Map/Reduce job is run, and the httperror.log is written to whenever a page request returns an http error code.  

The metrics written to metrics-jamon.log are written every 5 minute (configurable), and are aggregate numbers for this interval.  However, if the JAMon memory exceeds its threshold (also configurable) then it might be written more often (size is checked every minute by default). Hibernate statistics are captured every minute and then reset, but are saved every 5 minutes along with the other JAMon metrics. When the JAMon metrics are saved they are also reset, so the next collection interval only has new statistics.

What is the data format of the metrics-jamon.log file?

The metrics-jamon.log is a CSV file with the following fields where each line in the log represents a specific metric such as a SQL query or page request:

  1. LogTime: The time that this log message was written.
  2. Label: Short text used to identify the monitor such as "MonProxy-SQL-Type: All".
  3. Units: Units for the metric such as "ms.", or "count".
  4. Hits: The count of how many times the metric was executed.
  5. Avg: The average value for the metric.
  6. Total: avg*hits. An example would be total execution time for all invocations for a sql query.
  7. StdDev: Standard deviation for the metric.
  8. Min: The minimum value for the metric. For example the fastest sql query.
  9. Max: The maximum value for the metric. For example the slowest sql query.
  10. Active: The number of concurrent invocations for the metric occurring at the time of collection.
  11. AvgActive: The average number of concurrent invocations.
  12. MaxActive: The maximum number of concurrent invocations. For example if the metric was for sql queries then this value would represent the maximum number of concurrent queries executed.
  13. CollectionTime: The time that this metric was collected.


Some representative lines for metrics-jamon.log follow:

What is the data format of the httperror.log file?

The httperror.log is a CSV file with the following fields where each line in the log represents the specific page request that generated an http error.

  1. LogTime: The time that error was generated.
  2. Http status code: The error code in the request.
  3. Page : The page that generated the error
  4. Stacktrace: The Java stacktrace if one occurred.

Some representative lines for httperror.log follow:

What is the data format of the hadoop-metrics.log file?

Some representative lines for hadoop-metrics.log follow:

Here the format of log is:

TIMESTAMP, JOB_ID, JOB_NAME,  COUNTER_NAME, COUNTER_VALUE

The JOB_ID mentioned in this log file isn't Datameer X Job ID; it is the Hadoop MR job id of the jobs triggered by a Datameer X Job.

A Datameer X job (say a workbook job) can trigger multiple Hadoop MR jobs over cluster(one job per kept sheet).

The format of JOB_NAME is usually: <JOB_TYPE> (<JOB_EXECUTION_ID>) : <JOB_NAME>#<SHEET_NAME>

What kind of data is in the hadoop-metrics.log file?

Hadoop maintains counters for jobs as well as tasks. Datameer X collects the information for his own jobs within the hadoop-metrics.log file. Since these counters are built-in Hadoop, you can find more information in the book "Hadoop: The Definitive Guide, Third Edition" by Tom White, Chapter 8.1 Built-in Counters, Page 259 ff..

How Do I Configure Metrics Collection?

Metrics collection is highly configurable via the following files. There is further information in these files that describe the configuration properties.

  • default.properties - The following can be configured via properties within this file: 1) run frequency, 2) save frequency, 3) the maximum amount memory metrics collection can consume, 4) the maximum size of a sql statement, 5) the number of top queries and page requests to track (top K), 6) enable/disable writing a status message on save, 7) ResultSet iteration.
  • log4j properties file - The retention period of metrics.log and metrics-jamon.log can be configured here.

Below is an example of the status message that is written to the conductor.log if the 'write status on save' property is enabled.

INFO [2013-05-24 11:08:46] (JamonStatisticsService.java:113) - Saving JAMon monitoring statistics to disk: Page Requests=[count=3510, avg=13 ms., max=576 ms.], SQL Queries=[count=5888, avg=1 ms., max=10 ms.], Memory=[memory used=73 MB, free memory percent=28%], Exception Count=0, Log4j Error Count=0

Viewing and Configuring the Metrics via the REST API

* An html table of the metrics for the current collection interval can be viewed with the following: http://domain:port/rest/monitor/html
* View the metrics in json format: curl -s -S -u User:Password -X GET 'http://domain:port/rest/monitor/'
* View the metrics in json format and put the label before each metric: curl -s -S -u User:Password -X GET 'http://domain:port/rest/monitor/prependLabel'
* Disable JAMon - This allows clients to disable JAMon at runtime: curl -s -S -u User:Password -X PUT 'http://domain:port/rest/monitor/disable'
* Enable JAMon - curl -s -S -u User:Password -X PUT 'http://domain:port/rest/monitor/enable'