集群监控

基于cat api监控

es集群监控,最好是自己干吧,因为官方出了那种非常棒的x-pack做权限认证,监控,等等,做的都非常好,但是。。。是收费的。。。
自己做es集群监控,就是根据es的一些api,自己写一个java web的应用,自己做前端界面,程序里不断的每隔几秒钟,调用一次后端的接口,获取到各种监控信息,然后用前端页面显示出来,开发开发一个可视化的es集群的监控的工作台
在es老版本,有一个很好用的插件,叫做head,但是5.x之后都收口了,不让做这种插件了,主推自己的x-pack,要收费,要盈利,赚钱
1、GET /_cat/aliases?v
看到集群中有哪些索引别名
alias index filter routing.index routing.search alias1 test1 - - - alias2 test1 * - - alias3 test1 - 1 1 alias4 test1 - 2 1,2
2、GET /_cat/allocation?v
看到每个节点分配了几个shard,对磁盘的占用空间大小,使用率,等等
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node 5 260b 47.3gb 43.4gb 100.7gb 46 127.0.0.1 127.0.0.1 CSUXak2
3、GET /_cat/count?v
看每个索引的document数量
epoch timestamp count 1475868259 15:24:20 120
4、GET /_cat/fielddata?v
看每个节点的jvm heap内存中的fielddata内存占用情况(对分词的field进行聚合/排序要用jvm heap中的正排索引,fielddata)
id host ip node field size Nqk-6inXQq-OxUfOUI8jNQ 127.0.0.1 127.0.0.1 Nqk-6in body 544b Nqk-6inXQq-OxUfOUI8jNQ 127.0.0.1 127.0.0.1 Nqk-6in soul 480b
5、GET /_cat/health?v
比较全面的看一个es集群的整体健康状况,主要是看是green,yellow,red
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1475871424 16:17:04 elasticsearch green 1 1 5 5 0 0 0 0 - 100.0%
6、GET /_cat/indices?v
每个索引的具体的情况,比如有几个shard,多少个document,被删除的document有多少,占用了多少磁盘空间
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open twitter u8FNjxh8Rfy_awN11oDKYQ 1 1 1200 0 88.1kb 88.1kb
7、GET /_cat/master?v
看master node当前的具体的情况,哪个node是当前的master node
id host ip node YzWoH_2BT-6UjVGDyPdqYg 127.0.0.1 127.0.0.1 YzWoH_2
8、GET /_cat/nodes?v
看每个node的具体的情况,就比如jvm heap内存使用率,内存使用率,cpu load,是什么角色
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 127.0.0.1 65 99 42 3.07 mdi * mJw06l1
9、GET /_cat/pending_tasks?v
看当前pending没执行完的task的具体情况,执行的是什么操作
insertOrder timeInQueue priority source 1685 855ms HIGH update-mapping [foo][t] 1686 843ms HIGH update-mapping [foo][t] 1693 753ms HIGH refresh-mapping [foo][[t]] 1688 816ms HIGH update-mapping [foo][t] 1689 802ms HIGH update-mapping [foo][t] 1690 787ms HIGH update-mapping [foo][t] 1691 773ms HIGH update-mapping [foo][t]
10、GET /_cat/plugins?v&s=component&h=name,component,version,description
看当前集群安装了哪些插件
name component version description U7321H6 analysis-icu 5.5.1 The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components. U7321H6 analysis-kuromoji 5.5.1 The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch. U7321H6 analysis-phonetic 5.5.1 The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch. U7321H6 analysis-smartcn 5.5.1 Smart Chinese Analysis plugin integrates Lucene Smart Chinese analysis module into elasticsearch. U7321H6 analysis-stempel 5.5.1 The Stempel (Polish) Analysis plugin integrates Lucene stempel (polish) analysis module into elasticsearch. U7321H6 analysis-ukrainian 5.5.1 The Ukrainian Analysis plugin integrates the Lucene UkrainianMorfologikAnalyzer into elasticsearch. U7321H6 discovery-azure-classic 5.5.1 The Azure Classic Discovery plugin allows to use Azure Classic API for the unicast discovery mechanism U7321H6 discovery-ec2 5.5.1 The EC2 discovery plugin allows to use AWS API for the unicast discovery mechanism. U7321H6 discovery-file 5.5.1 Discovery file plugin enables unicast discovery from hosts stored in a file. U7321H6 discovery-gce 5.5.1 The Google Compute Engine (GCE) Discovery plugin allows to use GCE API for the unicast discovery mechanism. U7321H6 ingest-attachment 5.5.1 Ingest processor that uses Apache Tika to extract contents U7321H6 ingest-geoip 5.5.1 Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database U7321H6 ingest-user-agent 5.5.1 Ingest processor that extracts information from a user agent U7321H6 jvm-example 5.5.1 Demonstrates all the pluggable Java entry points in Elasticsearch U7321H6 lang-javascript 5.5.1 The JavaScript language plugin allows to have javascript as the language of scripts to execute. U7321H6 lang-python 5.5.1 The Python language plugin allows to have python as the language of scripts to execute. U7321H6 mapper-attachments 5.5.1 The mapper attachments plugin adds the attachment type to Elasticsearch using Apache Tika. U7321H6 mapper-murmur3 5.5.1 The Mapper Murmur3 plugin allows to compute hashes of a field's values at index-time and to store them in the index. U7321H6 mapper-size 5.5.1 The Mapper Size plugin allows document to record their uncompressed size at index time. U7321H6 store-smb 5.5.1 The Store SMB plugin adds support for SMB stores.
11、GET _cat/recovery?v
看shard recovery恢复的一个过程的具体情况
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
twitter 0 13ms store done n/a n/a node0 node-0 n/a n/a 0 0 100% 13 0 0 100% 9928 0 0 100.0%
12、GET /_cat/repositories?v
查看用于snapshotting的repository有哪些
id type repo1 fs repo2 s3
13、GET /_cat/thread_pool?v
看每个线程池的具体的情况
Z6MkIvC bulk 0 0 0 Z6MkIvC fetch_shard_started 0 0 0 Z6MkIvC fetch_shard_store 0 0 0 Z6MkIvC flush 0 0 0 Z6MkIvC force_merge 0 0 0 Z6MkIvC generic 0 0 0 Z6MkIvC get 0 0 0 Z6MkIvC index 0 0 0 Z6MkIvC listener 0 0 0 Z6MkIvC management 1 0 0 Z6MkIvC refresh 0 0 0 Z6MkIvC search 0 0 0 Z6MkIvC snapshot 0 0 0 Z6MkIvC warmer 0 0 0
14、GET _cat/shards?v
看每个shard的具体的情况
twitter 0 p STARTED 3014 31.1mb 192.168.56.10 H5dfFeA twitter 0 r UNASSIGNED
15、GET /_cat/segments?v
看每个segement,索引segment文件的情况,在哪个node上,有多少个document,占用了多少磁盘空间,有多少数据在内存中,是否可以搜索
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound test 3 p 127.0.0.1 _0 0 1 0 3kb 2042 false true 6.5.1 true test1 3 p 127.0.0.1 _0 0 1 0 3kb 2042 false true 6.5.1 true
16、GET /_cat/snapshots?v&s=id
看当前执行的snapshot的操作
id status start_epoch start_time end_epoch end_time duration indices successful_shards failed_shards total_shards snap1 FAILED 1445616705 18:11:45 1445616978 18:16:18 4.6m 1 4 1 5 snap2 SUCCESS 1445634298 23:04:58 1445634672 23:11:12 6.2m 2 10 0 10
17、GET /_cat/templates?v&s=name
看当前有的那些tempalte,具体的情况是什么
name template order version template0 te* 0 template1 tea* 1 template2 teak* 2 7

基于cluster进行监控

1、GET _cluster/health
{ "cluster_name" : "testcluster", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 5, "active_shards" : 5, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 5, "delayed_unassigned_shards": 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 50.0 }
2、GET _cluster/stats?human&pretty
{ "timestamp": 1459427693515, "cluster_name": "elasticsearch", "status": "green", "indices": { "count": 2, "shards": { "total": 10, "primaries": 10, "replication": 0, "index": { "shards": { "min": 5, "max": 5, "avg": 5 }, "primaries": { "min": 5, "max": 5, "avg": 5 }, "replication": { "min": 0, "max": 0, "avg": 0 } } }, "docs": { "count": 10, "deleted": 0 }, "store": { "size": "16.2kb", "size_in_bytes": 16684, "throttle_time": "0s", "throttle_time_in_millis": 0 }, "fielddata": { "memory_size": "0b", "memory_size_in_bytes": 0, "evictions": 0 }, "query_cache": { "memory_size": "0b", "memory_size_in_bytes": 0, "total_count": 0, "hit_count": 0, "miss_count": 0, "cache_size": 0, "cache_count": 0, "evictions": 0 }, "completion": { "size": "0b", "size_in_bytes": 0 }, "segments": { "count": 4, "memory": "8.6kb", "memory_in_bytes": 8898, "terms_memory": "6.3kb", "terms_memory_in_bytes": 6522, "stored_fields_memory": "1.2kb", "stored_fields_memory_in_bytes": 1248, "term_vectors_memory": "0b", "term_vectors_memory_in_bytes": 0, "norms_memory": "384b", "norms_memory_in_bytes": 384, "doc_values_memory": "744b", "doc_values_memory_in_bytes": 744, "index_writer_memory": "0b", "index_writer_memory_in_bytes": 0, "version_map_memory": "0b", "version_map_memory_in_bytes": 0, "fixed_bit_set": "0b", "fixed_bit_set_memory_in_bytes": 0, "file_sizes": {} }, "percolator": { "num_queries": 0 } }, "nodes": { "count": { "total": 1, "data": 1, "coordinating_only": 0, "master": 1, "ingest": 1 }, "versions": [ "5.5.1" ], "os": { "available_processors": 8, "allocated_processors": 8, "names": [ { "name": "Mac OS X", "count": 1 } ], "mem" : { "total" : "16gb", "total_in_bytes" : 17179869184, "free" : "78.1mb", "free_in_bytes" : 81960960, "used" : "15.9gb", "used_in_bytes" : 17097908224, "free_percent" : 0, "used_percent" : 100 } }, "process": { "cpu": { "percent": 9 }, "open_file_descriptors": { "min": 268, "max": 268, "avg": 268 } }, "jvm": { "max_uptime": "13.7s", "max_uptime_in_millis": 13737, "versions": [ { "version": "1.8.0_74", "vm_name": "Java HotSpot(TM) 64-Bit Server VM", "vm_version": "25.74-b02", "vm_vendor": "Oracle Corporation", "count": 1 } ], "mem": { "heap_used": "57.5mb", "heap_used_in_bytes": 60312664, "heap_max": "989.8mb", "heap_max_in_bytes": 1037959168 }, "threads": 90 }, "fs": { "total": "200.6gb", "total_in_bytes": 215429193728, "free": "32.6gb", "free_in_bytes": 35064553472, "available": "32.4gb", "available_in_bytes": 34802409472 }, "plugins": [ { "name": "analysis-icu", "version": "5.5.1", "description": "The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components.", "classname": "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin", "has_native_controller": false }, { "name": "ingest-geoip", "version": "5.5.1", "description": "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database", "classname": "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin", "has_native_controller": false }, { "name": "ingest-user-agent", "version": "5.5.1", "description": "Ingest processor that extracts information from a user agent", "classname": "org.elasticsearch.ingest.useragent.IngestUserAgentPlugin", "has_native_controller": false } ] } }
3、GET _cluster/pending_tasks
{ "tasks": [ { "insert_order": 101, "priority": "URGENT", "source": "create-index [foo_9], cause [api]", "time_in_queue_millis": 86, "time_in_queue": "86ms" }, { "insert_order": 46, "priority": "HIGH", "source": "shard-started ([foo_2][1], node[tMTocMvQQgGCkj7QDHl3OA], [P], s[INITIALIZING]), reason [after recovery from shard_store]", "time_in_queue_millis": 842, "time_in_queue": "842ms" }, { "insert_order": 45, "priority": "HIGH", "source": "shard-started ([foo_2][0], node[tMTocMvQQgGCkj7QDHl3OA], [P], s[INITIALIZING]), reason [after recovery from shard_store]", "time_in_queue_millis": 858, "time_in_queue": "858ms" } ] }