[ELK] Container 환경의 Elasticsearch에 대해 알아두면 좋은 것들

1. Elasticsearch 설치 시 TLS 인증 과정에서 CA Certificate와 HTTP/Transport 계층 keystore (key+certificate) 필요

- http_ca.crt, http.p12, transport.p12 파일이 Elasticsearch의 configuration 디렉토리에 생성됩니다. 이 파일들은 Kibana 인스턴스를 Elasticsearch 클러스터에 연결하고, 클러스터 내 노드끼리 통신하는데 사용됩니다. TLS configuration 설정은 elasticsearch.yml 파일에서 이뤄집니다.

1) http_ca.crt : The CA certificate that is used to sign the certificates for the HTTP layer of this Elasticsearch cluster.

2) http.p12 : Keystore that contains the key and certificate for the HTTP layer for this node.

3) transport.p12 : Keystore that contains the key and certificate for the transport layer for all the nodes in your cluster.

- bin/elasticsearch-keystore 명령어를 통해 secure setting을 관리할 수 있습니다.

# https://www.elastic.co/guide/en/elasticsearch/reference/8.6/elasticsearch-keystore.html

bin/elasticsearch-keystore
( [add <settings>] [-f] [--stdin]
| [add-file (<setting> <path>)+]
| [create] [-p]
| [has-passwd]
| [list]
| [passwd]
| [remove <setting>]
| [show [-o <output-file>] <setting>]
| [upgrade]
) [-h, --help] ([-s, --silent] | [-v, --verbose])

가령 아래 명령어를 통해 http.p12 keystore의 암호를 확인할 수 있습니다.

bin/elasticsearch-keystore show xpack.security.http.ssl.keystore.secure_password

* 참고 : https://www.elastic.co/guide/en/elasticsearch/reference/8.6/docker.html#elasticsearch-security-certificates

2. Elasticsearch 클러스터 노드 추가 시엔 Enrollment Token 이용

- 초기 세팅 시 기본 노드 수는 1개이고, terminal에 찍힌 Enrollment Token(30분 동안 valid)을 이용하여 노드를 추가할 수 있습니다. 혹시 노드 추가 과정에서 기존 노드가 힘들어한다면 ES_JAVA_OPTS 환경 변수를 통해 JVM heap size를 수동으로 설정하기도 합니다.

docker run -e ENROLLMENT_TOKEN="<token>" \
  -e ES_JAVA_OPTS="-Xms1g -Xmx1g" \ #optional : 노드 추가 시 JVM heap size 세팅도 가능
  --name es02 \
  --net elastic \
  -it docker.elastic.co/elasticsearch/elasticsearch:8.6.2

- 혹시 발급된 Enrollment Token을 잊어버렸거나 만료된 경우, 아래 명령어를 통해 이미 생성된 노드로부터 토큰을 다시 발급받으면 됩니다.

docker exec -it es01 \
  /usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s node

3. Elasticsearch를 Prod 환경에서 개발/운영 시 요구 사항

1) vm.max_map_count=262144 이상으로 설정

- vm.max_map_count는 Linux 커널 파라미터 중 하나로, Elasticsearch가 사용하는 메모리 매핑에 대한 제한을 조정하는 역할을 합니다. Elasticsearch는 메모리 매핑을 사용하여 데이터를 검색하고 인덱싱하므로 이 설정은 Elasticsearch가 사용할 수 있는 메모리 양에 직접적인 영향을 미칩니다. Elasticsearch 공식 문서에서는 vm.max_map_count 값을 최소 262144로 설정하는 것을 권장합니다. 따라서 PROD 환경에서 인덱싱 및 데이터 검색에 있어 성능 및 정확도 저하 문제를 피하려면 이 값을 충족해야 합니다.

## Linux 환경
grep vm.max_map_count /etc/sysctl.conf
vm.max_map_count=262144

#해당 명령은 현재 세션에서만 유효하므로 /etc/sysctl.conf 파일을 먼저 수정할 것.
sysctl -w vm.max_map_count=262144

* 참고 : https://www.elastic.co/guide/en/elasticsearch/reference/8.6/docker.html#_set_vm_max_map_count_to_at_least_262144

2) `elasticsearch` user의 configuration 파일 권한을 readable하게 설정

- Container에서 Elasticsearch 실행 시 `elasticsearch` user의 uid:gid 기본값은 1000:0 입니다. (Openshift는 uid를 임의로 부여하여 논외) bind-mounting을 통해 로컬 디렉토리나 파일을 Elasticsearch 컨테이너에 마운트할 때, `elasticsearch` user는 config(keystore 생성을 위해), data 및 log 디렉토리에 대한 쓰기 권한이 있어야 합니다.

1. 로컬 디렉토리를 생성합니다.

2. 해당 디렉토리의 권한을 Elasticsearch 사용자가 읽을 수 있도록 변경합니다. 예를 들어, chmod 755 /path/to/local/dir와 같이 명령을 실행하여 디렉토리의 권한을 변경할 수 있습니다.

3. 해당 디렉토리에 대해 gid 0에 대한 그룹 액세스 권한을 부여합니다. 예를 들어, chgrp 0 /path/to/local/dir와 같이 명령을 실행하여 해당 디렉토리에 대해 gid 0에 대한 그룹 액세스 권한을 설정할 수 있습니다.

3) plain text인 keystore 암호화

- 초기 설치할 때부터 keystore를 암호화하는 것을 추천드립니다. Elasticsearch를 설치할 때 기본적으로는 keystore 내의 데이터는 단지 난독화(obfuscated)된 plain text일 뿐, 암호화(encrypted)되어 저장되지 않습니다. 누군가 악의적으로 데이터에 액세스하는 경우는 매우 쉽게 뚫립니다. 따라서 Elasticsearch를 설치할 때 keystore의 데이터를 암호화하여 클러스터 외부에서 액세스하는 경우에도 데이터를 안전하게 유지하는 것을 권장합니다.

- Elasticsearch 6.0 버전부터는 keystore 내의 데이터를 암호화할 수 있는 기능이 추가되었습니다. 이 기능을 사용하면 keystore 내의 데이터를 암호화하여 저장하고, Elasticsearch가 실행될 때마다 암호를 입력하여 데이터를 복호화해야 합니다. (이전 버전에서도 별도의 암호화 도구를 사용하여 keystore의 데이터를 암호화할 수 있습니다.)

1. 로컬 디렉토리를 생성합니다. 이 디렉토리는 Elasticsearch의 config 디렉토리를 bind-mounting하기 위한 용도로 사용됩니다.

2. Elasticsearch Docker 이미지를 실행할 때, -v 또는 --volume 옵션을 사용하여 로컬 디렉토리와 Elasticsearch 컨테이너의 config 디렉토리를 매핑합니다. 예를 들어, 다음과 같이 명령을 실행합니다.
docker run -d --name elasticsearch \
-v /path/to/local/config:/usr/share/elasticsearch/config \
elasticsearch:tag
3. Elasticsearch 컨테이너가 실행되면, elasticsearch-keystore 도구를 사용하여 keystore 파일을 생성합니다. 다음과 같이 명령을 실행합니다.
docker exec -it elasticsearch \
  /usr/share/elasticsearch/bin/elasticsearch-keystore \
  create -p
이 명령을 실행하면 Elasticsearch는 keystore 파일을 생성하고 비밀번호를 입력하라는 메시지를 출력합니다. 비밀번호를 입력한 후 keystore 파일이 생성됩니다. 이렇게 하면 Elasticsearch가 실행될 때 keystore 파일이 자동으로 로드되고, 매번 비밀번호를 입력하게 됩니다.

4. 얼만큼의 Shard(샤드)를 Elasticsearch 클러스터에 사용해야 할까?

- Elasticsearch의 Index는 하나 이상의 Shard로 구성됩니다. 각 Shard는 Lucene(루신) 인덱스의 인스턴스로서, 인덱스 기능과 클러스터 내 데이터의 subset에 대한 쿼리 기능을 수행하는 가장 low-level 검색 엔진 unit이라고 할 수 있습니다.

- 데이터가 Shard에 들어오면 Lucene 세그먼트에 저장되고 쿼리 결과로 반환될 수 있도록 처리하는 "refresh" 가 일어납니다. Elasticsearch가 Index를 검색할 때, Index에 속한 모든 Shard 복사본으로 쿼리를 보내고 각 샤드의 결과를 global result set으로 압축시킵니다. (분산검색 개념 참고)

- 세그먼트는 변경이 불가능하기 때문에 개수가 증가하면 주기적으로 더 큰 하나의 세그먼트로 병합하고 기존 세그먼트는 삭제됩니다. (이 병합 도중 Disk I/O 리소스를 다소 잡아먹음)

a. 어떻게 검색이나 document의 CRUD 동작이 실시간으로 이뤄지는거죠?
b. document 삭제 시에 왜 저장 공간이 즉시 늘어나지 않는 것인가요?
c. 'refresh', 'flush', 'optimize' API는 무엇이고, 언제 이 API를 수행하게 되나요?

위 질문들이 궁금해서 Shard를 더 자세히 알고 싶으신 분들은 아래 링크를 참고해보세요.
- https://www.elastic.co/guide/en/elasticsearch/guide/current/inside-a-shard.html

* 참고 : https://www.elastic.co/kr/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

[Kubernetes] CKA(Certified Kubernetes Administration) 자격증 후기/접수 방법/출제 범위/할인바우처 (+CKAD, CKS

CNCF(Cloud Native Computing Foundation) 재단에서 주관하는 CKA(관리), CKAD(개발), CKS(보안) 세 자격증 가운데, 가장 대표적인 CKA 자격증에 대해 알아봅니다. 접수 및 시험 예약 방법, 추천하는 강좌 그리고

newstellar.tistory.com

[GitHub] 깃허브 잔디 심는 법 - GitHub Contributions

깃허브를 하다 보면 어떤 날은 특별한걸 하지도 않은 것 같은데 contribution 수가 많고(진한 초록색) 또 어떤 날은 깃허브를 붙잡고 몇 시간을 썼는데도 contribution이 없거나 연한 초록색으로 남아

newstellar.tistory.com

[Kubernetes] 쿠버네티스 환경에서의 JVM 동작/배포/성능 최적화

목차 1. 들어가며 2. JVM (Java Virtual Machine) 기본 개념 2.1. JVM 이란? 2.2. JVM 구성 요소 2.3. JVM 동작 원리 3. 쿠버네티스(Kubernetes) 기본 개념 3.1. 쿠버네티스란? 3.2. 쿠버네티스의 주요 구성 요소 3.3. 쿠버

newstellar.tistory.com

저작자표시 (새창열림)

'Data Engineering' 카테고리의 다른 글

[ELK] Elasticsearch 검색 Query 종류 (추천/비추천 패턴) (0)	2023.06.03
[ELK] Elasticsearch 클러스터 트러블 슈팅 및 모니터링 (2)	2023.05.05
[Data Analyst] Prometheus & Grafana 이해와 활용 (오픈소스 모니터링 시스템) (0)	2021.09.10

공부합시다

[ELK] Container 환경의 Elasticsearch에 대해 알아두면 좋은 것들

1. Elasticsearch 설치 시 TLS 인증 과정에서 CA Certificate와 HTTP/Transport 계층 keystore (key+certificate) 필요

2. Elasticsearch 클러스터 노드 추가 시엔 Enrollment Token 이용

3. Elasticsearch를 Prod 환경에서 개발/운영 시 요구 사항

1) vm.max_map_count=262144 이상으로 설정

2) `elasticsearch` user의 configuration 파일 권한을 readable하게 설정

3) plain text인 keystore 암호화

4. 얼만큼의 Shard(샤드)를 Elasticsearch 클러스터에 사용해야 할까?

'Data Engineering' 카테고리의 다른 글

댓글

티스토리툴바

[ELK] Container 환경의 Elasticsearch에 대해 알아두면 좋은 것들

1. Elasticsearch 설치 시 TLS 인증 과정에서 CA Certificate와 HTTP/Transport 계층 keystore (key+certificate) 필요

2. Elasticsearch 클러스터 노드 추가 시엔 Enrollment Token 이용

3. Elasticsearch를 Prod 환경에서 개발/운영 시 요구 사항

1) vm.max_map_count=262144 이상으로 설정

2) `elasticsearch` user의 configuration 파일 권한을 readable하게 설정

3) plain text인 keystore 암호화

4. 얼만큼의 Shard(샤드)를 Elasticsearch 클러스터에 사용해야 할까?

'Data Engineering' 카테고리의 다른 글

관련글

댓글

티스토리툴바