Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support host monitor #1890

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

Abingcbc
Copy link
Collaborator

  1. 打通进程元信息采集链路

TODO:

  1. 支持更多字段
  2. 可观测指标

@Abingcbc Abingcbc force-pushed the host_monitor branch 6 times, most recently from 6150e52 to bfdd9c2 Compare November 19, 2024 06:27
core/common/timer/HostMonitorTimerEvent.cpp Outdated Show resolved Hide resolved
core/models/PipelineEventGroup.h Outdated Show resolved Hide resolved
core/runner/sink/http/HttpSink.cpp Outdated Show resolved Hide resolved
core/host_monitor/collector/MockCollector.cpp Outdated Show resolved Hide resolved
core/host_monitor/collector/CollectorManager.cpp Outdated Show resolved Hide resolved
core/plugin/processor/inner/ProcessorHostMetaNative.cpp Outdated Show resolved Hide resolved
core/constants/EntityConstants.cpp Outdated Show resolved Hide resolved
core/host_monitor/SystemInformationTools.cpp Show resolved Hide resolved
core/plugin/processor/inner/ProcessorHostMetaNative.cpp Outdated Show resolved Hide resolved
core/host_monitor/collector/ProcessCollector.cpp Outdated Show resolved Hide resolved
core/common/FileSystemUtil.h Outdated Show resolved Hide resolved
core/common/StringTools.cpp Outdated Show resolved Hide resolved
core/common/timer/HostMonitorTimerEvent.cpp Outdated Show resolved Hide resolved
core/plugin/input/InputHostMeta.h Outdated Show resolved Hide resolved
core/plugin/input/InputHostMeta.h Outdated Show resolved Hide resolved
core/plugin/input/InputHostMeta.cpp Outdated Show resolved Hide resolved
core/plugin/input/InputHostMeta.cpp Show resolved Hide resolved
core/host_monitor/HostMonitorInputRunner.cpp Outdated Show resolved Hide resolved
core/host_monitor/HostMonitorInputRunner.cpp Outdated Show resolved Hide resolved
core/host_monitor/HostMonitorInputRunner.cpp Outdated Show resolved Hide resolved
core/common/timer/Timer.h Show resolved Hide resolved
core/plugin/processor/inner/ProcessorHostMetaNative.cpp Outdated Show resolved Hide resolved
core/runner/sink/http/HttpSink.cpp Outdated Show resolved Hide resolved
core/plugin/processor/inner/ProcessorHostMetaNative.cpp Outdated Show resolved Hide resolved
core/host_monitor/collector/ProcessCollector.h Outdated Show resolved Hide resolved
core/host_monitor/collector/ProcessCollector.cpp Outdated Show resolved Hide resolved
core/host_monitor/collector/ProcessCollector.cpp Outdated Show resolved Hide resolved
core/plugin/processor/inner/ProcessorHostMetaNative.cpp Outdated Show resolved Hide resolved
core/host_monitor/collector/ProcessCollector.cpp Outdated Show resolved Hide resolved

int readCount = 0;
WalkAllProcess(PROCESS_DIR, [&](const std::string& dirName) {
if (++readCount > mProcessSilentCount) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

个数控制是不是也在控制缓存里的?不应该只是控制去直接操作系统交互的吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缓存是为了计算增量的CPU使用率,前一次和后一次的top n进程可能不一样,所以需要保存所有的

core/common/MachineInfoUtil.cpp Outdated Show resolved Hide resolved
core/common/MachineInfoUtil.cpp Outdated Show resolved Hide resolved
core/common/timer/Timer.h Show resolved Hide resolved
core/plugin/input/InputHostMeta.cpp Show resolved Hide resolved
core/host_monitor/collector/ProcessEntityCollector.cpp Outdated Show resolved Hide resolved
core/host_monitor/collector/ProcessEntityCollector.cpp Outdated Show resolved Hide resolved
// See the License for the specific language governing permissions and
// limitations under the License.

#include "InputHostMeta.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档

core/plugin/input/InputHostMeta.cpp Outdated Show resolved Hide resolved
core/host_monitor/HostMonitorInputRunner.cpp Outdated Show resolved Hide resolved
ThreadPool mThreadPool;

mutable std::shared_mutex mRegisteredCollectorMapMutex;
std::unordered_map<std::string, bool> mRegisteredCollectorMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::unordered_map<std::string, bool> mRegisteredCollectorMap;
std::unordered_map<std::string, std::shared_ptr<BaseCollector>> mCollectorInstanceMap;

这两个变量有点不合理。
包括
1、HostMonitorInputRunner::GetCollector的使用上。应该是自闭环合法性
2、锁的关系等
3、RegisterCollector 操作的变量是 mCollectorInstanceMap,但是另一个没操作的变量却叫reg

可能的优化
1、保持现有逻辑。一组功能不用过多发散变量,以上改成一个map。value用结构体表示。整体实现结构优化下。
2、UpdateCollector时再注册ProcessEntityCollector实例,这样能省一个变量。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

core/host_monitor/collector/ProcessEntityCollector.cpp Outdated Show resolved Hide resolved

const std::string ProcessEntityCollector::sName = "process_entity";

ProcessEntityCollector::ProcessEntityCollector() : mProcessSilentCount(INT32_FLAG(process_collect_silent_count)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

任务执行异步
超时影响
不同任务类型见例如Prometheus、host entity、processo不影响

Copy link
Collaborator Author

@Abingcbc Abingcbc Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 线程池放入不会堵塞,runner拿到任务后,可以立刻放入线程池
  2. 在一个任务执行结束后,再放入下一个任务。一个任务超时,会计算从上一次执行时间开始,增加n个周期,到达下一个未来可执行的时间点。
    nextExecTime = execTime + n * interval

#else
ECSMeta ecsMeta;
if (FetchECSMeta(ecsMeta)) {
UpdateECSMetaAndHostid(ecsMeta);
}
#endif
updateHostId();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
要调整翊韬先把之前的设计补出来,说明下原因

void DumpECSMeta();
#ifdef __ENTERPRISE__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用这么怪的方式

```yaml
enable: true
inputs:
- Type: input_host_meta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

与其他保持一致,补充完整


event->SetContent(DEFAULT_CONTENT_KEY_FIRST_OBSERVED_TIME, processCreateTime);
event->SetContent(DEFAULT_CONTENT_KEY_LAST_OBSERVED_TIME, std::to_string(logtime));
auto interval = group.GetMetadata(EventGroupMetaKey::HOST_MONITOR_COLLECT_INTERVAL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1、采集配置中的间隔、host_monitor_default_interval 看上去并不是完全独立的参数。

2、凡是外部的参数都要有合法性保护,这里的也就是要有最小值。

// process entity
const std::string DEFAULT_CONTENT_VALUE_ENTITY_TYPE_ECS_PROCESS = "acs.ecs.process";
const std::string DEFAULT_CONTENT_VALUE_ENTITY_TYPE_HOST_PROCESS = "infra.host.process";
const std::string DEFAULT_CONTENT_KEY_PROCESS_PID = "pid";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

字段是如何跟安全保持一致的?


namespace logtail {

HostMonitorInputRunner::HostMonitorInputRunner() : mThreadPool(ThreadPool(3)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 的合理性?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -61,6 +61,8 @@ enum class EventGroupMetaKey {
PROMETHEUS_STREAM_ID,
PROMETHEUS_STREAM_TOTAL,

HOST_MONITOR_COLLECT_INTERVAL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用的地方都属于input控制范围,可以通过HostMonitorInputRunner::UpdateCollector来传递

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"process_entity"},

RegisterCollector<ProcessEntityCollector>();
}

void HostMonitorInputRunner::UpdateCollector(const std::vector<std::string>& newCollectors,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保持变量含义的统一性,这里应该是collectornames。

std::string& entityType,
std::string& hostEntityID,
std::string& hostEntityType) {
ECSMeta metaObj = HostIdentifier::Instance()->GetECSMeta();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

合库前需要:GetECSMeta的使用还需要完善下设计及实现

@@ -165,6 +165,8 @@ DEFINE_FLAG_STRING(loong_collector_operator_service, "loong collector operator s
DEFINE_FLAG_INT32(loong_collector_operator_service_port, "loong collector operator service port", 8888);
DEFINE_FLAG_INT32(loong_collector_k8s_meta_service_port, "loong collector operator service port", 9000);
DEFINE_FLAG_STRING(_pod_name_, "agent pod name", "");
DEFINE_FLAG_INT32(host_monitor_default_interval, "default interval for host monitor", 60);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调整的入口单一,没必要的入口去掉


namespace logtail {

HostMonitorInputRunner::HostMonitorInputRunner() : mThreadPool(ThreadPool(3)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -61,6 +61,8 @@ enum class EventGroupMetaKey {
PROMETHEUS_STREAM_ID,
PROMETHEUS_STREAM_TOTAL,

HOST_MONITOR_COLLECT_INTERVAL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"process_entity"},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants