Switching From FluentD to Vector Log Aggregation Tool – DevOps.com

Home » Blogs » Switching From FluentD to Vector Log Aggregation Tool
Avatar photoBy: on Leave a Comment
Log files are extremely important to the data analysis process as they contain essential information about usage patterns, activities and operations within an operating system, application, server or device. This data is relevant to a number of use cases across an organization from resource management, application troubleshooting, regulation compliance and SIEM and business analytics and marketing insights. To manage logs created by these use cases and make use of this wealth of data, log aggregation tools enable organizations to systematically collect and standardize log files. However, choosing the right tool can be quite challenging.
This blog will detail and compare the popular open source fluentD and Vector tools for log aggregation.
When using orchestration tools like Kubernetes to deploy containers or other API resources, there is a need for a log aggregator to store the pod or node logs in a cloud platform. For a particular requirement, fluentD was used as a log aggregator tool to push K8s pod logs to cloud storage buckets with a sample configuration as shown below:
<match kubernetes.**>
@type <cloud platform name>
project <project name in cloud platform>
keyfile <credential json to access the cloud storage> bucket <cloud storage bucket name> object_key_format <name for the file to be used>
path <file prefix/path where the file have to be stored>
<buffer tag,time>
@type file
path /var/log/fluent/gcs timekey 1m timekey_wait 30 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true flush_mode interval flush_interval 1 chunk_limit_size 10MB retry_max_interval 30 retry_wait 60
</buffer> <format>
@type json </format>
Using this system, fluentD was pushing only 47.62% of total logs to cloud storage. Since there was a loss of more than 50%, changes were made to the configuration. In most of the changes, the efficiency was somewhere between 40% and 50%, with a maximum efficiency achieved at an average 67% for an entire day. Below are some of the changes made along with the percentage of logs that were pushed to cloud storage:
<buffer tag,time> @type file
path /var/log/fluent/gcs timekey 1m timekey_wait 30 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true retry_max_interval 60 retry_wait 30
</buffer> Efficiency:- 46.32%
<buffer tag,time> @type file
path /var/log/fluent/gcs timekey 1m timekey_wait 30 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true
</buffer> Efficiency:- 49.89%
<buffer tag,time>
@type file
path /var/log/fluent/gcs timekey 10m timekey_wait 0 timekey_use_utc true flush_at_shutdown true
</buffer> Efficiency:- 37%
<buffer tag,time> @type file
path /var/log/fluent/gcs timekey 30 timekey_wait 0 timekey_use_utc true flush_thread_count 15 flush_at_shutdown true
</buffer> Efficiency:- 60.88%
<buffer tag,time> @type file
path /var/log/fluent/gcs timekey 1 timekey_wait 0 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true flush_mode immediate
</buffer> Efficiency:- 66.77%
To improve this further, the open source Vector tool by Datadog was also considered. This tool was suitable for K8s setup with a similar configuration as fluentD and was installed in the nodes.
A Helm command was used to clone the official repository in its VMs; the configuration was changed as described below and installed it as an agent. Vector comes in two working modes: Agent and aggregator. While agent is the plain mode that pushes logs/events from source to destination, aggregator is used to transform and ship data collected by other agents (in this case, Vector).
The installation of this tool requires a Helm repository in the local machine to fetch the source code. Hence the below commands were run in a sequential pattern before installing Vector in a K8s cluster:
helm repo add vector https://helm.vector.dev (Adding vector repo to helm list)
helm repo update (Updating the helm repos)
helm fetch –untar vector/vector . (command to clone the repository to local machine)
data_dir: /vector-data-dir
<Custom source id>:
type: kubernetes_logs (because we are using kubernetes as our source)
exclude_paths_glob_patterns: <Array of directories which has be excluded when collecting the logs from the nodes> (Optional)
<Custom sink id>:
type: <Destination cloud storage>
inputs: <Array of source id’s from where log has to be pushed>
bucket: <Bucket name of cloud storage>
key_prefix: <Path inside the bucket where the logs has to collected> (Optional) encoding:
codec: <Encoding of the log file> (Optional) Command to install the Vector:-
helm install vector . –namespace vector
After deploying Vector in the development environment and testing it, the efficiency was ~100% with negligible loss. The switch was then made to Vector and deployed in the production environment. Vector can ship up 100,000 events or logs/sec, which is a very high throughput rate compared to other tools for log aggregation performance. Vector was able to achieve 99.98-100% efficiency even in the Kubernetes production cluster.
To learn more about how DataOps can enable highly performant data pipelines with real-time logging and monitoring, watch this video. 
Filed Under: Application Performance Management/Monitoring, Blogs, DataOps, DevOps Practice, Doin’ DevOps
Powered by Techstrong Group, Inc.

© 2022 ·Techstrong Group, Inc.All rights reserved.


Note that any programming tips and code writing requires some knowledge of computer programming. Please, be careful if you do not know what you are doing…

Post expires at 10:25am on Friday February 24th, 2023

Leave a Reply

Next Post

New Rules Aim to Simplify Internet Service Details - AARP

Thu Nov 24 , 2022
Javascript must be enabled to use this site. Please enable Javascript in your browser and try again.Get AARP Perks to connect with resources, discounts, and other member benefits as you browse online.CoronavirusCoronavirusMember BenefitsMember BenefitsHealthHealthFamily CaregivingFamily CaregivingWork & JobsWork & JobsScams & FraudScams & FraudRetirementRetirementSocial SecuritySocial SecurityTravelTravelMoneyMoneyHome & FamilyHome & FamilyEntertainmentEntertainmentPolitics […]
%d bloggers like this: