Add free search for your website. Sign up now! https://webveta.alightservices.com/
Categories
Cloudwatch Logging

High level architecture of centralized logging and retention strategy at ALight Technology And Services Limited

This blog post is a general blog post on how centralized logging has been implemented, some of the tools used while keeping the costs low.

Having the ability to maintain logs is very important for software companies, even small startups. Centralized logging, monitoring, metrics and alerts are also important.

Log ingestion is done using FluentD. FluentD is installed on all the servers and even a Golden Base AMI has been created with FluentD installed.

Grafana Loki is used as the log ingestion server.

Grafana front-end for viewing logs from Loki.

FluentD has been configured to set different labels for different log sources, the output is written into Loki and into file output.

The output files would be zipped and uploaded into S3 with lifecycle policies. S3 buckets can be configured to be immutable i.e once a file is uploaded, can’t be deleted or re-written or modified until a specified period.

Loki has been configured with a smaller retention period. I wish Grafana Loki supported something like retaining time slices. More on the concept of time slices later in this blog post.

Loki can be configured for a longer retention period but unnecessary EBS storage costs. S3 Standard Infrequent Access or S3 Glacier Instant Retrieval are much cheaper for archival data. Based on your needs you can configure the system.

A new component in C# is being developed to ingest logs into Loki on a need basis. I will definitely post some sample code of the new component.

With the above configuration in place, once logs are written and within 6 minutes, the logs become immutable. Let’s say something happened, and was noticed within 1 day. I can immediately change the retention period of Loki and keep the log retention for longer period. If I saw some abnormality and if the logs are no longer available in Loki due to shorter retention period. The new component being developed would ingest the archived logs from S3 into Loki with the old timestamps. Under normal circumstances this wouldn’t be required, but there is no point in having archived logs that cannot be ingested and searched when required.

Some sample config elements for FluentD:

Code block for ingesting logs from CloudWatch:

I am ingesting CloudTrail logs, I would write a blog post or a video sometime later.

<source>
  @id cloudwatch_logs
  @type cloudwatch_logs
  tag cloudwatch.cloudtrail
  log_group_name <LOG_GROUP_NAME>
  add_log_group_name false
  use_log_group_name_prefix true
  log_stream_name <LOG_STREAM_PREFIX>
  use_log_stream_name_prefix true
  region <AWS-REGION>
  include_metadata true
  <parse>
   @type json
  </parse>
  <storage>
    @type local
    path /var/log/td-agent/cloudwatch_cloudtrail.json
  </storage>
</source>

Sample for log files:

<source>
  @type tail
  @id grafana
  path /var/log/grafana/*.log
  pos_file /var/log/td-agent/grafana.pos
  tag software.grafana
  refresh_interval 5
  <parse>
    @type none
  </parse>
  read_from_head true
  pos_file_compaction_interval 1h
</source

Sample filters for adding additional labels:

<filter **>
  @type record_transformer
  <record>
    tag_name ${tag}
    td_host GrafanaLoki
  </record>
</filter>

<filter cloudwatch.**>
  @type record_transformer
  <record>
    group cloud
    subgroup cloudwatch
  </record>
</filter>

Sample for outputting into files, archiving, ingesting into loki

<match **>
  @type copy
  @id copy
  <store>
    @id loki
    @type loki
    url "http://grafanaloki:3100"
    extra_labels {"env":"Grafana", "host":"Grafana"}
    flush_interval 1s
    flush_at_shutdown true
    buffer_chunk_limit 1m
    <label>
      tag_name
      td_host
      group
      subgroup
      level_three
    </label>
  </store>
  <store>
    @id file
    @type file
    path /var/log/fluentd/grafana_1/${tag}/file.GrafanaLoki.%Y-%m-%d_%H:%M:00
    append true
    <buffer tag, time>
      timekey 5m
      timekey_use_utc true
      timekey_wait 1m
    </buffer>
  </store>
</match>

The above configs are pretty much self-explanatory. Using Loki, Grafana are also very easy. But most important thing, configure and use Grafana with a 3rd party login instead of just username and password. I can’t stress the importance of MFA and if possible use YubiKey Bio. Most other forms of MFA have vulnerabilities and are hackable considering the advanced capabilities of the R&AW / Mafia / Anonymous hackers group equipment.

Metrics:

I am using collectd, Carbon, Grafana cloud for metrics. i.e all the servers have collectd, collectd ingests metrics into Carbon, Carbon forwards these metrics into Grafana cloud. Based upon patterns, set threshold alerts. I am planning to ingest custom additional metrics. But that’s planned for later. Definitely when I get to this phase, I would write some blog posts.

Alerts:

Considering the R&AW / Mafia / Anonymous hackers threat (capabilities of the equipment) – the most damage can happen if they login into AWS Console / SSH into servers. I have wrote some custom code for a lambda that would parse cloudwatch logs looking for AWS console login pattern and sends an alert. This Lambda runs once every minute. The anonymous hackers / spies / R&AW / Mafia might screenshot my AWS account or even record video or even might show my screen in a live video but they can’t login because of biometric MFA authentication.

Similarly I have configured my servers to send an email alert as soon as a SSH login happens. I access my Linux servers from within AWS website using EC2 Instance Connect rather than direct SSH. In other words, if anyone wants to access my Linux servers, they have to first login into AWS console using YubiKey Bio – in other words, no one else can login as of now.

I can provide code samples for the above 2 activities in a later blog post.

TimeSlices:

Earlier, I mentioned about a concept – TimeSlices. I don’t need all logs forever, if I want a certain logstream during a certain period, retain those logs.

Similarly another nice to have feature would be the ability to configure different retention periods for different types of logs. For example, remove traces after x day, remove debug after y days, remove info after z day. Retain Warn, Error, Critical for a longer period.

I am hoping this blog post helps someone. If anyone needs any help with architecting, planning, designing, developing for horizontal and vertical scalability or want any help with centralized logging or enterprise search using Solr or ElasticSearch or want to reduce costs by rightsizing, please do contact me. I offer free consultation and we can agree on the work that needs to be performed and the pricing.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
.Net AWS C# Cloudwatch NIST Security

Some Monitoring & Alerting tips for AWS workloads

This blog post is about some monitoring and alerting tips for AWS workloads.

  1. AWS Console Logins – Root or IAM user
  2. SSH into an EC2 instance

The above mentioned are considered primary. In addition the following monitoring are necessary:

3. What actions were performed by users and/or AWS such as launching EC2 instances (manual or autoscaling) or configuring Route53 or Security Groups etc…

4. Web logs, Load Balancer logs, Cloudfront logs in rare cases of DDOS attacks by the baddies.

5. Application logs

6. Database logs

7. System logs

In the next few weeks, I would be writing or even live videos / tutorials on how to monitor and alert for 1, 2 and 3. Some of these are based on using existing systems and in some cases, I would show manual and programmatic (C# preferred language of choice) approaches.

I would also share some blog posts on how to ingest logs into AWS Cloudwatch (5 GB ingestion free and some other costs) and Grafana (50GB ingestion free), discuss advantages and disadvantages of both.

As part of implementing NIST cyber security framework at ALight Technology And Services Limited, I am implementing these. I like sharing my knowledge with others as I come across new things, learn new things, even existing knowledge when appropriate, sometimes a blend of existing knowledge and new things.

Mr. Kanti Kalyan Arumilli

Arumilli Kanti Kalyan, Founder & CEO
Arumilli Kanti Kalyan, Founder & CEO

B.Tech, M.B.A

Facebook

LinkedIn

Threads

Instagram

Youtube

Founder & CEO, Lead Full-Stack .Net developer

ALight Technology And Services Limited

ALight Technologies USA Inc

Youtube

Facebook

LinkedIn

Phone / SMS / WhatsApp on the following 3 numbers:

+91-789-362-6688, +1-480-347-6849, +44-07718-273-964

+44-33-3303-1284 (Preferred number if calling from U.K, No WhatsApp)

kantikalyan@gmail.com, kantikalyan@outlook.com, admin@alightservices.com, kantikalyan.arumilli@alightservices.com, KArumilli2020@student.hult.edu, KantiKArumilli@outlook.com and 3 more rarely used email addresses – hardly once or twice a year.

Categories
C# Cloudwatch

Read Cloudwatch Logs Programatically using C#

In AWS, Cloudwatch is an extremely useful service for ingesting and retrieving logs, metrics, alarms etc… This particular blog post is about how to retrieve logs from cloudwatch.

Cloudwatch logs are organized in the following hierarchy: Region -> LogGroup -> Streams. Inside each stream are log messages which have 3 attributes: Message, Timestamp, Ingestion Time. Currently, I am using Cloudwatch for various logs such as Linux syslogs, web server logs, Cloudtrail events etc… How to programmatically ingest logs of .Net applications directly or using NLog or using Cloudwatch agent would be topics of future blog posts. This blog post assumes that the .Net code is running under an appropriate role that has appropriate permissions.

Install AWSSDK.CloudwatchLogs nuget package.

// Instantiating a Cloudwatch client
AmazonCloudWatchLogsClient client = new AmazonCloudWatchLogsClient(RegionEndpoint.EUWest2);


// Getting Log Groups - Code snippet
var logGroupsresponse = await client.DescribeLogGroupsAsync();

if(logGroupsresponse.HttpStatusCode == HttpStatusCode.OK)
{
   foreach(var logGroup in logGroupsresponse.LogGroups)
   {
         // Process
   }
}


// Getting Streams - Code snippet
var streamResponse = await client.DescribeLogStreamsAsync(
    new DescribeLogStreamsRequest
    {
        LogGroupName = "LogGroupName"
    });

if(streamResponse.HttpStatusCode == HttpStatusCode.OK)
{
   foreach(var stream in streamResponse.LogStreams)
   {
         // Process
   }
}


// Getting Log Messages - Code snippet
var logEventsresponse = await client.GetLogEventsAsync(
    new GetLogEventsRequest
    {
        LogGroupName = "Log Group Name",

        LogStreamName = "Log Stream Name"
    });

if(logEventsresponse.HttpStatusCode == HttpStatusCode.OK)
{
   foreach(var logMessage in logEventsresponse.Events)
   {
         // Process
   }
}

The above code snippets show instantiating, retrieving list of Log Groups, Streams within a particular log group, events within a specified Log Group and Stream.

There are additional parameters that can be specified in the requests. The most important being StartTime, StartFromHead for GetLogEventsRequest, NextToken, LogStreamNamePrefix, OrderBy for DescribeLogStreamsRequest.

Once all the important logs are ingested, an application can be built for monitoring threats or for viewing logs etc… As mentioned above, there will be more posts regarding Cloudwatch, logs, monitoring etc…