diff --git a/README.md b/README.md index 46d16ae07f15141170c80b0c010f3ba66b944cfc..41cd9e6ee672e571a4464c721e74081408787186 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,14 @@ The Network Observability eBPF Agent allows collecting and aggregating all the ingress and egress flows on a Linux host (required a Kernel 4.18+ with eBPF enabled). +* [How to compile](#how-to-compile) +* [Hot to configure](#how-to-configure) +* [How to run](#how-to-run) +* [Development receipts](#development-receipts) +* [Known issues](#known-issues) +* [Frequently-asked questions](#frequently-asked-questions) +* [Troubleshooting](#troubleshooting) + ## How to compile ``` @@ -19,24 +27,56 @@ The eBPF Agent is configured by means of environment variables. Check the The NetObserv eBPF Agent is designed to run as a DaemonSet in OpenShift/K8s. It is triggered and configured by our [Network Observability Operator](https://github.com/netobserv/network-observability-operator). -Anyway you can run it directly as an executable with administrative privileges: +Anyway you can run it directly as an executable from your command line: ``` export FLOWS_TARGET_HOST=... export FLOWS_TARGET_PORT=... sudo -E bin/netobserv-ebpf-agent ``` + To deploy locally, use instructions from [flowlogs-dump (like tcpdump)](./examples/flowlogs-dump/README.md). -To deploy it as a Pod, you can check the [deployment example](./examples/performance/deployment.yml). +To deploy it as a Pod, you can check the [deployment examples](./deployments). + +The Agent needs to be executed either with: + +1. The following [Linux capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html) + (recommended way): `BPF`, `PERFMON`, `NET_ADMIN`, `SYS_RESOURCE`. If you + [deploy it in Kubernetes or OpenShift](./deployments/flp-daemonset-cap.yml), + the container running the Agent needs to define the following `securityContext`: + ```yaml + securityContext: + runAsUser: 0 + capabilities: + add: + - BPF + - PERFMON + - NET_ADMIN + - SYS_RESOURCE + ``` + (Please notice that the `runAsUser: 0` is still needed). +2. Administrative privileges. If you + [deploy it in Kubernetes or OpenShift](./deployments/flp-daemonset.yml), + the container running the Agent needs to define the following `securityContext`: + ```yaml + securityContext: + privileged: true + runAsUser: 0 + ``` + This option is only recommended if your Kernel does not recognize some of the above capabilities. + We found some Kubernetes distributions (e.g. K3s) that do not recognize the `BPF` and + `PERFMON` capabilities. + +Here is a list of distributions where we tested both full privileges and capability approaches, +and whether they worked (✅) or did not (❌): + +| Distribution | K8s Server version | Capabilities | Privileged | +|-------------------------------|--------------------|--------------|------------| +| Amazon EKS (Bottlerocket AMI) | 1.22.6 | ✅ | ✅ | +| K3s (Rancher Desktop) | 1.23.5 | ❌ | ✅ | +| Kind | 1.23.5 | ❌ | ✅ | +| OpenShift | 1.23.3 | ✅ | ✅ | -## Where is the collector? - -As part of our Network Observability solution, the eBPF Agent is designed to send the traced -flows to our [Flowlogs Pipeline](https://github.com/netobserv/flowlogs-pipeline) component. - -In addition, we provide a simple GRPC+Protobuf library to allow implementing your own collector. -Check the [packet counter code](./examples/performance/server/packet-counter-collector.go) -for an example of a simple collector using our library. ## Development receipts @@ -62,7 +102,38 @@ Tested in Fedora 35 and Red Hat Enterprise Linux 8. ## Known issues -## Extrenal Traffic in Openshift (OVN-Kubernetes CNI) +### Extrenal Traffic in Openshift (OVN-Kubernetes CNI) For egress traffic, you can see the source Pod metadata. For ingress traffic (e.g. an HTTP response), -you see the destination **Host** metadata. \ No newline at end of file +you see the destination **Host** metadata. + +## Frequently-asked questions + +### Where is the collector? + +As part of our Network Observability solution, the eBPF Agent is designed to send the traced +flows to our [Flowlogs Pipeline](https://github.com/netobserv/flowlogs-pipeline) component. + +In addition, we provide a simple GRPC+Protobuf library to allow implementing your own collector. +Check the [packet counter code](./examples/performance/server/packet-counter-collector.go) +for an example of a simple collector using our library. + +## Troubleshooting + +### Deployed as a Kubernetes Pod, the agent shows permission errors in the logs and can't start + +In your [deployment file](./deployments/flp-daemonset-cap.yml), make sure that the container runs as +the root user (`runAsUser: 0`) and with the granted capabilities or privileges (see [how to run](#how-to-run) section). + +### The Agent doesn't work in my Amazon EKS puzzle + +Despite Amazon Linux 2 enables eBPF by default in EC2, the +[EKS images are shipped with disabled eBPF](https://github.com/awslabs/amazon-eks-ami/issues/728). + +You'd need either: + +1. Provide your own AMI configured to work with eBPF +2. Use other Linux distributions that are shipped with eBPF enabled by default. We have successfully + tested the eBPF Agent in EKS with the [Bottlerocket](https://aws.amazon.com/es/bottlerocket/) + Linux distribution, without requiring any extra configuration. + diff --git a/deployments/README.md b/deployments/README.md index 0dd0a27f507aa44533c9b8f881e003485b6c0284..2372305cb33623d646690e062bf094ccca55c2b3 100644 --- a/deployments/README.md +++ b/deployments/README.md @@ -6,5 +6,7 @@ but the files contained here are useful for documentation and manual testing. * `flp-daemonset.yml`, shows how to deploy/configure the Agent when Flowlogs Pipeline is deployed as daemonset, taking the target host configuration from the Host IP. +* `flp-daemonset-cap.yml`, same as `flp-daemonset.yml`, but assigning individual capabilities instead + of deploying a fully-privileged container. * `flp-service.yml`, shows how to deploy/configure the Agent when Flowlogs Pipeline is deployed as a service, explicitly setting the host configuration as the service name. \ No newline at end of file diff --git a/deployments/flp-daemonset-cap.yml b/deployments/flp-daemonset-cap.yml new file mode 100644 index 0000000000000000000000000000000000000000..1b234c549cf3d9588f896aa7c89acb0d4dbc137c --- /dev/null +++ b/deployments/flp-daemonset-cap.yml @@ -0,0 +1,131 @@ +# Example deployment for manual testing with flp +# It requires loki to be installed +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: netobserv-ebpf-agent + labels: + k8s-app: netobserv-ebpf-agent +spec: + selector: + matchLabels: + k8s-app: netobserv-ebpf-agent + template: + metadata: + labels: + k8s-app: netobserv-ebpf-agent + spec: + serviceAccountName: netobserv-account + hostNetwork: true + dnsPolicy: ClusterFirstWithHostNet + containers: + - name: netobserv-ebpf-agent + image: quay.io/mmaciasl/netobserv-ebpf-agent:main + # imagePullPolicy: Always + securityContext: + capabilities: + add: + - BPF + - PERFMON + - NET_ADMIN + - SYS_RESOURCE + runAsUser: 0 + env: + - name: FLOWS_TARGET_HOST + valueFrom: + fieldRef: + fieldPath: status.hostIP + - name: FLOWS_TARGET_PORT + value: "9999" +--- +apiVersion: apps/v1 +kind: DaemonSet +metadata: + name: flp + labels: + k8s-app: flp +spec: + selector: + matchLabels: + k8s-app: flp + template: + metadata: + labels: + k8s-app: flp + spec: + containers: + - name: flowlogs-pipeline + image: quay.io/netobserv/flowlogs-pipeline:latest + ports: + - containerPort: 9999 + args: + - --config=/etc/flp/config.yaml + volumeMounts: + - mountPath: /etc/flp + name: config-volume + volumes: + - name: config-volume + configMap: + name: flp-config +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: flp-config +data: + config.yaml: | + log-level: debug + pipeline: + - name: ingest + - name: decode + follows: ingest + - name: enrich + follows: decode + - name: encode + follows: enrich + - name: loki + follows: encode + parameters: + - name: ingest + ingest: + type: grpc + grpc: + port: 9999 + - name: decode + decode: + type: protobuf + - name: enrich + transform: + type: network + network: + rules: + - input: SrcAddr + output: SrcK8S + type: "add_kubernetes" + - input: DstAddr + output: DstK8S + type: "add_kubernetes" + - name: encode + encode: + type: none + - name: loki + write: + type: loki + loki: + type: loki + staticLabels: + app: netobserv-flowcollector + labels: + - "SrcK8S_Namespace" + - "SrcK8S_OwnerName" + - "DstK8S_Namespace" + - "DstK8S_OwnerName" + - "FlowDirection" + url: http://loki:3100 + timestampLabel: TimeFlowEnd +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: netobserv-account + diff --git a/deployments/flp-daemonset.yml b/deployments/flp-daemonset.yml index 87818e723757f158a4de8ca172a1c4b0cc3ddc01..cc05cdb9bbbc0afb032a1318192c998ebfc66d5e 100644 --- a/deployments/flp-daemonset.yml +++ b/deployments/flp-daemonset.yml @@ -24,6 +24,7 @@ spec: # imagePullPolicy: Always securityContext: privileged: true + runAsUser: 0 env: - name: FLOWS_TARGET_HOST valueFrom: @@ -124,18 +125,4 @@ apiVersion: v1 kind: ServiceAccount metadata: name: netobserv-account ---- -apiVersion: security.openshift.io/v1 -kind: SecurityContextConstraints -metadata: - name: example -allowPrivilegedContainer: true -allowHostDirVolumePlugin: true -allowHostNetwork: true -allowHostPorts: true -runAsUser: - type: RunAsAny -seLinuxContext: - type: RunAsAny -users: - - system:serviceaccount:network-observability:netobserv-account + diff --git a/deployments/flp-service.yml b/deployments/flp-service.yml index 2217f3ea0dbbcca8779bfe10c95d007705ceca34..72f1545d2a5e126919c9f9c8ac55c5e5a7954067 100644 --- a/deployments/flp-service.yml +++ b/deployments/flp-service.yml @@ -24,6 +24,7 @@ spec: # imagePullPolicy: Always securityContext: privileged: true + runAsUser: 0 env: - name: FLOWS_TARGET_HOST value: "flp" @@ -138,18 +139,3 @@ apiVersion: v1 kind: ServiceAccount metadata: name: netobserv-account ---- -apiVersion: security.openshift.io/v1 -kind: SecurityContextConstraints -metadata: - name: example -allowPrivilegedContainer: true -allowHostDirVolumePlugin: true -allowHostNetwork: true -allowHostPorts: true -runAsUser: - type: RunAsAny -seLinuxContext: - type: RunAsAny -users: - - system:serviceaccount:network-observability:netobserv-account diff --git a/pkg/agent/agent.go b/pkg/agent/agent.go index 3831894d1ef652abb549e379dca376875d7a2cd8..b473f1d0d79fc13629caeb957e0310becdc977ec 100644 --- a/pkg/agent/agent.go +++ b/pkg/agent/agent.go @@ -98,6 +98,8 @@ func FlowsAgent(cfg *Config) (*Flows, error) { func (f *Flows) Run(ctx context.Context) error { alog.Info("starting Flows agent") + systemSetup() + tracedRecords, err := f.interfacesManager(ctx) if err != nil { return err diff --git a/pkg/agent/agent_darwin.go b/pkg/agent/agent_darwin.go new file mode 100644 index 0000000000000000000000000000000000000000..284a70844da9a835acd7cb66a6ec94aa16b7644f --- /dev/null +++ b/pkg/agent/agent_darwin.go @@ -0,0 +1,4 @@ +package agent + +func systemSetup() { +} diff --git a/pkg/agent/agent_linux.go b/pkg/agent/agent_linux.go new file mode 100644 index 0000000000000000000000000000000000000000..4b3a0c2435b871e010a66f509b470d3e100b596c --- /dev/null +++ b/pkg/agent/agent_linux.go @@ -0,0 +1,16 @@ +package agent + +import ( + "github.com/cilium/ebpf/rlimit" + "github.com/sirupsen/logrus" +) + +var slog = logrus.WithField("component", "systemSetup") + +// systemSetup holds some system-dependant initialization processes +func systemSetup() { + if err := rlimit.RemoveMemlock(); err != nil { + slog.WithError(err). + Warn("can't remove mem lock. The agent could not be able to start eBPF programs") + } +} diff --git a/pkg/ebpf/tracer.go b/pkg/ebpf/tracer.go index 6de1983892eb5ec0ce0454ebd3015b7faf7e8151..cea82e411c47b2ce88e809a30da07f6aef860b29 100644 --- a/pkg/ebpf/tracer.go +++ b/pkg/ebpf/tracer.go @@ -11,7 +11,6 @@ import ( "time" "github.com/cilium/ebpf/ringbuf" - "github.com/cilium/ebpf/rlimit" "github.com/netobserv/netobserv-ebpf-agent/pkg/flow" "github.com/sirupsen/logrus" "github.com/vishvananda/netlink" @@ -53,11 +52,6 @@ func NewFlowTracer(iface string, sampling uint32) *FlowTracer { // before exiting. func (m *FlowTracer) Register() error { ilog := log.WithField("iface", m.interfaceName) - // Allow the current process to lock memory for eBPF resources. - // TODO: manually invoke unix.Prlimit with lower/reasonable rlimit - if err := rlimit.RemoveMemlock(); err != nil { - return fmt.Errorf("removing mem lock: %w", err) - } // Load pre-compiled programs and maps into the kernel, and rewrites the configuration spec, err := loadBpf() if err != nil {