AIX

Sunday, December 1, 2013

0653-902 Cannot open the specified file for reading.

Question

My database is opening files in CIO mode and I can't copy the files. Why is this and how can I fix it?
Answer

INTRODUCTION - DIRECT I/O AND CONCURRENT I/O

To gain greater performance many databases will use raw logical volumes to store tablespaces and containers. The performance gain is mainly due to the fact that it bypasses filesystem cache and file locking mechanisms. However this makes backup and administration more difficult. So there are two options that AIX offers to give database users better performance when using JFS or JFS2 filesystems. These options are Concurrent I/O (CIO) and Direct I/O (DIO).

Direct I/O (DIO) bypasses the filesystem buffer cache and enables reading/writing directly to the filesystem. This is particularly useful when the application itself buffers reads and writes, avoiding a "double-buffer" situation and increasing I/O throughput. To avoid data consistency issues, if multiple processes open the same file and one did not use the O_DIRECT flag in their open(), then normal cached filesystem mode is used for the file. When the last process not using O_DIRECT closes, the file access is moved to Direct I/O mode.

Direct I/O also uses mandatory write-locking on a per-file or per-inode basis, as does the filesystem. This ensures that overlapping writes to the file do not occur, which would corrupt the file. This is also known as serialization of writes.

Concurrent I/O (CIO) for JFS2 filesystems gains the benefits of Direct I/O without the locking serialization. It allows a process writing to a file exclusive write access to that file.

TWO METHODS OF ENABLING DIRECT I/O AND CONCURRENT I/O

There are two different ways a sysadmin or database admin can set up Direct I/O or Concurrent I/O.

The first is to use a mount option to put the entire filesystem in either DIO or CIO mode:

# mount -o cio,dio /mountpoint /dev/lvname

If you wish to permanently have the filesystem mounted this way, use chfs to change the LVCB and the /etc/filesystems file:

# chfs -a options=cio,dio /mountpoint

The second method is for an application itself to open the file in DIO or CIO mode. This would be done via an option set in the open() system call, such as:

open( /path/to/file, O_DIRECT | O_CIO )

There are benefits to each method, and some restrictions to using the open() API to access files.

FILE ISSUES WHILE USING DIRECT I/O AND CONCURRENT I/O

If a database such as Oracle has opened a file with CIO or DIO, other processes may find themselves blocked if they are not using the same access methods. Some admins may not know that the database is opening the file using CIO or DIO, but usually a configuration directive can tell you.

In the case of Oracle, there is a variable called FILESYSTEMIO_OPTIONS, and if set to SETALL then both CIO and DIO will be enabled on any data file opened.

The effect of this is that opening a file with CIO and DIO may block other commands, such as cp, out of the file.

A typical error would be when trying to copy a database table using 'cp' this error is returned;

a system call received a parameter that is not valid.

While this is not obvious, here is the explanation, from the open() manual page of the AIX Technical Reference Manual:

Tuesday, October 16, 2012

AIX boot process

Loading the boot image of AIX

After POST, the firmware (System Read Only Storage) detects the 1st bootable device stored in the bootlist. (here it is hdisk0)
then the bootstrap code (software ROS) i.e. 1st 512 bytes of the hard disk loads to RAM.
Bootstrap code locates the Boot Logical Volume (BLV = hd5) from the harddisk
BLV contains AIX kernel, rc.boot Script, Reduced ODM and Boot commands.
Then BLV in the RAM uncompresses and Kernel releases from it.
Then AIX Kernel gets control.
AIX Kernel creates a RAM File System (Rootvg not activated now).
kernel starts init process from the BLV.
init executes rc.boot script from the BLV in the RAM.
Init with rc.boot 1 configures base devices.

rc.boot 1 in detail

init process from RAMFS executes rc.boot 1 (if any error LED=c06)
restbase command copies ODM from BLV to RAMFS.(success LED=510, error LED=548)
cfgmgr -f calls Config_rules ( which are phase=1) and activates all base devices.
run command bootinfo -b to check last boot device ( success LED=511).

Then

rc.boot 2 activates rootvg from hard disk.

rc.boot 2 in detail

rc.boot 2 (LED 551)
ipl_varyon to activate rootvg ( success LED= 517, error LED=552,554,556).
run command fsck -f /dev/hd4 to check whether "/" unmounted uncleanely in the last shutdown ( error LED=555).

mount /dev/hd4 (/) to RAMFS (error LED=557 due to corrupted jfslog..)

fsck -f /dev/hd2 i.e. "/usr" ( error LED=518).
mount /dev/hd2 in RAMFS.
fsck -f /dev/hd9var i.e. check "/var"
mount /var
copycore command checks whether dump occured. then copy dump from primary dump device paging space (/dev/hd6) to /var/adm/ras/.
unmount /var
swapon /dev/hd6 i.e. activate primary paging space.

Now the condition is /dev/hd4 is mounted on / in the RAMFS;
cfgmgr -f configured all base devices . so configuration data has been written to ODM of RAMFS.

mergedev is called and copy /dev from RAMFS to disk.
copy customized ODM from RAMFS to hard disk(at this stage both ODM from hd5 and hd4 are sync now)
mount /var.
Boot messages copy to file on the hard disk ( /var/adm/ras/bootlog) alog -t boot -o to view bootlog

Now / , /usr and /var are mounted in rootvg on the hard disk. Then

Kernel removes RAMFS
init process start from / in the rootvg

Here completes rc.boot 2, Now the condition is kernel removed RAMFS and accessing rootvg filesystems from hard disk. init from BLV replaced by init from hard disk

in rc.boot 3, init process /etc/inittab file and remaining devices are configured.

rc.boot 3 in detail

/etc/init starts and reads /etc/inittab ( LED=553)
runs /sbin/rc.boot 3
fsck -f /dev/hd3 i.e. check /tmp.
mount /tmp
sysncvg rootvg &; i.e. run syncvg in background and report stale PPs.
cfgmgr -P2 i.e. run cfgmgr in phase 2 in normal startup. (cfgmgr -P3 in service mode)
All remaining devices are configured now.
cfgcon configures console ( LED= c31 select console, c32 lft, c33 tty, c34 file on disk). If CDE mentioned in /etc/inittab we will get graphical console.
savebase calls to sync ODM from BLV with / FS (i.e. /etc/objrepos).
syncd daemon started. All data from cache memory to disk saves in every 60 seconds.
starts errdaemon for error logging.
LED display turned OFF.
rm /etc/nologin i.e. if the file is not removed, then login is not possible.
If any device are in missed state, (in Cudv chgstatus=3) display it.
Display "system initialization completed"

Then execute next line from /etc/inittab

DETAILED

=======

Boot process of AIX in detail

I. The boot process in AIX

As a system administrator you should have a general understanding of the boot process. This knowledge is useful to solve problems that can prevent a system from booting properly. These problems can be both software or hardware.We also recommend that you be familiar with the hardware configuration of your system.

Booting involves the following steps:

The initial step in booting a system is named Power On Self Test (POST). Its purpose is to verify that basic hardware is in functional state.The memory, keyboard, communication and audio devices are also initialized. You can see an image for each of these devices displayed on the screen. It is during this step that you can press a function key to choose a different boot list. The LED values displayed during this phase are model specific. Both hardware and software problems can prevent the system from booting.

System Read Only Storage (ROS) is specific to each system type. It is necessary for AIX 5L Version 5.3 to boot, but it does not build the data structures required for booting. It will locate and load bootstrap code. System ROS contains generic boot information and is operating system independent. Software ROS (also named bootstrap) forms an IPL control block which is compatible with AIX 5L Version 5.3, takes control and builds AIX 5L
specific boot information. A special file system located in memory and named RAMFS file system is created. Software ROS then locates, loads, and turns control over to AIX 5L Boot Logical Volume (BLV). Software ROS is AIX 5L information created based on machine type and is responsible for completing machine preparation to enable it to start AIX 5L kernel. A complete list of files that are part of the BLV can be obtained from directory /usr/lib/boot.

The most important components are the following:

- The AIX 5L kernel
- Boot commands called during the boot process such as bootinfo, cfgmgr
- A reduced version of the ODM. Many devices need to be configured hd4 (/) made available, so their corresponding methods have to be stored in the BLV. These devices are marked as base in PdDv.
- The rc.boot script

Note: Old systems based on MCI architecture execute an additional step before this, the so called Built In Self Test (BIST). This step is no longer required for systems based on PCI architecture.

The AIX 5L kernel is loaded and takes control. The system will display 0299 on the LED panel. All previous codes are hardware-related. The kernel will complete the boot process by configuring devices and starting the init process. LED codes displayed during this stage will be generic AIX 5L codes. So far, the system has tested the hardware, found a BLV, created the RAMFS, and started the init process from the BLV. The rootvg has not yet been activated. From now on the rc.boot script will be called three times, each timebeing passed a different parameter.

1.Boot phase 1

During this phase, the following steps are taken:

The init process started from RAMFS executes the boot script rc.boot

If init process fails for some reason, code c06 is shown on LED display. At this stage, the restbase command is called to copy a partial image of ODM from the BLV into the RAMFS. If this operation is successful LED display shows 510, otherwise LED code 548 is shown.

After this, the cfgmgr -f command reads the Config_Rules class from the reduced ODM. In this class, devices with the attribute phase=1 are considered base devices. Base devices are all devices that are necessary to access rootvg.
For example, if the rootvg is located on a hard disk all devices starting from motherboard up to the disk will have to be initialized.The corresponding methods are called so that rootvg can be activated in the nextboot phase 2. At the end of boot phase 1, the bootinfo -b command is called to determine the last boot device. At this stage, the LED shows 511.

2.Boot phase 2

In this phase , the rc.boot script is passed to the parameter 2. During this phase the following steps are taken.

a) The rootvg volume group is varied on with the special version of the varyonvg command ipl_varyon. If this command is successful the system displays 517. otherwise one of the following LED will appear 552,554,556 and the boot process is halted.

b) Root file system hd4 is checked using the fsck -f command. This will verify only whether the filesystem was unmounted cleanly before the last shutdown. If this command fails, the system will display code 555.

c) The root file system ( /dev/hd4 ) is mounted on a temporary mount point /mnt in RAMFS. If this fails 557 will appear in LED.

d) The /usr file system is verified using fsck -f command and then mounted. the copycore command checks if a dump occured. if it did, it is copied from default dump devices, /dev/hd6 to the default copy directory /var/adm/ras. After this /var is unmounted.

e) The primary pagingspace from rootvg, /dev/hd6 will be activated.

f) The mergedev process is called and /dev files from RAMFS are copied to disk.

g) All customized ODM files from the RAMFS are copied to disk.Both ODM versions from hd4 and hd5 are synchronized.

h) Finaly, the root file system from rootvg (disk) is mounted over the root file system from the RAMFS. The mount points for the root filesystems become available. now the /var and /usr file systems from the rootvg are mounted again on their ordinary mount points. There is no console available at this stage; so all boot messages will be copied to alog. The alog command maintains and manages logs.

3.Boot Phase 3

After phase 2 is completed rootvg is activated and the following steps are taken,

a. /etc/init process is started. It reads /etc/inittab file and calls rc.boot with argument 3

b. The /tmp filesystem is mounted.

c. The rootvg is synchronized by calling the synchvg command and launching it as background process. As a result all stale partitions from rootvg are updated.At this stage LED code 553 is shown.

d. At this stage the cfgmgr command is called.if the system is booted in normal mode the cfgmgr command is called with option -p2; in service mode, option -p3. The cfgmgr command reads the Config_rules files from ODM and calls all methods corresponding to either phase 2 or 3. All other devices that are not base devices are configured at this time.

e. Next the console is configured by calling the cfgcon command. After the configuration of the console , boot messages are send to the console if no STDOUT redirection is made. However all missed messages can be found in /var/adm/ras/conslog. LED codes that can be displayed at this time are :

c31 = console not yet configured.
c32 = console is an LFT terminal.
c33 = console is a TTY.
c34 = console is a file on the disk.

f. finally the synchronization of the ODM in the BLV with the ODM from the / (root) filesyatem is done by the savebase command.

g. The syncd daemon and errdaemon are started.

h. LED display is turned off.

i. if the /etc/nologin exists, it will be removed.

j. If there are devices marked as missing in CuDv a message is displayed on the console.

i. the message system initialization completed is send to the console. the execution of the rc.boot has completed. init process will continue processing the next command from /etc/inittab.

II. system initialization

During system startup, after the root file system has been mounted in the pre-initialization process, the following sequence of events occurs:

1. The init command is run as the last step of the startup process.
2. The init command attempts to read the /etc/inittab file.
3. If the /etc/inittab file exists, the init command attempts to locate an initdefauult entry in the /etc/inittab file.

a. If the initdefault entry exists, the init command uses the specified runlevel as the initial system run level.
b. If the initdefault entry does not exist, the init command requests that the user enter a run level from the system console (/dev/console).
c. If the user enters an S, s, M, or m run level, the init command enters the maintenance run level. These are the only runlevels that do not require a properly formatted /etc/inittab file.

4. If the /etc/inittab file does not exist, the init command places the system in the maintenance run level by default.
5. The init command rereads the /etc/inittab file every 60 seconds. If the /etc/inittab file has changed since the last time the init command read it, the new commands in the /etc/inittab file are executed.

III. The /etc/inittab file

The /etc/inittab file controls the initialization process.

The /etc/inittab file supplies the script to the init command's role as a general process dispatcher. The process that constitutes the majority of the init command's process dispatching activities is the /etc/getty line process, which initiates individual terminal lines. Other processes typically dispatched by the init command are daemons and the shell.

The /etc/inittab file is composed of entries that are position-dependent and have the following format,

/etc/inittab format = Identifier:RunLevel:Action:Command

Each entry is delimited by a newline character. A backslash (\) preceding a new line character indicated the continuation of an entry. There are no limits (other than maximum entry size) on the number of entries in the /etc/inittab file.

The maximum entry size is 1024 characters.

The entry fields are :

Identifier
A one to fourteen character field that uniquely identifies an object.

RunLevel
The run level at which this entry can be processed. The run level has the following attributes:

-Run levels effectively correspond to a configuration of processes in the system.

-Each process started by the init command is assigned one or more run levels in which it can exist.

-Run levels are represented by the numbers 0 through 9.

Eg: if the system is in run level 1, only those entries with a 1 in the run-level field are started.

-When you request the init command to change run levels, all processes without a matching entry in the run-level field for the target run level receive a warning signal (SIGTERM). There is a 20-second grace period before processes are forcibly terminated by the kill signal (SIGKILL).

-The run-level field can define multiple run levels for a process by selecting more than one run level in any combination from 0 through 9. If no run level is specified, the process is assumed to be valid at all run levels.

-There are four other values that appear in the run-level field, even though they are not true run levels: a, b, c and h. Entries that have these characters in the run level field are processed only when the telinit command requests them to be run (regardless of the current run level of the system). They differ from run levels in that the init command can never enter run level a, b, c or h. Also, a request for the execution of any of these processes does not change the current run level. Furthermore, a process started by an a, b, or c command is not killed when the init command changes levels. They are only killed if their line in the /etc/inittab file is marked off in the action field, their line is deleted entirely from /etc/inittab, or the init command goes into single-user mode.

Action
It tells the init command how to treat the process specified in the process field. The following actions are recognized by the init command:

respawn If the process does not exist, start the process. Do not wait for its termination (continue scanning the /etc/inittab file). Restart the process when it dies. If the process exists, do nothing and continue scanning the /etc/inittab file.

wait When the init command enters the run level that matches the entry's run level, start the process and wait for its termination. All subsequent reads of the /etc/inittab file, while the init command is in the same run level, will cause the init command to ignore this entry.

once When the init command enters a run level that matches the entry's run level, start the process, and do not wait for termination. When it dies, do not restart the process. When the system enters a new run level, and the process is still running from a previous run level change, the program will not be restarted.

boot Process the entry only during system boot, which is when the init command reads the /etc/inittab file during system startup. Start the process, do not wait for its termination, and when it dies, do not restart the process. In order for the instruction to be meaningful, the run level should be the default or it must match the init command's run level at boot time. This action is useful for an initialization function following a hardware reboot of the system.

bootwait Process the entry the first time that the init command goes from single-user to multi-user state after the system is booted. Start the process, wait for its termination, and when it dies, do not restart the process. If the initdefault is 2, run the process right after boot.

powerfail Execute the process associated with this entry only when the init command receives a power fail signal ( SIGPWR).

powerwait Execute the process associated with this entry only when the init command receives a power fail signal (SIGPWR), and wait until it terminates before continuing to process the /etc/inittab file.

off If the process associated with this entry is currently running, send the warning signal (SIGTERM), and wait 20 seconds before terminating the process with the kill signal (SIGKILL). If the process is not running, ignore this entry.

ondemand Functionally identical to respawn, except this action applies to the a, b, or c values, not to run levels.

initdefault An entry with this action is only scanned when the init command is initially invoked. The init command uses this entry, if it exists, to determine which run level to enter initially. It does this by taking the highest run level specified in the run-level field and using that as its initial state. If the run level field is empty, this is interpreted as 0123456789. therefore, the init command enters run level 9. Additionally, if the init command does not find an initdefault entry in the /etc/inittab file, it requests an initial run level from the user at boot time.

sysinit Entries of this type are executed before the init command tries to access the console before login. It is expected that this entry will only be used to initialize devices on which the init command might try to ask the run level question. These entries are executed and waited for before continuing.

Command
A shell command to execute. The entire command field is prefixed with exec and passed to a forked sh as sh -c exec command. Any legal sh command syntax can appear in this field. Comments can be inserted with the # comment syntax.

The getty command writes over the output of any commands that appear before it it in the /etc/inittab file. To record the output of these commands to the boot log, pipe their output to the alog -tboot command. The stdin, stdout, and stderr file descriptors may not be available while the init command is processing inittab entries. Any entries writing to stdout or stderr may not work predictably unless they redirect their output to a file or to /dev/console.
The following commands are the only supported methods for modifying the records in the /etc/inittab file.

lsitab Lists records in the /etc/inittab file.
mkitab Adds records to the /etc/inittab file.
chitab Changes records in the /etc/inittab file.
rmitab Removes records from the /etc/inittab file.

Eg:

If you want to add a record on the /etc/inittab file to run the find command on the run level 2 and start it again once it has finished:

1. Run the ps command and display only those processes that contain the word find:
# ps -ef | grep find
root 19750 13964 0 10:47:23 pts/0 0:00 grep find
#
2. Add a record named xcmd on the /etc/inittab using the mkitab command:
# mkitab "xcmd:2:respawn:find / -type f > /dev/null 2>&1"
3. Show the new record with the lsitab command:
# lsitab xcmd
xcmd:2:respawn:find / -type f > /dev/null 2>&1
#
4. Display the processes:
# ps -ef | grep find
root 25462 1 6 10:56:58 - 0:00 find / -type f
root 28002 13964 0 10:57:00 pts/0 0:00 grep find
#
5. Cancel the find command process:
# kill 25462
6. Display the processes:
# ps -ef | grep find
root 23538 13964 0 10:58:24 pts/0 0:00 grep find
root 28966 1 4 10:58:21 - 0:00 find / -type f
#

Since the action field is configured as respawn, a new process (28966 in this example) is started each time its predecessor finishes. The process will continue re-spawning, unless you change the action field,

Eg:

1. Change the action field on the record xcmd from respawn to once:
# chitab "xcmd:2:once:find / -type f > /dev/null 2>&1"
2. Display the processes:
# ps -ef | grep find
root 20378 13964 0 11:07:20 pts/0 0:00 grep find
root 28970 1 4 11:05:46 - 0:03 find / -type f
3. Cancel the find command process:
# kill 28970
4. Display the processes:
# ps -ef | grep find
root 28972 13964 0 11:07:33 pts/0 0:00 grep find
#

To delete this record from the /etc/inittab file, you use the rmitab command.

Eg:

# rmitab xcmd
# lsitab xcmd
#

Order of the /etc/inittab entries

The base process entries in the /etc/inittab file is ordered as follows:

1. initdefault
2. sysinit
3. Powerfailure Detection (powerfail)
4. Multiuser check (rc)
5. /etc/firstboot (fbcheck)
6. System Resource Controller (srcmstr)
7. Start TCP/IP daemons (rctcpip)
8. Start NFS daemons (rcnfs)
9. cron
10.pb cleanup (piobe)
11.getty for the console (cons)

The System Resource Controller (SRC) has to be started near the begining of the etc/inittab file since the SRC daemon is needed to start other processes. Since NFS requires TCP/IP daemons to run correctly, TCP/IP daemons are started ahead of the NFS daemons. The entries in the /etc/inittab file are ordered according to dependencies, meaning that if a process (process2) requires that another process (process1) be present for it to operate normally, then an entry for process1 comes before an entry for process2 in the /etc/inittab file.

Tuesday, October 9, 2012

RAID Levels

RAID levels

RAID level 0 – Striping

In a RAID 0 system data are split up in blocks that get written across all the drives in the array. By using multiple disks (at least 2) at the same time, this offers superior I/O performance. This performance can be enhanced further by using multiple controllers, ideally one controller per disk.

Advantages

RAID 0 offers great performance, both in read and write operations. There is no overhead caused by parity controls.
All storage capacity is used, there is no disk overhead.
The technology is easy to implement.

Disadvantages

RAID 0 is not fault-tolerant. If one disk fails, all data in the RAID 0 array are lost. It should not be used on mission-critical systems.

Applications

Pre-Press
Video editing and production
Image manipulation/editing
Downloading

RAID level 1 – Mirroring

Data are stored twice by writing them to both the data disk (or set of data disks) and a mirror disk (or set of disks) . If a disk fails, the controller uses either the data drive or the mirror drive for data recovery and continues operation. You need at least 2 disks for a RAID 1 array.

RAID 1 systems are often combined with RAID 0 to improve performance. Such a system is sometimes referred to by the combined number: a RAID 10 system.

Using RAID 1 with a separate controller for each disk is sometimes called
duplexing.

Advantages

RAID 1 offers excellent read speed and a write-speed that is comparable to that of a single disk.
In case a disk fails, data do not have to be rebuild, they just have to be copied to the replacement disk.
RAID 1 is a very simple technology.

Disadvantages

Inefficient use of disk space
High disk overhead
Doubles number of writes

Applications

Transaction,logging or record keeping applications

RAID level 3

Byte-level striping with dedicated parity.

On RAID 3 systems, data blocks are subdivided (striped) and written in parallel on two or more drives. An additional drive stores parity information (X-OR). You need at least 3 disks for a RAID 3 array. It combines 5 or 9 disks.

Advantages

RAID-3 provides high throughput (both read and write) for large data transfers.
Disk failures do not significantly slow down throughput.

Disadvantages

This technology is fairly complex and too resource intensive to be done in software.
Performance is slower for random, small I/O operations.

Ideal use

RAID 3 is not that common in purpose.

RAID level4

Block-level striping with dedicated parity.

RAID level 5

RAID 5 is the most common secure RAID level. It is similar to RAID-3 except that data are transferred to disks by independent read and write operations (not in parallel). The data chunks that are written are also larger. Instead of a dedicated parity disk, parity information is spread across all the drives. You need at least 3 disks for a RAID 5 array.
A RAID 5 array can withstand a single disk failure without losing data or access to data. Although RAID 5 can be achieved in software, a hardware controller is recommended. Often extra cache memory is used on these controllers to improve the write performance.

Advantages

Read data transactions are very fast while write data transaction are somewhat slower (due to the parity that has to be calculated).

Disadvantages

Disk failures have an effect on throughput, although this is still acceptable.
Like RAID 3, this is complex technology.

Ideal use

RAID 5 is a good all-round system that combines efficient storage with excellent security and decent performance. It is ideal for file and application servers.

RAID 6

Block-level striping with double distributed parity.

RAID 10 (or 1+0): Reliable, High Performing Mirrored Stripes
RAID 10 is a mirrored stripe. At its most simple, two disks are mirrored and then those two mirrors are striped together into one LUN, which is then presented out to a server. Due to the number of disks used, RAID 10 is a moderately high cost solution, however it does offer decent I/O rates.

Advantages

Highly fault tolerant
High data availability
Very good read / write performance

Disadvantages

Very expensive
Drive spindles must be synchronised
Not very scaleable

RAID 01 (or 0+1): High Performing Striped Mirrors
RAID 01 is a striped mirror. In its simplest form, two sets of two disks are striped, then then are mirrored. RAID 01 is an yields high performance but not maximum reliability.

Advantages

No parity generation
Easy to implement
Utilises full disk capacity
4 drives minimum
Higher performance than RAID 5

Disadvantages

Inefficient use of disk space
High disk overhead / Expensive
Costly to deploy

Thursday, September 27, 2012

tcmdump on AIX

tcpdump command

# at now
tcpdump -w /tmp/tcpdump -i en1 -s 1500 'port 8080'

-w -- file
-i -- interface
-s - MTU

For reading

tcpdump -nnr /tmp/tcpdump.27sep2012

tcpdump on a particular port(assume port is 1414)

at now
tcpdump -w /tmp/tcpdump -i en1 -s 1500 'port 1414'

how to read a tcpdump output

tcpdump -nnr /tmp/tcpdump

Capture packets for particular destination IP ( assume destination ip is 192.168.1.1)

tcpdump -w /tmp/tcpdump -i en2 -s 1500 dst 192.168.1.1

Tuesday, September 18, 2012

EtherChannel and IEEE 802.3ad Link Aggregation

EtherChannel and IEEE 802.3ad Link Aggregation are network port aggregation technologies that allow several Ethernet adapters to be aggregated together to form a single pseudo Ethernet device. For example, ent0 and ent1 can be aggregated into an EtherChannel adapter called ent3; interface en3 would then be configured with an IP address. The system considers these aggregated adapters as one adapter. Therefore, IP is configured over them as over any Ethernet adapter. In addition, all adapters in the EtherChannel or Link Aggregation are given the same hardware (MAC) address, so they are treated by remote systems as if they were one adapter. Both EtherChannel and IEEE 802.3ad Link Aggregation require support in the switch so it is aware which switch ports should be treated as one.

The main benefit of EtherChannel and IEEE 802.3ad Link Aggregation is that they have the network bandwidth of all of their adapters in a single network presence. If an adapter fails, network traffic is automatically sent on the next available adapter without disruption to existing user connections. The adapter is automatically returned to service on the EtherChannel or Link Aggregation when it recovers.
There are some differences between EtherChannel and IEEE 802.3ad Link Aggregation. Consider the differences given in Table 3 to determine which would be best for your situation.

Table 3. Differences between EtherChannel and IEEE 802.3ad Link Aggregation.
EtherChannel	IEEE 802.3ad
Requires switch configuration	Little, if any, configuration of switch required to form aggregation. Some initial setup of the switch may be required.
Supports different packet distribution modes	Supports only standard distribution mode

Beginning with AIX 5L with 5200-03, Dynamic Adapter Membership functionality is available. This functionality allows you to add or remove adapters from an EtherChannel without having to disrupt any user connections. For more details, see Dynamic Adapter Membership.

Supported Adapters

EtherChannel and IEEE 802.3ad Link Aggregation are supported on the following Ethernet adapters:

10/100 Mbps Ethernet PCI Adapter
Universal 4-Port 10/100 Ethernet Adapter
10/100 Mbps Ethernet PCI Adapter II
10/100/1000 Base-T Ethernet PCI Adapter
Gigabit Ethernet-SX PCI Adapter
10/100/1000 Base-TX Ethernet PCI-X Adapter
Gigabit Ethernet-SX PCI-X Adapter
2-port 10/100/1000 Base-TX Ethernet PCI-X Adapter
2-port Gigabit Ethernet-SX PCI-X Adapter

Only the basic EtherChannel functionality (operating exclusively in "standard" or "round-robin" mode without a backup) is supported in the following Ethernet adapters:

PCI Ethernet BNC/RJ-45 Adapter
PCI Ethernet AUI/RJ-45 Adapter

Unless the AIX Release Notes specify otherwise, support for new adapters will be provided as those adapters are released.

Note:

Mixing adapters of different speeds in the same EtherChannel, even if one of them is operating as the backup adapter, is not officially supported. This does not mean that such configurations will not work. The EtherChannel driver will make every reasonable attempt to work even in a mixed-speed scenario.

For information on configuring and using EtherChannel, see EtherChannel. For more information on configuring and using IEEE 802.3ad Link Aggregation, see IEEE 802.3ad Link Aggregation. For information on the different AIX and switch configuration combinations and the results they will produce, see Interoperability Scenarios.

EtherChannel

The adapters that belong to an EtherChannel must be connected to the same EtherChannel-enabled switch. This switch must be manually configured to treat the ports that belong to the EtherChannel as an aggregated link. Note that your switch documentation may refer to this capability as "link aggregation" or "trunking."
Traffic is distributed across the adapters in either the standard way (where the adapter over which the packets are sent is chosen depending on an algorithm) or on a round-robin basis (where packets are sent evenly across all adapters). Incoming traffic is distributed in accordance to the switch configuration and is not controlled by the EtherChannel operation mode.
In AIX, you can configure multiple EtherChannels per system, but it is required that all the links in one EtherChannel are attached to a single switch. Because the EtherChannel cannot be spread across two switches, the entire EtherChannel is lost if the switch is unplugged or fails. To solve this problem, a new backup option available in AIX 5.2 and later keeps the service running when the main EtherChannel fails. The backup and EtherChannel adapters should be attached to different network switches, which must be inter-connected for this setup to work properly. In the event that all of the adapters in the EtherChannel fail, the backup adapter will be used to send and receive all traffic. When any link in the EtherChannel is restored, the service is moved back to the EtherChannel.
For example, ent0 and ent1 could be configured as the main EtherChannel adapters, and ent2 as the backup adapter, creating an EtherChannel called ent3. Ideally, ent0 and ent1 would be connected to the same EtherChannel-enabled switch, and ent2 would be connected to a different switch. In this example, all traffic sent over en3 (the EtherChannel's interface) would be sent over ent0 or ent1 by default (depending on the EtherChannel's packet distribution scheme), whereas ent2 will be idle. If at any time both ent0 and ent1 fail, all traffic would be sent over the backup adapter, ent2. When either ent0 or ent1 recover, they will once again be used for all traffic.
Network Interface Backup, a mode of operation available for EtherChannel in AIX 4.3.3 and AIX 5.1, protects against a single point of Ethernet network failure. No special hardware is required to use Network Interface Backup, but the backup adapter should be connected a separate switch for maximum reliability. In Network Interface Backup mode, only one adapter at a time is actively used for network traffic. The EtherChannel tests the currently-active adapter and, optionally, the network path to a user-specified node. When a failure is detected, the next adapter will be used for all traffic. Network Interface Backup provides detection and failover with no disruption to user connections. Network Interface Backup was originally implemented as a mode in the EtherChannel SMIT menu. In AIX 5.2 and later, the backup adapter provides the equivalent function, so the mode was eliminated from the SMIT menu. To configure network interface backup in AIX 5.2 and later, see Configure Network Interface Backup.

Configuring EtherChannel

Follow these steps to configure an EtherChannel.

Considerations

You can have up to eight primary Ethernet adapters and only one backup Ethernet adapter per EtherChannel.
You can configure multiple EtherChannels on a single system, but each EtherChannel constitutes an additional Ethernet interface. The no command option, ifsize, may need to be increased to include not only the Ethernet interfaces for each adapter, but also any EtherChannels that are configured. In AIX 5.2 and earlier, the default ifsize is eight. In AIX 5.2 and later, the default size is 256.
You can use any supported Ethernet adapter in an EtherChannel (see Supported Adapters). However, the Ethernet adapters must be connected to a switch that supports EtherChannel. See the documentation that came with your switch to determine if it supports EtherChannel (your switch documentation may refer to this capability also as link aggregation or trunking).
All adapters in the EtherChannel should be configured for the same speed (100 Mbps, for example) and should be full duplex.
The adapters used in the EtherChannel cannot be accessed by the system after the EtherChannel is configured. To modify any of their attributes, such as media speed, transmit or receive queue sizes, and so forth, you must do so before including them in the EtherChannel.
The adapters that you plan to use for your EtherChannel must not have an IP address configured on them before you start this procedure. When configuring an EtherChannel with adapters that were previously configured with an IP address, make sure that their interfaces are in the detach state. The adapters to be added to the EtherChannel cannot have interfaces configured in the up state in the Object Data Manager (ODM), which will happen if their IP addresses were configured using SMIT. This may cause problems bringing up the EtherChannel when the machine is rebooted because the underlying interface is configured before the EtherChannel with the information found in ODM. Therefore, when the EtherChannel is configured, it finds that one of its adapters is already being used. To change this, before creating the EtherChannel, type smit chinet, select each of the interfaces of the adapters to be included in the EtherChannel, and change its state value to detach. This will ensure that when the machine is rebooted the EtherChannel can be configured without errors. For more information about ODM, see Object Data Manager (ODM) in AIX 5L Version 5.2 General Programming Concepts: Writing and Debugging Programs.
If you will be using 10/100 Ethernet adapters in the EtherChannel, you may need to enable link polling on those adapters before you add them to the EtherChannel. Type smit chgenet at the command line. Change the Enable Link Polling value to yes, and press Enter.
Note:

In AIX 5L with 5200-03 and later, enabling the link polling mechanism is not necessary. The link poller will be started automatically.
If you plan to use jumbo frames, you may need to enable this feature in every adapter before creating the EtherChannel and in the EtherChannel itself. Type smitty chgenet at the command line. Change the Enable Jumbo Frames value to yes and press Enter. Do this for every adapter for which you want to enable Jumbo Frames. You will enable jumbo frames in the EtherChannel itself later.
Note:

In AIX 5.2 and later, enabling the jumbo frames in every underlying adapter is not necessary once it is enabled in the EtherChannel itself. The feature will be enabled automatically if you set the Enable Jumbo Frames attribute to yes.

Configure an EtherChannel

Type smit etherchannel at the command line.
Select Add an EtherChannel / Link Aggregation from the list and press Enter.
Select the primary Ethernet adapters that you want on your EtherChannel and press Enter. If you are planning to use EtherChannel backup, do not select the adapter that you plan to use for the backup at this point. The EtherChannel backup option is available in AIX 5.2 and later.
Note:

The Available Network Adapters displays all Ethernet adapters. If you select an Ethernet adapter that is already being used (has an interface defined), you will get an error message. You first need to detach this interface if you want to use it.
Enter the information in the fields according to the following guidelines:
- EtherChannel / Link Aggregation Adapters: You should see all primary adapters that you are using in your EtherChannel. You selected these adapters in the previous step.
- Enable Alternate Address: This field is optional. Setting this to yes will enable you to specify a MAC address that you want the EtherChannel to use. If you set this option to no, the EtherChannel will use the MAC address of the first adapter.
- Alternate Address: If you set Enable Alternate Address to yes, specify the MAC address that you want to use here. The address you specify must start with 0x and be a 12-digit hexadecimal address (for example, 0x001122334455).
- Enable Gigabit Ethernet Jumbo Frames: This field is optional. In order to use this, your switch must support jumbo frames. This will only work with a Standard Ethernet (en) interface, not an IEEE 802.3 (et) interface. Set this to yes if you want to enable it.
- Mode: You can choose from the following modes:
  - standard: In this mode the EtherChannel uses an algorithm to choose which adapter it will send the packets out on. The algorithm consists of taking a data value, dividing it by the number of adapters in the EtherChannel, and using the remainder (using the modulus operator) to identify the outgoing link. The Hash Mode value determines which data value is fed into this algorithm (see the Hash Mode attribute for an explanation of the different hash modes). For example, if the Hash Mode is standard, it will use the packet's destination IP address. If this is 10.10.10.11 and there are 2 adapters in the EtherChannel, (1 / 2) = 0 with remainder 1, so the second adapter is used (the adapters are numbered starting from 0). The adapters are numbered in the order they are listed in the SMIT menu. This is the default operation mode.
  - round_robin: In this mode the EtherChannel will rotate through the adapters, giving each adapter one packet before repeating. The packets may be sent out in a slightly different order than they were given to the EtherChannel, but it will make the best use of its bandwidth. It is an invalid combination to select this mode with a Hash Mode other than default. If you choose the round-robin mode, leave the Hash Mode value as default.
  - netif_backup: This option is available only in AIX 5.1 and AIX 4.3.3. In this mode, the EtherChannel will activate only one adapter at a time. The intention is that the adapters are plugged into different Ethernet switches, each of which is capable of getting to any other machine on the subnet or network. When a problem is detected either with the direct connection (or optionally through the inability to ping a machine), the EtherChannel will deactivate the current adapter and activate a backup adapter. This mode is the only one that makes use of the Internet Address to Ping, Number of Retries, and Retry Timeout fields. Network Interface Backup Mode does not exist as an explicit mode in AIX 5.2 and later. To enable Network Interface Backup Mode in AIX 5.2 and later, you must configure one adapter in the main EtherChannel and a backup adapter. For more information, see Configure Network Interface Backup.
  - 8023ad: This options enables the use of the IEEE 802.3ad Link Aggregation Control Protocol (LACP) for automatic link aggregation. For more details about this feature, see IEEE 802.3ad Link Aggregation.
- Hash Mode: You can choose from the following hash modes, which will determine which data value will be used by the algorithm to determine the outgoing adapter:
  - default: In this hash mode the destination IP address of the packet will be used to determine the outgoing adapter. For non-IP traffic (such as ARP), the last byte of the destination MAC address is used to do the calculation. This mode will guarantee packets are sent out over the EtherChannel in the order they were received, but it may not make full use of the bandwidth.
  - src_port: In this hash mode the source UDP or TCP port value of the packet will be used to determine the outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP address will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used.
  - dst_port: In this hash mode the destination UDP or TCP port value of the packet will be used to determine the outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used.
  - src_dst_port: In this hash mode both the source and destination UDP or TCP port values of the packet will be used to determine the outgoing adapter (specifically, the source and destination ports are added and then divided by two before being fed into the algorithm). If the packet is not UDP or TCP traffic, the last byte of the destination IP will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used. This mode can give good packet distribution in most situations, both for clients and servers.
    Note:
    
    It is an invalid combination to select a Hash Mode other than default with a Mode of round_robin.
  To learn more about packet distribution and load balancing, see Load-balancing options.
- Backup Adapter: This field is optional. Enter the adapter that you want to use as your EtherChannel backup. EtherChannel backup is available in AIX 5.2 and later.
- Internet Address to Ping: This field is optional and only takes effect if you are running Network Interface Backup mode or if you have only one adapter in the EtherChannel and a backup adapter. The EtherChannel will ping the IP address or host name that you specify here. If the EtherChannel is unable to ping this address for the Number of Retries times in Retry Timeout intervals, the EtherChannel will switch adapters.
- Number of Retries: Enter the number of ping response failures that are allowed before the EtherChannel switches adapters. The default is three. This field is optional and valid only if you have set an Internet Address to Ping.
- Retry Timeout: Enter the number of seconds between the times when the EtherChannel will ping the Internet Address to Ping. The default is one second. This field is optional and valid only if you have set an Internet Address to Ping.
Press Enter after changing the desired fields to create the EtherChannel.
Configure IP over the newly-created EtherChannel device by typing smit chinet at the command line.
Select your new EtherChannel interface from the list.
Fill in all the required fields and press Enter.

Configure Network Interface Backup

Network Interface Backup protects against a single point of network failure by providing failure detection and failover with no disruption to user connections. When operating in this mode, only one adapter is active at any given time. If the active adapter fails, another adapter in the EtherChannel will be used for all traffic. When operating in Network Interface Backup mode, it is not necessary to connect to EtherChannel-enabled switches.
The Network Interface Backup setup is most effective when the adapters are connected to different network switches, as this provides greater redundancy than connecting all adapters to one switch. When connecting to different switches, make sure there is a connection between the switches. This provides failover capabilities from one adapter to another by ensuring that there is always a route to the currently-active adapter.
In releases prior to AIX 5.2, Network Interface Backup mode was implemented as an explicit mode of operation in the EtherChannel SMIT menu. In AIX 5.2 and later, however, the backup adapter functionality provides the equivalent behavior, so the mode was eliminated from the SMIT menu.
Additionally, AIX 5.2 and later versions provide priority, meaning that the adapter configured in the primary EtherChannel will be used preferentially over the backup adapter. As long as the primary adapter is functional, it will be used. This contrasts from the behavior of Network Interface Backup mode in releases prior to AIX 5.2, where the backup adapter was used until it also failed, regardless of whether the primary adapter had already recovered.
For example, ent0 could be configured as the main adapter, and ent2 as the backup adapter, creating an EtherChannel called ent3. Ideally, ent0 and ent2 would be connected to two different switches. In this example, all traffic sent over en3 (the EtherChannel's interface) would be sent over ent0 by default, whereas ent2 will be idle. If at any time ent0 fails, all traffic would be sent over the backup adapter, ent2. When ent0 recovers, it will once again be used for all traffic.
While operating in Network Interface Backup Mode, it is also possible to configure the EtherChannel to detect link failure and network unreachability. To do this, specify the IP address or host name of a remote host where connectivity should always be present. The EtherChannel will periodically ping this host to determine whether there is still a network path to it. If a specified number of ping attempts go unanswered, the EtherChannel will fail over to the other adapter in the hope that there is a network path to the remote host through the other adapter. In this setup, not only should every adapter be connected to a different switch, but each switch should also have a different route to the host that is pinged.
This ping feature is only available in Network Interface Backup mode. However, in AIX 5.2 and later, if there is a failover due to unanswered pings on the primary adapter, the backup adapter will remain the active channel as long as it is working. There is no way of knowing, while operating on the backup adapter, whether it is possible to reach the host being pinged from the primary adapter. To avoid failing over back and forth between the primary and the backup, it will simply keep operating on the backup (unless the pings go unanswered on the backup adapter as well, or if the backup adapter itself fails, in which case it would fail over to the primary adapter). However, if the failover occurred because the primary adapter failed (not because the pings went unanswered), the EtherChannel will then come back to the primary adapter as soon it has come back up, as usual.
To configure Network Interface Backup in AIX 5.2, see Configure Network Interface Backup in AIX 5.2 and later. To configure Network Interface Backup in previous versions of AIX, see Appendix B. Configure Network Interface Backup in previous AIX versions

Configure Network Interface Backup in AIX 5.2 and later

With root authority, type smit etherchannel on the command line.
Select Add an EtherChannel / Link Aggregation from the list and press Enter.
Select the primary Ethernet adapter and press Enter. This is the adapter that will be used until it fails.
Note:

The Available Network Adapters displays all Ethernet adapters. If you select an Ethernet adapter that is already being used, you will get an error message and will need to detach this interface before you can use it. See the ifconfig command for information on how to detach an interface.
Enter the information in the fields according to the following guidelines:
- EtherChannel / Link Aggregation Adapters: You should see the primary adapter you selected in the previous step.
- Enable Alternate Address: This field is optional. Setting this to yes will enable you to specify a MAC address that you want the EtherChannel to use. If you set this option to no, the EtherChannel will use the MAC address of the primary adapter.
- Alternate Address: If you set Enable Alternate Address to yes, specify the MAC address that you want to use here. The address you specify must start with 0x and be a 12-digit hexadecimal address (for example 0x001122334455).
- Enable Gigabit Ethernet Jumbo Frames: This field is optional. In order to use this, your switch must support jumbo frames. This will only work with a Standard Ethernet (en) interface, not an IEEE 802.3 (et) interface. Set this to yes if you want to use it.
- Mode: It is irrelevant which mode of operation you select because there is only one adapter in the main EtherChannel. All packets will be sent over that adapter until it fails. There is no netif_backup mode because that mode can be emulated using a backup adapter.
- Hash Mode: It is irrelevant which hash mode you select because there is only one adapter in the main EtherChannel. All packets will be sent over that adapter until it fails.
- Backup Adapter: Enter the adapter that you want to be your backup adapter. After a failover, this adapter will be used until the primary adapter recovers. It is recommended to use the preferred adapter as the primary adapter.
- Internet Address to Ping: The field is optional. The EtherChannel will ping the IP address or host name that you specify here. If the EtherChannel is unable to ping this address for Number of Retries times in Retry Timeout intervals, the EtherChannel will switch adapters.
- Number of Retries: Enter the number of ping response failures that are allowed before the EtherChannel switches adapters. The default is three. This field is optional and valid only if you have set an Internet Address to Ping.
- Retry Timeout: Enter the number of seconds between the times when the EtherChannel will ping the Internet Address to Ping. The default is one second. This field is optional and valid only if you have set an Internet Address to Ping.
Press Enter after changing the desired fields to create the EtherChannel.
Configure IP over the newly-created interface by typing smit chinet at the command line.
Select your new EtherChannel interface from the list.
Fill in all the required fields and press Enter.

For additional tasks that can be performed after the EtherChannel is configured, see Managing EtherChannel and IEEE 802.3ad Link Aggregation.

Load-balancing options

There are two load balancing methods for outgoing traffic in EtherChannel, as follows: round-robin, which spreads the outgoing traffic evenly across all the adapters in the EtherChannel; and standard, which selects the adapter using an algorithm. The Hash Mode parameter determines which numerical value is fed to the algorithm.
The following table summarizes the valid load balancing option combinations offered.

Table 4. Mode and Hash Mode combinations and the outgoing traffic distributions each will produce.
Mode	Hash Mode	Outgoing Traffic Distribution
standard or 8023ad	default	The traditional AIX behavior. The adapter selection algorithm uses the last byte of the destination IP address (for TCP/IP traffic) or MAC address (for ARP and other non-IP traffic). This mode is typically a good initial choice for a server with a large number of clients.
standard or 8023ad	src_dst_port	The outgoing adapter path is selected by an algorithm using the combined source and destination TCP or UDP port values. Since each connection has a unique TCP or UDP port, the three port-based hash modes provide additional adapter distribution flexibility when there are several, separate TCP or UDP connections between an IP address pair.
standard or 8023ad	src_port	The adapter selection algorithm uses the source TCP or UDP port value. In the netstat -an command output, the port is the TCP/IP address suffix value in the `Local` column.
standard or 8023ad	dst_port	The outgoing adapter path is selected by the algorithm using the destination system port value. In the netstat -an command output, the TCP/IP address suffix in the `Foreign` column is the TCP or UDP destination port value.
round-robin	default	Outgoing traffic is spread evenly across all the adapter ports in the EtherChannel. This mode is the typical choice for two hosts connected back-to-back (without an intervening switch).

Round-Robin

All outgoing traffic is spread evenly across all of the adapters in the EtherChannel. It provides the highest bandwidth optimization for the AIX server system. While round-robin distribution is the ideal way to utilize all the links equally, consider that it also introduces the potential for out-of-order packets at the receiving system.
In general, round-robin mode is ideal for back-to-back connections running jumbo frames. In this environment, there is no intervening switch, so there is no chance that processing at the switch could alter the packet delivery time, order, or adapter path. On this direct cable network path, packets are received exactly as sent. Jumbo frames (9000 byte MTU) always yield better file transfer performance than traditional 1500 byte MTUs. In this case, however, they add another benefit. These larger packets take longer to send so it is less likely that the receiving host would be continuously interrupted with out-of-order packets.
Round-robin mode can be implemented in other environments but at increased risk of out-of-order packets at the receiving system. This risk is particularly high when there are few, long-lived, streaming TCP connections. When there are many such connections between a host pair, packets from different connections could be intermingled, thereby decreasing the chance of packets for the same connection arriving out-of-order. Check for out-of-order packet statistics in the tcp section of the netstat -s command output. A steadily-increasing value indicates a potential problem in traffic sent from an EtherChannel.
If out-of-order packets are a problem on a system that must use traditional Ethernet MTUs and must connected through a switch, try the various hash modes offered in standard mode operation. Each mode has a particular strength, but the default and src_dst_port modes are the logical starting points as they are more widely applicable.

Standard or 8032ad

Standard algorithm. The standard algorithm is used for both standard and IEEE 802.3ad-style link aggregations. AIX divides the last byte of the "numerical value" by the number of adapters in the EtherChannel and uses the remainder to identify the outgoing link. If the remainder is zero, the first adapter in the EtherChannel is selected; a remainder of one means the second adapter is selected, and so on (the adapters are selected in the order they are listed in the adapter_names attribute).
The Hash Mode selection determines the numerical value used in the calculation. By default, the last byte of the destination IP address or MAC address is used in the calculation, but the source and destination TCP or UDP port values may also be used. These alternatives allow you to fine-tune the distribution of outgoing traffic across the real adapters in the EtherChannel.
In default hash mode, the adapter selection algorithm is applied to the last byte of the destination IP address for IP traffic. For ARP and other non-IP traffic, the same formula is applied on the last byte of the destination MAC address. Unless there is an adapter failure which causes a failover, all traffic between a host pair in default standard mode goes out over the same adapter. The default hash mode may be ideal when the local host establishes connections to many different IP addresses.
If the local host establishes lengthy connections to few IP addresses, however, you will notice that some adapters carry a greater load than others, because all the traffic sent to a specific destination is sent over the same adapter. While this prevents packets from arriving out-of-order, it may not utilize bandwidth in the most effective fashion in all cases. The port-based hash modes still send packets in order, but they allow packets belonging to different UDP or TCP connections, even if they are sent to the same destination, to be sent over different adapters, thus utilizing better the bandwidth of all the adapters.
In src_dst_port hash mode, the TCP or UDP source and destination port values of the outgoing packet are added, then divided by two. The resultant whole number (no decimals) is plugged into the standard algorithm. TCP or UDP traffic is sent on the adapter selected by the standard algorithm and selected hash mode value. Non-TCP or UDP traffic will fall back to the default hash mode, meaning the last byte of either the destination IP address or MAC address. The src_dst_port hash mode option considers both the source and the destination TCP or UDP port values. In this mode, all of the packets in one TCP or UDP connection are sent over a single adapter so they are guaranteed to arrive in order, but the traffic is still spread out because connections (even to the same host) may be sent over different adapters. The results of this hash mode are not skewed by the connection establishment direction because it uses both the source and destination TCP or UDP port values.
In src_port hash mode, the source TCP or UDP port value of the outgoing packet is used. In dst_port hash mode, the destination TCP or UDP port value of the outgoing packet is used. Use the src_port or dst_port hash mode options if port values change from one connection to another and if the src_dst_port option is not yielding a desirable distribution.

Managing EtherChannel and IEEE 802.3ad Link Aggregation

This section will tell you how to perform the following tasks:

Listing EtherChannels or Link Aggregations

On the command line, type smit etherchannel.
Select List All EtherChannels / Link Aggregations and press Enter.

Changing the Alternate Address

This enables you to specify a MAC address for your EtherChannel or Link Aggregation.

On AIX 5.2 with 5200-01 and earlier, type ifconfig interface detach, where interface is your EtherChannel's or Link Aggregation's interface. (On AIX 5L with 5200-03 and later, you can change the alternate address of the EtherChannel without detaching its interface).
On the command line, type smit etherchannel.
Select Change / Show Characteristics of an EtherChannel and press Enter.
If you have multiple EtherChannels, select the EtherChannel for which you want to create an alternate address.
Change the value in Enable Alternate EtherChannel Address to yes.
Enter the alternate address in the Alternate EtherChannel Address field. The address must start with 0x and be a 12-digit hexadecimal address (for example, 0x001122334455).
Press Enter to complete the process.
Note:

Changing the EtherChannel's MAC address at runtime may cause a temporary loss of connectivity. This is because the adapters need to be reset so they learn of their new hardware address, and some adapters take a few seconds to be initialized.

Dynamic Adapter Membership

Prior to AIX 5L with 5200-03, in order to add or remove an adapter from an EtherChannel, its interface first had to be detached, temporarily interrupting all user traffic. To overcome this limitation, Dynamic Adapter Membership (DAM) was added in AIX 5L with 5200-03. It allows adapters to be added or removed from an EtherChannel without having to disrupt any user connections. A backup adapter can also be added or removed; an EtherChannel can be initially created without a backup adapter, and one can be added a later date if the need arises
Not only can adapters be added or removed without disrupting user connections, it is also possible to modify most of the EtherChannel attributes at runtime. For example, you may begin using the "ping" feature of Network Interface Backup while the EtherChannel is in use, or change the remote host being pinged at any point.
You may also turn a regular EtherChannel into an IEEE 802.3ad Link Aggregation (or vice versa), allowing users to experiment with this feature without having to remove and recreate the EtherChannel.
Furthermore, with DAM, you may choose to create a one-adapter EtherChannel. A one-adapter EtherChannel behaves exactly like a regular adapter; however, should this adapter ever fail, it would be possible to replace it at runtime without ever losing connectivity. To accomplish this, you would add a temporary adapter to the EtherChannel, remove the defective adapter from the EtherChannel, replace the defective adapter with a working one using Hot Plug, add the new adapter to the EtherChannel, and then remove the temporary adapter. During this process you would never notice a loss in connectivity. If the adapter had been working as a standalone adapter, however, it would have had to be detached before being removed using Hot Plug, and during that time any traffic going over it would simply have been lost.

Adding, removing, or changing adapters in an EtherChannel or Link Aggregation

There are two ways to add, remove, or change an adapter in an EtherChannel or Link Aggregation. One method requires the EtherChannel or Link Aggregation interface to be detached, while the other does not (using Dynamic Adapter Membership, which is available in AIX 5L with 5200-03 and later).

Making changes to an EtherChannel using Dynamic Adapter Membership

Making changes using Dynamic Adapter Membership does not require you to stop all traffic going over the EtherChannel by detaching its interface. Consider the following before proceeding:

Notes:

When adding an adapter at runtime, note that different Ethernet adapters support different capabilities (for example, the ability to do checksum offload, to use private segments, to do large send, and so forth). If different types of adapters are used in the same EtherChannel, the capabilities reported to the interface layer are those supported by all the adapters (for example, if all but one adapter supports the use of private segments, the EtherChannel will state it does not support private segments; if all adapters do support large send, the channel will state it supports large send). When adding an adapter to an EtherChannel at runtime, be sure that it supports at least the same capabilities as the other adapters already in the EtherChannel. If you attempt to add an adapter that does not support all the capabilities the EtherChannel supports, the addition will fail. Note, however, that if the EtherChannel's interface is detached, you may add any adapter (regardless of which capabilities it supports), and when the interface is reactivated the EtherChannel will recalculate which capabilities it supports based on the new list of adapters.
If you are not using an alternate address and you plan to delete the adapter whose MAC address was used for the EtherChannel (the MAC address used for the EtherChannel is "owned" by one of the adapters), the EtherChannel will use the MAC address of the next adapter available (in other words, the one that becomes the first adapter after the deletion, or the backup adapter in case all main adapters are deleted). For example, if an EtherChannel has main adapters ent0 and ent1 and backup adapter ent2, it will use by default ent0's MAC address (it is then said that ent0 "owns" the MAC address). If ent0 is deleted, the EtherChannel will then use ent1's MAC address. If ent1 is then deleted, the EtherChannel will use ent2's MAC address. If ent0 were later re-added to the EtherChannel, it will continue to use ent2's MAC address because ent2 is now the owner of the MAC address. If ent2 were then deleted from the EtherChannel, it would start using ent0's MAC address again. Deleting the adapter whose MAC address was used for the EtherChannel may cause a temporary loss of connectivity, because all the adapters in the EtherChannel need to be reset so they learn of their new hardware address. Some adapters take a few seconds to be initialized.
If your EtherChannel is using an alternate address (a MAC address you specified), it will keep using this MAC address regardless of which adapters are added or deleted. Furthermore, it means that there will be no temporary loss of connectivity when adding or deleting adapters because none of the adapters "owns" the EtherChannel's MAC address.
Almost all EtherChannel attributes can now be modified at runtime. The only exception is Enable Gigabit Ethernet Jumbo Frames. To modify the Enable Gigabit Ethernet Jumbo Frames attribute, you must first detach the EtherChannel's interface before attempting to modify this value.
For any attribute that cannot be changed at runtime (currently, only Enable Gigabit Ethernet Jumbo Frames), there is a field called Apply change to DATABASE only. If this attribute is set to yes, it is possible to change, at runtime, the value of an attribute that usually cannot be modified at runtime. With the Apply change to DATABASE only field set to yes the attribute will only be changed in the ODM and will not be reflected in the running EtherChannel until it is reloaded into memory (by detaching its interface, using rmdev -l EtherChannel_device and then mkdev -l EtherChannel_device commands), or until the machine is rebooted. This is a convenient way of making sure that the attribute is modified the next time the machine boots, without having to disrupt the running EtherChannel.

To make changes to the EtherChannel or Link Aggregation using Dynamic Adapter Membership, follow these steps:

At the command line, type smit etherchannel.
Select Change / Show Characteristics of an EtherChannel / Link Aggregation.
Select the EtherChannel or Link Aggregation that you want to modify.
Fill in the required fields according to the following guidelines:
- In the Add adapter or Remove adapter field, select the Ethernet adapter you want to add or remove.
- In the Add backup adapter or Remove backup adapter fields, select the Ethernet adapter you want to start or stop using as a backup.
- Almost all the EtherChannel attributes may be modified at runtime, although the Enable Gigabit Ethernet Jumbo Frames attribute cannot.
- To turn a regular EtherChannel into an IEEE 802.3ad Link Aggregation, change the Mode attribute to 8023ad. To turn an IEEE 802.3ad Link Aggregation into an EtherChannel, change the Mode attribute to standard or round_robin.
Fill in the necessary data, and press Enter.

Making changes on AIX 5.2 with 5200-01 and earlier

Follow these steps to detach the interface before making changes:

Type ifconfig interface detach, where interface is your EtherChannel's interface.
On the command line type, smit etherchannel.
Select Change / Show Characteristics of an EtherChannel / Link Aggregation and press Enter.
Select the EtherChannel or Link Aggregation that you want to modify.
Modify the attributes you want to change in your EtherChannel or Link Aggregation and press Enter.
Fill in the necessary fields and press Enter.

Remove an EtherChannel or Link Aggregation

Type ifconfig interface detach, where interface is your EtherChannel's interface.
On the command line type smit etherchannel.
Select Remove an EtherChannel / and press Enter.
Select the EtherChannel that you want to remove and press Enter.

Configure or remove a backup adapter on an existing EtherChannel or Link Aggregation

The following procedure configures or removes a backup adapter on an EtherChannel or Link Aggregation. This option is available only in AIX 5.2 and later.

Type ifconfig interface detach, where interface is your EtherChannel's or Link Aggregation's interface.
On the command line, type smit etherchannel.
Select Change / Show Characteristics of an EtherChannel / Link Aggregation.
Select the EtherChannel or Link Aggregation that you are adding or modifying the backup adapter on.
Enter the adapter that you want to use as your backup adapter in the Backup Adapter field, or select NONE if you wish to stop using the backup adapter.

Troubleshooting EtherChannel

If you are having trouble with your EtherChannel, consider the following:

Tracing EtherChannel

Use tcpdump and iptrace to troubleshoot the EtherChannel. The trace hook id for the transmission packets is 2FA and for other events is 2FB. You cannot trace receive packets on the EtherChannel as a whole, but you can trace each adapter's receive trace hooks.

Viewing EtherChannel Statistics

Use the entstat command to get the aggregate statistics of all the adapters in the EtherChannel. For example, entstat ent3 will display the aggregate statistics of ent3. Adding the -d flag will also display the statistics of each adapter individually. For example, typing entstat -d ent3 will show you the aggregate statistics of the EtherChannel as well as the statistics of each individual adapter in the EtherChannel.

Note:

In the General Statistics section, the number shown in Adapter Reset Count is the number of failovers. In EtherChannel backup, coming back to the main EtherChannel from the backup adapter is not counted as a failover. Only failing over from the main channel to the backup is counted. In the Number of Adapters field, the backup adapter is counted in the number displayed.

Improving Slow Failover

If the failover time when you are using network interface backup mode or EtherChannel backup is slow, verify that your switch is not running the Spanning Tree Protocol (STP). When the switch detects a change in its mapping of switch port to MAC address, it runs the spanning tree algorithm to see if there are any loops in the network. Network Interface Backup and EtherChannel backup may cause a change in the port to MAC address mapping.
Switch ports have a forwarding delay counter that determines how soon after initialization each port should begin forwarding or sending packets. For this reason, when the main channel is re-enabled, there is a delay before the connection is re-established, whereas the failover to the backup adapter is faster. Check the forwarding delay counter on your switch and make it as small as possible so that coming back to the main channel occurs as fast as possible.
For the EtherChannel backup function to work correctly, the forwarding delay counter must not be more than 10 seconds, or coming back to the main EtherChannel might not work correctly. Setting the forwarding delay counter to the lowest value allowed by the switch is recommended.

Adapters not Failing Over

If adapter failures are not triggering failovers and you are running AIX 5.2 with 5200-01 or earlier, check to see if your adapter card needs to have link polling enabled to detect link failure. Some adapters cannot automatically detect their link status. To detect this condition, these adapters must enable a link polling mechanism that starts a timer that periodically verifies the status of the link. Link polling is disabled by default. For EtherChannel to work correctly with these adapters, however, the link polling mechanism must be enabled on each adapter before the EtherChannel is created. If you are running AIX 5L with 5200-03 and later, the link polling is started automatically and this cannot be an issue.
Adapters that have a link polling mechanism have an ODM attribute called poll_link, which must be set to yes for the link polling to be enabled. Before creating the EtherChannel, use the following command on every adapter to be included in the channel:

smit chgenet

Change the Enable Link Polling value to yes and press Enter.

Using Jumbo Frames

For the jumbo frames option to work properly in AIX 5.2 and earlier, aside from enabling the use_jumbo_frame attribute on the EtherChannel, you must also enable jumbo frames on each adapter before creating the EtherChannel using the following command:

smitty chgenet

Change the Enable Jumbo Frames value to yes and press Enter. On AIX 5.2 and later, jumbo frames are enabled automatically in every underlying adapter when it is set to yes.

Remote Dump

Remote dump is not supported over an EtherChannel.

IEEE 802.3ad Link Aggregation

IEEE 802.3ad is a standard way of doing link aggregation. Conceptually, it works the same as EtherChannel in that several Ethernet adapters are aggregated into a single virtual adapter, providing greater bandwidth and protection against failures. For example, ent0 and ent1 can be aggregated into an IEEE 802.3ad Link Aggregation called ent3; interface en3 would then be configured with an IP address. The system considers these aggregated adapters as one adapter. Therefore, IP is configured over them as over any Ethernet adapter.
Like EtherChannel, IEEE 802.3ad requires support in the switch. Unlike EtherChannel, however, the switch does not need to be configured manually to know which ports belong to the same aggregation.
The advantages of using IEEE 802.3ad Link Aggregation instead of EtherChannel are that it creates the link aggregations in the switch automatically, and that it allows you to use switches that support the IEEE 802.3ad standard but do not support EtherChannel.
In IEEE 802.3ad, the Link Aggregation Control Protocol (LACP) automatically tells the switch which ports should be aggregated. When an IEEE 802.3ad aggregation is configured, Link Aggregation Control Protocol Data Units (LACPDUs) are exchanged between the server machine and the switch. LACP will let the switch know that the adapters configured in the aggregation should be considered as one on the switch without further user intervention.
Although the IEEE 802.3ad specification does not allow the user to choose which adapters are aggregated, the AIX implementation does allow the user to select the adapters. According to the specification, the LACP determines, completely on its own, which adapters should be aggregated together (by making link aggregations of all adapters with similar link speeds and duplexity settings). This prevents you from deciding which adapters should be used standalone and which ones should be aggregated together. The AIX implementation gives you control over how the adapters are used, and it never creates link aggregations arbitrarily.
To be able to aggregate adapters (meaning that the switch will allow them to belong to the same aggregation) they must be of the same line speed (for example, all 100 Mbps, or all 1 Gbps) and they must all be full duplex. If you attempt to place adapters of different line speeds or different duplex modes, the creation of the aggregation on the AIX system will succeed, but the switch may not aggregate the adapters together. If the switch does not successfully aggregate the adapters together, you may notice a decrease in network performance. For information on how to determine whether an aggregation on a switch has succeeded, see Troubleshooting IEEE 802.3ad.
According to the IEEE 802.3ad specification, packets going to the same IP address are all sent over the same adapter. Thus, when operating in 8023ad mode, the packets will always be distributed in the standard fashion, never in a round-robin fashion.
The backup adapter feature is available for IEEE 802.3ad Link Aggregations just as it is for EtherChannel. The backup adapter does not need to be connected to an IEEE 802.3ad-enabled switch, but if it is, the backup adapter will still follow the IEEE 802.3ad LACP.
You can also configure an IEEE 802.3ad Link Aggregation if the switch supports EtherChannel but not IEEE 802.3ad. In that case, you would have to manually configure the ports as an EtherChannel on the switch (just as if a regular EtherChannel had been created). By setting the mode to 8023ad, the aggregation will work with EtherChannel-enabled as well as IEEE 802.3ad-enabled switches. For more information about interoperability, see Interoperability Scenarios.

Note:

The steps to enable the use of IEEE 802.3ad varies from switch to switch. You should consult the documentation for your switch to determine what initial steps, if any, must be performed to enable LACP in the switch.

For information in how to configure an IEEE 802.3ad aggregation, see Configuring IEEE 802.3ad Link Aggregation.

Considerations

Consider the following before configuring an IEEE 802.3ad Link Aggregation:

Although not officially supported, the AIX implementation of IEEE 802.3ad will allow the Link Aggregation to contain adapters of different line speeds; however, you should only aggregate adapters that are set to the same line speed and are set to full duplex. This will help avoid potential problems configuring the Link Aggregation on the switch. Refer to your switch's documentation for more information on what types of aggregations your switch allows.
If you will be using 10/100 Ethernet adapters in the Link Aggregation on AIX 5.2 with 5200-01 and earlier, you need to enable link polling on those adapters before you add them to the aggregation. Type smitty chgenet at the command line. Change the Enable Link Polling value to yes, and press Enter. Do this for every 10/100 Ethernet adapter that you will be adding to your Link Aggregation.
Note:

In AIX 5L with 5200-03 and later, enabling the link polling mechanism is not necessary. The link poller will be started automatically.

Configuring IEEE 802.3ad Link Aggregation

Follow these steps to configure an IEEE 802.3ad Link Aggregation:

Type smit etherchannel at the command line.
Select Add an EtherChannel / Link Aggregation from the list and press Enter.
Select the primary Ethernet adapters that you want on your Link Aggregation and press Enter. If you are planning to use a backup adapter, do not select the adapter that you plan to use for the backup at this point. The backup adapter option is available in AIX 5.2 and later.
Note:

The Available Network Adapters displays all Ethernet adapters. If you select an Ethernet adapter that is already being used (has an interface defined), you will get an error message. You first need to detach these interfaces if you want to use them.
Enter the information in the fields according to the following guidelines:
- EtherChannel / Link Aggregation Adapters: You should see all primary adapters that you are using in your Link Aggregation. You selected these adapters in the previous step.
- Enable Alternate Address: This field is optional. Setting this to yes will enable you to specify a MAC address that you want the Link Aggregation to use. If you set this option to no, the Link Aggregation will use the MAC address of the first adapter.
- Alternate Address: If you set Enable Alternate Address to yes, specify the MAC address that you want to use here. The address you specify must start with 0x and be a 12-digit hexadecimal address (for example, 0x001122334455).
- Enable Gigabit Ethernet Jumbo Frames: This field is optional. In order to use this, your switch must support jumbo frames. This will only work with a Standard Ethernet (en) interface, not an IEEE 802.3 (et) interface. Set this to yes if you want to enable it.
- Mode: Enter 8023ad.
- Hash Mode: You can choose from the following hash modes, which will determine which data value will be used by the algorithm to determine the outgoing adapter:
  - default: In this hash mode the destination IP address of the packet will be used to determine the outgoing adapter. For non-IP traffic (such as ARP), the last byte of the destination MAC address is used to do the calculation. This mode will guarantee packets are sent out over the EtherChannel in the order they were received, but it may not make full use of the bandwidth.
  - src_port: In this hash mode the source UDP or TCP port value of the packet will be used to determine the outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP address will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used.
  - dst_port: In this hash mode the destination UDP or TCP port value of the packet will be used to determine the outgoing adapter. If the packet is not UDP or TCP traffic, the last byte of the destination IP will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used.
  - src_dst_port: In this hash mode both the source and destination UDP or TCP port values of the packet will be used to determine the outgoing adapter (specifically, the source and destination ports are added and then divided by two before being fed into the algorithm). If the packet is not UDP or TCP traffic, the last byte of the destination IP will be used. If the packet is not IP traffic, the last byte of the destination MAC address will be used. This mode can give good packet distribution in most situations, both for clients and servers.
  To learn more about packet distribution and load balancing, see Load-balancing options.
- Backup Adapter: This field is optional. Enter the adapter that you want to use as your backup. The backup adapter option is available in AIX 5.2 and later.
- Internet Address to Ping: This field is optional, and only available if you have only one adapter in the main aggregation and a backup adapter. The Link Aggregation will ping the IP address or host name that you specify here. If the Link Aggregation is unable to ping this address for the Number of Retries times in Retry Timeout intervals, the Link Aggregation will switch adapters.
- Number of Retries: Enter the number of ping response failures that are allowed before the Link Aggregation switches adapters. The default is three. This field is optional and valid only if you have set an Internet Address to Ping.
- Retry Timeout: Enter the number of seconds between the times when the Link Aggregation will ping the Internet Address to Ping. The default is one second. This field is optional and valid only if you have set an Internet Address to Ping.
Press Enter after changing the desired fields to create the Link Aggregation.
Configure IP over the newly-created Link Aggregation device by typing smit chinet at the command line.
Select your new Link Aggregation interface from the list.
Fill in all the required fields and press Enter.

Managing IEEE 802.3ad

For management tasks that can be performed on an IEEE 802.3ad Link Aggregation after configuration, see Managing EtherChannel and IEEE 802.3ad Link Aggregation.

Troubleshooting IEEE 802.3ad

If you are having trouble with your IEEE 802.3ad Link Aggregation, use the following command to verify the mode of operation of the Link Aggregation:

entstat -d device

where device is the Link Aggregation device.

This will also make a best-effort determination of the status of the progress of LACP based on the LACPDUs received from the switch. The following status values are possible:

Inactive: LACP has not been initiated. This is the status when a Link Aggregation has not yet been configured, either because it has not yet been assigned an IP address or because its interface has been detached.
Negotiating: LACP is in progress, but the switch has not yet aggregated the adapters. If the Link Aggregation remains on this status for longer than one minute, verify that the switch is correctly configured. For instance, you should verify that LACP is enabled on the ports.
Aggregated: LACP has succeeded and the switch has aggregated the adapters together.
Failed: LACP has failed. Some possible causes are that the adapters in the aggregation are set to different line speeds or duplex modes or that they are plugged into different switches. Verify the adapters' configuration. In addition, some switches allow only contiguous ports to be aggregated and may have a limitation on the number of adapters that can be aggregated. Consult the switch documentation to determine any limitations that the switch may have, then verify the switch configuration.

Note:

The Link Aggregation status is a diagnostic value and does not affect the AIX side of the configuration. This status value was derived using a best-effort attempt. To debug any aggregation problems, it is best to verify the switch's configuration.

Interoperability Scenarios

The following table shows several interoperability scenarios. Consider these scenarios when configuring your EtherChannel or IEEE 802.3ad Link Aggregation. Additional explanation of each scenario is given after the table.

Table 5. Different AIX and switch configuration combinations and the results each combination will produce.
EtherChannel mode	Switch configuration	Result
8023ad	IEEE 802.3ad LACP	OK - AIX initiates LACPDUs, which triggers an IEEE 802.3ad Link Aggregation on the switch.
standard or round_robin	EtherChannel	OK - Results in traditional EtherChannel behavior.
8023ad	EtherChannel	OK - Results in traditional EtherChannel behavior. AIX initiates LACPDUs, but the switch ignores them.
standard or round_robin	IEEE 802.3ad LACP	Undesirable - Switch cannot aggregate. The result may be poor performance as the switch moves the MAC address between switch ports

8023ad with IEEE 802.3ad LACP: This is the most common IEEE 802.3ad configuration. The switch can be set to passive or active LACP.
standard or round_robin with EtherChannel: This is the most common EtherChannel configuration.
8023ad with EtherChannel: In this case, AIX will send LACPDUs, but they will go unanswered because the switch is operating as an EtherChannel. However, it will work because the switch will still treat those ports as a single link.

Note:

In this case, the entstat -d command will always report the aggregation is in the Negotiating state.
standard or round_robin with IEEE 802.3ad LACP: This setup is invalid. If the switch is using LACP to create an aggregation, the aggregation will never happen because AIX will never reply to LACPDUs. For this to work correctly, 8023ad should be the mode set on AIX.

For more information visit :- http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.commadmn/doc/commadmndita/etherchannel_intro.htm