This page provides information on troubleshooting the Grid protocol implementations in Click. It assumes that you are trying to get two machines to talk to each other, and that you are familiar with network debugging tools like tcpdump.
This section provides some hints on troubleshooting the DSDV protocol. It assumes that you are using the wireless interface eth1 on both machines, and that you are running Linux.
Ping Test
Can you ping between the two machines using the Grid IP addresses (10.0.0.*)? If so, the protocol works and you are done.
DSDV Route Brodcasts
If the ping test doesn't work, check that DSDV protocol packets are travelling between the two machines. DSDV sends periodic route broadcasts that have the ethertype 0x7fff. Become root, and run tcpdump with a command line like this:
% su % tcpdump -p -n -i eth1 ether proto 0x7fff |
You should get output like this for each packet:
10:40:55.347260 0:90:27:e0:23:XX ff:ff:ff:ff:ff:ff 7fff 86: .... 0004 93e0 0000 2c00 |
The second and third columns of each line are the packet's ethernet source and destination addresses. The destination address of the route broadcasts will be ff:ff:ff:ff:ff:ff (the ethernet broadcast address).
Run tcpdump on both machines. Make sure DSDV is running on each machine (you will see DSDV broadcasts sent by that machine in its tcpdump output), and check that both machines see each other's broadcasts. If you don't see any packets, then DSDV is probably not running on either machine. Make sure you are tcpdumping on the correct interface.
If each machine can only see its own route broadcasts but not those of the other machine, go to the next step to make sure the network interfaces are even able to communicate.
If both machines can see each other's route broadcasts, DSDV should be working. Check for error messages from running Click on the DSDV configuration. At userlevel, look for messages from the Click program; in the kernel check the output of dmesg.
Basic Connectivity
If Click is running correctly on each machine, but they don't see each other's packets, you should check that the network interfaces on each machine can send packets between themselves without DSDV or Click.
Try to ping between the machines without using Click or DSDV. Stop DSDV and Click, and manually configure the interfaces using ifconfig on each machine, e.g. using the address 10.3.0.1 and 10.3.0.2. You'll need to be root:
% su % ifconfig eth1 10.3.0.X netmask 255.255.255.0 |
You'll have to do this on both machines; replace the X in the IP address with 1 or 2, depending on which machine you are on. You should be able to ping between the machines using the 10.3.0.* addresses. If not, something is wrong with your network interfaces.
If you are using wireless interfaces, make sure they are in range: move the machines next to each other. Also, make sure each interface has the same wireless settings, using iwconfig:
% iwconfig eth1 eth1 IEEE 802.11-DS ESSID:"my-grid-net" Mode:Ad-Hoc Channel:2 Cell: CA:00:75:01:54:02 Bit Rate=1Mb/s Tx-Power=0 dBm (1 mW) Sensitivity=0/65535 Retry limit:16 RTS thr=2300 B Fragment thr:off Power Management:off Link Quality:9/255 Signal level:-79 dBm Noise level:-96 dBm Rx invalid nwid:11462022 Rx invalid crypt:0 Rx invalid frag:1 Tx excessive retries:30984 Invalid misc:10891560 Missed beacon:0 |
Check that the interface on both machines are in Ad-Hoc mode, have the same ESSID (or Network Name), are using the same channel, and agree on which Cell they are using. These fields are highlighted in the iwconfig output above.
If you are using wired interfaces, make sure both machines are plugged into the same ethernet segment (same hub or switch, or physical cable if using coax).
This section provides some hints on troubleshooting the DSR protocol. It's not written yet.
If you're still stuck after following our instructions, you can email grid-hackers@pdos.lcs.mit.edu with a detailed description of the problem, including the troubleshooting steps that you tried.
Last modified: $Date: 2003/08/18 15:28:47 $