Saturday, December 11, 2004

How to use Java to connect with HTTP servers outside your corporate firewall

Summary
This tip will show you how to write Java applications that can get past your corporate proxy and access Web servers on the Internet. Adding proxy support to your Java applications involves writing just a few additional lines of code and doesn't rely on any security "loopholes."

Almost every company is concerned with protecting its internal network from hackers and thieves. One common security measure is to completely disconnect the corporate network from the Internet. If the bad guys can't connect to any of your machines, they can't hack into them. The unfortunate side effect of this tactic is that internal users can't access external Internet servers, like Yahoo or JavaWorld. To address this problem, network administrators often install something called a "proxy server." Essentially, a proxy is a service that sits between the Internet and the internal network and manages connections between the two worlds. Proxies help reduce outside security threats while still allowing internal users to access Internet services. While Java makes it easy to write Internet clients, these clients are useless unless they can get past your proxy. Fortunately, Java makes it easy to work with proxies -- if you know the magic words, that is.

The secret to combining Java and proxies lies in activating certain system properties in the Java runtime. These properties appear to be undocumented, and are whispered between programmers as part of the Java folklore. In order to work with a proxy, your Java application needs to specify information about the proxy itself as well as specify user information for authentication purposes. In your program, before you begin to work with any Internet protocols, you'll need to add the following lines:

System.getProperties().put( "proxySet", "true" );
System.getProperties().put( "proxyHost", "myProxyMachineName" );
System.getProperties().put( "proxyPort", "85" );

The first line above tells Java that you'll be using a proxy for your connections, the second line specifies the machine that the proxy lives on, and the third line indicates what port the proxy is listening on. Some proxies require a user to type in a username and password before Internet access is granted. You've probably encountered this behavior if you use a Web browser behind a firewall. Here's how to perform the authentication:

URLConnection connection = url.openConnection();
String password = "username:password";
String encodedPassword = base64Encode( password );
connection.setRequestProperty( "Proxy-Authorization", encodedPassword );

The idea behind the above code fragment is that you must adjust your HTTP header to send out your user information. This is achieved with the setRequestProperty() call. This method allows you to manipulate the HTTP headers before the request is sent out. HTTP requires the user name and password to be base64 encoded. Luckily, there are a couple of public domain APIs that will perform the encoding for you (see the Resources section).

As you can see, there's not a whole lot to adding proxy support to your Java application. Given what you now know, and a little research (you'll have to find out how your proxy handles the protocol you're interested in and how to deal with user authentication), you can implement your proxy with other protocols.

Proxying FTP
Scott D. Taylor sent in the magic incantation to deal with proxying the FTP protocol:

defaultProperties.put( "ftpProxySet", "true" );
defaultProperties.put( "ftpProxyHost", "proxy-host-name" );
defaultProperties.put( "ftpProxyPort", "85" );

You can then access the files URLs using the "ftp" protocol via something like:

URL url = new URL("ftp://ftp.netscape.com/pub/navigator/3.04/windows/readme.txt" );

If anybody has examples of using a proxy with other Internet protocols, I'd love to see them.

Note: Has only been tested with JDK 1.1.4.

Follow-up Tips!

from Marla Bonar:

For those still using JDK 1.1.7 (with WebSphere 3.0), setting the system properties for proxyHost and proxyPort does not work; conn.getInputStream() either returns with Connection timed out or No route to host. However, I was able to work around this problem by using the URL constructor that takes the Host and Port as parameters (using my proxy host and port):

public URL(String protocol, String host, int port, String file).

from Dylan Walsh:

The method for providing authentication via username and password does not work. "Basic " should be at the start of the authentication string; for example:

String encodedPassword = base64Encode( password );

should be:

String encodedPassword = "Basic " + base64Encode( password );

Also you don't need a separate program to do the base64 encode. You can use the sun.misc.BASE64Encoder() class instead. Here is what the code looks like with both changes:

System.getProperties().put("proxySet", "true");
System.getProperties().put("proxyHost", proxyHost);
System.getProperties().put("proxyPort", proxyPort);
String authString = "userid:password";
String auth = "Basic " + new sun.misc.BASE64Encoder
().encode(authString.getBytes());
URL url = new URL("http://java.sun.com/");
URLConnection conn = url.openConnection();
conn.setRequestProperty("Proxy-Authorization", auth);

from Marcel Oerlemans:

Here's how you use Socks 4 proxy servers:

System.getProperty("socksProxySet", true);
System.getProperty("socksProxyHost", proxyHostName);
System.getProperty("socksProxyPort", proxyPort);
Usually the proxyPort for Socks 4 is port 1080

You can then make your connection using Socks4.

Resources

Monday, December 06, 2004

E1 Technology

Q: What's E1 exactly?What does it mean & how it works?Please explain in brief & complete.
A: E1 is the European equivalent to the American T1. Although both E1 and T1 use 64 kbps channels, they differ in many aspects. E1 is a point-to-point, dedicated, 2.048 Mbps communications circuit that carries 32 channels contrasted with T1's 24 channels. Of these 32 channels, 30 channels transmit voice and data. Unlike T1, E1 always provides clear channel 64 kbps channels.
Of the two remaining channels, one uses time slot 16 and is used for signaling and carrying line supervision (such as whether the telephones are on-hook or off-hook). The other remaining channel uses time slot 0, and is used for synchronization, channel control, and framing control.
There are two options for the physical media:
* 120 ohm twisted pair cabling, typically foil shielded. This is called a balanced interface and uses a DB-15 or 8-pin modular connector.
* 75 ohm coaxial cable. This is called an unbalanced interface because the two conductors do not have an equal impedance to ground, and uses a BNC connector.

Sunday, December 05, 2004

Supernetting

...Today I decide to talk about "Supernetting".If you are working on BGP you may need to read this, even for experts, maybe you want to know the basic.

Supernetting (known as CIDR, too) allows the use of multiple IP networks on the same interface. It is the reverse of subnetting, which allows the use of a single IP network on multiple interfaces.
Officially, supernetting is the term used when multiple network addresses of the same Class are combined into blocks. If the IP networks are contiguous, you may be able to use a supernet. If the IP networks are not contiguous, you would need to use sub-interfaces. These are not currently supported on Compatible Systems routers but are supported on routers from Cisco Systems.
A prerequisite for supernetting is that the network addresses be consecutive and that they fall on the correct boundaries. To combine two Class C networks, the first address' third octet must be evenly divisible by 2. If you would like to supernet 8 networks, the mask would be 255.255.248.0 and the first address' third octet needs to be evenly divisible by 8. For example, 198.41.15.0 and 198.41.16.0 could NOT be combined into a supernet, but you would be able to combine 198.41.18.0 and 198.41.19.0 into a supernet.
An IP address is a 32-bit number (4 bytes, called "octets", separated by periods, commonly called "dots.") Supernetting is most often used to combine Class C addresses (the first octet has values from 192 through 223). A single Class C IP network has 24 bits for the network portion of the IP address, and 8 bits for the host portion of the IP address. This gives a possibility of 256 hosts within a Class C IP network (2^8=256).
The subnet mask for a Class C IP network is normally 255.255.255.0. To use a supernet, the number of bits used for the subnet mask is REDUCED. For example, by using a 23 bit mask (255.255.254.0 -- 23 bits for the network portion of the IP network, and 9 bits for the host portion), you effectively create a single IP network with 512 addresses. Supernetting, or combining blocks of IP networks, is the basis for most routing protocols currently used on the Internet.
For Example: Two Class "C" network numbers of 198.41.78.0 and 198.41.79.0
The addresses pass the prerequisites. They are consecutive and the third octet of the first address is divisible by 2 (78 Mod 2 = 0). To further illustrate what is being done, let's look at the addresses in binary. The third octet of the first address (78) is 01001110. The second (79) is 01001111. The binaries are the same except for the last bit of the address (the 24th bit of the IP address). The 78 network is supernet 0 and the 79 network is supernet 1.
The subnet mask for this example supernet is 23 bits, or 255.255.254.0. ALL devices on the network MUST be using this subnet mask. Any device that is not using this subnet mask would be unreachable.
The broadcast address for ALL devices on the example supernet is 198.41.79.255. Most modern devices don't require you to fill out the broadcast address, as it can be deduced from the IP address and the subnet mask. The broadcast address is used as a special destination signifying ALL hosts on the network.
As with any IP network, the first number in the range (.0 in a class "C") has special significance, and can't be assigned to any hosts on the network. The first number in the range is referred to as the "network number". Conversely, the last, or highest number in the range (.255 in a class "C") is called the broadcast address, and also can't be used by any host on the network.
Because of these unique addresses, it would probably be wise not to use the 198.41.78.255 and 198.41.79.0 addresses (in the above example), even though these SHOULD be perfectly legal addresses for hosts when using a supernet.
There is one additional prerequisite for supernetting, you MUST EITHER be running static routing EVERYWHERE or be using a classless routing protocol such as RIP2 (or OSPF) which include subnet mask information and can pass supernetting information in order for this to work. Standard RIP does not transmit the subnet mask information.
If you are using Compatible Systems Routers then you should check that you are running a router ROM version later than 3.0.7 to have the supernetting feature fully implemented.

Thursday, December 02, 2004

Part II: Linux Boot Optimization: Results

Easily replace the init with a shell script based on /etc/rc.d/rc.sysinit and try to

tune it up as a result.
The results are pretty good I think, here is the general time line made
with a wallclock:

00: exit grub; start booting the kernel
04: kernel prints audit()
11: initrd is mounted; Red Hat nash visible
mount / ro (normal initrd procedure)
13: start bootchart logging; start readahead of approx 193MB files
sleep until readahead is complete
24: readahead done; now
create /dev and modprobe (in background)
mount / rw, enable swap
start xfs
startx as user davidz in background
start messagebus
start hald
start acpid
start NetworkManager
32: X claims the display
34: GNOME desktop banner
40: GNOME desktop is usable (Nautilus desktop, panel fully populated)

Here is a bootchart made with the bootchart software from Ziga Mahkovec:

http://people.redhat.com/davidz/bootchart.png

Thanks to David as I said before, I'm writing from his works here:

"You may notice that you can also start firefox after login and it starts very
very fast that's because readahead loads all files used by Firefox
in earlier experiments. I've also added files from OpenOffice.org to
readahead and that meant I could start up OpenOffice.org Writer in about
three seconds. More below.

I've made the following observations

1. The kernel patch, linux-2.6.3-printopen.patch, wasn't really working
well for me - it reported far to few files - instead I added a
printk() to fs/namei.c:link_path_walk()
(disclaimer: I don't know much about the kernel so there may be a
better solution than this).

2. The data captured from link_path_walk() was massaged into a list
of unique files to preload and sorted on sectors.

3. While capturing the data link_path_walk() and before processing
I went through all the menus in the GNOME desktop (to make sure
their icon and desktop files would be added to the list) as well as
loading Firefox. The list contains 5189 unique files - 231 of these
from my home directory - 103 of these from gconf in my home
directory and 302 from gconf in /etc. 2267 were .png files and
814 of them were .desktop files. 488 files had ".so" in their name.
There was a total of 193MB of files (which says something about
the footprint of the GNOME desktop on Fedora :-/)

4. Doing the readahead really helped the time from startx till a
usable desktop - less than 10 seconds!

5. Doing readahead on the 5189 files took about 45 seconds on my
system, mostly because the files were scattered around the disk.
Since I had a spare partition 17GB partition, I did this:
a. format spare partition as ext3
b. copy all readahead files to spare partition (193MB)
c. copy rest of files from main partition to spare partition
(about 9GB)
Now the readahead is down to 11 seconds which averages out to
be 18MB/s. On the other hand, I can still see (using fileblock)
that the files in the readahead is still scattered out and hdparm
says I should be able to get 33.87 MB/sec with no seeks.

6. I made a hack to cache /dev (a dev.tar file) and the list of modules
to load. This could be used in production if the kernel could give
us basically a hash value for the kobject hierarchy representing
the hardware (perhaps even a 'tree /sys |md5sum' would suffice).
This shaved some seconds of as well.

7. A number of things was started in parallel - I found that doing
readahead while running modprobe wasn't helping anything; in fact
it contributed negatively to performance (a bit to my surprise, I
guess because the kernel was busy).

8. readahead on the right files is A Good Thing(tm). Booting my system
without readahead on the partition with the readahead files scattered
took 58 seconds (compared to 39 with readahead on the optimized
partition)

http://people.redhat.com/davidz/bootchart-without-readahead-scattered.png

and without readahead on on the optimized partition it took 43
seconds

http://people.redhat.com/davidz/bootchart-without-readahead-nonscattered.png

again compared to 39 seconds. As an added bonus, the readahead
makes sure that e.g Firefox loads fast; all .png and .desktop files
are in place for when using the menus. As mentioned, one could put
very big apps like e.g. OO.o in the readahead set.

So, I think these numbers are good and there's still some room for
improvement; e.g. it takes ten seconds from grub to when the initrd is
mounted - surely the kernel can boot faster? It's after all 25% of the
time spent from grub until I have usable desktop.

The bad thing is that this approach is highly specific to my system (and
thus why I'm not posting an RPM with it :-), however I think it clearly
shows where improvements should be made; here are some random thoughts

a. We should keep track of files being loaded and maintain the
readahead fileset as appropriate. printk() doesn't seem like the
right solution; perhaps a system daemon using inotify or the
kernel events layer is the road ahead? This would enable us to
readahead the KDE stuff if the user is e.g. using KDE a lot.

b. ext3 should support operations for moving blocks around; e.g.
optimize around the readahead fileset - when idle the system should
rearrange the files to facilitate faster booting

c. the start_udev and kmodule process could be cached as I did above

d. The whole init(1) procedure seems dated; perhaps something more
modern built on top of D-BUS is the right choice - SystemServices
by Seth Nickell comes to mind [1]. Ideally services to be started
would have dependencies such as 1) don't start the gdm service
before /usr/bin/gdm is available; 2) the SSH service would only
be active when NetworkManager says there is a network connection;
/usr from LABEL=/usr would only be mounted when there is a volume
with that label and so forth. Also, such a system would of course
have support for LSB init scripts.
(This is probably a whole project on it's own so I'm omitting
detailed thinking on it for now)
"
Thanks a lot to Ziga Mahkovec for the bootchart software - it's been
very useful.

Part I: Linux Boot Optimization: Introduction & Implementation thought

... somehow I was thinking about boot processes and its time and payloads on Windows and Linux and compare their "boot time" and the time that the OS ready to respond to a request from a network, console or so on.Thanks to David Zeuthen & Owen Taylor and after of all tests there were so many usefull results which I would like to brief them.

Currently, the time to boot the Linux desktop from the point where the power switch is turned on, to the point where the user can start doing work is roughly two minutes.
During that time, there are basically three resources being used: the hard disk, the CPU, and the natural latency of external systems - the time it takes a monitor to respond to a DDC probe, the time it takes for the system to get an IP via DCHP, and so forth.
Ideally, system boot would involve a 3-4 second sequential read of around 100 megabytes of data from the hard disk, CPU utilization would be parallelized with that, and all queries on external systems would be asynchronous ... startup continues and once the external system responds, the system state is updated. Plausibly the user could start work under 10 seconds on this ideal system.
The challenge is to create a single poster showing graphically what is going on during the boot, what is the utilization of resources, how the current boot differs from the ideal world of 100% disk and CPU utilization, and thus, where are the opportunities for optimization.
So had a brief look at shortening startup/login time and tried

disabling rhgb in favor of starting gdm early. It looks pretty
promising; here are some wall-clock numbers from two runs of each
configuration:

| gdm_early | rhgb+gdm |
----------------------+------+-------+-------+------+
GRUB timeout | 0:00 | 0:00 | 0:00 | 0:00 |
Starting udev | 0:13 | 0:13 | 0:13 | 0:14 |
HW init done | 0:25 | 0:25 | 0:26 | 0:26 |
rhgb visible | N/A | N/A | 0:36 | 0:35 |
gdm login visible | 0:43 | 0:44 | 1:25 | 1:26 |
gdm login entered | 0:52 | 0:52 | 1:31 | 1:32 |
GNOME banner visible | 1:13 | 1:14 | 1:40 | 1:41 |
Nautilus Background | 1:33 | 1:32 | 1:51 | 1:52 |
Panel visible | 1:43 | 1:43 | 2:02 | 2:02 |
HD activity off | 1:59 | 1:56 | 2:13 | 2:14 |

The milestones should be pretty self evident. This is on a stock FC3
system running on a IBM T41 1.6GHz (running on AC power), 512MB RAM
without any services manually disabled.

In addition to starting gdm early, the modifications also start up a few
services, D-BUS, HAL and NetworkManager, that is critical to the GNOME
desktop.

Some random thoughts/observations:

- We get the gdm window 40 secs faster

- The 12 secs from "Starting udev" to "HW init done" can be mostly
shaved away/run in parallel

- Kernel bootstrap time (13 secs) can probably be much shorter
(that's what some kernel guys say anyway)

- With this hack we shave twenty secs of the booting time (e.g. from
GRUB until you can use your PC) but booting still feels much quicker
because of the interaction with gdm in the middle (YMMV; e.g. placebo
effect etc.)

- rhgb+gdm spawns an X server each which is sort of stupid and unsafe
(or so some Xorg guys tell me). This solution, per design, avoids
doing that

- we don't get the kudzu screen nor the fsck screens or any other
console interactions. However, IMHO, such screens are not good UI
in the first place - we should instead have GUI replacemnts that
possibly notifies you when you log into the desktop session (stuff
like NetworkManager and HAL alleviates such problems for networking
and storage devices)

- we don't get service startup notification, but, uhmm, is it really
useful learning that the "Console Mouse Service" or "Printing Sub-
system" have started? Instead, this stuff could just be put in gdm

- it could be interesting to make /sbin/init own a D-BUS service that
gdm and other stuff can query and interact with. Could also be fun
to completely replace it with something a'la the SystemServices
prototype that Seth did last year; links

http://www.osnews.com/story.php?news_id=4711
http://www.gnome.org/~seth/blog/2003/Sep/27

- Could be interesting to instrument the kernel with some pagefault
counters etc. and attempt do more readahead on e.g. the GNOME libs
(both Windows XP and Mac OS X does all that; I think we do too but
I've been told it can be improved)

So, anyway, I think it could be interesting to discuss starting gdm
instead of rhgb. If you want to try out my crude hack, grab the file
here

http://people.redhat.com/davidz/newinit.sh

put it in on your system as /newinit.sh, chmod a+x it and change this
line /etc/inittab

si::sysinit:/etc/rc.d/rc.sysinit

to these two lines

#si::sysinit:/etc/rc.d/rc.sysinit
si::sysinit:/newinit.sh

and you should be set to go! If it breaks you get to keep both pieces;
e.g. try this at your own risk [1].

Tuesday, November 30, 2004

IE "Save Picture As..." Image Download Spoofing

A vulnerability in Microsoft Internet Explorer has been discovered, which can be exploited by malicious people to trick users into downloading malicious files. The vulnerability is caused due to Internet Explorer using the file extension from the URL’s filename when saving images with the “Save Picture As” command and also strips the last file extension if multiple file extensions exist. This can be exploited by a malicious web site to cause a valid image with malicious, embedded script code to be saved with an arbitrary file extension. Successful exploitation may allow a malicious web site to trick users into downloading e.g. a malicious HTML Application (.hta) masqueraded as a valid image. However, exploitation requires that the option “Hide extension for known file types” is enabled (default setting). The vulnerability has been confirmed on a fully patched system with Internet Explorer 6.0 and Microsoft Windows XP SP2.
Solution : Disable the “Hide extension for known file types” option.