screen scraping : Java Glossary
home S words local find no local find frame, full screen Google search web for topic jump to footer translate with Babelfish by Roedy Green ©1996-2008 Canadian Mind Products
Go to : punctuation 0-9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all)
screen scraping
Screen scraping runs scripted client software which interacts with legacy green screen applications e.g. CICS 3270 terminal apps, and (through the scripting) can return data to a host component. The host component can make the data available to non-legacy apps through ODBC, JDBC, etc.

The Screen scraper program has to fool the host into thinking it is talking to one of its usual hardware terminals with an operator sitting at it. It must compose queries in the format the usual hardware would produce, and interpret the formatted data coming back, parsing it to extract the data and leave behind the formatting.

Before you leap into writing a screen scraper, investigate thoroughly all the possible terminals you might emulate that will work with the existing app. You might find some simpler to emulate than others.

Some of the old terminals had quite complex protocols, e.g. SDLC, so you usually you don’t want to write that part from scratch. Look for a third party libary to handle the low-level protocol details.

Screen scraping can also refer to capturing a bit image off the screen the program is running on using Robot.createScreenCapture.

To convert the pixels back to text is not quite as difficult as you might think. You can do a primitive OCR that just compares clip regions with a cast of prototype characters set in the same font and size looking for an exact match. You might want to adust colours to pure black and white before you start. This is quite a bit easier than real OCR where you have to deal with imprecisely formed characters.

To separate characters you have to look for a vertical strip of white. To rapidly find the matching character you could use several methods:

Screenscraping also refers to extracting information from web pages on the web. Unless the authors permit reuse, you are violating copyright by doing that. I got in trouble by screenscraping foreign exchange rates off the Oanda site. Even material that looks fair game for reuse, e.g. prices, is not necessarily so. It is a legal minefield. It seems that manually extracting information is considered less sinful that using a program to do it, but you can still get in trouble.


CMP_homejump to top
CMP logo
feedback Please email your feedback for publication, errors, omissions, broken/redirected link reports
and suggestions to improve this page to Roedy Green : feedback email
made with CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[65.110.21.43]
Your face IP:[38.103.63.18] Spread the Net
You are visitor number 11,136.
You can get a fresh copy of this page from: or possibly from your local J: drive (Java virtual drive/Mindprod website mirror)
http://mindprod.com/jgloss/screenscraping.html J:\mindprod\jgloss\screenscraping.html