This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.
This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.
Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.
Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.
You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.
Please do not email me about this project without reading the disclaimer above.
This project is really two separate utilities, but since they have to mesh together I discuss them together. You could write either one and it would still be useful on its own. This project has several uses:HTML Documents, such as this one, tend to grow fat and unwieldy. At some point they need to be split into smaller documents. Have a look at methods.html. It has been split up, with a menu put on the front to link to the various pieces. Breaking up even one big document manually is at least a day’s work, by the time you get all the minor HTML adjustments done and proofread.
I would like to automate, or at least semi-automate the process. Here is how it would work. You take your monster document and insert magic tags in it showing how you want it split up. Then you run a utility that creates the new files.
There are features to deal with boilerplate, such as standard headers and footers and for customising those standard headers and footers.
The utility also generates you a menu that can be used to jump to all the different documents created.
However, when you propagate boilerplate you want it slightly customised. You can arrange to get it customised by embedding magic comment tags that will be expanded when you run the REFRESH utility.
<!--! THIS -->The name of the document often appears in the document itself. You might want it left the same to continue to refer to the mother document or you might want it changed to refer instead to the name of the child document. You mark such places with a <!--! THIS -->. The text <!--! THIS --> will be replaced by swim in the child fragment swim.html. This lets you generate strings like http://mindprod.com/swim.html or swim.cnt. The utility looks at the current directory to determine if extra levels of directory need to be included in the generated file name.
<!--! ICON -->This works just like THIS so you can make references to the fragment’s icon in the standard headers or footers, or even the fragment body.
<!--! MOTHER -->This works just like THIS so you can make references to the fragment’s mother file in the standard headers or footers, or even the fragment body. The file from which this was split off, or the file just above it in the document hierarchy.
<!--! BEGIN REFRESH FROM="fred" -->... <!--! END REFRESH FROM="fred" -->Your boilerplate will contain THIS and MOTHER tags, that will be expanded when the text between BEGIN REFRESH and END REFRESH is replaced. You must be sure these BEGIN REFRESH and END REFRESH tags are still in place after a regeneration so that future regenerations will still work. You could do that by including BEGIN REFRESH and END REFRESH as the first and last thing of every chunk of includable boilerplate as well as wherever you want that code included.
You thus have the option of either boilerplate that is regeneratable with embedded REFRESH tags and that which should not be regenerated later, without such tags, presumably because you intend to hand customise it and don’t want your customisations overwritten.
You might invent a shortcut
<!--! INCLUDE FROM="header2" -->
that acts like a BEGIN REFRESH / END REFRESH pair. You would use it to get the boilerplate included in the first place, when you are not using SPLIT. It would be deleted after processing.
During SPLIT processing, all tags except SPLIT are ignored. They are just treated as ordinary text. They can be expanded/refreshed later with a REFRESH run. The SPLIT process leaves behind a line like this in each file which is useful in expansion of tags:
You can manually edit the INFO tag generated by SPLIT or insert it manually. This enables you to use the REFRESH utility without ever using the SPLIT utility.
THIS, MOTHER and ICON not in boiler plate text, are expanded once and cannot be refreshed since there is no tag left behind. If you want them refreshable, they must appear in boilerplate text, enclosed in REFRESH tags. In a pinch you could create a boilerplate file to include that had almost nothing in it but a THIS or MOTHER tag. It will be expanded in the context of the file where it finally appears.
Your head is probably hurting by now. The basic problem is how to run REFRESH multiple times. You want to get rid of what was included/expanded earlier, before you re-expand the tags. You need a way of identifying where earlier expansion material started and ended. You also must leave embedded notes around about how to re-expand. For your first cut, only worry about running REFRESH once and leave no trace of the tags. Once you have that working you may feel ready to tackle the problem of running REFRESH multiple times to freshen your boilerplate expansions.
I wrote a pair of split/glue utilities like this to help me manage Pascal source code on the PDP-11 many years ago. Except for fixing NAME links, this project is actually easier to code that to explain.
REFRESH can be handled in a single pass. When you find an INFO tag you remember the information for use in later tag expansion. You discard the INFO tag itself. When you find an INCLUDE tag, you copy in data from the FROM file and recursively process it for embedded tags, including more INCLUDES. The only file you change is the one you are processing. You don’t refresh or expand the included boilerplate files themselves. You discard the INCLUDE tag itself. If you see a BEGIN REFRESH, you discard up to and including the corresponding END REFRESH. You may discard some nested BEGIN END pairs in the process! Then you treat like an INCLUDE. Don’t automatically generate new BEGIN END REFRESH pairs. If they are wanted, they will be inside the included text. If you see a THIS, ICON or MOTHER, replace it from the information in the most recent INFO tag. Discard the tag itself. This way refreshable boilerplate can be composed of refreshable boilerplate. When your refresh your documents, the latest and greatest will be recursively refreshed.
As a first cut, you might use Funduc Search and Replace as your scanning and replacing engine. The key is the *[] regular expression marker that will match anything, e.g. the old boilerplate sandwiched between two markers and the binary replace mode that lets you insert arbitrary multiline text.
I have also done a some specific solutions to the splitter problem — to split up large glossaries, making each <dt> its own file.
This page is posted |
http://mindprod.com/project/htmlsplitter.html | |
Optional Replicator mirror
|
J:\mindprod\project\htmlsplitter.html | |
Please read the feedback from other visitors,
or send your own feedback about the site. Contact Roedy. Please feel free to link to this page without explicit permission. | ||
Canadian
Mind
Products
IP:[65.110.21.43] Your face IP:[18.97.14.91] |
| |
Feedback |
You are visitor number | |