Simple Text Extract HTML (Internet HTML at Davar Web Site)

Simple Text Extract HTML

Simple text extract is an example of one possible way to present a short text of a simple paragraph structure. Once (after multiple trial-and-error iterations) I've developed the structure that I found to be satisfactory, I try to be consistent in using it throughout the entire web site. My first experiments were inevitably manual, later I've written a program in PowerBASIC to convert the scanned text into the HTML paragraph structure of a simple extract.

While each text extract is individual and on most part requires certain adjustments after automatic conversion, the basic principles of simple text extract presentation are the same for the format that I'm using. Some of them, like overall HTML layout, use of indentation, navigation bars, font usage, background setup, and HTML text HEAD structure are common for almost all pages of this web site. Those principles can be summarized as follows:

Text starts with 3 lines of descriptive title (3 separate lines of HTML comment). This title is an internal text description that helps me when I'm switching in my QuickEdit session between a dozen of texts (the way I'm doing it now while my train slowly approaches Hoboken). Internal title also helps me to handle the multitude of files comprising this web site (about 800 as of October 2000). I use ZTreeWin (32 bit shareware reincarnation of XTreeGold) to manage my web site files, and the power of 4DOS scripts to handle incremental uploads (via good old FTP). This choice of tools also calls for the internal descriptive titles that facilitates management of a large amount of small files.

Title consists of an underlined title text, text file URL, and three dates:
- Text creation date
- Text last modification date
- Text content modification date
Modification dates are maintained manually when text is changed (not too often to become a burden). Second date reflects any change to the text, while third date reflects content change, and drives an automatic system of coloring dates in various site navigation lists to highlight the most recently modified pages.
TITLE in the HTML HEAD will be displayed by the browser in the top bar of its window. It consists here of an extract text title followed by a reference to the section of the web site and the name of the site). TITLE is most likely to be displayed when your page gets found by the search engine to present the title of the individual page to the person browsing the search results. Title should be kept short and preferably as precise as possible.
LINK line that immediately follows TITLE line, specifies information about site FavIcon image for the page (it's the same throughout entire site).
CharSet specification matches here the standard default for Windows. However, in order for a browser (Netscape Navigator at least) to be able to interpret special symbols correctly the CharSet should be specified explicitly. E.g.: symbol § is interpreted correctly as §, because the text you are looking at now has explicit CharSet specification (to check this hit [Ctrl]+[U] and see the HTML source line just below the TITLE line). Without explicit CharSet specification the browser might fail to interpret special symbols showing question marks instead. Another reason to have a CharSet specification in every HTML source is for structural consistency with the Russian section of my web site where Cyrillic character set specification is a must.
The HEAD elements that follow provide information for a search engine spiders. Good search engines retrieve this information and ask you to give only an URL when you register your site with them (less than good engines ask you to enter all or part of that data manually).
- Description is most likely to be displayed right after the title (whatever will fit into 1–2 lines) when your page gets found by the search engine. Description can be viewed as an abstract that presents the content of the individual page to the person browsing the search results. Description should be kept reasonably short and preferably as precise as possible. Since my site is a mixture of many things, I split my Description into Page: and Site: parts, so that, if Page: part is small, some of the Site: sections' list might get displayed.
- Keywords are used for the search itself — search engine will look for the closest match between requested set of keywords and keywords from the page (retrieved by a spider and saved in the search engine database). The careful choice of the keywords is very important. Ideally they should be a distinguishable key permitting the reasonably close association of a page content with the intended target of a potential searcher. The following keywords reflect, IMHO, the content of the extract presented:
  - Davar Web Site — The unique keyword that I use for technical purposes — it enables a quick and simple check whether my site is in the database of a search engine.
  - Programming — The main topic of the presented extract.
  - Frederic Brooks — The author of the extract whose name seems to be forgotten by the "Brave New World" of PCs, but who is well known to any mainframe old-timer.
  - Mythical Man-Month — His book about software development, management of a development process, and other related topics.
  On the contrary, the word Joy, for example, though it is used several times in the text, including text title, would be a wrong choice, because its usage will place the title and URL of this page in an avalanche of search results that have nothing to do with the programming. The page found will be lost anyway, since there are so many other "joys", which are much more popular than that of a programming.
  
  I didn't use the word Mainframe either, because, though the book was written by the manager of one of the largest mainframe projects ever, and to a great extent describes this project, presented extract is about the craft of the programming in general, and doesn't bear any specific references to mainframes. This is equally applicable to the entire book as well — while it's inevitably based on the mainframe materials, it is greatly generalized and most of its insightful ideas can be successfully used as a guidelines for development on any computer architecture and in any software environment.
  
  Note: Some of the search engines will ignore the Keywords altogether and will try to derive them from the text of the page. The reason for this is that Keywords can be easily abused by the author of web page — in order to get more traffic author might put Keywords which don't reflect the content of the page. In the "best" traditions of our "politically correct" age in order to prevent the abuse by some, those search engines prohibit the use by everybody. IMHO, it would be better (and not much of a trouble) to check wether the Keywords match the text and, if not, derive them from the text (better yet, reject the submission at all to punish the guilty only, instead of punishing everybody for somebody's potential fraud attempt). I still believe in a common sense, and do my best to put out Keywords that reflect the content of my pages (as it was shown above, it's not always straightforward).
- Next line identifies the author of this HTML page.
- Last block of lines of the HEAD is style information.
  - A: HOVER enables link highlighting by specifying attributes (foreground and background colors) for a link when mouse cursor moves over it (see Internet Code Patterns for more details).
  - P sets paragraph style with first line indentation of 50 pixels, and line justification within paragraphs.
The BODY line starts the HTML text body, and sets background, foreground and hyperlinks' colors. There are browser defaults for all of them, but I prefer not to rely on these defaults, because I simply don't know what they are for each individual browser that is used to view my page. At the same time colors are an essential part of page design (user still always can, if he/she wishes, override any document color defaults by the defaults of his browser by making an appropriate choices of browser parameters).

Background color gets overridden by a background tiling image (color is still specified for the case if image is not found due to some mistake in site maintenance). If JavaScript is enabled (standard default), background image is selected at random from the pool of 9 backgrounds each time the page is initially retrieved or reloaded. If JavaScript is disabled, browser selects a fixed background.
The comment right after BODY replicates Description. The reason for this is an unfortunate fact that some of the search engines ignore Description, and put in its place the first couple lines of the text itself (I guess they do this for the same reason they ignore the Keywords). The texts of all my pages begin with a standard navigation bar, which would look senseless as a page abstract. Placing a comment replicating the Description is my attempt (not always successful) to give a search engine what it expects to get form the text starting lines — page abstract.
There are two JavaScript inserts colored in blue to stand out, which enclose entire page content. They represent opening and closing HTML code blocks of a horizontal framing and centering table. This is necessary in order to narrow and/or center site content that was designed for a minimum 800x600 screen resolution. Excessively long text lines on big (and especially wide) modern screens look awkward, as well as left-aligned lists. Narrowing and/or centering of content for such screens yield acceptable results, while permitting to maintain intact the basic layout principles of an entire web site. This universal mass fix was applied to all web site pages in the middle of 2007, and till March 2008 I still find from time to time individual pages that were distorted by this mass change, and that require individual adjustments.
FONT FACE and SIZE are specified for the same reason as colors and can be overridden by the user in the same way.
Top of the text Anchor enables jump from the text bottom to text top.
I've used Italicised text for the entire extract. It looks less formal to me that way.
Top navigation bar contains several essential links within the site where user can jump before reading the text. It is centered, and is separated from the text by a horizontal line and line space. Top navigation bar provides hyperlinks to:
- Web site entry — top level of the site page hierarchy
- Site contents
- Site index
- Programming selection — site section where extract belongs
- Programming extracts — sub-section of the Programming section
- Text bottom
Extract title is CENTERed, Bolded and Underlined. It is COLORed Red and is one step larger font SIZE size than the rest of the text. Title is folowed by line space.
Extract author name is CENTERed, Bolded and COLORed Navy.
The body of the extract is the series of Paragraphs. All paragraphs are presented in the same way:
- Paragraph sentences are separated by 2 spaces. Since HTML browser collapses any number of successive spaces into one space,   (non-breakable space) is appended to the end of every sentence to get an additional separating space.
- Paragraph text is justified (i.e., ALIGNed both on the Left and on the Right).
- Paragraph first line is indented by 7 &nbsp-s (non-breakable spaces). Mixture of &nbsp-s with regular spaces won't work properly for paragraph indentation, since paragraphs get justified, and I don't want the justification spaces to be inserted into the indent thus changing indent's size from paragraph to paragraph.
- The first letter of the first word of first paragraph line is Bolded and COLORed Red. I saw this style some time ago on a web site, liked it, and I'm using it ever since.
Text extract is followed by the the name of the author and the name of the book containing that extract. Normally I use for this a simple two-line DIVision with right ALIGNment. This specific extract, however, has the counterpart The Woes of the Craft page, and it was quite natural to provide a direct link to it at the end of an extract text. I wanted to have the author name and the book name ALIGNed to the Right as usual, and have the link ALIGNed to the Left (default) while keeping it on the same line with the book name. I use a TABLE with the corresponding left and right ALIGNment of data in its two columns. Please note that specification of WIDTH=100% is essential for the whole method to work as intended.
Bottom navigation bar contains several essential links within the site where user can jump after reading the text. It is centered and is separated from the text by a line space and horizontal line. Bottom navigation bar provides hyperlinks to:
- Web site entry — top level of the site page hierarchy
- Site contents
- Site index
- Programming selection — site section where extract belongs
- Programming extracts — sub-section of Programming section
- Text top
The natural question which arises at this point is: why not to use instead of those two navigation bars only one, which will be always present on the screen in a separate small horizontal frame, while text will be scrolling through the big frame occupying the rest if the screen? Been there, tried that... I found frames to be too much of a trouble to maintain and to use only for a minor convenience of having a navigation always on the screen. And this convenience itself is somewhat questionable — even a small frame pinches the main content window, while for the most time of the text browsing it has no use. It seems to me, that in the case of a simple text, frame usage involves an additional complexity that is not justified by any significant improvement. Thus, I've ended up with the navigation bars both at top and bottom as a compromise between convenience and complexity. This pair permits to jump to something else at both critical points of page browsing: before reading the text and after it. If the user wants to get out somewhere in between, he/she has to scroll either to text top or bottom — this is a slight inconvenience of top/bottom bar scheme (on most modern browsers [Ctrl]+[Home]/[Ctrl]+[End] permits to jump directly to text top/bottom). It's not much of a burden in any case, since most texts on my web site are short, and [Back] button along with History window is always close at hand. My own experiments with the frames, as well as the opinions about them that I've picked from the web, made me quite sceptical about their universal usefulness. I resort to them on my site only when there is serious functional justification for that.
Bottom of the text Anchor enables jump from the text top to text bottom.
The last 3 lines close global text FONT, HTML text BODY, and the entire HTML itself.

Note: To maintain a proper formatting of this document some excessively long lines of code have been split into parts (for a presentation purpose only). Those line split points are indicated either by a light green highlighted space " " (line splicing is optional), or by a light red highlighted space " " (line splicing is required for code to be valid). Regardless of the sliced line parts' indentation, line splicing should be done in such a way that first character of the next line part follows immediately the corresponding split point indicator (highlighted space) of the previous line part.

<HTML> <HEAD> <TITLE>The Joys of the Craft (Programming Extracts at Davar Web Site)</TITLE> <LINK REL="shortcut icon" HREF="../../favicon.ico" TYPE="image/vnd.microsoft.icon"> <META HTTP-EQUIV="Content-Type" CONTENT="Text/HTML; CharSet=ISO-8859-1"> <META NAME = Description CONTENT ="Page: The Joys of the Craft [of Programming] from The Mythical Man-Month by Frederic Brooks. Site: Davar Web Site, Computer Science, Programming, Mainframe, UNIX, PC, Internet, Mathematics, Go, Zen, Quotations, Extracts, Humor, Russian."> <META NAME = Keywords CONTENT ="Davar Web Site, Programming, Frederic Brooks, Mythical Man-Month"> <META NAME = Author CONTENT="Vladimir Veytsel"> <STYLE TYPE="Text/CSS"> A:HOVER {COLOR:Red; BACKGROUND:#FFFF66} P {TEXT-INDENT:50px; TEXT-ALIGN:Justify} </STYLE> </HEAD> <NOSCRIPT> <BODY BACKGROUND="../../PAPER001.JPG" BGCOLOR=White TEXT=Black LINK=Blue ALINK=Fuchsia VLINK=Purple> </NOSCRIPT> <SCRIPT LANGUAGE=JavaScript>  </SCRIPT>  <SCRIPT LANGUAGE=JavaScript>  </SCRIPT> <A NAME="Top"></A> <CENTER> Go to:  <A HREF="../../index.htm">Site entry</A> | <A HREF="../../CNT.HTM">Site direct</A> | <A HREF="../../IND.HTM">Site index</A> | <A HREF="../PROGRAM.HTM">Programming</A> | <A HREF="EXTRACTS.HTM">Program extracts</A> | <A HREF="#Bottom">Text bottom</A> <HR> The  Joys  of  the  Craft Frederic Brooks, Jr. </CENTER> Why is programming fun?  What delights may its practitioner expect as his reward? First is the sheer joy of making things.  As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design.  I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake. Second is the pleasure of making things that are useful to other people.  Deep within, we want others to use our work and to find it helpful.  In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office." Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning.  The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate. Fourth is the joy of always learning, which springs from the nonrepeating nature of the task.  In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both. Finally, there is the delight of working in such a tractable medium.  The programmer, like the poet, works only slightly removed from pure thought-stuff.  He builds his castles in the air, from air, creating by exertion of the imagination.  Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.  (As we shall see later, this very tractability has its own problems.) Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself.  It prints results, draws pictures, produces sounds, moves arms.  The magic of myth and legend has come true in our time.  One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be. Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men. <TABLE WIDTH=100% BORDER=0 CELLSPACING=0 CELLPADDING=0> <TR> <TD VALIGN=Bottom><A HREF="CRAFTWOE.HTM">The Woes of the Craft</A></TD> <TD ALIGN=Right><A HREF="http://www.cs.unc.edu/~brooks">Frederic Brooks, Jr.</A> "<A HREF='http://en.wikipedia.org/wiki/The_Mythical_Man-Month'>The Mythical Man-Month</A>", 1975, 1995</TD> </TR> </TABLE> <HR> <CENTER> Go to:  <A HREF="../../index.htm">Site entry</A> | <A HREF="../../CNT.HTM">Site direct</A> | <A HREF="../../IND.HTM">Site index</A> | <A HREF="../PROGRAM.HTM">Programming</A> | <A HREF="EXTRACTS.HTM">Program extracts</A> | <A HREF="#Top">Text top</A> </CENTER> <A NAME="Bottom"></A> <SCRIPT LANGUAGE=JavaScript>  </SCRIPT> </BODY> </HTML>

View The Joys of the Craft page or view [and save] CRAFTJOY.TXT text
(Use [Back] button or [Alt]+[CL] to return here from page/text view)
To make text executable rename it to CRAFTJOY.HTM and
make a global change of "<" into "<" signs and "&" into "&" signs.
Copyright © 1998 – 2008 by