\overfullrule0pt

Project documentation

Links
web browser
Mikuláš Patočka, Martin Pergel, Petr Kulhavý, Karel Kulhavý

Contents

\maketoc

Introduction

This is a project documentation for the program Links. You can read here about development course, history, participants and the result of the project. The printed version of documentation doesn't contain project mailing correspondence. The full documentation can be found in a PDF format on a CD with the program.

Links is a web browser for operating systems Unix and OS/2. The browser can be run both in text and graphics mode. In the graphics mode the browser can run under systems X-Windows, SVGAlib, AtheOS and Pmshell. The browser supports javascript (version 1.1 by Netscape Corporation), protocols HTTP 1.0 and 1.1, FTP, Finger, and SSL, formats HTML version 4.0 without CSS and image formats GIF, XBM, TIFF, JPEG, and PNG.

The order

The objective of the project is a web browser written in C language under operating system Linux, The basic engine (FTP, HTTP requeste, HTML formatter etc.) will be written by the project participants. The code written by other people may be used in case of libraries for processing of other formats (jpeg, png, mpeg, giff, ogg etc.). No libraries for work with HTTP and FTP will be used.

The broser should be stable, it's output high-quality if possible (easily readable). The comfort for the user should be so high that it could be used for everyday, fast and simple browsing on the Internet. The greates accent will be put on stability, because a crash of the browser means great nuissance for the user.

Middle-sized accent will be put on output quality (from the image quality point of view and from the document formatting point of view), because browsers are nowadays used for relatively large percentage of time spent sitting by computer and eye comfort is a requirement for successful and comfortable work with a computer. Aside from the output quality a speed limited only by typical transmission infrastructure will stand. The speed should not be limited by speed of CPU and graphics subsystem. If a conflict between quality and speed occurs, quality will be preferred. However, should this bring unjustified slowing down, then speed will be preferred.

The last position will occupy user comfort, which is going to be minimal, however elaborate and especially such that it wouldn't slow down the work with the browser and wouldn't require lengthy study of user interface documentation. We count also portability amongsth the comfort. Portability such that porting of the code doesn't bring significant problems, because work transparency in multiplatform environments brings the user high comfort without impinging upon other functions of the browser.

Expected browser features:

Why did we select this project

There already are many web page browsers, however none of them was according to our expectations regarding stability, security, output quality and also speed. Moreover majority of them have over-complicated user control, which makes the work of the user unpleasant.

Concurrency browsers aim more at using so called progressive technologies and supporting the latest features than at stability, portability and speed. Regarding Javascript interpretation, concurrency browsers again aim at supporting latest features of the languages (best possibly such that don't work with other browsers) and are very unstable. It could be said that javascript rules the user, instead of user ruling the javascript. For every concurrency browser (Netscape, Mozilla, Opera, Konqueror, Internet Explorer) a javascript code few lines long exists that makes the browser unmanageable and it's necessary to restart it, or even worse, they pwerform a DoS-type attack on the user's behalf (either by huge memory requirements, or using up all CPU time).

Thereore we decided to write out own web browser. The goal of the project was writing quality, fast, secure and stable, however still simple, web page browser. Stress was put especially on stability, output quality and speed. It should crash in no way, prevent the user from control over itself, or making unjustified demands on the memory or CPU time. The browser should be portable to majority of Unices and compilable with minimal requirements on the system.

Project participants

The developper team, under supervision of Mgr. David Bednárek, consisted of the following members:

Originally Jakub Drnec should have been in the team, but he left the faculty, so that only four people were left. The first year was under supervision of Petr Merta, but he left after one year and the project has been handed over to Mgr. Bednárek.

Also other people participated on the project, especially those who translated texts into various foreign languages. These people are listed at the end of the Credits chapter.

Program Structure

In this chapter you can read about rough division of the project into parts and who wrote which part. More detailed description of structure of the project is in the development documentation.

The core element of the browser is scheduler a.k.a. select loop . All actions that are due to happen are being planned here and individual parts of the browser are being called from here. The select loop is a cooperative scheduler. All parts of the browser are written to ensure that control doesn't stay for too long in them, because other parts could not work for a long time. This would manifest for example by not reacting to user's input.

Another very important part is session that ensures download management, document formatting and display, user control and movement around the document, and javascript execution. Session itself doesn't do anything, only calls other parts which do the actions themselves. Session calls object requester, javascript, HTML parse and the part providing display and interaction with the user.

Object requester manages document download from the network. Object requester calls request scheduler . File cache caching all files downloaded from network is connected to request scheduler. Interfaces of individual protocols (HTTP, HTTPS, FTP, finger and local disk download) are connected to request scheduler too.

HTML parser processes HTML documents and translates them into internal structures of the browser. View a.k.a. displayer manages document displaying, line breaking and handling of events generated by user. Aside from movement around the document the user can use menus. Part handling menus is called menu . The displayer calles functions of individual display drivers: terminal, X, SVGAlib, .... It is necessary to typeset in graphics mode and this is handled by part called fonty , which contains font cache and graphics routines for type processing and printing.

Javascript consists of lexical analyser, syntax analyser, interpreter and builtin functions. There is an interface between javascript and session called javascript interface , which negotiates access of javascript to internal data structures of the browser (document, pictures, links, forms, ...). Javascript interface contains functions that start and end interpretation of javascript code and similar functions. event handlers relate to javascript. Event handlers are pieces of code that are being called by various events, especially caused by user. For example by clicking on a button, moving a mouse, loading a document, pressing a key, changing a text field etc. Event handlers are built directly into the displayer.

Partitioning of work:

\penalty -10000

History

Development of "links-current" follows:

Project meeting were occuring very often, usually in queue in school dining room, during lunch, in computer laboratory, during 0verkill sessions and every time we saw each other we were discussing project problems, new things, what it is necessary to write and more. We also discussed many things using talk or e-mail. The project had three official meetings with the leader. First by writing out the project, second after half a ear and third again after another about half year. We wrote the project by ourselves, only here and there we needed to consult with the leader. We regularly wrote reports about project status.

Problems and decisions

In this chapter we are going to discuss problems we hit during the development of the project and important decisions that we were facing, including solutions.

Scheduler

We decided to write our own scheduler instead of making multithreaded or multiprocess web browser. Mikuláš was once trying to write a multithreaded variant for about two day, but he didn't like it so he deleted it and wrote his own scheduler. Having an own scheduler is more effective and as it is cooperative, then the individual parts can more easily switch between themselves and don't have to do complicated waiting.

Graphics interfaces

During writing a driver for X-Window graphics system we had a choice from an amount of already existing toolkits, however, we didn't exploit this possibility and wrote our own driver using directly Xlib library. The reason was that toolkits are many and every user is using a different one. Therefore the user would be potentially forced to install more libraries on his system, which would make the installation more complicated. Moreover, when there are so many similar toolkits then there isn't a reason for using some particular one. Using a toolkit would benefit in just a similarity to other applications and making the programming easier. But we strived for maximum similarity to text mode usage, which was not achievable with any toolkit.

We designed the graphics interface on a low level to be portable, simple and that a new driver could be easily added. The interface contains elementary functions for printing a bitmap, painting an area with colour, drawing a line, scroll and so on. These functions can be written very easily on any graphics interface.

Thanks to this design of graphics interface, the browser is using only one window. This has an advantage of easy predictability where which window will be placed, so it can't get lost on neighbouring screen neither it can be overlaid with different window. This makes the user comfort higher.

SVGAlib

Links may not work on older versions of SVGAlib, because they contain various bugs like a bad code that draws nonsenses over the screen during performing graphical operations. Using Links on older SVGAlibs can pose a security risk, because for example SVGAlib 1.2.10 was incorrectly giving up root privileges.

This is not a mistake of Links, but of the SVGAlib designers. I (Karel Kulhavý) have an opinion that SVGAlib is broken by design and should be rewritten, which is however not within the scope of Links project.

Even the new SVGAlib has bugs in the function vga\_setlinearaddressing (which manifest for example on S3 Virge 3d). Due to these bugs this function had to be commented out in Links (the same is done for example in the program Quake). Due to this reason Links is slower in some circumstances, which is not a mistake of Links, but of SVGAlib.

Some graphics functions were so buggy in SVGAlib that they didn't work at all and had to be emulated using more primitive functions, for example filling a rectangular area using putpixel (2-color modes). Because even this interface was drawing nonsenses, and regarding the fact that 2-color modes are not capable of displaying of any colours, they were removed from the Links at last.

Links is using acceleration primitives of SVGAlib in a case when SVGAlib supports them. But the problem is that SVGAlib doesn't support them for all cards, at least not for the one the Links team had an access to. Due to overall conception of SVGAlib it can be expected that they will be massively buggy and that Links will on some cards heavily crash the system. This is not a problem of Links, but of SVGAlib.

Framebuffer

Writing the framebuffer driver on Linux was very uneasy due to very bad or completely missing documentation. Therefore a 100 percent reliability is not guaranteed. Also it is not guaranteed that the driver will work with all cards on all computers. We were often forced to replace documentation by reverse engineering the source codes of the linux kernel, reading graphics drivers for the indivudual graphics cards and reading very sparse comments in header files of the framebuffer interface. These conditions made our work very hard and unnecessarily extended the development time. Often we have to determine things directly by the authors of the linux kernel or empirically try the behaviour and judge according that how the interface should be used. We consider this way very unclean, however due to absence of quality documentation for the interface we didn't have another option.

We had to be determining mouse position during writing framebuffer. We used the gpm library for this, using which the state of mouse is being assessed on a terminal (in text mode). Framebuffer is a de-facto terminal with a possibility of graphics output. Framebuffer doesn't offer any own interface for the mouse, only allows the graphics output to the screen. There isn't any other option of reading the mouse on a terminal, because not on all computers an ordinary user has an access to the mouse device (for example /dev/mouse ). The gpm library reads the mouse device and allows user applications to determine the mouse position and state of button depression using a socket. Fortunately majority of Linux computers have gpm installed and therefore it can be used.

But the problem is that gpm is strictly textual and therefore the cursor moves by characters and not by pixels. We tried to read the mouse by characters and by a dirty trick fake larger size of the screen, but it didn't work with older versions of gpm . Therefore we made a reading of relative mouse movements, which works with all versions. But the problem is that the mouse is moving very slowly. Therefore we had to finally anyway multiply the relative movement by a constant and therefore it is not possible to move the mouse smoothly and a real danger exists that there will be a spot on the screen where the user will not be able to click however necessary it will be. This problem can be partially solved by enlarging the text and pictures, but it unfortunatelly can't be solved completely. We tried to alleviate this situation by providing a possibility to make a fine movement of the mouse using F5 -- F8 keys.

The client application unfortunately can't set the mouse sensitivity, because it is set by the administrator hard during staring up the gpm daemon. Therefore we chose a compromise between sensitivity and accuracy. Therefore the cursor moves around the screen acceptably.

Theoretically it would be possible to write graphics support into gpm , because gpm is being spread under GPL licence. Writing graphics support should not be a big problem, there is even a certain chance that if the modified library would be sent to the authors, the authors would publish it in the next version. But the problem is that not everyone would have this library installed on his computer, therefore again the user without administrator privileges couldn't use a mouse on framebuffer again. This would be rather a long-term solution, because it could be counted on that the graphics-capable version of gpm would become common.

For the users that have administrator privileges we made a patch by application of which the mouse starts moving smoothly over the screen. The patch is distributed together with the browser and is in the file PATCH-gpm-1.20.0-smooth-cursor . The patch was made for version 1.20.0 ofgpm , but it will probably work with different version as well, possibly after a small change.

We managed only to implement framebuffer on Intel platform. We had an acess also to computers of Sparc architecture, on which Linux with framebuffer was running, but these computers don't have linear mapping of video memory, which complicates the implementation. As we didn't have a documentation, we didn't make support for nonlinear memory mapping. nelineární mapování paměti jsme nepodpořili.

Javascript

We decided to write the interpreter from scratch, because we came to an opinion that a javascript interpreter in Links may be useful. Moreover Mgr. Bednarek said on compilers lecture that writing a compiler or interpreter from scratch happens to few people and we consider design and making of an interpreter an abnormally interesting business, though a bit demanding.

Grammar

The first problem we encountered during writing javascript was getting a norm according to which we could write. We tried hard for about a month to find it on the Internet, but we couldn't find anything. We managed to get ECMA 262 norm descibing ECMA script grammar, but at that time we didn't have a slightest notion that Javascript 1.3 norm exists. We also didn't have a slightest notion that is is governed by ECMA script grammar. Then someone managed to find Javascript 1.1 norm by Netscape (not on Netscape webpages), according to which we proceeded. We got known about Javascript norm about year after finding the 1.1 norma and after starting it's implementation. Javascript 1.1 had grammar simpler than ECMA 262 grammar -- it didn't contain two-reducction conflicts and almost no shift-reduce conflicts. In the ECMA grammar there were tens to hundreds of two-reduction conflicts and shift-reduction conflicts.

We too document-object model from Netscape 2.0 according to a found documentation. We added some enhancements.

Parser

During designing the javascript parser we had a possibility to write a lexical and syntax analyzer either our own, or use already existing tools Bison and Flex for manufacture of syntax and lexical analyzers.

In the case of syntax analyzer a machine manufacture won without doubts, because the grammar is too complicated for manual writing. The grammar contains 131 rules and the automat from Bison has 212 states. The resulting syntax analyzer is therefore LALR(1). There are about 40 shift-reduce conflicts in the grammar which are resolved by shift operation precedence. We did a machine rewrite into bison-acceptable form of the grammar after download. We then manually wrote reduction actions.

We used a tool called Flex for lexical analysis, because manual writing of an automaton is lenghty and usually the resulting automaton contains lots of bugs. For using Flex program it was necessary to redefine functions getc and ungetc , because we didn't manae to determine, how to force Flex read the input from memory and not from a stream.

A problem is related to decision to use Flex and Bison: what if the user will not have these tools installed on his computer? These tools are not as common as for example C compiler on Unix system. Therefore we decided to supply machine-generated outputs from these tools into source distribution. The original Flex and Bison sources are of course also provided in the source distribution, therefore it's always possible to regenerate the automats at any time.

Intercode

According to an advice from Mgr. Bednarek, the javascript interpreter was divided up into parser with intercode generator and intercode interpreter, because without the division, multitasking implementation would be difficult in the automata. Therefore it would be necessary to wait for until the script ends and only then continue.

We chose a tree form of the intercode, now a real DAG, without doing any optimizations. Tree intercode was chosen because a tree is directly outputted by the syntax analysis. If we had to generate quadruple or triple intercode, it's generation would take some time, which would delay interpretation of short scripts, which we wanted to be able to interpret quickly. After consultation with Jan Hubicka (of GCC fame), who said that tree intercode is better to be interpreted by a stack, we decided to write a stack-based interpreter.

The intercode tree contains in every node an information. This information contains line number where the particular piece of code occurs, operator and space for 6 arguments. The arguments could be placed into for example linked list, in an array of variable size or in a different data structure. As operator can never have a variable number of arguments, we chose an array of fixed size which has advantage of constant access time.

Names are translated into keys, list of which is hashed. Identifier have keys within one context unique. This solution has been chosen for ensuring higher interpreter speed. The interface is designed for change of "address spaces" size to be possible, currently it is size of 128 addresses which is a compromise between size and number of collisions. According to theroretical computations, with quadratic size of mean number of collisions is constant. I. e. first 11 records should collide with a probability of  $P\le50\ hash in expected case occurs only after $128\log 128$ records, so that only after defining 896 variables the table "overflows". Moreover in linked lists leading from each hash table slot MFR is applied, which shows results at most $2\times$ worse than optimal self-maintaining list (see \odkaz{RNDr. Václav Koubek: Datové struktury}).

Identification by keys means remarkable speed-up, but sometimes it is harmful --- for example when an object defined in document is to be searched for. Identification by keys is in many cases advantageous , because interpretation consists mainly of identifier searches and with setting according to the norm at least first search is performed always internally in javascript, second search by at least second access too (for example T{document.object.}...).

Security

Javascript interface is designed so that it doesn't allow javascript to access foreign objects. Foreighn objects are all objects in documents from different servers, than the server with the document with the accessing javascript is. All upcalls, which work which work with some objects strictly test these access rights, therefore javascript doesn't see "foreign" objects at all and has a feeling that they don't exist at all.

We were warned that many things can overflow in an interpreter. After that we decided to write a "memory accounting". The user has a possibility to set tp maximum memory amount that javascript is allowed to allocated. After exceeding this limit at the end of an elementary javascript step is the script (context) killed and a "purge" performed on it. This purge is done for the remaining contexts to be able to continue. The accounting can be performed either for each context separately, or for all of them together. If we accounted each context separately, there would be a danger that the script would open too many contexts by which the interpreter would exhaust the memory. Therefore we chose memory accounting for all contexts.

Overflow would be possible to be detected directly in the allocation function, but a problem would arise: how to end up the script from the allocation function. Therefore memory is allocated in allocation function always and overflow check is performed only at the end of function zvykni (zvykni is Czech word for chew). As this function performs only elementary interpreation steps, there is not a danger of allocation of too big a chunk of memory in one step. Therefore this method is safe. According to our compuations the memory should not grow in elemantary step more than in a linear way.

With memory complexity of javascript is also connected a potential danger of stack overflow by running for example an infinite recursion. To prevent this danger we introduced clipping of maximum recursion depth -- the user has again a possibility to set this up in a menu.

Due to possibility of interpreter starvation by running an infinite loop it was decided that the interpreter must return control after finite time and reschedule itself. Scheduling is therefore performed after intercode generation and then after hundred times calling of function zvykni (100 is an empirical constant that shows relatively sufficient ratio of price to complexity), moreover the interpretation is being interrupted also by certain upcalls. Only one thread may run within one context. Running in more contexts at a time is however not limited and even occurs very often. Number of steps until "preemption" occurs can be changed by a compile-time constant. It would be possible to leave te user to change this constant during runtime, but we were fearing that some users would not comprehend the meaning of this setting and would only try what numbers can be entered. The interpreter would then interpret slowly, or would have too big a latency after a keypress.

After experience with other browsers we came to a conclusion that these security measures do not suffice. The attack can be also accomplished for example by printing warning in an infinite loop, permanent changing of the URL and so on. It is this type of attack which no existing browser can defeat. For example the following oneliner code is enough:

while(1)alert("You are a BFU!");
This is not a typical DoS attack which would exhaust all memory or 100 \ authors of other browsers probably didn't think about that. In a case of attack like this, the user is forced to infinitely click on javascript windows due to which he never gets to other control elements of the browser. Therefore this attack is not mounted against computer, but against user. Some browsers (for example Netscape Navigator) even don't wait for user's reaction and are making still more and more windows, which understandably prevents the user from even using the other applications.

Therefore we decided for a solution that is simple, but efficient: to allow the user to kill the script in every window opened by javascript. In case of URL change, opening and closing browser window the browser will ask the user if he allows this action. The user has a possibility to permit, refuse or end the script. This prevents javascript from pestering the user by unsolicited URL changes, closing browser window or flooding with more and more new windows.

Errors and warnings

The javascript standard says what is correct and what is incorrect very ambiguosly. Therefore if it's possible, we ignore errors during interpretation, moreover we gave a freedom to the user to set up level of tolerance to errors. We ignore some errors straight away and only notice the user about them by a warning. We allow all type conversions automatically, as well as turning off warning messages, turning off the whole javascript or immediate stopping of all currently running interpretations, not mentioning the possibility to kill javascript during every window message (if it still makes sense).

Error means definitive end of interpretation in the given context, i. e. all following scripts in given contexts are already ignored, therefore on pages based on javascript and written badly the user had no other option to peek into page source, understand what the script was intended to do, interpret it himself and perform the possible required actions by hand (for example URL change). Killing all scripts causes error in all existing contexxt, therefore on pages containing only links through javascript it is absolutely unsuitable to ask for killing out all scripts.

The decision not to continue in interpretation after occcurence of error is supported by the fact that after a syntax error it would be necessary to make some functions invalid and remove part of the tree. Following interpretations typically reference the previous results, so that it would ultimately lead to an error anyway. To continue after a semantic error it would be necessary to remove problems with stacks of parents and arguments and to do magic with address spaces. It is enough that similar magic is necessary for operations break , return , and continue . Errors "chained" or better "introduced" by previous would happen. Not continuing after an error is usual also in other browsers.

Because javascript authors often do not mention object document when accessing it's members, we decided to allow name resolution in the main address space, but only if the user asks for this. The user has a possibility to tick up global name resolution in a menu. This setting is on by default. We gave this choice to the user because global name resolution is slower than local one and also because some web pages require it.

Images

In the first implementation of images the displayer of images (decoder and dithering engine) was unrestartable, which showed up as a large insufficiency, because display of images was slow and was blocking other actions of the browser. Therefore we had to rewrite the displayer into a restartable version so that it is possible to reschedule in middle of decoding and dithering.

TIFF

We decided to support TIFF graphics format. This format is not common on the web, but sometimes there are important documents that are not available in a different format. We used libtiff library for decoding, because due to many variants and diversity of the format hand writing the decoder would be too time consuming. The TIFF format specifies that decoding may start only after loading the whole file. Therefore TIFFs won'y display during downloading from the network. This is however not a bug of Links, but a feature of the TIFF format.

Other browsers do not support TIFF format, they need a plugin to display it. This is however only a pity, because TIFF is not an unused format.

JPEG~2000

When we were writing decoders for various graphics formats, we also wanted to write a support for the new stadard JPEG 2000, which was interesting for us for its high compression. We had an opinion that this format will surely be used in the future, therefore it's support would be a good investment into the future. Unfortunately we didn't find any open source library supporting this format. Hand writing of wavelet decoder would be time consuming job, therefore we dismissed support of this format. Nevertheless as soon as a platform-independent library supporting JPEG 2000 is available, it will not be a problem to add this format into the browser thanks to flexible design of the image decoding interface.

Others

We decided to place all data structures and executable code into one binary file for portability reason. By this the user is freed from problems where to place the browser data and a problem with finding these data is solved also (different users typically want to place their data into different directories, therefore after copying the binary file Links could not be run. This way just copying the binary file onto a different computer with the same platform is sufficient and the browser will run there.

C was chosen as programming language because C is a universal programming language characterized by efficient expressions, modern flow control, stucture of the data and wealth of operators. Generality of the C language makes it more suitable and more efficient for many tasks than other "more powerful" languages. We were respecting the ANSI C standard to reach portability to as many systems as possible.

Browsing History

In the history the whole formatted page including cursor position and contents of all forms is stored. Therefore if the user goes back in history, he will get to exactly the same place from which he went to the new page. The form content will not get lost even not by page reload, as it is with other browsers.

Portability

For the project to be good we had to ensure code portability. Therefore we wrote in ANSI C code. Nevertheless at least some parts had to be platform dependent. In spite of this fact we had to ensure portability. For this reason we used tools AutoconfAutomake , with which it's possible to make a makefile suited to a particular computer. Without using these tools it would be very labour-intensive to make a portable Makefile and ensure compilation on various systems.

We provide a generated script configure , which serves the purpose of adaptiong configuration for a particular computer. Aside from it we also supply input files for programs Autoconf and Automake , so that the script can be regenerated at any time.

Translations

Links has it's menu translated into many languages. We counted with this feature from the very beginning of the project. Therefore we were facing decision how to make these translations at the beginning of the project. For string translation there is a tool called gettext , which we were using during the development. Gettext however showed to be limiting, because it is not portable, doesn't know other code pages than ISO 8859-2 and in libc different from glibc 2 contains many bugs.

For this reason we dismissed gettext and wrote our own system of language recoding. Languages are being added during compilation time. If a change is made, the developer has to run a script that regenerates source files in C language. The character set translation system works in a similar fashion.

Control

We chose browser control to be by a system of simple interactive offers. In the beginning of the browser when it was still working only in text mode we got inspired by textmode browser Lynx, that is being controlled using hot keys. This control is not much smart, because the user has to remember amount of keys for all possible functions. We kept backward compatibility of control with Lynx (hot keys are therefore the same), because it was widespread at that time and users were accustomed to its control. Moreover we added simple interactive menus, from which the user can invoke all the functions. The who do not remember amount of hot keys have therefore a way of easy control. In the text mode we also made possible to use mouse (using libgpm library). This makes control even easier. In the graphics mode we somewhat changed the control, more emphasis is given on the mouse, for example movement over links using keyboard doesn't work and mouse must be used (remark of the translator: now it works, it has been added as a fature).

We tried for the control to consume minimum screen space (which has a great value especially in the text mode where the resolution is small), therefore the menu is not visible on the first sight and is invoked by escape key. We don't like the overcombined control in other browsers in which we not only badly orient, but also it consumes significant part of the screen area. In Links the screen is consumed only by status line (below) and page title (above). We categorically dismissed a bar for entering URL and a bar with icons, because they consume space and these functions can more easily be invoked by pressing a key than trying to hit a button with mouse.

Fonts

During writing the graphics part we were facing the problem what fonts to use for typesetting. There was a requirement of easy legibility and scalability. We had basically a possibility to use already existing fonts -- for example from X Window or Ghostscript -- or distribute our own fonts. We chose our own fonts for portability reason and independence on other programs and setting of user's computer.

Then we had a choice whether to use vector fonts or bitmap ones. We chose the bitmap variant especially for the reason that to be able to antialias vector fonts in a sufficient way, it would be necessary to generate them in much larger resolution than the resolution of our bitmap fonts is. It would consume very much time, moreover such generated bitmap would have to be resampled, which would also consume lot of time. To add or change font the user would need special typographic tools for work with vector fonts, this way only any graphics program or scanner suffices. If someone wanted to add a font from a book, he would need a specialized software for vectorization of raster format, this way any graphics program (for example Gimp) suffices.

Bitmap files are part of the binary file which again improves portability, because the fonts don't need to be searched for in various directories and also installation is made easier. The fonts are stored in a big resolution in PNG format (which substantially reduces their size) and during display they are being antialiased for their legibility to be improved also at small sizes (practically tested: at 8 pixel size the antialiased font is still legible and X font is already illegible). For better legibility we used font Computer Moderm from Ghostscript. The user can also smoothly set type size, which will definitely appreciate people with sight impairment.

Project Result

The result of the project is a single binary file which can be executed and work in text as well in graphics mode. The code is written portably, so that it is possible to compile it without problems on all Unices and also on OS/2. In graphics mode graphics system X-Window is especially used, which is on said operating systems most used and most portable. This also allows using the browser across the network on a remote computer. Further graphics systems SVGAlib (on Linux) and Pmshell (on OS/2) are supported.

All code is written in ANSI C language in a way to be portable across platforms. We recomment GCC compiler and GNU Make program for compilation, nevertheless other compilers should work too. For making JavaScript interpreter tools Bison and Flex were used. Further for easier compilation and portability programs Autoconf and Automake were used.

For text output no terminal libraries were used because they are not portable. Therefore the output is being displayed using standard ANSI terminal escape sequences with possibility of switching on various nonstandard extensions in menu, for example colours, various types or frames and cursor shape.

Javascript

Upcall and internal function implementation notes

This chapter contains deviations of the implemantation from javascript norm. Especially you find here a list of things that we implemented in a different way and then also those which we implemented in addition to Javascript 1.1 norm by Netscape Corporation.

In grammar we had to allow not ending the statement with a semicolon in addition, therefore making the source texts ambiguous, allowing mixing of array operator and member operators (a$[\ ]$.b nebo a$[\ ][\ ]$ ) and allowing opening a comment in source text ( This script hasn't been survived by any web browser we had access to. All web browser happened to be unmanageable. Some of them ferociously created more and more windows, some of them waited for user's click. But all browsers had to be restarted for a regained usability after running this script.

Links doesn't suffer the lightest problem with these scripts, because the user has the possibility to stop interpretation of all scripts with every dialog created by javascript. Therefore an alert will appear and after pressing "Kill script" the user can continue in his work with the browser.






This script will mount an attack against memory by continuous allocation. It doesn't attack only the browser, but the whole user's computer. With Netscape 4.77 browser on Linux it will cause an unpleasant quarter hour with disk lighted up and very slow reactions of the computer. Remaining browsers also crashed or stopped responding and started to overload the system massively, however with Netscape 4.77 the results were most fatal.

Links will this script end up soon, because it will exhaust memory limit assigned to javascript. The user has a possibility to set up this memory limit in menu.






The above mentioned script expects that it will be saved in the file again-and-again.html . The only tested web browser that neither crashed after interpretation of this script, nore became unmanageable, was Opera browser on Linux operating system, which was furhter usable after light difficulties.

Links this script again doesn't make difficulties, because at every URL change the user is asked whether he wishes to approve the change or dismiss it. The user has again the possibility to immediately stop interpretation of all scripts in a dialog. Moreover if the URL is the same as URL of current page, nothing is performed. In this case nothing will then happen. The attackes can however simply bypass this arrangement simply by making two scripts, which will point each to the other. But in this case the user will be asked by every URL change, therefore he will have an option to stop the "attack".






This script is a light variation on the previous one, we suppose that the script is stored in the file window-still.html . The difference is only in that the script opens new and new windows, which way it clogs up the system, or ar least makes the browser unusable. Even, not a signle tested browser passed this test. In case of IE the operating system Windows 2000 has sever problems with so many windows which left permanent consequences on a task manager having been run, which could not be closed, because it's windows started to be displeyed without the top quarter.

The script again didn't make problems to our user, because before every opening or closing window the user is asked for a confirmation. Therefore a small window was displayed whether the user wished to open new window and the user has again the choice to stop interpretation of the scripts.





}

Infinite recusion is already taken care of in newer browsers. Browsers Opera and Mozilla did nothing, other browsers (Netscape, IE, Konqueror) almost immediately crashed.

If this script is run in Links, a dialog will pop up soon that javascript exceeded limit of maximum function recursion, and that the script is violently terminated. The limit can be again set up in a menu by the user.

We consider the mentioned problems of browser being very severe security problems. We hold the opinion that this should not happen to a correct browser in any case. We counted with this philosophy from the very beginning of designing javascript and thus we armoured the browser agains all attacks from the javascript side. Javascript is strictly written with the goal that the user has full control over the script under all circumstances.

Used libraries

During writing the project we tried to use as few foreign libraries as possible for the code to be as portable as possible and for the user not to have to first install megabytes of various libraries, when he wants to use Links\. We got convinced from our own experience that different versions of libraries are often mutually incompatible (even backwards). Libraries also contain various bugs and security holes, which would be brought into Links browser this way.

Therefore we employed libraries only for image decoding (jpeg and png) and for work with SSL protocol. These libraries are very common and relatively stable and with respect to complexity level of problems they solve it wouldn't pay off to write an own code. Own code could contain more mistakes than time- and and praxis- proven library code. Moreover writing these libraries would require much time for study of documentation describing these standard, not talking about time necessary for testing correctness of implementation.

In text mode there aren't necessary any special libraries and only the standard libc library suffices. For compilation in graphics mode the following libraries are necessary:

The following libraries are not mandatory, but recommended. Without them the browser functions mentioned by the library won't work.

Development and testing

Development environment

Links has been most time developed on Linux operating system. Part (some code of Mikulas Patocka) has been developed on OS/2. For compilation we used mainyl the GCC compiler (mostly versions 2.95.x and 2.96.x) and GNU Make tool (version 3.79). All the code has been written in VIM editor. Autoconf (version 2.13) and Automake (version 1.4) are among support programs we used. Javascript parser has been written using the tool Flex (version 2.5.2) and Bison (version 1.24). X-Window interface has been developed mainly on Linux XFree86 version 3.3.6 and 4.0.2. SVGAlib interface has been tested with SVGAlib 1.9.x and 1.4.x. The fonts have been processed by programs GIMP and ImageMagick.

Portability testing

We performed the portability tests on Unices and machines available in computer labs and on our own (and for us available) computer. We performed test for Alpha through an account on www.testdriver.compaq.com.

Other declared supported operating systems are unfortunately only based on verbal information, because we hadn't a chance to testit. We got known about them from users that wrote us excited that Links runs on even such systems.

We tried to compile with compilers:

Testing the program, spotting bugs and optimization

During the development we used method of regression code testing. This method is for example successfully used during the development of GCC compiler. In our case it consists of testing the browser and with every problem (crash, memory leak, infinite loop, stack overwrite or any other error) determining the direct cause and storing the conditions during which the problem occured. Therefore downloading and storing HTML page on disk and storing keys that have been pressed. After this it is always possible to summon all situation at which some problem or error happened. This allows a control, that the given bug has been really removed and that by later code changes, the bug hasn't been introduced again. It's just necessary to run an automated script which tests the browser on problematic inputs.

We introduced this procedure after we discovered that some bugs repeat. We did very intensive testing of the browser as we didn't practically used any other browser for web browsing except Links.

For removing memory leaks and shooting into memory we used our own technologies. We wrote wrappers around malloc, free, realloc, ... functions. These functions create a red zone around allocated memory and save informations on which line of code the memory has been allocated and how many bytes has beebn allocated. The uninitialized memory is filled with a pattern to earlier detect usage of allocated, but uninitialized memory (the pattern usually causes an error or crash of the program). At the time of freeing the memory, consistency of the red zone is being tested, the freed memory is being filled with pre-arranged pattern and at the end of the program it is being examined whether all memory blocks have been freed.

Employing this mechanism we achieved very efficient detection and removal of memory leaks, writes off allocated space (for example array overflow) and detection of uninitialized or already deallocated memorry usage. For this purpose a tool alread exists -- the electric fence library, however this library is very inefficient and unusable slows down the program and also introduces bugs into the program, which would not exist without usage of thiis library. Therefore we basically were not using this library.

To achieve maximum througghput and optimal code (especially in graphics routines) we were using code profiling technology. We compiled the program with profiling information support, after that we were testing the browser on graphically very demanding inputs. Using the program gprof we then assessed, how much time is being spent in which function and how long doesn which function last. This way we obtained very precious informations for following intensive manual code optimizing based on compiler and machnine code knowledge. This way we managed to optimize the code very efficiently.

Tools recommended for compilation

During writing the javascript interpreter we used tools Flex and Bison for manufacture of automata for lexical and syntax analysis. These tools are not necessary during compilation, because in the distribution we already ship outputs from these tools -- source codes in C language.

For compilation, program Make and any ANSI C language compiler (GCC recommended) is necessary.

For determination of portability we used tools autoconf and automake . These tools serve the purpose of generating the file Makefile . Programs Autoconf and Automake create a platform-independent script configure , which is run during installation and which generates the file Makefile . Neither tools Autoconf nor Automake are necessary for installation, as we ship already generated script configure .

More we created couple of script for generation of fonts, translations and character set tables. These scripts should be portable on Unix systems, but again, they are not necessary for installation, because in the distributiion package there already are C language source files generated this way.

Code taken over from other authors

No character set translations with exception of English and Czech were written by anyone from the author team. The translations are taken over from enthusiasistsm which sent them us.

Conclusion

We think that the project has fulfulled its purpose, because not only we managed to write a quality web browser and learned and tried lot of interesting things, but mainly we learned group communication, interface specification and documentation writing. Which we think that was the main purpose of this project.

The project brought us improvement in C language programming, knowledge how programming should be performed and how not. We learned that the resulting code should run as fast as possible, should not contain any bugs, should fullfill norms and standard, the code should be portable, program function should be limited only to necessary minimum enough for effective work of the user. The program should be monolithic and heterogenous. It should be kept in mind during programming that any bug found in the program can be considered a total failure of the program.

The programmer should not yield to creeping featurism during programming (creeping featurism means wrapping more functions over the existing program like snow on a snowball). Programming should not be performed with bugs and in a way that first the program is written faulty and after that, based on bugreports, iit is being repaired. Programming should not be conveyed in contradiction to RFC's and standards. Methodics and structure should not be employed in cases where it harms speed and effectivity. Code readability should not be enforced against speed, bugs in other libraries should not be worked around in cases where it collides with simplicity, reliability or speed. The program should not perform unnecessary operations which do nothing. Data should not be filtered througgh layers of bureaucratic interfaces, not even for reason of methodicity and/or program or design structure or code legibility. Bugs in program should not be thinked about as permissible because builtin self-testing mechanism should be able to spot or remove them. Programming should not be done in a way not related to functioning of real computer. New functions should not be added into the program in case it is not flawless. The programmer should not stop programming at the moment when the code works, but should check it after himsel in maximum possible manner. The programmer must not admit existence of testers that will test the program and must not take not seriously results of possible bugs in the program. The program should not be hurried up during programming and the programmer should not be forced into shortening the time to market.

Problems during development

We realized during the developmet of the program that honesty roughly doesn't pay off. Namely it was about adhering to standards. The browser worked exactly according to RFC and protocol specification, however as 90 \ it often happens that on such pages the browser displayes less than other browsers, which are unstable and don't adhere to standards. Nevertheless we decided to honour standards anyway, because this is the only way how to principally reach reasonable interoperability and a solution consisting of emulation of bugs of other programs showed as unviable, because bugs in other programs soon after introducing such scheme start to contradict to themselves.

Most this prolem showed probably during development of javascript interpreter. Because javascript interpretations from various vendors differ significantly and the authors of the web pages honour standards wery little.

Plans for the future

We would like to develop Links futher because we think that there is always something to improve. We are convinced about this also by reaction of part of our users in mailing-list (links-list@linuxfromscratch.org ). For example we could support CSS in HTML, floating objects, possibly enhance javascript with new constructions to display more pages that are written wrong.

We plan to implement the eval construction, unallocating unused parts of the tree during runtime, possibly compression of source code kept in memory (for example holding code of the same wording only once). Aside from this we want to expand the existing document object model with for example document.all , document.scripts functions and more.

Used materials

Javascript

  1. lecture notes from Data Structures
  2. lecture notes from Compilers
  3. lecture notes from Probabilistic Algorithms
  4. Proceedings of the SIGPLAN 82 Symposium on Compiler Constructions, Vol. 17, Num. 6
  5. info flex, info bison
  6. Javascript 1.1 norm bby Netscape Corporation, Javascript 1.3
  7. ECMA--262 norm
  8. consulations with Mgr. Mareš, Doc. Sgall, Mgr. Bednárek and colleague Hubička
  9. A. Motwani, P. Raghawan: Randomized algorithms
  10. Aho, Sethi, Ullmann: Compilers: Principles, Techniques and Tools
  11. K. Mehlhorn: Data structures and algorithms
  12. David Gries: Digital Computer Compilers

Graphics

  1. consulations with RNDr. Pelikán
  2. Josef Pelikán: Pokročilá 2D počítačová grafika (Advanced 2D Computer Graphics) (study texts for the lecture on MFF UK)
  3. Josef Pelikán: Počítačová grafika 1 (Computer Graphics 1) (study texts for the lecture on MFF UK)
  4. Charles Poynton: Gamma FAQ
    (http://www.informap.net/$\sim$poynton/GammaFAQ.html )
  5. Charles Poynton: Color FAQ
    (http://www.informap.net/$\sim$poynton/ColorFAQ.html )
  6. norms and standards: JFIF 1.02, ITU T.81, CCIR Recommendation 601, RFC 2083, ISO DIS 10918-1, IEC 61966-2-1, ISO 9241

Graphic drivers

  1. Adrian Nye: Xlib Programming Manual
  2. consultation wit authors of framebuffer and SVGAlib
  3. source codes of Linux kernel
  4. consulation with Mgr. Martinem Beran

Credits

Here is a list of people that somehow participated on development of the browser. Namely it's regarding translations into foreign languages. Also lot of other people were participating into the project their own way, they were the people that were noticing us about vrious bugs. They are not mentioned here, because wouldn't fit these pages. This however changes nothing on their honour of testing the browser.

We would then like to thank this way to all that however participated on development of the browser, should it be by testing, reporting or fixing bugs, translations into foreign languages, their own code, or just sending ideas what to improve.

\halign{ #\hfil&\quad#\hfil\cr Unai Uribarri&History\cr Uwe Hermann&Manual page, command line switch\cr &"-version", opening a link in a new xterm\cr Alexander Mai&Support for xterm under OS/2, fixing includes\cr &for AIX, updating manual pages\cr Dakshinamurthy Karra&porting on Win NT, storing goto hhistory\cr Oleg Deribas&Window title and clipboard support in OS/2\cr Arkadiusz Sochala&Polish translation\cr Dmitrij M. Klimov&Frames in KOI8-R, Russian translation\cr Jurij Raškovskij&Updating russian translation\cr beckers&German translation\cr Armon Red&Icelandic translation\cr Wojtek Bojd\o l&Updating the polish translation\cr Serge Winitzki&Updating the Russian translation\cr Aurimas Mikalauskas&Lithuanian translation\cr Martin Norback&Swedish translation\cr Jimenez Martinez,&\cr Angel Luis,&\cr David Mediavilla,&\cr Ezquibela&Spanish translation\cr Suveg Gabor&Hungarian translation\cr Gianluca Montecchi&Italian translation\cr Sergej Boruševskij&No-proxy-for, Ctrl-W filling-in, SSL\cr Fabrice Haberer-Proust&French translation\cr Cristiano Guadagnino&Updated italian translation\cr Fabio Junior Beneditto&Translation into Brazilial Portuguese\cr Kaloian Doganov&Bulgarian translation\cr Baris Metin&Turkish translation\cr Dmitrij Pinčukov&Ukrainian translation\cr Taniel Kirikal&Estonian translation\cr zas@norz.org&Updated French translation\cr Alberto García&Galician translation\cr Radovan Staš&Slovenian translation\cr Marco Bodrato&Twinterm supportTwintermu\cr Kaloian Doganov&Updating Bulgarian translation\cr Olexander Kunytsa&Updating Ukrainian translation\cr Mediavilla David&Updating Spanish translation\cr Simos Xenitellis,&\cr Alejandros Diamandidis&Greek codepages and translation\cr Stefan de Groot&Dutch translation\cr Carles&Catalan translation\cr Ionel Mugurel Ciobîcă&Romanian translation\cr Petr Baudiš&Using "imgtitle" when "alt" is not present,\cr &adding "LISTING" tag, updating manual page\cr Muhamad Faizal&Indonesian translation\cr Peter Naulls&Support for RiscOS\cr Jonas Fonseca&Danish translation\cr Miroslav Rudišin&Updating Slovak translation\cr }