1288861495FINAL DISSERTATION - PDF Free Download

MIDLANDS STATE UNIVERSITY

FACULTY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION SYSTEMS Name

:

Munyaradzi Zhou

Registration Number

:

R0433042

Module Title

:

Dissertation

Module Code

:

MIM 808

Level

:

2

Year

:

2010

Dissertation Title

:

An investigation of web page content’s effects on web page load time over the internet. A Case Study of Midlands State University

Supervisor

:

Mr T. Tsokota ……………………………………………

This Dissertation is submitted in partial fulfilment of the requirements for the Master of Science Degree in Information Systems Management at Midlands State University.

ABSTRACT This research focused on web page components or objects’ effects on web page load time over the internet and how they can be modified to reduce load time. The internet is of great importance to the current and future societal information age. People not on an Asymmetric Digital Subscriber Lines (ADSL) or faster lines have to spend more time to search for and to access information online. Even broadband users can suffer the slow-loading web pages hence there is need to know how to cut web page content while still maintaining the desirable features of the web site. This research showed that Midlands State University’s web site loads slowly within an average of 33.397 seconds and may even fail to load as indicated by its average downtime of 16.29%. For application performance across the internet to improve, the web site load time must be on average 8 seconds which increases the image of the site hence the perceived quality of services or products offered and thus increasing stakeholders’ satisfaction. Institutes with stakeholders globally dispersed have more reliance on their web sites as a means of communication. These institutes need to maintain their market share through web page load time optimization and create a secure and fast flow of information to any user. The researcher used web site speed measuring software tools (EValid and YSlow), online web site monitoring tools such as IWeb and a mathematical model to make quantitative and qualitative analysis on web page load time in relationship to web page content. The researcher recommends web developers to measure web page response time against threshold values such as the 8 second rule during designing and implementation of a web page and adapt techniques identified which reduce load time. The research shows that an increase in total size of web page content components is directly proportional to increase in response time and web page content components have different effects on web page load time. The evidence in this study shows that web page load time is mainly affected by its web page content characteristics.

i

ACKNOWLEDGEMENTS

Firstly, I thank the Almighty for giving me health and strength to do this work. I thank Mr T Tsokota my encouraging supervisor and coordinator for his constructive criticism to come up with this thesis. This work would not have been possible without his generosity and support.

The comments received from the department members Miss T. G Gwanzura, Mr W Mtembo, Mr F Madzikanda, Mr M Giyane, Mr T Musiiwa, Mr T. G Rebanowako, Miss T. Tagwi and Miss A Mavhunga during the review process were also very helpful and detailed.

I' ve also appreciated the advice, support, and contributions of my Executive Dean, Mr A Mukwembi for his support in this work.

I am also thankful for the support of all the members of my family. Finally, my beloved wife Beang-Tsohle who supported me morally and mentally – thank you.

ii

DECLARATION

I, Zhou Munyaradzi, do hereby declare that this dissertation is the result of my own investigation and research, except to the extent indicated in the Acknowledgements, References and by comments included in the body of the report, and that it has not been submitted in part or in full for any other degree to any other university.

........................................................

....................................................................

Student Signature

Date

M. Zhou

........................................................

......................................................................

Supervisor signature Mr. T. Tsokota

...........................................................

Date

.........................................................................

Chairman signature Mr. A Mukwembi

Date iii

TABLE OF CONTENTS ABSTRACT..................................................................................................................................... i ACKNOWLEDGEMENTS ............................................................................................................ ii DECLARATION............................................................................................................................ iii TABLE OF CONTENTS ............................................................................................................... iv LIST OF FIGURES ...................................................................................................................... vii LIST OF TABLES ....................................................................................................................... viii LIST OF ACRONYMS AND ABBREVIATIONS........................................................................ ix CHAPTER 1: INTRODUCTION ................................................................................................... 1 1.1 INTRODUCTION................................................................................................................. 1 1.2 BACKGROUND OF THE STUDY ..................................................................................... 1 1.3 PROBLEM STATEMENT ................................................................................................... 2 1.4 OBJECTIVES ....................................................................................................................... 2 1.5 RESEARCH QUESTIONS ................................................................................................... 3 1.6 METHODOLOGY ................................................................................................................ 3 1.7 JUSTIFICATION .................................................................................................................. 4 1.8 RESEARCH LIMITATIONS ............................................................................................... 5 1.9 CONCLUSION ..................................................................................................................... 5 CHAPTER 2: LITERATURE REVIEW ........................................................................................ 6 2.1 INTRODUCTION................................................................................................................. 6 2.2 USER’S PERCEPTION ON RESPONSE TIME AND THE 8 SECONDS RULE ............. 6 2.3 PERFORMANCE METRICS ............................................................................................... 8 2.4 COMPUTATION OF PAGE LOAD TIME. ........................................................................ 9 2.4.1 Active/Synthetic Monitoring ........................................................................................ 14 2.4.2 Passive Monitoring/Real User Monitoring ................................................................... 14 2.4.3 Server-side Measurements............................................................................................ 15 iv

2.5 WEB SITE OPTIMIZATION ............................................................................................. 15 2.5.1 Server and Client Hardware ......................................................................................... 15 2.5.2 Content Distribution Network ...................................................................................... 16 2.5.3 Server software and web site programming software .................................................. 17 2.5.4 Web Standards (HTTP 1.0/1.1) .................................................................................... 18 2.5.5 Delayering and parallelization/ Load balancing ........................................................... 19 2.5.6 Web Caching ................................................................................................................ 20 2.5.7 Web performance tuning, configuration and acceleration............................................ 22 2.5.8 Browser technology ...................................................................................................... 23 2.6 CONCLUSION ................................................................................................................... 26 CHAPTER 3: RESEARCH METHODOLOGY .......................................................................... 27 3.1 INTRODUCTION............................................................................................................... 27 3.2 RESEARCH PHILOSOPHY .............................................................................................. 27 3.2.1 Positivism ..................................................................................................................... 28 3.2.2 Interpretivism................................................................................................................ 29 3.3 RESEARCH FRAMEWORK AND APPROACH ............................................................. 29 3.3.1 Research Instruments.................................................................................................... 33 3.4 SAMPLING LOCATIONS ................................................................................................. 34 3.5 QUALITATIVE AND QUANTITATIVE DATA ANALYSIS ......................................... 35 3.6 RESEARCH LIMITATIONS ............................................................................................. 36 3.7 CONCLUSION ................................................................................................................... 36 CHAPTER 4: DATA ANALYSIS, FINDINGS AND DISCUSSION .......................................... 37 4.1 INTRODUCTION............................................................................................................... 37 4.2 YSLOW AND EVALID EXPERIMENTS......................................................................... 37 4.2.1 JavaScripts position ...................................................................................................... 39 v

4.2.2 Expire headers .............................................................................................................. 39 4.2.3 Static components ......................................................................................................... 40 4.2.4 Images........................................................................................................................... 41 4.3 UPTIME AND DOWNTIME EXPERIMENTS USING UPTRENDS .............................. 43 4.4 WEB SITE LOAD TIME TEST RESULTS (24*7) ........................................................... 45 4.5 INTERNET SUPERVISION WEB SITE LOAD TIME TEST RESULTS ....................... 46 4.6 MULTI-BROWSER WEB SITE PERFORMANCE TEST RESULTS ............................. 47 4.7 DISCUSSION AND FINDINGS ........................................................................................ 47 4.8 CONCLUSION ................................................................................................................... 49 CHAPTER 5: CONCLUSION, RECOMMENDATIONS AND FUTURE RESEARCH ........... 50 5.1 INTRODUCTION............................................................................................................... 50 5.2 CONCLUSION ................................................................................................................... 50 5.3 RECOMMENDATIONS .................................................................................................... 51 5.4 LIMITATIONS ................................................................................................................... 54 5.5 FUTURE RESEARCH ....................................................................................................... 54 REFERENCES ............................................................................................................................. 55 APPENDICES .............................................................................................................................. 59

vi

LIST OF FIGURES Figure 2.1 Sovia’s response time formula

10

Figure 2.2 NetForecast’s response time formula

12

Figure 2.3 Factors affecting response time

13

Figure 2.4 Browser market share for North America and Asian by June 2010

24

Figure 4.1 Uptime and downtime sample report of the web page before modified

43

Figure 4.2 Uptime and downtime sample report of the web page after modification

44

Figure 4.3 Active probing report using [Site24x7]

45

vii

LIST OF TABLES Table 2.1 User’s view on response time

7

Table 2.2 Connections per server

25

Table 3.1 Countries and cities selected for monitoring

35

Table 4.1 Object type and size

38

Table 4.2 Effects of Objects size on response time

38

Table 4.3 External Objects

39

Table 4.4 Browser and Location Response Time Differences (seconds)

47

viii

LIST OF ACRONYMS AND ABBREVIATIONS ADSL

Asymmetric Digital Subscriber Line

CDN

Content Delivery Network

CPU

Central Processing Unit

CSS

Cascading Style Sheets

DNS

Domain Name Services

HTML

Hyper Text Mark-up Language

HTTP

Hyper Text Transfer Protocol

ICT

Information Communication Technology

IE

Internet Explorer

ISPs

Internet Service Providers

LAN

Local Area Network

MSU

Midlands State University

RAM

Random Access Memory

RTT

Round Trip Time

SEO

Search Engine Optimization

TCP

Transport Control Protocol

W3C

World Wide Web Consortium

ix

CHAPTER 1: INTRODUCTION

1.1 INTRODUCTION Web sites play an important role in academic institutions especially in universities and colleges as they use them as a method of teaching and a means of online internationalization. Internet users demand the similar level of performance as they experience when connecting to applications via the Local Area Network (LAN). Typically, web sites on LAN load faster mainly because the web page content is stored locally requiring less load time (Eland, 2008). Universities might have stakeholders globally dispersed who study at campuses or purse their studies online and they had now relied more on their web sites as a tool to convey information. The load time of web sites is one of the most important factors affecting its usability, most internet users will just skip a site altogether if it fails to load within an average of 8 seconds (Zona Research Group, 2003). To increase its global market share the institute must standardize its web page load time. 1.2 BACKGROUND OF THE STUDY Information Communication Technology is a strategic goal of Midlands State University (MSU) and now it relies more on its web site and also their web portal. The web site has e-learning facilities where students access their learning materials (notes, assignments, continuous assessments marks, module course outlines, and so on),a chat panel between student and lecturer, news and updates through a notice board for both staff and students. The university’s degree programmes include conventionals, parallel, block-release and visiting students. Block-release and visiting school students usually access their e-learning accounts out of campus since they’re 1

employees in different regions of Zimbabwe and abroad more so to international students during their holidays/potential international students hence they access the site over the internet so the load time of web site need to be analyzed. 1.3 PROBLEM STATEMENT A second delay in web page response time leads to a 16% decrease in customer satisfaction (Aberdeen Research, 2010). Midlands State University’s mission statement on exploiting ICT is being implemented and there is need to meet international standards as they use the web site as a means of teaching, admission of international students, collaboration, e-learning and so on. The institute is in collaboration and is in process of collaborating with different universities around the globe and national colleges such as Trust Academy, Educare and so on who access the web site over the internet. MSU’s web site is mainly used as a medium of communication. Various groups of students such as visiting and block-release access the site externally so the issue of web page load time is important since they claim the web site response time is low and it might fail to load completely. Having more diversified web users means there is need to meet more diversified user needs, which challenges the web developers and systems’ capacity. 1.4 OBJECTIVES From the problem statement a study on how to optimize web page load time was done through an analysis of the following: 1. Identification on how to reduce web page load time through methods such as compression and using web caching.

2

2. Maintaining fewer round trip times for a loading page through reducing the number of HTTP requests. 3. Identification of web page components and objects that affect web page load time. 4. Finding out if the web page load time in two major browsers Internet Explorer and Firefox differ. 5. Finding the relationship between web page size and its uptime/downtime 1.5 RESEARCH QUESTIONS Research questions answered in this research gave an in-depth analysis on how web page content characteristics affect webpage load time over the internet. The following questions were addressed: 1. How can you speed up your web site load time through techniques such as caching and compression? 2. Does the number of HTTP requests have a direct relationship with latency? 3. How do web page objects such as images files affect web site load time and how can these objects be optimized to reduce load time? 4. Does web page load time differ in two major browser technologies, Internet Explorer and Firefox? 5. How does web page weight relate to web page uptime/downtime? 1.6 METHODOLOGY A convenience sampling method was used, the researcher used Midlands State University’s web site (his workplace web site) as a case study since it is easily accessed over the intranet. The researcher used web site online monitoring tools to make quantitative and qualitative analysis on 3

web page load time for the locations including Los Angeles and New York, (US), Gloucester (UK), Dortmund (Germany), St. Petersburg (Russia), Sydney (Australia) and so on. Mainly a quantitative research method was used through using page-speed and page-test measurement tools. YSlow, a software tool was used to make an analysis of web page components since it analyzes web page performance and shows why performance is slow. EValid software tool was used since it clearly shows the identifiable characteristics of a web page and their sizes in kilobytes. A mathematical model was developed and was verified using an online measurement tool IWeb. 1.7 JUSTIFICATION This research will help to cement the institute’s mission on how it can exploit Information Communication Technology on web sites. Also the Information Technology Services software development team will benefit from the recommendations to improve the site’s performance globally. At many organizations, web-based applications are the backbone of business-critical processes such as ecommerce operations, financial transactions, and media. End users have direct connection to applications to initiate and complete transactions from their web browsers. So the success or failure of web-based applications depends on a fast and reliable access conditions for the end users. This is particularly critical for e-commerce applications where even small differences in response times can have a dramatic effect on such metrics as page views, number of searches, and site revenue. Web site optimization will help enhance the organization’s image and increase stakeholder’s loyalty.

4

One of the benefits is helping to protect and even increase online revenue by preventing downtime for critical systems. Another related benefit of preventing downtime is preserving the reputation of the company. 1.8 RESEARCH LIMITATIONS Freeware laboratory tools and demo-software were used since the developer had no funding to buy premium tools. Active probing which was used to carryout web site monitoring may not accurately provide information such as web browsers used by end-users and the internet connection speeds of the synthetic clients which may not adequately represent the internet connection speeds of the actual end-users. Despite active probing constraints, it still provides useful data for measurement and monitoring of web sites in the sense that the information obtained is very similar to what might be experienced by the end-users. 1.9 CONCLUSION The rest of this thesis is organized as follows. In chapter 2, we looked at literature survey/review, an analysis of all relevant research that impinges web page optimization. In chapter 3, we looked at methodology and implementation, this is the section we presented the methodological framework for the research. In chapter 4, Evaluation was done through reporting the results of the work with reference to the methodology discussed in the previous section. In chapter 5, conclusion, summary and future work, we concluded our work and discussed about its future extensions.

5

CHAPTER 2: LITERATURE REVIEW 2.1 INTRODUCTION The emergence of the World Wide Web (Web) as a mainstream development platform has yielded many hard problems for application developers, who must provide high quality of service to application users. Strategies for improving client performance include client-side caching and caching proxy servers (Williams, 1996). However, performance bottlenecks persist on the server side. These bottlenecks arise from factors such as inappropriate choice of concurrency and dispatching strategies, excessive file system access, and unnecessary data copying (Hu et al., 2007). According to King (2008), as bandwidth has increased (more than 63% of the US is now on broadband), so have user expectations towards response times. While the average web page size and complexity have increased significantly since the 1990s to over 315K and 50 total objects, users expect faster response times with their faster wireless and broadband connections. The current guidelines on response times was categorized into two, faster response times for broadband users (3-4 seconds) and slower response times for dial-up users (on the order of 8-10 seconds). 2.2 USER’S PERCEPTION ON RESPONSE TIME AND THE 8 SECONDS RULE Gomez (2009) cited that there is a direct correlation between web performance and business results. If page response time jumps from 2 seconds to 8 seconds, page abandonment increases by 33%. When web site performance is slow during peak periods, more than 75% of consumers said they would go to a competitor’s site (Gomez and Equation Research, 2010). 6

According to two surveys conducted by Forrester Research and Gartner Group (2009), ecommerce sites are losing $1.1 to $1.3 billion in revenue each year due to customer click-away caused by slow loading web sites. For best spider results, keep the web page load time below 12 seconds on a 56k modem. It’s likely that if a site does not start to load within 8 seconds a visitor will go to another web site. This means that a page should not be any more than 30 kilobytes total including text, graphics, html, JavaScript, etc. Table 2.1 shows different response times in respect to the user’s view (Subraya, 2006). Table 2.1 User’s view on response time Response time

User’s view

<0.1 second

User feels that the system is reacting instantaneously.

<1.0 seconds

User experiences a slight delay but he is still focused on the current web site.

<10 seconds

This is the maximum time a user keeps the focus on the web site, but his attention is already in distract zone.

>10 seconds

User is most likely to be distracted from the current web site and looses interest.

Industry wise annual losses due to violation of the eight second rule show the concern on slow downloading pages as reported by Zona Research Group (2003).Web pages must load in less 7

than 10 seconds, beyond that point, there seems to be a significant increase in user frustration, perception of poor site or product quality, and simply giving up on the site (Proctor, 2005; Subraya, 2006).Quality of service ratings are plotted as a function of page load time, it was found that there is a dramatic drop in the percentage of good ratings between 8 and 10s, accompanied by corresponding jump in the percentage of poor ratings(Proctor and Vu, 2005). 2.3 PERFORMANCE METRICS The hit rate is the percentage of all requests that can be satisfied by searching the cache for a copy of the requested object. The byte hit rate represents the percentage of all data that is transferred directly from the cache rather than from the origin server. Another web performance metric is response time or latency. Maximizing the hit rate reduces latency more effectively than policies designed to reduce response times (Cao and Irani, 1997).The term latency is the connection time from start to stop when a web page completely loads (Rodriguez et al, 2001). Response time, or latency, is the time elapsed between two related events – a start and a stop event. In web-based systems, there are two different definitions of response time from the user’s perspective, distinguished by the descriptions of the stop event. The definitions are: (a) The time elapsed from the moment the user requests a web page until the requested page is displayed in its entirety on the user’s machine (b) The time elapsed between the start of the request and beginning of the response, i.e. the page, starts displaying on the user’s machine. One of the issues about response time that draws the attention of web suppliers, especially e-commerce practitioners, is how long the users are willing to wait for a web page to be downloaded before giving up (King, 2008; Pathan and Buyya, 2008). 8

2.4 COMPUTATION OF PAGE LOAD TIME. A rough approximation of the load time for a page can be calculated by dividing the total page size, in Kbytes, by the minimum bandwidth, in Kbytes/s of the user’s connection (Proctor and Vu, 2005; Souders, 2009). Quick loading web sites depend partly on the response of the web server (including the server’s web connection and the current load on the server), partly on the size of the web page asked for by the user and partly on the user’s web connection, processor e.t.c. So managing load time is complex (Snitker, 2004).

According to Sevcik and Bartlett (2001) response time in seconds can be computed as follows: R = 2(D+L+C) + (D+C/2) ((T-2)/M) +D ln ((T-2)/M+1) +max (8P (1+OHD)/B, DP/W)/ (1-sqrt (L)) where: B = Min line speed (bits per second) C = Cc + Cs Cc = Client processing time (seconds) Cs = Server processing time (seconds) D = Round trip delay (seconds) L = Packet loss (fraction) M = multiplexing factor OHD = Overhead (fraction) P = Payload (bytes) R = Response Time (seconds)

9

T = application turns (count) W = Window size (bytes). Savoia' s contribution was in simplifying Sevcik and Bartlett’s formula for page response time as experienced by the user to: Figure 2.1, adapted from Savoia (2001), shows simplication of Sevcik and Bartlett’s formula.

Figure 2.1 Sovia’s response time formula. Sovia’s formula makes several generalizations and assumptions, and its accuracy varies through the possible range of values (it tends to overestimate below eight seconds and underestimate over eight seconds). Actually, even that explanation hides some further assumptions. His reference to eight seconds involves assumptions about typical connection latency and bandwidth, typical web page sizes, and the typical number elements (separately downloadable files) that make up a web page.

•

Page size: Page size is measured in Kbytes, and on the surface, the impact of this variable is pretty obvious: the larger the page, the longer it takes to download. When estimating page size, however, many people fail to consider all the components that contribute to page size—all images, Java and other applets, banners from third sources, etc.

10

•

Minimum bandwidth: Minimum bandwidth is defined as the bandwidth of the smallest pipe between your content and the end user. Just as the strength of a chain is determined by its weakest link, the effective bandwidth between two end points is determined by the smallest bandwidth between them. Typically the limiting bandwidth is between the users and their ISPs.

•

Round trip time: In the context of web page response time, round-trip time (RTT) indicates the latency, or time lag, between the sending of a request from the user’s browser to the web server and the receipt of the first few bytes of data from the web server to the user’s computer. RTT is important because every request/response pair (even for a trivially small file) has to pay this minimum performance penalty

•

Turns: A typical web page consists of a base page [or index page] and several additional objects such as graphics or applets. These objects are not transmitted along with the base page; instead, the base page HTML contains instructions for locating and fetching them. Unfortunately for end-user performance, fetching each of these objects requires a fair number of additional communication cycles between the user’s system and the web site server, each of which is subject to the RTT delay.

•

Server processing time: The last factor in the response time formula is the processing time required by the server and the client to put together [i.e. generate and render] the required page so it can be viewed by the requester. This can vary dramatically for different types of web pages. On the server side, pages with static content require minimal processing time and will cause negligible additional delay. Dynamically created pages (e.g., personalized home pages like my.yahoo.com) require a bit more server effort and computing time, and will introduce some delay. Finally, pages that involve complex 11

transactions (e.g., credit card verification) may require very significant processing time and might introduce delays of several seconds. •

Client processing time: On the client side, the processing time may be trivial (for a basic text-only page) to moderate (for a page with complex forms and tables) to extreme. If the page contains a Java applet, for example, the client’s browser will have to load and run the Java interpreter, which can take several seconds (Savoia, 2001).

NetForecast in September 2006 modified Savoia’s formula by replacing the equal sign with curly equals, signifying "is approximately equal to" as follows: Figure 2.2, adapted from NetForecast (2006), shows modification of Savoia’s formula.

Figure 2.2 NetForecast’s response time formula.

12

Figure 2.3, adapted from NetForecast (2006), shows six factors that affect response time.

Figure 2.3 Factors affecting response time It is possible to evaluate the relative contributions of Bandwidth and Latency to overall response time by simply comparing the two factors [Payload/Bandwidth] and [Turns x RTT] in typical web environments. Page size is the main attribute which affect web page load time after an analysis of the equations which the researcher will focus on. Page load time measurement methods for a web site can be grouped into three categories: •

Active/Synthetic Monitoring,

•

Passive Monitoring/Real User Monitoring, and

•

Server-side measurement.

13

The three categories are described in detail below. 2.4.1 Active/Synthetic Monitoring With active probing, a few geographically distributed synthetic clients, called the agents, are used to periodically probe the server by requesting a set of web pages or operations (the agents collect performance data from scheduled tests that simulate the way users interact with web sites and view video content). The agents mimic users from different locations over the world. The measurements obtained are representations of latencies that may be experienced by the end users. Active probing therefore produces real world results from faked end users. In order to produce results representative enough of the massive Internet traffic from largely diversified users across the world with varied connection speeds, active probing requires machines with different capabilities to be setup in many different locations and large amounts of measurement to be taken daily as is done by an American InternetsupervisionTM company (Gomez, 2009). 2.4.2 Passive Monitoring/Real User Monitoring Client-side measurements make use of instrumentations such as scripting languages or specialized software at the client side to acquire the desired information. Cookies might be used to record information at the client side. A cookie is a small parcel of information issued by the web server to a web browser to uniquely and anonymously identify the user using that particular browser (Clickstream, 2003). Rajamony and Elnozahy (2001) used JavaScript to instrument hyperlinks in a set of web pages for measuring response time. When an instrumented link is activated, the current time is determined and remembered, and then the request is sent to the web server. After the requested page and its embedded elements have been fully loaded to the browser, the client browser computes the response time as the difference between the current 14

time and the previously stored time. The response time can be transmitted to a record-keeping web site on a different server from the originally responding web server for further analysis. 2.4.3 Server-side Measurements Server-side measurement performs measurements of various performance metrics at the web server. Server log analysis is the most commonly used method. A web server log contains fields that describe each request a browser makes from the server (Rosenstein, 2000; Gomez 2009). 2.5 WEB SITE OPTIMIZATION To catch the attention of web site users, web page load time need to be improved by addressing or through different approaches outlined below. 2.5.1 Server and Client Hardware Hardware forms the structure on which the web operates. Hardware making up the client, server and network has an impact on response time. On the client and server sides, hardware aspects that are commonly related to response time include the processing power of the central processing unit (CPU), the capacity and speed of the memory (random access memory – RAM), cache, bus, and disk. The single biggest hardware issue affecting web server performance is RAM. A web server should never ever have to swap, as swapping increases the latency of each request beyond a point that users consider "fast enough". This causes users to hit stop and reload, further increasing the load. It is necessary and one can control the maximum clients setting so that the server does not spawn so many children when it starts swapping. The procedure for doing this is simple: determine the size of your average Apache process, by looking at your process list via a 15

tool such as top, and divide this into your total available memory, leaving some room for other processes. Beyond that the rest is mundane: the web server hoster must acquire a fast enough CPU, a fast enough network card, and fast enough disks, where "fast enough" is something that needs to be determined by experimentation. Capacity testing is conducted with normal load to determine the extra capacity where stress capacity is determined by overloading the system until it fails, which is also called a stress load to determine the maximum capacity of a system (Subraya, 2006). 2.5.2 Content Distribution Network The user' s proximity to a web server has an impact on response times. Deploying content across multiple, geographically dispersed servers will make pages load faster from the user' s perspective. A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users (Pallis and Vakali, 2006; Vakali and Pallis, 2003; Peng 2003). The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen (Johnson et al., 2000). Some large Internet companies own their own CDN, but it' s cost-effective to use a CDN service provider, such as Akamai Technologies is the market leader with 80%, Mirror Image Internet, or Limelight Networks. For start-up companies and private web sites, the cost of a CDN service can be prohibitive, but as the target audience grows larger and becomes more global, a CDN is 16

necessary to achieve fast response times. Switching to a CDN is a relatively easy code change that will dramatically improve the speed of a web site as a single CDN Content distributor link the origin server in the US with surrogate servers each representing in all places such as Japan, United Kingdom, and Australia. It also reduces the need to invest in more powerful and yet expensive servers or more bandwidth in order to cope with an increasing user population as well as more demanding applications and web content. Meanwhile, it also improves site availability by replicating static content in many distributed locations. CDN is normally used to serve static content such as images or multimedia objects. However, the use of CDN techniques to serve dynamic data is increasing (Pallis and Vakali, 2005). 2.5.3 Server software and web site programming software Performance of web servers is critical to the success of many corporations and organizations on average, Apache spends about 20-25% of the total CPU time on user code, 35-50% on kernel system calls and 25-40% on interrupt handling. For systems with small RAM sizes, the web server performance is limited by the disk bandwidth. For systems with reasonably large RAM sizes, the TCP/IP stack and the network interrupt handler are the major performance bottlenecks. Apache shows similar behavior on both the uniprocessor and the SMP systems (Hu et al., 2001). A JAWS is competitive with state-of-the art commercial web server implementations. The research reveled that JAWS does not perform as well as Enterprise or IIS for small files. However, as the file size grows, JAWS overtakes the other servers in performance. Two conclusions can be drawn from these results. First, there are still performance issues beyond the 17

scope of this paper that require research to determine how to improve JAWS performance for transferring small files. Second, it affirms our hypothesis that a web server can only achieve optimal performance by employing adaptive techniques (Hu et al., 2007). Many software technologies are involved in different aspects of the Web, including client and server operating systems, Web browsers, server software, databases, middleware architectures, and scripting engines and languages. There are some relevant performance tips given by (Killelea, 2002): •

The Web browser seldom becomes the bottleneck that causes lengthy response times but

its settings may well be tuned to achieve slightly better performance. For example, not verifying cached pages from the server makes retrieving of the cached pages faster even though it risks the user viewing out of date pages. •

On identical PC hardware, Linux generally gives better performance as a web client than

Windows. •

UNIX is more stable and has better performance than other server operating systems

because of its longer development history and open nature. •

To improve database performance, actions can be taken include to use precompiled SQL

statements called prepared statements, cache the results of the most frequently used queries, and use a connection pool rather than setting up a connection for each database query. 2.5.4 Web Standards (HTTP 1.0/1.1) According to Manjhi (2008), HTTP (Hyper Text Transfer Protocol) is an application level protocol over which all web traffic flows. It defines how clients (browsers, spiders etc.) request the web servers for web pages and how the web servers transfer web pages to the clients. All 18

HTTP traffic takes place over TCP- a reliable, transport layer protocol. Each HTTP message is either a request or a response. HTTP/1.1, the latest version of HTTP uses persistent TCP connections with pipelining. This makes it better than its previous versions in the following ways: • Previous versions of HTTP used non-persistent connections in which separate HTTP connection for each object referenced in the HTML base page was required. Thus, each object suffered from a minimum delay of two round-trip-times. Hence, the minimum total delay in such a situation for accessing a page that referenced ten inline images was twenty RTT. The problem was partially alleviated using multiple parallel connections. • TCP slow start further compounded the delay. The delay introduced assumed more importance in view of the average small size of web objects (roughly 4KB). HTTP/1.1 is currently the protocol of choice for web transactions. HTTP/1.1 includes a number of elements intended to make caching work as well as possible. The goal is to eliminate the need to send requests in many cases by using an expiration mechanism, and to minimize the need to send full responses in many other cases by using validations. The basic cache mechanisms in HTTP/1.1 (serverspecified expiration times and validators) are implicit directives to caches. In addition, the server or client use Cache-Control header when they want to provide explicit directives to the HTTP caches. 2.5.5 Delayering and parallelization/ Load balancing Web servers need to handle many requests concurrently and therefore need to perform multithreading or multitasking in order to achieve parallelism. Additional parallelism can be achieved by using multiple servers in conjunction with a load balancer. The function of a load 19

balancer is to distribute requests among the servers. One method of load balancing requests to servers is via DNS servers (King, 2008; Souders, 2009). 2.5.6 Web Caching Web caching systems can lead to significant bandwidth savings, server load balancing, perceived network latency reduction, and higher content availability (Barish, 2000). Caching is the most important and most widely used performance improvement technique for web-based systems (Killelea, 2002). The idea of caching is to keep frequently accessed data at locations close to the clients such as the client browsers or web proxy servers. Retrieving data from these caching locations will not only reduce transmission time across the Internet, but also reduce workloads imposed on the web server. Thus, caching trades storage space and currency of content for access speed. According to Manjhi (2008) there are several incentives for having a caching system in a network and these include: •

Web caching reduces bandwidth consumption, thereby decreases network traffic and lessens network congestion.

•

It reduces access latency due to two reasons:

(a)Frequently accessed documents are present in one of the nearby caches and thus can be retrieved faster (transmission delay is minimized). (b)Due to the previous reason, network traffic is reduced and the load on origin servers gets reduced. Thus, documents not cached can also be retrieved relatively faster. •

Web caching reduces the workload of the web server. 20

•

If the remote server is not available due to remote server’s crash or network partitioning, the client can obtain a cached copy at the proxy. Thus, the robustness of the web service is enhanced.

•

It allows information to be distributed more widely at a low cost (as cheaper servers can be installed).

2.5.6.1 Types of Web caches Browser caches Browsers and other user agents benefit from having a built-in cache. A browser cache is limited to just one user, or at least one user agent. Thus it gets hits only when the user revisits a page. When the web server sends an explicit expiration time for each file, caching can eliminate the request-and-response routine otherwise the browser will need to validate its cache (Wessels, 2001; Addison, 2006). Caching proxies Unlike browser caches, proxy caches service many different users at once. Since many different users visit the same popular web sites, caching proxies usually have higher hit ratios than browser caches. Caching proxies are essential services for many organisations including ISPs, corporations and schools. Caching proxies are normally located near network gateways (i.e. routers) on the organization’s side of its internet connection. In other words a cache should be located to maximize the number of clients that can use it, but it should not be on the far side of a far slow congested network link. A proxy splits a web request into two separate TCP connections, one to the client and the other to the server. Network administrators like interception caching because it reduces their administrative burdens and increases the number of clients using the cache. 21

Surrogates They are also called reverse proxies, server accelerators and include other devices that pretend to be origin servers. CDNs use surrogates to replicate information at many different locations. Another common use for surrogates is to accelerate slow web servers. Acceleration is accomplished simply by caching the server’s responses. Surrogates are also often used to decrypt HTTP/TLS connections. Such decryption requires a fair amount of processing power. Rather than putting that burden on the origin server itself, a surrogate encrypts and decrypts the traffic. Surrogates that cache origin server responses are not much more different from caching proxies. Surrogates hit ratio is high, about 90% or more (Wessels, 2001). 2.5.7 Web performance tuning, configuration and acceleration Web Performance tuning is about getting the best possible performance from the web and this include tuning web server software, streamlining web content, getting optimal performance from a browser, tuning both client and server hardware, and maximizing the capacity of the network itself (Killelea, 2002). A technique for improving the performance of web sites is to cache data at the site so that frequently requested pages are served from a cache which has significantly less overhead than a web server. Such caches are known as http accelerators or web server accelerators. In certain situations, it is desirable to scale a web server accelerator to contain more than one processor. This may be desirable for several reasons: • Multiple nodes provide more cache memory. Web server accelerators have to be extremely fast. In order to obtain optimal performance, all data should be cached in main

22

memory instead of on disk. Multiple processors provide more main memory for caching data. • Multiple processors provide higher throughputs than a single node. • Multiple processors functioning as accelerators can offer high availability. If one accelerator processor fails, one or more other accelerator processors can continue to function. • In some situations, it may be desirable to distribute an accelerator across multiple geographic locations (Song et al., 2007). 2.5.8 Browser technology Browser compatibility is important for several reasons, and designers must design to accommodate the most popular and up-and-comer browsers. Web site visitors have different browsers such as Mozilla Firefox, Microsoft Internet Explorer, Apple Safari, and Google Chrome and a web site may look fine in a web server browser but may look terrible in other client browsers. It’s necessary to test how the web site displays in the mentioned major browsers. To help ensure proper rendering across all browsers, developers must adhere to supported W3C standards when they code. Designers who want to innovate can take advantage of emerging standards like HTML 5 and CSS3. However, this approach comes with additional risk and the need for additional testing because these standards have not been locked down yet. For example, there’s no agreed-upon standard for displaying video in HTML 5. Therefore, companies without strong depth on the HTML coding bench should let Adobe worry about cross-browser compatibility and build their innovative functionality using the Flex platform. Older versions of Netscape Navigator are slower at loading animated gifs and graphics than Microsoft Internet 23

Explorer. Apparently, the newest version of Netscape Navigator has remedied this problem (Siskind, et al., 2007; Manning, et al., 2009). According to Gomez (2009) browser performance variation is almost an 8 second difference in response time between the slowest and fastest browsers. Web site managers must optimize performance for the browsers that most of their end-users use and this can also be achieved through online surveys in respect to the four major browsers. Figure 2.2, adapted from Gomez (2010), shows browser market share for North America and Asian by June 2010 and generally Internet Explorer has the largest market share.

Source: http://www.gs.statcounter.com Figure 2.4 Browser market share for North America and Asian by June 2010.

24

Table 2.2, adapted from Souders (2009), shows relationship between browser technology and web standards. Table 2.2 Connections per server Browser

HTTP/1.1

HTTP/1.0

Internet Explorer 6,7

2

4

Internet Explorer 8

6

6

Firefox 2

2

8

Firefox 3

6

6

Safari 3,4

4

4

Chrome 1,2

6

6

Opera 9,10

4

4

Users who access AOL and Wikipedia using IE 6 and 7 benefit from the decision to downgrade to HTTP/1.0. They get resources downloaded four at a time and still benefit from reusing TCP connections. Most other browsers don’t increase the connections per server based on HTTP version, as shown in table 2.2 above. If you have a large number of IE 6 and 7 users, you might want to consider downgrading to HTTP/1.0. This will increase parallel downloads (for IE 6 and 7) without the cost of an extra DNS lookup. To make all users benefit from increased parallelization, domain sharding is the preferred solution. IE 8 and Firefox 3 both increase the number of connections per server from two to six. Striving to increase the number of parallel downloads for older browsers could result in too many parallel 25

downloads for these next-generation browsers. If the browser opens too many connections it could overload the server as well as degrade download efficiency on the client (Souders, 2009). 2.6 CONCLUSION In this chapter, related literature has been studied particularly server and client considerations, web standards and caching, browser technology and performance metrics such as response time were discussed. The next chapter focused on the problems the research addressed, and methods used to address the research problems.

26

CHAPTER 3: RESEARCH METHODOLOGY 3.1 INTRODUCTION An exploratory research design was considered the most suitable approach in view of the nature of the problem being investigated. To determine the factors affecting web page load time over the internet the researcher used software tools, online web page tools and mathematical modeling. The research methods used to conduct the research are also explained in this chapter. The research focuses mainly on response time measurement, analysis and web page optimization. The motivational literature clearly shows that most researchers studied web-page load time as a constituent of client-side and server-side determinant (Hu, et al, 2007; King 2008). E-commerce mainly focused on users’ perception on response time (Gomez and Equation Research, 2010; Subraya, 2006; Forrester Research and Gartner Group 2009). This research has filled the gap on web page response time by investigating web page characteristics in relationship to their type and size. 3.2 RESEARCH PHILOSOPHY A research philosophy is a belief about the way in which data about a phenomenon should be gathered, analysed and used. The term epistemology (what is known to be true) as opposed to doxology (what is believed to be true) encompasses the various philosophies of research approach. The purpose of science, then, is the process of transforming things believed into things known: doxa to episteme. Two major research philosophies have been identified in the Western tradition of science, namely positivist (sometimes called scientific) and interpretivist (also known as antipositivist) (Galliers, 1991). 27

3.2.1 Positivism Positivists believe that reality is stable and can be observed and described from an objective viewpoint (Levin, 1988), that is without interfering with the phenomena being studied. They contend that phenomena should be isolated and that observations should be repeatable. This often involves manipulation of reality with variations in only a single independent variable so as to identify regularities in, and to form relationships between, some of the constituent elements of the social world. Predictions can be made on the basis of the previously observed and explained realities and their inter-relationships. "Positivism has a long and rich historical tradition. It is so embedded in our society that knowledge claims not grounded in positivist thought are simply dismissed as a scientific and therefore invalid" (Hirschheim, 1985, p.33). This view is indirectly supported by Alavi and Carlson (1992) who, in a review of 902 Information Systems (IS) research articles, found that all the empirical studies were positivist in approach. Positivism has also had a particularly successful association with the physical and natural sciences. There has, however, been much debate on the issue of whether or not this positivist paradigm is entirely suitable for the social sciences (Hirschheim, 1985; Susman, 1985), were calling for a more pluralistic attitude towards IS research methodologies (see for example Kuhn, 1970; Bjorn-Andersen, 1985; Remenyi and Williams, 1996). While we shall not elaborate on this debate further, it is germane to our study it is also the case that Information Systems, dealing as it does with the interaction of people and technology, is considered to be of the social sciences rather than the physical sciences (Hirschheim, 1985). Indeed, some of the difficulties experienced in IS research, such as the apparent inconsistency of results, may be attributed to the inappropriateness of the positivist 28

paradigm for the domain. Likewise, some variables or constituent parts of reality might have been previously thought immeasurable under the positivist paradigm - and hence went unresearched (after Galliers, 1991). 3.2.2 Interpretivism Interpretivists contend that only through the subjective interpretation of and intervention in reality can that reality be fully understood. The study of phenomena in their natural environment is key to the interpretivist philosophy, together with the acknowledgement that scientists cannot avoid affecting those phenomena they study. They admit that there may be many interpretations of reality, but maintain that these interpretations are in themselves a part of the scientific knowledge they are pursuing. Interpretivism has a tradition that is no less glorious than that of positivism, nor is it shorter. 3.3 RESEARCH FRAMEWORK AND APPROACH Web page response time is a quantitative performance metric and thus measurement of its values is the most straightforward yet essential way to deal with it. Measurement produces a set of values that show what the response time of a web page is at a specific time. Mathematical modeling can be used to present response time at an abstract level. This research proposes a mathematical model that explains how response time is related to particular web page characteristics. If a model is derived from a set of measured data that summarises the dataset by some statistical parameters, it is called a descriptive model. If a model is a description of a process or system elements that could have ideally produced the dataset to be examined, then it is called a constructive model (Salamatian & Fdida, 2003). A descriptive model views the system as a black box and a constructive model takes a white box view to the system. In either case, a 29

model is a useful complement to measurements as well as simulation. For example, a descriptive model can be used to interpret data obtained from measurements while a constructive model can be used to produce input data for simulations. In this research a descriptive model was used. Web page content can be identified into distinguishable characteristics such as html files, number of objects, number of images, internal/external CSS files, external scripts and html size. The researcher had used these characteristics to come up with a mathematical model on response time of a web page. Total web page size (measured in bytes) which consists of embedded objects such as images, lines of code including comments(HTML), elements such as CSS and JavaScript shows a direct proportion to response time in relationship to the client connection speed(with an average connection speed of a 56k modem for a response time of 12 seconds (see Section 2.2). The model provides a systematic way to check particular aspects that impact response time. The model was supported by a software tool YSlow that checks a web page against the corresponding complexity dimension and thus identifies design deficiencies that cause the web page to deliver slow responses. The model is can be summarised as response time varies inversely as the sum of the web page size (object(s) size) and partly as the sum of Round Trip Latency with the change in packet loss. Using different object sizes with their corresponding response time obtained through YSlow experiments the model equation: Response Time= (Object size/4480)-0.09728 ± 0.7seconds (packet loss factor) was derived.

The model supports the web page developer to check design quality of the web pages in terms of response time. The model was verified using YSlow. Once the web developer is satisfied with the web page, both in terms of functionality and estimated response time (amongst other non-functional requirements), the web page can be released and be used as part of a functioning system. At this time, a practical way to observe ongoing response times of specific in-use web pages is desirable and necessary. Continuous 30

monitoring can be carried out to accomplish this purpose of observing response time over a period of time. Monitoring involves the extraction of data during program execution and this was done using online monitoring tools the researcher subscribed to. The benefits of monitoring are it allows detection and identification of potential or existing problems or weaknesses in the monitored system and it is easier to propose actions required in order to rectify the problems or remedy the weaknesses, and thus, improve the performance of the monitored system. Active probing/Synthetic monitoring is a common method used for measurement and monitoring of web sites (see Section 2.4.1). Synthetic clients are placed in different places around the world to probe web servers for resources hosted by the servers to monitor the server’s availability, reliability, and response time. There are many companies providing monitoring services at a premium, limited account or professional level and these include Uptrends, (http://www.uptrends.com), Site24/7 (http://www.site42*7.com), InternetSupervisionTM (http://www.internetsupervision.com), Gomez (http://www.gomez.com), Pingdom (http://www.pingdom.com) and IWeb (http://www.web sitepulse.com). A common characteristic of the services provided by these companies is that the measurement and monitoring are done externally and independently of the measured/monitored servers. The measurement and monitoring results obtained is a representative of much larger sets of transactions between the clients and the servers on the Internet.

During the web page development phase, the page developer would like to have an idea of the expected response times of individual pages before they are published to the web site. The developer can use the online measurement tools proposed in this research to examine response times of the individual pages. Before measuring the response times, the developer can first assign a maximal response time for the web site basing on e-commerce web page load time standards. Alternatively, different maximal response times can be assigned to different pages based on their importance, criticality, and complexity of the functionality encompassed within the page. After 31

the measurement process, the maximal response times can be compared with the measured response times. Web pages that exceed the assigned maximal response times can then be identified. Remedial action can be taken to modify the web pages so that their response times can be improved. In order to modify the web pages to improve their response times, a web page designer or developer will need to understand and identify the individual characteristics of web pages that influence and impact on their response times. A mathematical model described earlier in this chapter and YSlow were important in this context since they were used to explain the relationship between web page characteristics and response time. YSlow version 1.1 is a freeware tool for Firebug (a Firefox plug-in) which runs on Firefox (developed by Yahoo), calculates response time. It computes grades of the web site against 13 best practice guidelines. YSlow analyzes web page performance and tells you why performance is slow. The tool can be accessed from http://www.developer.yahoo.com/yslow/. The web page designer or developer can check the web pages against the model to identify any design deficiencies that are known to lead to poor response times. Modification efforts can then be focused on those specific aspects to improve the response times. Even though measurement during the web page development phase helps to reduce poorly designed web pages in terms of response time, it does not guarantee satisfactory web page response time during the day to day operation of the web site. During daily operation, the web pages may still exhibit slow response time due to certain other factors. Sometimes web pages which perform very well at the launch time will start developing poor response time as the system matures. Web site monitoring tools can be used to identify these web pages. If a web page consistently exhibits slow response time, irregardless of factors such as the number of concurrent users accessing the web site, it may indicate that the web page needs to be 32

investigated and modified to achieve better response time. At this stage, the web pages can again be checked against the response time model to identify further areas for improvement. Different software tools such as EValid version 4.0.100.1190 manufactured by Software Research (a demo) and YSlow were used to identify problems, delineate realistic response time expectations, highlight poorly responding pages and support effective remedial action in relationship to the mathematical model. Web site monitoring was done to check the uptime and response time of web pages and this will help developers to modify web page content to suit the desired standards. An uptrends online tool which the developer subscribed to online at a premium account was used to carryout the monitoring. 3.3.1 Research Instruments Research instruments used to conduct the research include laboratory experiments using (software tools and online testing tools) and mathematical modeling Laboratory experiments. In order to minimize the impact of various factors other than web page characteristics on web page response time, especially network conditions, experiments were conducted in a controlled environment, i.e. a LAN. Isolating the effects of these factors from web page characteristics as far as possible is important to identify the significance of web page characteristics in affecting response time. YSlow and EValid were downloaded for data collection, performance monitoring and analysis.

33

Mathematical analysis Empirical data obtained from online experiments were analysed to find out relationships between web page characteristics and response time. The mathematical model developed was used to describe these relationships in a clear yet simple way. 3.4 SAMPLING LOCATIONS A random stratified sampling technique was used by selecting major cities in different parts of the world’s continents which are supported by monitoring online tools which the researcher subscribed to excluding Harare and Johannesburg which were tested using tracert and ping commands. The following instructions were given to the researcher’s former classmates in Harare and Johannesburg to test response time of the web page. Using ping command: On a PC go to: Start - > Click Run - > Type "cmd" and press enter - > Type "ping [hosting service URL]" and press Enter. The response should include something like this: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss) If the response you get doesn' t say "0 % loss" in your result, then you don' t have to read the results any further - look for another host. It means that an HTTP packet (data) is getting lost on the way. If the loss is 0% then look for the line that says: Minimum = [number] ms, Maximum = [number] ms, Average = [number] ms. Using the tracert command: On a PC go to: 34

Start -> Click Run -> Type "cmd" and press Enter -> Type "tracert [hosting service URL]" and press Enter. Table 3.1 Countries and cities selected for monitoring. Country

City

US

Los Angeles

US

Washington DC

UK

Gloucester

Germany

Dortmund

IL

Chicago

China

Beijing

Australia

Sydney

Chile

Santiago

South Africa

Johannesburg

Zimbabwe

Harare

3.5 QUALITATIVE AND QUANTITATIVE DATA ANALYSIS Two approaches were used (quantitative and qualitative) in a mixed method to minimize weaknesses and reach deeper understanding. Bryman and Bell (2007) state that mixed methods can be used when a researcher is not confident enough to rely on just one method alone. Triangulation will be used to combine quantitative and qualitative methods. It is generally accepted in action research that researchers should not rely on any single source of data, interview, observation, or instrument (Mills, 2003, p. 52). In research terms, this desire to use 35

multiple sources of data is referred to as triangulation. Golafshani (2003) made the point that in qualitative research reliability and validity is replaced by the idea of trustworthiness, which is defensible and recommended triangulation as a way to establish this trustworthiness of the results of a study. Quantitative analysis was mainly used to gather information about the web pages content in relationship to load time and uptime. A qualitative approach was used for the interpretation of the findings. 3.6 RESEARCH LIMITATIONS An objective approach was done in the research although there were challenges such as time constraints since the researcher was supposed to be at work. The researcher had no funding to buy premium web page testing tools therefore he used demo softwares and limited users online accounts subscriptions. The locations selected for monitoring and testing were mainly from the developed countries since the facilities are currently being offered there. 3.7 CONCLUSION The methodology used in this study was outlined and its advantages were stated. This chapter introduced the problems addressed by this research and the methodology used to address the problems. The essence of the problems addressed is indeed web page response time. The next chapter presented the interpretation and analysis of data collected.

36

CHAPTER 4: DATA ANALYSIS, FINDINGS AND DISCUSSION

4.1 INTRODUCTION This chapter focused on the presentation, interpretation, analysis and discussion of data from the investigation carried out. Analysis and interpretation of the results focused on the research questions listed in Chapter One. The researcher attempted to link the research findings to literature review discussed previously. Comparisons in the content of the web page were made easier since experiments were done before and after upgrading of homepage web page. 4.2 YSLOW AND EVALID EXPERIMENTS MSU homepage has a total of 24 HTTP requests on average and a total weight of 203.3Kbytes. The number of HTTP requests must be minimized since they increase web page load time. The page contains 4 external JavaScript scripts. Combining them into one will result in fewer HTTP requests and this can clearly be identified since there were no JavaScripts in the old homepage and the average number of HTTP requests was 13. The average response time of the old home page was 49.83 seconds mainly because of the image moto.gif which had a total weight of 178 200 bytes(75 % of total web page size) and had a missing width attribute which result in slow web page loading. Specifying the width and height attribute of an image makes a web page to load the entire objects before the image is fully uploaded hence reducing response time. The page has a total of 24 components which have a response time of 5.854 seconds. Six of the components need to be compressed with gzip which will reduce the total page weight by 55Kb hence response time.

37

Table 4.1 Object type and size Object Type

Size(bytes)before upgrading Size(bytes) after upgrading

HTML

4 313

4 089

HTML Images 232 084

117 928

CSS Images

0

0

Total Images

232 084

117 928

JavaScript

0

23 703

CSS

589

1 024

Multimedia

0

0

Other

0

0

Using the mathematical model a summary of web page response time increase/decrease is as tabulated below. Table 4.2 Effects of Objects size on response time Object

Increase/(Reduction) in response time as a percentage with respect to change in size of the object

HTML

(4.72%)

HTML Images

(45.91%)

JavaScript

552%

CSS

25%

38

Table 4.3 External Objects

External Object

Quantity before upgrading Quantity after upgrading

Total HTML

1

1

Total HTML Images 10

13

Total CSS images

0

0

Total Images

10

13

Total Scripts

1

4

Total CSS imports

1

1

Total frames

0

0

Total Iframes

0

0

Total objects

13

19

4.2.1 JavaScripts position There are 3 JavaScript scripts found in the head of the document • • •

http://www.msu.ac.zw/msucopy/loadtemp.js http://www.msu.ac.zw/msucopy/jquery-1.2.6.pack.js http://www.msu.ac.zw/msucopy/MSUgallery.js

These JavaScripts need to be put at the bottom of the page to reduce latency which results in an increase in web page load time. 4.2.2 Expire headers There are 23 static components without a far-future expiration date. • • •

http://www.msu.ac.zw/msucopy/global.css http://www.msu.ac.zw/msucopy/loadtemp.js http://www.msu.ac.zw/msucopy/jquery-1.2.6.pack.js 39

• • • • • • • • • • • • • • • • • • • •

http://www.msu.ac.zw/msucopy/MSUgallery.js http://www.msu.ac.zw/msucopy/index_files/cbjscbindex.js http://www.msu.ac.zw/msucopy/images/t_bg.jpg http://www.msu.ac.zw/.../Midlands_State_University.jpg http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex1_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex2_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex3_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex4_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex5_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex6_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex7_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex8_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex9_0.gif http://www.msu.ac.zw/msucopy/slide/mainpic5.jpg http://www.msu.ac.zw/msucopy/left.gif http://www.msu.ac.zw/msucopy/play.gif http://www.msu.ac.zw/msucopy/right.gif http://10.10.1.10/newmsu/images/arrow1.jpg http://www.msu.ac.zw/msucopy/images/portal.jpg http://www.msu.ac.zw/msucopy/images/elearningr.jpg

These components need to be specified the day, month and year which they will expire so that the content will become cacheable hence reduce web page load time. 4.2.3 Static components There are 23 static components that are not on a Content Delivery Network (CDN). http://www.msu.ac.zw/msucopy/global.css http://www.msu.ac.zw/msucopy/loadtemp.js http://www.msu.ac.zw/msucopy/jquery-1.2.6.pack.js http://www.msu.ac.zw/msucopy/MSUgallery.js http://www.msu.ac.zw/msucopy/index_files/cbjscbindex.js http://www.msu.ac.zw/msucopy/images/t_bg.jpg http://www.msu.ac.zw/.../Midlands_State_University.jpg http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex1_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex2_0.gif 40

http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex3_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex4_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex5_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex6_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex7_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex8_0.gif http://www.msu.ac.zw/msucopy/index_files/ebbtcbindex9_0.gif http://www.msu.ac.zw/msucopy/slide/mainpic5.jpg http://www.msu.ac.zw/msucopy/left.gif http://www.msu.ac.zw/msucopy/play.gif http://www.msu.ac.zw/msucopy/right.gif http://10.10.1.10/newmsu/images/arrow1.jpg http://www.msu.ac.zw/msucopy/images/portal.jpg http://www.msu.ac.zw/msucopy/images/elearningr.jpg There is need to specify CDN hostnames in different countries within all continents which will dramatically improve the speed of a web site since users are artificially nearer to the server. 4.2.4 Images HTML images have a total size of 117 928 bytes which is approximately 58% of the total web page size and these images can be converted to .gif or .png image format from .jpeg format since it uses a lot of memory. Before the home page was upgraded 90% of images where in the format .jpeg which had a total size of 232 084 bytes which is approximately twice the current total weight of images. The old home page average web page load time was 49.83 seconds of which 48.25 seconds was attributable to the HTML images. An experiment carried out using Yahoo! Smush.it tool resulted in a saving of 4.45% or 7.56 Kb reduction in the size of images which will 41

correspondingly reduce web page load time from an average of 33.05 seconds on a 56K connection rate to 31.58 seconds on the current web page. The web page images had either a width or height missing attribute which causes poor response time. See appendices for detailed experiment results.

42

4.3 UPTIME AND DOWNTIME EXPERIMENTS USING UPTRENDS An analysis of 14 reports from uptrends was done before and after the web page was upgraded. MSU’s homepage web page uptime on average was 40.28 % before being upgraded and has increased to 83.71 % after upgrading.

Daily Uptrends Management report for account: 258548. Period: from 09/02/2010 to 09/03/2010 Uptime in period

Vital statistics in period Uptime Number of errors Number of warnings Number of alerts Number of checks

22.32% 111 114 0 257

Figure 4.1 Uptime and downtime sample report of the web page before modified. 43

Daily Uptrends Management report for account: 258548. Period: from 09/11/2010 to 09/12/2010 Uptime in period

Vital statistics in period Uptime Number of errors Number of warnings Number of alerts Number of checks

80.59% 28 34 0 178 © Copyright 2010 Uptrends.com.

Figure 4.2 Uptime and downtime sample report of the web page after modification.

44

4.4 WEB SITE LOAD TIME TEST RESULTS (24*7) Ten online experiments were carried out on the monitoring site www.site24*7.com and an average response time of 26.23seconds was recorded after the home page was modified while an average response time of 37.45 seconds was recorded before the homepage was modified. The homepage size before modification was 236 986 bytes while after modification is 146 744 bytes which shows that there is a direct relationship between web page size and load time. A sample of a report carried out is as follows: Sat, 11 September, 2010 12:45:58 Midlands State University Up - [Site24x7]

URL is Available from September 11, 2010 2:42 PM EET !

"

,

! !

,

! #$ #$

$ $% $ *

! #$ &&' #$ &&'

'

' 0

(

"

-

. /

$1+1*

)

$ 23+

$ ( $+)

4

+

5

Embed an Uptime Button on your site and show uptime metrics to your visitors.

Figure 4.3 Active probing report using [Site24x7]. 45

$3%

4.5 INTERNET SUPERVISION WEB SITE LOAD TIME TEST RESULTS An internet supervision site www.internetsupervision.com was used to carryout ten experiments on different check points supported by the service after upgrading of the site. The average load time was as follows according to check point and its connection speed: Dortmund, Germany (5 Mbps)

34.72

Beijing, China (5 Mbps)

34.92

Sydney, Australia (5 Mbps)

34.70

Gloucester, UK (5 Mbps)

34.86

Chicago, IL (45 Mbps)

40.53

Washington, DC (3 Mbps)

34.06

Los Angeles, CA (1.5 Mbps)

34.03

Santiago, Chile (760 Kbps)

34.72

Ten experiments were done using the ping and tracert commands for Harare and Johannesburg. The average web page load time in Harare (Zimbabwe) was 22.65 seconds an in Johannesburg (South Africa) it was 28.78 seconds and the experiments were done using tracert and ping commands. Generally the web page load time mean is 33.397 seconds using online monitoring tools for the selected check points which is 0.347 seconds more than the mean response time carried out using EValid and YSlow tools.

46

4.6 MULTI-BROWSER WEB SITE PERFORMANCE TEST RESULTS Five experiments were carried out on multi-browser web site performance and on average Internet Explorer 7 is 1.427 slower than Firefox 3.5 using the online tool on www.gomez.com.

Table 4.4 Browser and Location Response Time Differences (seconds) Location New York, NY Seattle, WA Los Angeles, CA Chicago, IL

Internet Explorer 7 22.341 61.243 22.392 22.489

Firefox 3.5 20.929 61.014 20.965 21.618

4.7 DISCUSSION AND FINDINGS The evidence in this study shows that web page load time is mainly affected by its web page content rather than client and web-server network conditions as was proposed by many researchers. There are 23 static components that are not on a CDN. The average load time according to check point and its connection speed shows that an increase in distance between the client and web server increases response time. There is need to introduce CDNs (the server with the fewest network hops or the server with the quickest response time is chosen since it reduces response time) as noted in Section 2.5.2. Web page content components have different effects on web page load time. JavaScripts increase the number of HTTP request which will also increase web page load time. There are 3 JavaScript scripts found in the head of the document. These JavaScripts need to be put at the bottom of the page to reduce latency which results in an increase in web page load time. 47

There are 23 static components without a far-future expiration date. These components need to be specified the day, month and year which they will expire so that the content will become cacheable hence reduce web page load time. Caching is the most important and most widely used performance improvement technique for web-based systems since retrieving data from these caching locations will reduce transmission time across the internet as noted in the motivational literature (see Section 2.5.6). The study revealed that on average Internet Explorer 7 is 1.427 slower than Firefox 3.5. An increase in the number of connections per server a browser accommodates reduces web page response time which agrees with web browser technology literature review. Firefox 3.5 allows an average of 6 connections per server which increases parallel downloads while Internet Explorer 7 allows an average of 3. Literature revealed that the larger the page, the longer it takes to load as was noted in the mathematical model and the online experiments. However the literature doesn’t show the relationship of web page content characteristics to response time which the study had clearly identified. The web page images had either a width or height missing attribute which causes poor response time. Specifying the width and height attribute of an image makes a web page to load the entire objects before the image is fully uploaded hence reducing response time. HTML images have a total size of 117 928 bytes which is approximately 58% of the total web page size and these images can be converted to .gif or .png image format from .jpeg format since it uses a lot of memory. The homepage size before modification was 236 986 bytes while after modification is 146 744 bytes which shows that there is a direct relationship between web page size and load time. MSU’s homepage web page uptime on average was 40.28 % before being 48

upgraded and has increased to 83.71 % after upgrading since the web page weight had considerably reduced by 38.08% making it easier to load the page. 4.8 CONCLUSION Web page content affect the web page response time in relationship to object type and size. A critical analysis of the web page characteristics need to be carried out by web developers to deploy a web site which comply with the 8 seconds rule by Forrester Research Group. Load time increases as the geographical distance increases from the server to the client assuming that clients have the same connection speed. The problem of distance can be solved by using CDNs. From the motivational literature Internet Explorer has the largest market share while the web page loads faster on Firefox than on Internet Explorer hence the developers must tailor the web page content to reduce the response time when using Internet Explorer. The next chapter focused on conclusions, recommendations and future research.

49

CHAPTER 5: CONCLUSION, RECOMMENDATIONS AND FUTURE RESEARCH 5.1 INTRODUCTION This chapter reviewed the context, scope, and focus of this thesis. The research has shown that it is possible to measure, model and monitor individual web page response time which makes it possible to improve and optimize web pages in terms of response time. Future research is proposed at the end of this chapter. The thesis mainly focused on web page load time in relationship web page content and how it can be improved. Online monitoring tools and mathematical modeling were used during the operational period to quickly identify web pages that are exhibiting slow response times or poor responsiveness. It was also claimed in the thesis statement that it is possible to identify particular characteristics of web pages themselves that influence response time and to provide advice to web developers to improve the performance of these individual pages. 5.2 CONCLUSION Web developers during designing and implementation of a web page must measure web page response time against threshold values such as the 8 second rule. If a web page exceeds the threshold value there is need to analyze the web page using monitoring tools and the mathematical model otherwise the web site can be deployed. The evidence in this study shows that web page load time is mainly affected by its web page content characteristics.

50

An increase in total size of web page content components is directly proportional to increase in response time. Web page content components which affect web page load time include the total number of HTML files, total number CSS files, total number of objects, the total size of external scripts, total HTML images (weight), number of JavaScripts, HTML files size and the total CSS size. Multi-browser web site performance experiments showed that on average IE 7 is 1.427 slower than Firefox 3.5 while IE has the highest market share (see Section 2.5.8) hence web developers must tailor made their sites to be fast on IE. MSU’s web site need to be further upgraded using web page content’s various objects recommendations as seen in Section 5.3 to meet an average web page load time of 10 seconds. 5.3 RECOMMENDATIONS Recommendations to MSU web development team MSU’s site is using HTTP compression (content encoding using gzip) which the development team must continue exploiting its features to reduce latency. The image elearningr.jpg has both missing height and width attribute which increases load time since the other objects wait for the whole image to be loaded since it lacks these attributes. Various objects recommendations to enhance response time are discussed. 1. TOTAL_HTML- the total number of HTML files on this page (including the main HTML file) is 2. These files can be compressed into 1 which most browsers can multithread. 2. TOTAL_OBJECTS – Caution, there are 19 total objects on this page. From 12 to 20 objects per page, the latency due to object overhead makes up from 75% to 80% of the delay of the average web page. There is need to reduce, eliminate, and combine external 51

objects (graphics, CSS, JavaScript, iFrames and XHTML) to reduce the total number of objects, and thus separate HTTP requests. Using CSS sprites will help to consolidate decorative images. 3. TOTAL_IMAGES – Caution, there is a moderate amount of images on this page (13). There is need to use fewer images on the site or try reusing the same image in multiple pages to take advantage of caching. Using CSS techniques such as colored backgrounds, borders, or spacing instead of graphic techniques can help reduce HTTP requests. 4. TOTAL_CSS - The total number of external CSS files on this page is 1. Since external CSS files must be in the HEAD of the HTML document, they must load first before any BODY content displays. Although they are cached, CSS files slow down the initial display of your page. CSS files must be placed in the HEAD and JavaScript files at the end of the BODY to enable progressive display. 5. TOTAL_SIZE – Caution, the total size of this page is 146744 bytes, which will load in over 20 seconds on a 56Kbps modem - or 33.05 seconds on a 56Kbps modem. It is necessary to reduce the total page size to less than 100K to achieve sub 20 second response times on 56K connections. 6. TOTAL_SCRIPT – Caution, the total number of external script files on the home page is 4, they must be reduced to one or two. Combining, minifying, merging and compressing JavaScripts files will reduce response time by 5.4 seconds. Consider suturing JavaScript files together at the server to minimize HTTP requests. Placing external JavaScript files at the bottom of your BODY, and CSS files in the HEAD enables progressive display in XHTML web pages. 52

7. HTML_SIZE - The total size of this HTML file is 4089 bytes, which less than 50K. Assuming that the HEIGHT and WIDTH of images is specified, this size allows the HTML to display content within 10 seconds, which is the average time users are willing to wait for a page to display without feedback. 8. IMAGES_SIZE – Warning, the total size of images is 117928 bytes, which is over 100K. Consider switching graphic formats to achieve smaller file sizes (from JPEG to PNG for example or using Smush.it).Finally, substitute CSS techniques for graphics techniques to create colored borders, backgrounds, and spacing. These images can be merged into sprite sets or CSS inlined to reduce response time by 12.2 seconds. 9. SCRIPT_SIZE – Warning, the total size of external scripts is 23703 bytes, which is over 20K. Consider optimizing JavaScript for size, combining them, and using HTTP compression where appropriate for any scripts placed in the HEAD of the document. CSS menus can be a substitute for JavaScript-based menus to minimize or even eliminate the use of JavaScript. 10. CSS_SIZE - The total size of external CSS is 1024 bytes, which is less than 8K an average standard. 11. MULTIM_SIZE - The total size of all external multimedia files is 0 bytes, which is less than 10K an average standard. Recommendations to other web developers Web pages with a large number of objects on them load slowly. A large number of objects can increase web page latency. There are several common ways to optimize web pages with a large number of objects, such as combining external objects and using CSS sprites where possible, 53

compressing objects and images with gzip; reducing the number of network requests; minifying JavaScript and CSS, optimizing database queries, and removing duplicate JavaScript and CSS files. Analyzing web page content characteristics is the key to create a web page which meets the desired response time. 5.4 LIMITATIONS The accuracy of response time estimation by the monitoring tool was subjective to the synthetic clients in different locations. Freeware and demo tools were used hence the researcher investigated only those parameters which were for free. 5.5 FUTURE RESEARCH The researcher failed to study related work to web site optimization due to time constraints. Researchers can also focus on different programming languages such as Delphi, Ruby and Java how they affect web page load time for a distinct web page to be designed. There is need to research on how to promote web sites through Search Engine Optimization (SEO) techniques adoption. SEO directly addresses the need for a web site to attract new and targeted visitors, who in turn will convert into customers. In addition a further research on upgrading the normal desktop web site version to a mobile site version will be studied in relationship to response time.

54

REFERENCES

Aberdeen Research (2010) ‘Customer Satisfaction on the web’. Aberdeen Research Group Report, USA. [Accessed 1 June 2010]. Addison, D. R. (2006) Web site cookbook. 1st Ed, O’Reilly Media, Inc., Sebastopol, CA, USA, pg 233. Alavi, M., & Carlson, P. (1992) ‘A review of MIS research and disciplinary development’. Journal of Management Information Systems pg 45-62. [Accessed 10 July 2010]. Barish, G. and Obraczka, K. (2000) ‘World wide web caching: Trends and techniques’.USC Information Sciences Institute. Bjorn-Andersen, N. (1985) Are 'human factors' human? Maidenhead, England: Pergamon Infotech. Bryman, A. and Bell, E. 2007. Business research methods. 2nd Ed. Oxford: Oxford University Press. Cao, P. and S. Irani, S. (1997) ‘Cost-Aware WWW Proxy Caching Algorithms’, Proceedings of USENIX Symposium on Internet Technologies and Systems (USITS)’. Monterey, CA, pg 193206. Clickstream (2003) ‘The Clickstream Advantage: Method and Benefits of Fully Automated, Multi-channel Digital Data Collection Technical White Paper’. Clickstream Technologies Plc. Available from: http://www.clickstream.com/docs/pdf/cswhitepaper.pdf [Accessed 11 June 2010]. Eland, J. (2008) Web Stores Do-It-Yourself for Dummies. Wiley Publishing, Inc., Indianapolis, Indiana, pg 361. Forrester Research and Gartner Group (2009) ‘Two load time report surveys’. Forrester Research, Inc., 400 Technology Square, Cambridge, MA 02139, USA. [Accessed 8 August 2010]. Galliers, R. D. (1991) Developing Strategic Information System. British Academy of Management, Bath, UK. Golafshani, N. (2003) ‘Understanding reliability and validity in qualitative research’. The Qualitative Report, 8(4), 597-606http://www.nova.edu/ssss/QR/QR8-4/golafshani.pdf. [Accessed 26 June 2010].

55

Gomez (2009) ‘Best of the web-Gomez web performance awards White Paper’ [Accessed 25 July 2010]. Gomez (2009) ‘The Three Pillars of Web Performance Monitoring White Paper’ [Accessed 02 August 2010]. Gomez and Equation Research report (2010) ‘Best of the web- 12 tips to assess online retail readiness for the holidays White Paper’. USA. [Accessed 10 August 2010]. Hirschheim, R. A. (1985) Office Automation: A Social and Organizational Perspective. Wiley, Chichester. Hu, J. C., Pyarali, I. and Schmidt, D. C. (2007) ‘Measuring the Impact of Event Dispatching and Concurrency Models on Web Server Performance Over High-speed Networks’. Department of Computer Science .Washington University, St. Louis, Missouri. Hu, S. Y., Nanda, A. and Yang, Q. (2001) ‘Measurement, Analysis and Performance Improvement of the Apache Web’. Johnson, K. L., Carr, J. F., Day, M. S. and Kaashoek, M. F. (2000) ‘The Measured Performance of Content Distribution Networks’. SightPath, Inc. 135 Beaver Street, Waltham, MA 02452, USA. Killelea, P. (2002) Web Performance Tuning, 2nd Ed. O' Reilly & Associates, Inc. Sebastopol, CA, USA. King, A. B. (2008) Web site optimization, 1st Ed, O’Reilly Media, Inc., Sebastopol, A, USA, pg 155,186,243-244,324, 330,338. Kuhn T. S. (1970) The Structure of Scientific Revolutions; 2nd Ed. University of Chicago Press, Chicago. Levin, D. M. (1988) The opening of vision: Nihilism and the postmodern situation. London: Routledge. Manjhi, A. K (2008), ‘Technical report submitted in partial fulfillment of the course CS625: Advanced Computer Networks’.UK. Manning H., Gans, R., McLeish, S. and Zinser, R. (2009) ‘How To Survive The Browser Wars: Top Considerations When Designing For Cross-Browser Interoperability’. Mills, J. (2003) ‘Polymer Processors. Second Workshop on Non-Silicon Computation’ Federated Computer Research Conference, San Diego, June 2003. NetForecast (2006) ‘Field Guide to Application Delivery Systems Report’. NetForecast, September 2006. 56

Pallis, G. and Vakali, A. (2005) ‘Striking a balance between the costs for Web content providers and the quality of service for Web customers. Insight and Perspectives for CONTENT DELIVERY NETWORKS’. USA. Pallis, G. and Vakali, A. (2006) ‘Insight and Perspectives for Content Delivery Networks’. Communications of the ACM, Vol. 49, No. 1, ACM Press, NY, USA, pp. 101-106, January 2006. Pathan, M. and Buyya, R. (2008) ‘A Taxanomy of CDNs’. Peng, G. (2003) ‘CDN: Content Distribution Network’. Technical Report TR-125, Experimental Computer Systems Lab, Department of Computer Science, State University of New York, Stony Brook, NY, 2003. http://citeseer.ist.psu.edu/peng03cdn.html. Proctor, R. W. and Vu, K. L. (2005) Handbook of Human factors in web design. Lawrence Elrabaurn Associates, Inc., Publishers, New Jersey, pg 108-109. Proctor, W. and Vu, K. L. (2005) Handbook of human factors in web design. Robert Lawrence Erlbaurn Assosiates, Inc., Publishers, New Jersey, pg 108. Rajamony, R., and Elnozahy, M. (2001) ‘Measuring Client-Perceived Response Times on the WWW’. In the 3rd USENIX Symposium on Internet Technologies and Systems (USITS), San Francisco, 26-28 March 2001. Remenyi, D. and Williams, B. (1996) ‘The nature of research: Qualitative or quantitative, narrative or paradigmatic?’ Information Systems Journal. [Accessed 10 July 2010]. Rodriguez, P, Spanner C and Biersack E.W. (2001) ‘Analysis of Web Caching Architectures: Hierarchical and Distributed Caching’. IEEE/ACM Transactions on Networking. Rosenstein, M. (2000) ‘What is Actually Taking Place on Web Sites: e-Commerce Lessons from Web Server Log’. In the 2nd ACM Conference on Electronic Commerce, Minneapolis, 1720 October 2000. Salamatian, K., and Fdida, S. (2003) ‘A Framework for Interpreting Measurement over Internet’. In ACM SIGCOMM Workshop on Models, Methods and Tools for Reproducible Network Research, Karlsruhe, 25 August 2003. Savoia, A. (2001) ‘Web Page Response Time 101’. Sevcik, P. and Bartlett, J. (2001) ‘Understanding Web Performance Research Report’. NetForecast Inc., Business Communication Review (BCR) October 2001. Siskind, G. H., Murray D. and Klau R. P. (2007) The lawyer’s guide to marketing on the internet. American Bar Association, USA, pg 17. Snitker T. V. (2004) Breaking through to the other side: using the user experience in the web. 1st Ed, First Impression, pg 134-136. 57

Song, J., Levy-Abegnoli, E., Iyengar, A and Dias, D. (2007) ‘A Scalable and Highly Available Web Server Accelerator’. IBM T.J. Watson Research Center, Yorktown Heights, NY. Souders, S. (2009) Even faster web sites.1st Ed, O’Reilly Media Inc., Sebastopol, CA, America, pg 133,167-181. Subraya, B. M. (2006) Integrated approach to web performance testing: A practitioner’s guide. Ideal Group Inc., London, UK, pg 7. Susman, G. (1983) Action research: a socio-technical systems perspective. in G. Morgan, ed. Beyond Method: Strategies for Social Research. Newbury Park: Sage. Vakali, A. and Pallis, G. (2003) ‘Content Delivery Networks: Status and Trends’. IEEE Internet Computing, IEEE Computer Society, pp. 68-74, November-December 2003. Wessels, D. (2001) Web caching. 1st Ed, O’Reilly and Associates, Inc, Sebastopol, CA, America, pg 15-16. Williams, S., Abrams, M., Standridge, C. R., Abdulla, G and Fox, E. A. (1996) ‘Removal Policies in Network Caches for World Wide Web Documents’. Proceedings of SIGCOMM ’96, pages 293–305, Stanford, CA, August 1996. ACM. Zona Research Group (2003) ‘The Need for Speed’. Research Report: Zona Research Group, July 2003.

58

APPENDICES

APPENDIX A: EValid and YSlow Experiments

EValid Site Analysis Report Selection

59

YSlow Statistics report

Minified JavaScript for: http://www.msu.ac.zw/msucopy/ 1. 2. 3. 4. 5. 6.

http://www.msu.ac.zw/msucopy/loadtemp.js http://www.msu.ac.zw/msucopy/jquery-1.2.6.pack.js http://www.msu.ac.zw/msucopy/MSUgallery.js http://www.msu.ac.zw/msucopy/index_files/cbjscbindex.js inline script block #1 inline script block #2

http://www.msu.ac.zw/msucopy/loadtemp.js

60

var xmlHttp var xmlHtt function loadstud() {xmlHttp=GetXmlHttpObject();if(xmlHttp==null) {alert("Your browser does not support AJAX!");return;} var url="studentlogin.php?";url=url+"&sid="+Math.random();document.getElementById ("student").innerHTML="

WAIT Loading...

" xmlHttp.onreadystatechange=stateChangedS;xmlHttp.open("GET",url,true);xmlHttp .send(null);} function stateChangedS() {if(xmlHttp.readyState==4) {document.getElementById("student").innerHTML=xmlHttp.responseText;}} function loadstaff() {xmlHttp=GetXmlHttpObject();if(xmlHttp==null) {alert("Your browser does not support AJAX!");return;} var url="stafflogin.php?";url=url+"&sid="+Math.random();document.getElementById(" staff").innerHTML="

WAIT Loading...

" xmlHttp.onreadystatechange=stateChanged;xmlHttp.open("GET",url,true);xmlHttp. send(null);} function stateChanged() {if(xmlHttp.readyState==4) {document.getElementById("staff").innerHTML=xmlHttp.responseText;}} function GetXmlHttpObject() {var xmlHttp=null;try

61

{xmlHttp=new XMLHttpRequest();} catch(e) {try {xmlHttp=new ActiveXObject("Msxml2.XMLHTTP");} catch(e) {xmlHttp=new ActiveXObject("Microsoft.XMLHTTP");}} return xmlHttp;} http://www.msu.ac.zw/msucopy/jquery-1.2.6.pack.js

62

APPENDIX B: Smush.it Experiments

63