mod_trace_output Server Monitor
Track the Web pages sent to your visitors by your Apache Web server.
"I hope you will enjoy using mod_trace_output as much as I enjoyed
developing it." (Gérard Materna).
mod_trace_output is an Apache plugin implementing server monitoring of output Web pages.
The mission of mod_trace_output is to supply the Qualiweb/WASA application with the content of dynamic web pages. The problem is that dynamic content is generated in real-time. Furthermore, with new programming techniques using XML and CSS/XSLT, there is nearly no semantic content left in the pages. The content therefore needs to be stored after it is generated.
Server monitoring compares with network monitoring. Network monitoring mines the output Web pages on the network wire. A network monitor could be implemented on the basis of a network packet capture library like ethereal. The implementation would listen to the network, get TCP-IP packets, rebuild the TCP streams, and extract HTML data from the HTTP protocol.
mod_trace_output is part of the Qualiweb/WASA application developed by Dr Jean-Pierre Norguet. WASA aims to analyse the semantic content viewed by a Web site visitors. Such analysis not only needs to log urls, but also actual content seen by the visitors. More information on output page mining can be found in Dr Jean-Pierre Norguet's publications.
The mod_trace_output sources are available for download at http://sourceforge.net/project/showfiles.php?group_id=52230 .
mod_trace_output is working as is and can be used freely by the community. However, a few improvements, a bug fix, and a port to Apache 2.0 would be welcome. The project is currently looking for a developer that is willing to evolve and maintain mod_trace_output.
mod_trace_output is an open source project hosted at sourceforge.net and lies under Apache Software License.
As I said before mod_trace_output will catch the content of the web pages sent by Apache after they were generated. It works with most common modules, including mod_jk (wich is using the uri translate phase to set the r->handler to "jakarta-servlet" instead of the usual mechanism based on mime-types, this was quite a headach!).
It can store pages visited in files on a specified directory or in a specified MySQL database. This content can also be gzipped in order to use less disk space and/or less bandwith in case of remote MySQL database. Furthermore, mod_trace_output sends the data gzipped to browsers that support it.
The module is designed to run on LINUX with Apache 1.2.23 and 1.2.24. But my objective is to port it soon to Apache 2.0 (and maybe later to windows).
It is written in c, using the libmysqlclient (from MySQL), the libz (for gzip) libraries.
First, you have to know that mod_trace_output is creating a subrequest to have the job done and get the data back from this subrequest.
Mod_trace_output has a handler for the fixup phase, where it sets the r->handler to "trace-output", but only if the following conditions are met : the current request is not a subrequest of mod_trace_output handler (otherwise I have an endless loop!), the request is not for headers only, r->filename is not a directory (mod_dir has to find the rigth directory index file before we do something), if r->content_type is NULL : check for "HANLDER" in ToMimeTypeIn (see directives below) otherwise check for r->content_type in ToMimeTypeIn.
So if one of these condition is not met, the fixup handler will decline and won't set r->handler = "trace-output", and mod_trace_output won't handle the request.
If all the conditions are met, we have r->hanlder="trace-output", and, at the content handling phase, mod_trace_output handler is called first. It then creates a subrequest and adds a filter_callback function to it, so that it can get a copy of the data sent. Then it uses a little trick (many thanks to Ian McRae and to his mod_mmap_dynamic) and sets the client side connection to point to /dev/null, so that the subrequest doesn't actually send the data.
Then subrequest runs, the module gets the copy of the data, it gzips it, and sends it to the browser. The response created by the subrequest can be of arbitrary mime type. If the response's mime type is in the ToMimeTypesOut list (see directive ToMimeTypesOut below), it stores it either in files or in MySQL db, gzipped or not.
In previous version, mod_trace_output had no handler for the fixup phase, so the module needed to be executed first, but this is not a requirement anymore.
I'll give the installation steps if you compile mod_trace_output statically with Apache. But you can also use DSO and apxs to compile it out of Apache.
Here we go. Create a directory mod_trace_output in the modules directory of the source distribution of apache, copy all the files provided in the mod_trace_output release (mod_trace_output.c, mod_trace_output.h, filter.c, gzipcomp.c, mysql.c and MakeFile.tmpl), then add the following line in the configuration file.
AddModule modules/mod_trace_output/mod_trace_output.o
It uses libmysqlclient (.a or .so) so you need to set extra options in the
Configuration file.
Add -L/path/to/libmysqlclient/folder -lz -lmysqlclient
(if you have libmysqlclient.so) or
-L/path/to/libmysqlclient/folder/libmysqlclient.a (if you have libmysqlclient.a)
to EXTRA_LDFLAGS.
Add -I/path/to/mysql/include to EXTRA_INCLUDES.
Then compile apache as usual. See http://httpd.apache.org/ for more information on how to do it.
Mod_trace_output is now statically linked into your apache binary, you can configure it with several directives, see below.
The following directives cannot be set in a <Directory> or a <Location> tag, so the scope of their value is the entire (virtual) server.
-ToMySQLInfo: take three arguments, the MySQL database name, user and
password.
-ToMySQLHost: host of the MySQL database.
-ToMySQLPort: TCP-IP
port to connect to the MySQL database. You need this only if the host is not
localhost.
-ToMySQLSocket: path to socket (ex:"/var/lib/mysql/mysql.sock")
to connect to the MySQL database. You need this only if the mysql.sock file is
not at its default place.
-ToMySQLNonPersistent: flag to use non-persistent
MySQL connections (values: ON/OFF). OFF is the best value for
performance.
-ToLog: flag to allow / disallow the log (values:
ON/OFF).
-ToLogFile: file where to log mod_trace_output actions.
The following directives must be set in a <Directory> or a <Location> tag, so their scope is the specified directory or location.
-ToMimeTypesIn: mime types of files requested that
mod_trace_output will handle(separated by comas). You can also specify
"HANDLER", if you want to handle the requests that have no content_type and just has a handler
(like in the case of mod_jk). The default is "", NO mime types are handled.
-ToMimeTypesOut: mime types of files
that mod_trace_output will store (separated by comas) (default =
"text/html").
-ToDataGroup: string used
to group data in the database, for example by directory (default = directory
path).
-ToStoreInMySQL: flag to allow / disallow storing of the pages in
MySQL (values: ON/OFF).
-ToGzipInMySQL: flag to allow / disallow to gzip
compress data in MySQL (values: ON/OFF).
-ToStoreInFiles: flag to allow /
disallow storing of the pages in a specified directory (values:
ON/OFF).
-ToGzipInFiles: flag to allow / disallow to gzip compress data in
stored files (values: ON/OFF).
-ToDir: directory where to store pages
(default /tmp).
These directives allow to specify different configuration for each location, you can store content from a location in a MySQL database, while storing the contents in gzipped html files for another location.
Example:
ToMySQLInfo dbname user password
ToMySQLHost localhost
ToMySQLPort
3306
ToMySQLSocket /var/lib/mysql/mysql.sock
ToMySQLNonPersistent
"OFF"
ToLogFile "/etc/httpd/logs/mod_trace_output_log"
ToLog "ON"
<Location /directory_sql>
ToMimeTypesIn "text/html,HANLDER"
ToMimeTypesOut
"text/html"
ToDataGroup "test NON
gzipped"
ToStoreInMySQL "ON"
ToGzipInMySQL "OFF"
</Location>
<Location /directory_sql_gz>
ToMimeTypesIn "text/php"
ToMimeTypesOut "text/html"
ToDataGroup
"test gzipped"
ToStoreInMySQL "ON"
ToGzipInMySQL
"ON"
</Location>
<Location /directory_files>
ToMimeTypesIn "text/plain"
ToMimeTypesOut
"text/plain"
ToDir "/var/wasa_store/"
ToStoreInFiles
"ON"
ToGzipInFiles "OFF"
</Location>
<Location /directory_files_gz>
ToDir
"/var/wasa_store"
ToStoreInFiles "ON"
ToGzipInFiles
"ON"
</Location>
Here are some tests with Apache Jakarta JMeter. I have used 10 threads. Each thread is iterating in a group of 4 pages, with the following size: 1kb, 20 kb, 40 kb and 80 kb. The timer is set at 250 millisecond, so each trhead fires these 4 pages per second.
Time unit is millisecond.
Sample 1: server without mod_trace_output. Sample 2: server with mod_trace_output storing pages content in files,
not gzipped.
Sample 3: server with mod_trace_output storing pages content in MySQL
db, not gzipped.
There is not much differences between these 3. It seems mod_trace_output is taking about 60 milliseconds to run, that is less than 2% of the request handling time. Of course, this has to be verified with more tests.
And here comes the power of gzip:
Sample 4: server with mod_trace_output storing pages content in files, gzipped.
Sample 5: server with mod_trace_output storing pages content in MySQL
db, gzipped.
Although there is a little more deviation -- more extreme values, comparing to the average --, the responses are 20 times faster. Of course, this has very few to do with mod_trace_output. The gain in speed comes from the transfer of gzipped data compared to non gzipped data.
mod_trace_output performs very well. Data compression impressively saves bandwith.
http://sourceforge.net/
http://www.linuxlinks.com/
http://httpd.apache.org/
http://freshmeat.net/
http://www.ethereal.com/
Gérard Materna has been the main developer of the mod_trace_output project during his master thesis. Gérard Materna is now CEO of Ubidata.
Dr Jean-Pierre Norguet obtained his Ph.D. in ICT from University of Brussels. His work was funded by a FNRS research fellow grant. Dr Jean-Pierre Norguet provided assistance, guidance, and advice on the project.
Olivier Samyn graduated in 2001 from University of Brussels as engineer in networks and computer science. Olivier Samyn provided technical help on Linux for the mod_trace_output project.
Esteban Zimanyi is Professor in computer science with University of Brussels [http://cs.ulb.ac.be/members/esteban/]. Pr. Zimanyi has been the institutional advisor of the mod_trace_output project.