Welcome![Sign In][Sign Up]
Location:
Search - s-lang

Search list

[ELanguageS-Lang.rar

Description:
Platform: | Size: 958308 | Author: | Hits:

[Internet-Network用Java编写HTML文件分析程序

Description:

Java编写HTML文件分析程序

 一、概述

    

    Web服务器的核心是对Html文件中的各标记(Tag)作出正确的分析,一种编程语言的解释程序也是对源文件中的保留字进行分析再做解释的。实际应用中,我们也经常会碰到需要对某一特定类型文件进行要害字分析的情况,比如,需要将某个HTML文件下载并同时下载与之相关的.gif.class等文件,此时就要求对HTML文件中的标记进行分离,找出所需的文件名及目录。在Java出现以前,类似工作需要对文件中的每个字符进行分析,从中找出所需部分,不仅编程量大,且易出错。笔者在近期的项目中利用Java的输入流类StreamTokenizer进行HTML文件的分析,效果较好。在此,我们要实现从已知的Web页面下载HTML文件,对其进行分析后,下载该页面中包含的HTML文件(假如在Frame中)、图像文件和ClassJava Applet)文件。

    

    二、StreamTokenizer

    

    StreamTokenizer即令牌化输入流的作用是将一个输入流中变成令牌流。令牌流中的令牌实体有三类:单词(即多字符令牌)、单字符令牌和空白(包括JavaC/C++中的说明语句)。

    

    StreamTokenizer类的构造器为: StreamTokenizer(InputStream in)

    

    该类有一些公有实例变量:ttypesvalnval ,分别表示令牌类型、当前字符串值和当前数字值。当我们需要取得令牌(即HTML中的标记)之间的字符时,应访问变量sval。而读向下一个令牌的方法是调用nextToken()。方法nextToken()的返回值是int型,共有四种可能的返回:

    

    StreamTokenizer.TT_NUMBER: 表示读到的令牌是数字,数字的值是double型,可以从实例变量nval中读取。

    

    StreamTokenizer.TT_Word: 表示读到的令牌是非数字的单词(其他字符也在其中),单词可以从实例变量sval中读取。

    

    StreamTokenizer.TT_EOL: 表示读到的令牌是行结束符。

    

    假如已读到流的尽头,则nextToken()返回TT_EOF

    

    开始调用nextToken()之前,要设置输入流的语法表,以便使分析器辨识不同的字符。WhitespaceChars(int low, int hi)方法定义没有意义的字符的范围。WordChars(int low, int hi)方法定义构造单词的字符范围。

    

    三、程序实现

    

    1HtmlTokenizer类的实现

    

    对某个令牌流进行分析之前,首先应对该令牌流的语法表进行设置,在本例中,即是让程序分出哪个单词是HTML的标记。下面给出针对我们需要的HTML标记的令牌流类定义,它是StreamTokenizer的子类:

    

    

    import java.io.*;

    import java.lang.String;

    class HtmlTokenizer extends

    StreamTokenizer {

    //定义各标记,这里的标记仅是本例中必须的,

    可根据需要自行扩充

     static int HTML_TEXT=-1;

     static int HTML_UNKNOWN=-2;

     static int HTML_EOF=-3;

     static int HTML_IMAGE=-4;

     static int HTML_FRAME=-5;

     static int HTML_BACKGROUND=-6;

     static int HTML_APPLET=-7;

    

    boolean outsideTag=true; //判定是否在标记之中

    

     //构造器,定义该令牌流的语法表。

     public HtmlTokenizer(BufferedReader r) {

    super(r);

    this.resetSyntax(); //重置语法表

    this.wordChars(0,255); //令牌范围为全部字符

    this.ordinaryChar('< '); //HTML标记两边的分割符

    this.ordinaryChar('>');

     } //end of constrUCtor

    

     public int nextHtml(){

    int token; //令牌

    try{

    switch(token=this.nextToken()){

    case StreamTokenizer.TT_EOF:

    //假如已读到流的尽头,则返回TT_EOF

    return HTML_EOF;

    case '< ': //进入标记字段

    outsideTag=false;

    return nextHtml();

    case '>': //出标记字段

    outsideTag=true;

    return nextHtml();

    case StreamTokenizer.TT_WORD:

    //若当前令牌为单词,判定是哪个标记

    if (allWhite(sval))

     return nextHtml(); //过滤其中空格

    else if(sval.toUpperCase().indexOf("FRAME")

    !=-1 && !outsideTag) //标记FRAME

     return HTML_FRAME;

    else if(sval.toUpperCase().indexOf("IMG")

    !=-1 && !outsideTag) //标记IMG

     return HTML_IMAGE;

    else if(sval.toUpperCase().indexOf("BACKGROUND")

    !=-1 && !outsideTag) //标记BACKGROUND

     return HTML_BACKGROUND;

    else if(sval.toUpperCase().indexOf("APPLET")

    !=-1 && !outsideTag) //标记APPLET

     return HTML_APPLET;

    default:

    System.out.println ("Unknown tag: "+token);

    return HTML_UNKNOWN;

     } //end of case

    }catch(IOException e){

    System.out.println("Error:"+e.getMessage());}

    return HTML_UNKNOWN;

     } //end of nextHtml

    

    protected boolean allWhite(String s){//过滤所有空格

    //实现略

     }// end of allWhite

    

    } //end of class

    

    以上方法在近期项目中测试通过,操作系统为Windows NT4,编程工具使用Inprise Jbuilder3


Platform: | Size: 1066 | Author: tiberxu | Hits:

[ELanguageS-Lang

Description: 一个C格式的脚本处理函数库源代码,可让你的C程序具有执行C格式的脚本文件- A C form script processing function storehouse source code, may let your C procedure have carries out the C form script document
Platform: | Size: 958575 | Author: 小草 | Hits:

[Other resourcejava5.0中文API文档j2se_zh(new)

Description: 这个java5.0的API文档j2se_zh(new)是一位高手为了方便大家学习在SUN于2005/10/31发布不久后而制作的chm格式,目前世界上最新的API文档,最新出炉的中文API,很不错的哦,大家赶快来下载。 谢谢这位朋友。 Sun 公司提供的Java API Docs是学习和使用Java语言中最经常使用的参考资料之一。但是长期以来此文档只有英文版,对于中国地区的Java开发者来说相当的不便。目前Sun 公司正在组织多方力量将此文档翻译成中文,并于近日在Sun 中国技术社区正式发布java.lang和java.util类库API 文档的中文版。-java5.0 the API documentation j2se_zh (new) is a master in order to facilitate learning in the Sun 2005/11/02 released shortly after the production of chm format, the world's latest API documentation, the latest Chinese latest API, very good, oh, we hasten to download. Thank you, my friend. Sun's Java API Docs is the learning and use of the Java language most often used in reference one. However long this document only in English, Chinese region of Java developers is a great inconvenience. Now Sun is organizing efforts of the various parties this documentation translated into Chinese, and China's Sun recently released technical community and java.util java.io class library API documentation of the Chinese version.
Platform: | Size: 2020760 | Author: JK | Hits:

[Other resource51+ch375读写U盘超精简原程序(啊雨)

Description: 这个程序用180行C代码就能够读取FAT16文件系统U盘的根目录,可以看到根目录下的文件 名,并可显示 首文件内容,不过,该程序很不严谨,也没有任何错误处理,对U盘兼容性较差,只是用于简单试 验,作为参考. 这个程序可以支持WINDOWS按FAT16格式化的U盘,因为程序精简,所以只兼容超过50%以上的U 盘品牌,如果换 成CH375A芯片则兼容性可提高到85%,当然,如果使用WCH公司的子程序库或者正式版本的C源 程序兼容性更好。 下 欢测试以下U盘通过:郎科/超稳经典64M/超稳迷你128M/U160-64M/超稳普及128M,爱国者/迷 你王16M/邮箱型, 黑匣子/64M,微闪/64M,飙王/32M/64M/128M,晶彩/C200-64M,新科/256M,昂达/128M...,欢迎 提供测试结果 未通过U盘:爱国者/智慧棒128M,清华普天/USB2.0-128M,当然,使用WCH的子程序库或CH375A 都可以测试通过 -this procedure with 180 OK C code can read FAT16 file system drives the root directory, we can see the root directory under the file name, would show the first document, but the very stringent procedures, and there is no error handling, the U-poor compatibility, but for the simple test a post-mortem as a reference. this procedure can be supported by Windows FAT16 formatted U disk, because the procedure is streamlined and therefore only compatible with more than 50% of the U-brand, if chips were replaced CH375A compatibility can be increased to 85%, if the company's use of WCH Subroutine Library or an official version of the C source code compatibility better. Huan tests under the following U-pass : Lang Branch / Ultra Stable classic 64M / mini 128M/U160-64M Ultra Stable / Ultra Stable
Platform: | Size: 3834 | Author: 梁波 | Hits:

[Internet-Networkjabberx-cvs.tar

Description: JabberX 是一个基于unix平台的jabber客户端 JabberX is a unix console Jabber client. Jabber is a cross-platform, open source, XML-based, distributed instant messaging system. Requirements: ------------- Iksemel library (from http://jabber-x.sourceforge.net) NCurses or S-Lang package. Autoconf and Automake packages are required for compiling from cvs. Perl is required for building perl scripting support. -JabberX is a UNIX-based platform jabber client JabberX unix console is a Jabber client. Jabber is a cross-platform, open source, XML-based, distributed instant messaging system. Requirements : ------------ - Iksemel library (from http://jabber-x.sourceforge.net) NCurses or S-Lang package. Autoconf and Automake packages are required for compiling from cvs. Perl is required for building perl scripting support.
Platform: | Size: 108398 | Author: 雨笑 | Hits:

[Multimedia programMFFMBitStream

Description: MFFM Bit Stream A C++ heirachy for reading and writing bit streams. Implemented for maximum efficiency/ease of use. Write or read bit streams for audio and video protocols such as mpeg (mp3), H.263, etc. Many parallel streams could be used in logic syntax streams as well Operating System: OS Independent (Written in an interpreted language)-MFFM A Bit Stream C heirachy for reading and writing bit streams. Implemented for maximum e fficiency / ease of use. Write or read bit stream s for audio and video protocols such as mpeg (mp3 ), H.263, etc. Many parallel streams could be used in logi c syntax streams as well Operating System : OS Independent (Written in an interpreted lang uage)
Platform: | Size: 59222 | Author: shan | Hits:

[Button controlMemoryMonitor

Description: MemoryMonitor demonstrates the use of the java.lang.management API in observing the memory usage of all memory pools consumed by the application. This simple demo program queries the memory usage of each memory pool and plots the memory usage history graph. To run the MemoryMonitor demo java -jar <JDK_HOME>/demo/management/MemoryMonitor.jar These instructions assume that this installation s version of the java command is in your path. If it isn t, then you should either specify the complete path to the java command or update your PATH environment variable as described in the installation instructions for the Java 2 SDK.-MemoryMonitor demonstrates the use of the java.lang.management API in observing the mem ory usage of all memory pools consumed by the apppublication. This simple queries the demo program memory usage of each memory pool and plots the memory usage history graph. To run the MemoryMonitor demo java -jar
Platform: | Size: 14803 | Author: 向明建 | Hits:

[Linux-UnixNgrams

Description: A C++ N-grams Package 2.0 This is a simple C++ n-grams package that includes a header, the corresponding cpp file, and a sample driver program. It is a natural language processing tool for creating n-gram profiles for text documents. The details on usage is documented in the header right above each public function defined. This package is based on Dr. Vlado Keselj s Perl package Text::Ngrams which is available in CPAN.-A C N-Package 2.0 grams This is a simple C n-g rams package that includes a header, the corresponding cpp file, and a sample driver program. It is a natural lang uage processing tool for creating n-gram PROFI les for text documents. The details on usage is d ocumented right in the header above each public function defined. This package is based on Dr.. V lado Keselj s Perl package Text : : Ngrams which is available in CPAN.
Platform: | Size: 6568 | Author: 郑乔鸿 | Hits:

[Other resourceplsql-help

Description: The intended use of this help manual is a quick reference guide as it is not fully inclusive of all elements of the PL/SQL Programming Language. Please refer to the PL/SQL User s Guide and Reference for more information.-The intended use of this help manual is a q uick reference guide as it is not fully inclusiv e of all elements of the PL / SQL Programming Lang uage. Please refer to the PL / SQL User's Guide and Reference for more information.
Platform: | Size: 344735 | Author: qlw | Hits:

[ELanguageS-Lang

Description: 一个C格式的脚本处理函数库源代码,可让你的C程序具有执行C格式的脚本文件- A C form script processing function storehouse source code, may let your C procedure have carries out the C form script document
Platform: | Size: 958464 | Author: 小草 | Hits:

[JSPjava5.0中文API文档j2se_zh(new)

Description: 这个java5.0的API文档j2se_zh(new)是一位高手为了方便大家学习在SUN于2005/10/31发布不久后而制作的chm格式,目前世界上最新的API文档,最新出炉的中文API,很不错的哦,大家赶快来下载。 谢谢这位朋友。 Sun 公司提供的Java API Docs是学习和使用Java语言中最经常使用的参考资料之一。但是长期以来此文档只有英文版,对于中国地区的Java开发者来说相当的不便。目前Sun 公司正在组织多方力量将此文档翻译成中文,并于近日在Sun 中国技术社区正式发布java.lang和java.util类库API 文档的中文版。-java5.0 the API documentation j2se_zh (new) is a master in order to facilitate learning in the Sun 2005/11/02 released shortly after the production of chm format, the world's latest API documentation, the latest Chinese latest API, very good, oh, we hasten to download. Thank you, my friend. Sun's Java API Docs is the learning and use of the Java language most often used in reference one. However long this document only in English, Chinese region of Java developers is a great inconvenience. Now Sun is organizing efforts of the various parties this documentation translated into Chinese, and China's Sun recently released technical community and java.util java.io class library API documentation of the Chinese version.
Platform: | Size: 2020352 | Author: | Hits:

[Internet-Networkjabberx-cvs.tar

Description: JabberX 是一个基于unix平台的jabber客户端 JabberX is a unix console Jabber client. Jabber is a cross-platform, open source, XML-based, distributed instant messaging system. Requirements: ------------- Iksemel library (from http://jabber-x.sourceforge.net) NCurses or S-Lang package. Autoconf and Automake packages are required for compiling from cvs. Perl is required for building perl scripting support. -JabberX is a UNIX-based platform jabber client JabberX unix console is a Jabber client. Jabber is a cross-platform, open source, XML-based, distributed instant messaging system. Requirements :------------- Iksemel library (from http://jabber-x.sourceforge.net) NCurses or S-Lang package. Autoconf and Automake packages are required for compiling from cvs. Perl is required for building perl scripting support.
Platform: | Size: 108544 | Author: 雨笑 | Hits:

[Otherext-lang-zh_CN-GBK-min

Description: Dos速成手册,很好的东东,请大家多多下载-Dos Quick manual, very good Dongdong, please download a lot of U.S.
Platform: | Size: 2048 | Author: 蒋峰 | Hits:

[MultiLanguagelang

Description: 采用JAVA语言实现,对汉语分词的词库进行规范,与大家共享-Realize the use of JAVA language for Chinese Word of the thesaurus to regulate, with the U.S. share
Platform: | Size: 288768 | Author: 舒晓明 | Hits:

[WEB CodevtigerCRM-5_0_3_lang-zh

Description: vTigerCRM基于国际著名的开源软件SugarCRM开发。SugarCRM是国际CRM领域的后期之秀,总部位于美国加利福尼亚。它提供著名的开源软件SugarCRM开源版,所谓开源简单理解就是开放程序代码,采用开放式研发模式,免费提供用户使用的意思。与SugarCRM一样,vTigerCRM采用B/S架构,使用MySQL数据库,Appach网页发布服务器。它能满足国内许多中小型企业对于客户关系管理的入门级需求。此为它最新版本的汉化包。-vTigerCRM based on the internationally renowned SugarCRM open source software development. SugarCRM is an international field of the latter part of the CRM show, headquartered in California, the United States. It provides well-known open source software open-source version of SugarCRM, a simple understanding of the so-called open-source code is open, open R
Platform: | Size: 178176 | Author: Henry | Hits:

[WEB CodeLang-zh_cnModule-CE-5.5.0

Description: 一个非常好用的php的CRM系统,喜欢的同学可以参考一下。-A very nice php' s CRM system, like the students can refer to.
Platform: | Size: 507904 | Author: wangyan | Hits:

[Linux-UnixS-Lang

Description: linux S-Lang语言学习及安装包 -linux S-Lang language learning and the installation package
Platform: | Size: 3012608 | Author: cybaiu | Hits:

[JSP/Javacommons-lang-2.6-sources

Description: Commons Lang, a package of Java utility classes for the classes that are in java.lang s hierarchy, or are considered to be so standard as to justify existence in java.lang.-Commons Lang, a package of Java utility classes for the classes that are in java.lang s hierarchy, or are considered to be so standard as to justify existence in java.lang.
Platform: | Size: 372736 | Author: const | Hits:

[Otherlang-php

Description: This is a lightweight, low-level architecture for web sites and Web applications. It's designed to make your web site simpler, faster and more efficient.
Platform: | Size: 632832 | Author: xybrandon | Hits:
« 12 3 »

CodeBus www.codebus.net