Welcome![Sign In][Sign Up]
Location:
Search - hi s

Search list

[Internet-Network用Java编写HTML文件分析程序

Description:

Java编写HTML文件分析程序

 一、概述

    

    Web服务器的核心是对Html文件中的各标记(Tag)作出正确的分析,一种编程语言的解释程序也是对源文件中的保留字进行分析再做解释的。实际应用中,我们也经常会碰到需要对某一特定类型文件进行要害字分析的情况,比如,需要将某个HTML文件下载并同时下载与之相关的.gif.class等文件,此时就要求对HTML文件中的标记进行分离,找出所需的文件名及目录。在Java出现以前,类似工作需要对文件中的每个字符进行分析,从中找出所需部分,不仅编程量大,且易出错。笔者在近期的项目中利用Java的输入流类StreamTokenizer进行HTML文件的分析,效果较好。在此,我们要实现从已知的Web页面下载HTML文件,对其进行分析后,下载该页面中包含的HTML文件(假如在Frame中)、图像文件和ClassJava Applet)文件。

    

    二、StreamTokenizer

    

    StreamTokenizer即令牌化输入流的作用是将一个输入流中变成令牌流。令牌流中的令牌实体有三类:单词(即多字符令牌)、单字符令牌和空白(包括JavaC/C++中的说明语句)。

    

    StreamTokenizer类的构造器为: StreamTokenizer(InputStream in)

    

    该类有一些公有实例变量:ttypesvalnval ,分别表示令牌类型、当前字符串值和当前数字值。当我们需要取得令牌(即HTML中的标记)之间的字符时,应访问变量sval。而读向下一个令牌的方法是调用nextToken()。方法nextToken()的返回值是int型,共有四种可能的返回:

    

    StreamTokenizer.TT_NUMBER: 表示读到的令牌是数字,数字的值是double型,可以从实例变量nval中读取。

    

    StreamTokenizer.TT_Word: 表示读到的令牌是非数字的单词(其他字符也在其中),单词可以从实例变量sval中读取。

    

    StreamTokenizer.TT_EOL: 表示读到的令牌是行结束符。

    

    假如已读到流的尽头,则nextToken()返回TT_EOF

    

    开始调用nextToken()之前,要设置输入流的语法表,以便使分析器辨识不同的字符。WhitespaceChars(int low, int hi)方法定义没有意义的字符的范围。WordChars(int low, int hi)方法定义构造单词的字符范围。

    

    三、程序实现

    

    1HtmlTokenizer类的实现

    

    对某个令牌流进行分析之前,首先应对该令牌流的语法表进行设置,在本例中,即是让程序分出哪个单词是HTML的标记。下面给出针对我们需要的HTML标记的令牌流类定义,它是StreamTokenizer的子类:

    

    

    import java.io.*;

    import java.lang.String;

    class HtmlTokenizer extends

    StreamTokenizer {

    //定义各标记,这里的标记仅是本例中必须的,

    可根据需要自行扩充

     static int HTML_TEXT=-1;

     static int HTML_UNKNOWN=-2;

     static int HTML_EOF=-3;

     static int HTML_IMAGE=-4;

     static int HTML_FRAME=-5;

     static int HTML_BACKGROUND=-6;

     static int HTML_APPLET=-7;

    

    boolean outsideTag=true; //判定是否在标记之中

    

     //构造器,定义该令牌流的语法表。

     public HtmlTokenizer(BufferedReader r) {

    super(r);

    this.resetSyntax(); //重置语法表

    this.wordChars(0,255); //令牌范围为全部字符

    this.ordinaryChar('< '); //HTML标记两边的分割符

    this.ordinaryChar('>');

     } //end of constrUCtor

    

     public int nextHtml(){

    int token; //令牌

    try{

    switch(token=this.nextToken()){

    case StreamTokenizer.TT_EOF:

    //假如已读到流的尽头,则返回TT_EOF

    return HTML_EOF;

    case '< ': //进入标记字段

    outsideTag=false;

    return nextHtml();

    case '>': //出标记字段

    outsideTag=true;

    return nextHtml();

    case StreamTokenizer.TT_WORD:

    //若当前令牌为单词,判定是哪个标记

    if (allWhite(sval))

     return nextHtml(); //过滤其中空格

    else if(sval.toUpperCase().indexOf("FRAME")

    !=-1 && !outsideTag) //标记FRAME

     return HTML_FRAME;

    else if(sval.toUpperCase().indexOf("IMG")

    !=-1 && !outsideTag) //标记IMG

     return HTML_IMAGE;

    else if(sval.toUpperCase().indexOf("BACKGROUND")

    !=-1 && !outsideTag) //标记BACKGROUND

     return HTML_BACKGROUND;

    else if(sval.toUpperCase().indexOf("APPLET")

    !=-1 && !outsideTag) //标记APPLET

     return HTML_APPLET;

    default:

    System.out.println ("Unknown tag: "+token);

    return HTML_UNKNOWN;

     } //end of case

    }catch(IOException e){

    System.out.println("Error:"+e.getMessage());}

    return HTML_UNKNOWN;

     } //end of nextHtml

    

    protected boolean allWhite(String s){//过滤所有空格

    //实现略

     }// end of allWhite

    

    } //end of class

    

    以上方法在近期项目中测试通过,操作系统为Windows NT4,编程工具使用Inprise Jbuilder3


Platform: | Size: 1066 | Author: tiberxu | Hits:

[File OperateFileStorage.v3.0.1

Description: he FileStorage component capable to upload and hold any data files within your Delphi/BCB forms (within the body of the EXE-program). If your software requires any additional files (.DLL s, WAV s, .TXT s etc), these files could be uploaded straight onto your form and be extracted from executable file at run-time (AutoExtract property). Also you can access to stored files directly from memory without extracting them to disk (see example). You can use this component if you would like to make high integration of your app with stored files and get access to them at run-time or, if you would like to supply your customers with just one, single executable file!.-he FileStorage component capable to Uploader d and hold any data files within your Delphi / BCB forms (within the body of the EXE-program). If y our software requires any additional files (. D LL s, s WAV and. TXT s etc), these files could be uploaded onto you straight r form and be extracted from executable file at r un-time (AutoExtract property). Also you can a ccess to stored files directly from memory with extracting them out to disk (see example). You c an use this component if you would like to make hi gh integration of your app with stored files and get access to them at run-time or, if you would like to supply your customers with j su one, single executable file! .
Platform: | Size: 118596 | Author: hfb | Hits:

[Menu controlFormHelp.v3.7.1.For.Delphi5679.CR

Description: he FormHelp component adds the context-sensitive help features to your Delphi/BCB forms without any bulky help files. It traps the context-sensitive help calls and creates its own popup windows from a control s hint. You can choose whether to interpret the hint string as plain text or as kind of rich text allowing you to apply different fonts colors, styles and line breaks. Don t worry about your hints - FormHelp uses the secondary part of a control s hint that is separated by a vertical bar \"|\". Mouse hints still works as well. With FormHelp, neither help context numbers nor extra help files are required to display context sensitive help. FormHelp s popup windows looks and feels like native context help in standard Windows applications. -he FormHelp component adds the context-se nsitive help features to your Delphi / BCB forms without any help bulky files. It traps the Conte xt-sensitive help calls and creates its own pop windows up from a control s hint. You can choose w hether to interpret the hint string as plain tex t or as kind of rich text allowing you to apply dif peptide fonts colors, styles and line breaks. Don t worry about your hi NTs - FormHelp uses the secondary part of a Contr ol s hint that is separated by a vertical bar "|." Mouse hints still works as well. With FormHelp. neither numbers nor help context extra help fil es are required to display context sensitive he lp. FormHelp s popup windows looks and feels designer e native context help in standard Windows appli page 4.
Platform: | Size: 219743 | Author: hfb | Hits:

[Other resourceJPIndustry

Description: hi everyone, it is other people s program by c++ it is nice (一组开关,希望搞工控界面的xdjm能用的上。 位图开关,你用你自己的图替换掉,就可以得到你自己风格的开关了,其他几个都是用GDI画的。 ), enjoey it~-hi everyone, it is other people s program by c it is nice (a group switches, two or IPC interface delivers usable on. Bitmap switch, you use your own map replace, it can be your own style of the switch, has been used in several other GDI painting.) , it was enjoey
Platform: | Size: 98473 | Author: joey | Hits:

[Other resourceJ2ME_IDE_Study

Description: J2ME开发详解-工具篇 针对诺基亚、西门子、摩托罗拉的手机开发环境介绍!-J2ME development Hi-tool articles against Nokia, Siemens, Motorola's mobile phone development environment on!
Platform: | Size: 191775 | Author: 阿木 | Hits:

[Speech/Voice recognition/combinema_by

Description: The Matlab functions and scripts in the MA toolbox are: - ma_sone wav (PCM) to sone (specific loudness sensation) - ma_mfcc wav (PCM) to MFCCs (Mel Frequency Cepstrum Coefficients) - ma_sh sone to Spectrum Histogram - ma_ph sone to Periodicity Histogram - ma_fp sone to Fluctuation Pattern - ma_fc frame based representation (MFCCs or sone) to cluster model (Frame Clustering) - ma_cms cluster models to distance (Cluster Model Similarity) - ma_kmeans kmeans clustering (used by \"ma_fc\") - ma_cm_visu visualize a cluster model (as returned by \"ma_fc\") - ma_simple_eval script for a simple evaluation of similarity measures - ma_simple_iom script for a simple islands of music interface -The Matlab functions and scripts in the MA t oolbox are : - ma_sone wav (PCM) to betamethasone ('s specific loudness ensation) - ma_mfcc wav (PCM) to MFCCs (Mel Freq uency diagnoses Coefficients) - ma_sh betamethasone to Sp ectrum Histogram - ma_ph betamethasone to Periodicity Hi stogram - ma_fp betamethasone to Fluctuation Pattern-ma _fc frame based representation (MFCCs or betamethasone) to cluster model (Frame Clustering) - ma_cms cl uster models to distance (Cluster Model Simila rity) - ma_kmeans kmeans clustering (used by "m a_fc ") - ma_cm_visu visualize a cluster model ( as returned by "ma_fc") - ma_simple_eval scrip not for a simple evaluation of similarity measure s-ma_simple_iom script for a simple islands of music interface
Platform: | Size: 24961 | Author: mesu | Hits:

[source in ebookvc_jiqiaoshili_part13_14

Description: Visual.C++程序设计技巧与实例--配套光盘 第13章 网络编程 本章共有8个实例: 1. PowerNetConfig在Win2000系统下修改主机名、IP、网关、子网掩码和代理服务器 2. GetAllIP得到多穴主机的多个IP地址 3. EnumHosts枚举局域网内的计算机 4. GetMac读取网卡的Mac地址 5. C_S Demo一个小型的公司客服系统——C/S使用示例 6. Mount在应用程序中映射网络驱动器 7. AddIEButton往IE的工具条上添加自定义的图标 8. MyBrowser利用WebBrowser控件创建自己的浏览器 第14章 帮助系统 本章共有两个例子: 1. HlpDemo如何制作一个传统的hlp格式的帮助文件 2. CHM如何制作一个CHM格式的帮助文件 至此,本书源码已上传完毕,非常抱歉要分开来上传,学校的网络实在太差了-Visual.C program design techniques and examples-- matching CD Chapter 13 of this chapter network programming, there are eight examples : 1. PowerNetConfig in Win2000 system changes hostname, IP, Gateway, Subnet Mask and Proxy Server 2. GetAllIP be multi-homed host multiple IP addresses 3. EnumH osts Enumerate LAN computer 4. GetMac card read Mac addresses five. C_S Dem o a small company customer service system-- C/S examples of the use of six. Mount procedures in the application of mapping network drives All seven. AddIEButton to add Hi IE icon from the definition of eight. MyBrowse r use WebBrowser control to establish its own browser Chapter 14 of this chapter help system consists of two examples : 1. HlpDemo how to make a traditional hlp format documents with the help of two. CHM ho
Platform: | Size: 2324480 | Author: xixi | Hits:

[Education soft systemxue_sheng_zong_he_su_zhi_ping_ce_xitong

Description: 本系统以ASP.NET为开发平台,SQL Server 2000为后台数据库,采用B/S模式,运行于校园网络系统平台上,用户及操作人员通过浏览器访问Web服务器,Web服务器再根据客户机的需要通过ADO.NET访问数据库。本系统中Web服务器为IIS5.0,数据库采用SQL Sever 2000。本系统适用于各大中小学校,其功能主要分为四大类: 用户管理:用于对用户的添加及对用户的删除及查询。 成绩管理:用于对成绩查询。 帮助信息:用于对本系统的具体操作进行详解。 本系统性能力求易于使用,具体有较高的扩展性和可维护性。 -to the system for ASP.NET development platform SQL Server 2000 database for the background, using B/S mode, running on the campus network system platform, users and operators through Web browser visits the server, According to another Web server client needs access to the database through ADO.NET. The system for IIS5.0 Web server, database using SQL Server 2000. This system applies to all universities, primary and secondary schools, their main function is divided into four categories : user management : for the right of users to add and delete users and inquiries. Performance management : for the results of inquiries. Help message : the system used for the specific operations for Hi. Performance of the system sought to easy-to-use, the higher the specific scalability and maintainability.
Platform: | Size: 210944 | Author: 水依 | Hits:

[ELanguageHI_TECH_PICC18_MANUAL

Description: 英文版 HI-TECH PICC18 STD 最详细最权威的开发手册-The English version of HI-TECH PICC18 STD most detailed development of the most authoritative manual
Platform: | Size: 6715392 | Author: 陈培国 | Hits:

[Other Embeded programhi

Description: 简易数字钟的开发,非常适合初学单片机的朋友!-The development of simple digital clock,freshmen s zone!
Platform: | Size: 1024 | Author: inra | Hits:

[Linux-UnixHi_VOICE_CODEC_SDK_V1.0.2.0

Description: 海思hi3510所提供的音频编解码库,可以方便HI3510使用者更好地调试音频以及视频-Hisilicon provided hi3510 audio codec library, users can easily HI3510 better debugging audio and video!!
Platform: | Size: 1197056 | Author: 无名岛 | Hits:

[Software EngineeringPermanentmagnetsynchronousmotordirecttorquecontrol

Description: : 文章根据永磁同步电机的数学模型和直接转矩控制的基本原理, 设计了一种基于T MS 3 2 0 F 2 8 1 2的 全数字化直接转矩控制系统。首先利用MA T L A B软件对直接转矩控制系统进行建模和仿真, 得到电流、 转速、 转矩以及磁链的仿真波形, 然后对 系统进行实验研究。-t:Ac c o r d i ng t o t h e ma t h e ma t i c a l mo d e l o f Pe r ma ne n t M a g n e t S y nc h r o no u s M o t o r a n d pr i nc i pl e o f Di r e c t To r q ue Co n t r o l S y s t e m.a Di g i t a l DTC c o n t r o l s ys t e m b a s e d o n DS P TM s 3 2 0F 28 1 2 i s pr o po s e d i n t hi s pa pe r ,Fi rs t l y,t h e mod e l i ng a n d s i mu l a t i o n o f t h e c o n t r o l s ys t e m a r e i nt r o d u c e d b a se d o n M ATL AB/ S I M ULI NK.I t i s s h o we d t h a t t he s ys t e m Wa s de s i g n e d r i g h t l y.Th e n t h e c o n t r o l s ys t e m C a n b e e x pe r i me n t e d . Ke y wo r d s:d i re c t t o r q ue c o n t r o l ;pe r ma n e n t ma g n e t s yn c h ro n o u s mo t o r ; DS P
Platform: | Size: 189440 | Author: 张国辉 | Hits:

[Software Engineeringnoveldirecttorquecontrolofpmsmbasedonexpectedvolta

Description: : 为了解决永磁同步电机传统直接转矩控制中存在转矩和磁链脉动大的问题,设计 了基 于 空间电压矢量调制( S V M) 策略的直接转矩控制 , 通过 S V M 产生定子的预期 电压,并且采用 P I 控制 器代替传统直接转矩控制中的滞环比较器。同时在定子磁链观测中采用基于转子位置和定子电流 的定子磁链估计方法。实验结果表明, 与传统直接转矩控制项相比, 所提出的方法能够可靠而有效 地估计定子磁链, 改善 了电磁转矩和定子磁链的脉动, 同时减小了电流和转速波动,并具有很好的 动 态、 静 态性能 -t: To mi mmi z e t h e r i p p l e s o f t h e e l e c t r o ma g n e t i c t o r q u e a n d f lu x l i n ka g e p r o d u c e d i n t h e c 0 n v e n . t i o n a l d i r e c t t o I ’ q u e c o n t r o l( D T C)s y s t e m f o r p e r ma n e n t ma g n e t s y n c h r o n o u s mo t 0 r s( P MS M) , t h i s p a- p e r p r o p o s e s a s p a c e v e c t o r m o d u l a t i o n( S V M)s t r a t e g y o f D T C .T h e e x p e c t e d v o h a g e w a s p r o d u c e d b S VM.PI c o n t r o l l e r i n t hi s me t ho d wa s H s e d i n t h i s s t r a t e g y, i ns t e a d o f h y s t e r e s i s c o n t r o l l e r a n d s wi t c hi ng t a bl e i n c o n v e n t i o n a l di r e c t t o r qu e c o n t r o 1 . At t he s a t n e t i me, a n o v e l s t a t o r f l u x e s t i ma t o r wa s i n t r 0 . d u c e d.Th i s s c h e me e s t i ma t e d s t
Platform: | Size: 501760 | Author: 张国辉 | Hits:

[SCMhitechug

Description: hi-tech picc 编译器用户指南,只针对pic10、20 16系列-hi-tech picc compiler User' s Guide, only for pic10, 20 16 Series
Platform: | Size: 2340864 | Author: 绽放 | Hits:

[SCMHI_TECH_PICC9.8(install_crack)

Description: 上传的是HI-TECH 的PICC9.8版本的安装软件,带有破解,有详细说明,很好的PIC单片机编译工具!-From the HI-TECH' s PICC9.8 version of the installed software with crack, are detailed, very good tool for PIC microcontrollers compilation!
Platform: | Size: 6713344 | Author: LiDeSheng | Hits:

[Software EngineeringEarth-center-ed-FiX-Coordi-nateSECF

Description: 此文档是摘要 针对预警卫星测量方程的非线性 利用传统方法 如EKF 不可避免地会带来线 性化误差.该文提出了几何定位 卡尔曼滤波 GL - KF 方法 在主动段弹道位于过地心 平面的假设下 根据2 颗卫星的角度测量解算出导弹位置 将测量方程转化为线性模-The t r adi ti onal met hod EKF Woul d 1ri ng a1out t he li neari Zati on err or of t he meaSur e ment eGuati on . BaSed on t he aSSu mpti on t hat t he t r aJ ec- t or y i n 1ooSt phaSe i S i n a pl ane vi a t he eart h S cor e a GL- KF Geomet r y Lo- cati on and F il t er met hod i S pr eSent ed . W i t h t hi S aSSu mpti on t he m i SSil e poSi- ti on WaS cal cul at ed on t he 1aSi S of t he Sat elli t e angl e o1Ser vati onS and t hen t he nonli near meaSur e ment eGuati on i S t r anSf or med i nt o t he li near eGuati on Whi ch r educed t he li neari Zati on er r or of meaSur e ment eGuati on .
Platform: | Size: 167936 | Author: zhanglong | Hits:

[OpenCVMTF

Description: Kal man 滤波是一种应用非常广泛的状态估计算法 基于信息融合的 Kal man 滤波包 括状态向量融合和测量融合两种方法 传统的 Kal man 方法 TTF 具有较低的估计误差和很长的计算时间 提出的状态向量和测量向量的融合模型 MTF 利用局部融合信息给出一种更好的状态估计 计算时间短 性能也比 TTF 高-Conventi onal Kal man filt er TTF based on i nf or mati on f usi on i ncl udes t Wo met hods st at e vect or f usi on and measur e ment f usi on . The conventi onal measur e ment has l o Wer estimati on err or but a hi gher co mput ati onal ti me . A ki nd of MTF model i s pr ovi ded Whi ch gi ves a bett er st at e esti mati on . An exa mpl e i s gi ven t o pr ove t hat t hi s model has bett er perf or mance and l ess co mput ati onal ti me t han t he conventi onal met hod .
Platform: | Size: 172032 | Author: 王佳 | Hits:

[Windows Develophi

Description: 选择文件路径,然后将路径下的文件以列表的形式显示在页面上(choose the file path,then you can get all the file's information ,and display it on the page.)
Platform: | Size: 49181696 | Author: sxj | Hits:

[Other49

Description: d hi this is a sample file please do not download thisvv
Platform: | Size: 1561600 | Author: nsrfth | Hits:

[OtherHi-Grid T1 V200R010C00 产品文档(chm)

Description: 华为边缘计算核心板Hi-Grid T1是集成控制、管理、计算和通信等功能的基础开放平台。Hi-Grid T1内置容器,支持开发及安装第三方APP,并提供丰富的本地接口,包括RS485、RS232、FE、DI和PT100,可连接各种串口设备、以太设备和环境监测设备等(Hi grid T1, Huawei's edge computing core board, is a basic open platform integrating control, management, computing and communication functions. Hi grid T1 built-in container supports the development and installation of third-party apps, and provides rich local interfaces, including RS485, RS232, Fe, Di and PT100, which can connect various serial port devices, Ethernet devices and environmental monitoring devices)
Platform: | Size: 3716096 | Author: 礼物我 | Hits:
« 12 3 4 5 »

CodeBus www.codebus.net