Welcome![Sign In][Sign Up]
Location:
Search - hi p

Search list

[Internet-Network用Java编写HTML文件分析程序

Description:

Java编写HTML文件分析程序

 一、概述

    

    Web服务器的核心是对Html文件中的各标记(Tag)作出正确的分析,一种编程语言的解释程序也是对源文件中的保留字进行分析再做解释的。实际应用中,我们也经常会碰到需要对某一特定类型文件进行要害字分析的情况,比如,需要将某个HTML文件下载并同时下载与之相关的.gif.class等文件,此时就要求对HTML文件中的标记进行分离,找出所需的文件名及目录。在Java出现以前,类似工作需要对文件中的每个字符进行分析,从中找出所需部分,不仅编程量大,且易出错。笔者在近期的项目中利用Java的输入流类StreamTokenizer进行HTML文件的分析,效果较好。在此,我们要实现从已知的Web页面下载HTML文件,对其进行分析后,下载该页面中包含的HTML文件(假如在Frame中)、图像文件和ClassJava Applet)文件。

    

    二、StreamTokenizer

    

    StreamTokenizer即令牌化输入流的作用是将一个输入流中变成令牌流。令牌流中的令牌实体有三类:单词(即多字符令牌)、单字符令牌和空白(包括JavaC/C++中的说明语句)。

    

    StreamTokenizer类的构造器为: StreamTokenizer(InputStream in)

    

    该类有一些公有实例变量:ttypesvalnval ,分别表示令牌类型、当前字符串值和当前数字值。当我们需要取得令牌(即HTML中的标记)之间的字符时,应访问变量sval。而读向下一个令牌的方法是调用nextToken()。方法nextToken()的返回值是int型,共有四种可能的返回:

    

    StreamTokenizer.TT_NUMBER: 表示读到的令牌是数字,数字的值是double型,可以从实例变量nval中读取。

    

    StreamTokenizer.TT_Word: 表示读到的令牌是非数字的单词(其他字符也在其中),单词可以从实例变量sval中读取。

    

    StreamTokenizer.TT_EOL: 表示读到的令牌是行结束符。

    

    假如已读到流的尽头,则nextToken()返回TT_EOF

    

    开始调用nextToken()之前,要设置输入流的语法表,以便使分析器辨识不同的字符。WhitespaceChars(int low, int hi)方法定义没有意义的字符的范围。WordChars(int low, int hi)方法定义构造单词的字符范围。

    

    三、程序实现

    

    1HtmlTokenizer类的实现

    

    对某个令牌流进行分析之前,首先应对该令牌流的语法表进行设置,在本例中,即是让程序分出哪个单词是HTML的标记。下面给出针对我们需要的HTML标记的令牌流类定义,它是StreamTokenizer的子类:

    

    

    import java.io.*;

    import java.lang.String;

    class HtmlTokenizer extends

    StreamTokenizer {

    //定义各标记,这里的标记仅是本例中必须的,

    可根据需要自行扩充

     static int HTML_TEXT=-1;

     static int HTML_UNKNOWN=-2;

     static int HTML_EOF=-3;

     static int HTML_IMAGE=-4;

     static int HTML_FRAME=-5;

     static int HTML_BACKGROUND=-6;

     static int HTML_APPLET=-7;

    

    boolean outsideTag=true; //判定是否在标记之中

    

     //构造器,定义该令牌流的语法表。

     public HtmlTokenizer(BufferedReader r) {

    super(r);

    this.resetSyntax(); //重置语法表

    this.wordChars(0,255); //令牌范围为全部字符

    this.ordinaryChar('< '); //HTML标记两边的分割符

    this.ordinaryChar('>');

     } //end of constrUCtor

    

     public int nextHtml(){

    int token; //令牌

    try{

    switch(token=this.nextToken()){

    case StreamTokenizer.TT_EOF:

    //假如已读到流的尽头,则返回TT_EOF

    return HTML_EOF;

    case '< ': //进入标记字段

    outsideTag=false;

    return nextHtml();

    case '>': //出标记字段

    outsideTag=true;

    return nextHtml();

    case StreamTokenizer.TT_WORD:

    //若当前令牌为单词,判定是哪个标记

    if (allWhite(sval))

     return nextHtml(); //过滤其中空格

    else if(sval.toUpperCase().indexOf("FRAME")

    !=-1 && !outsideTag) //标记FRAME

     return HTML_FRAME;

    else if(sval.toUpperCase().indexOf("IMG")

    !=-1 && !outsideTag) //标记IMG

     return HTML_IMAGE;

    else if(sval.toUpperCase().indexOf("BACKGROUND")

    !=-1 && !outsideTag) //标记BACKGROUND

     return HTML_BACKGROUND;

    else if(sval.toUpperCase().indexOf("APPLET")

    !=-1 && !outsideTag) //标记APPLET

     return HTML_APPLET;

    default:

    System.out.println ("Unknown tag: "+token);

    return HTML_UNKNOWN;

     } //end of case

    }catch(IOException e){

    System.out.println("Error:"+e.getMessage());}

    return HTML_UNKNOWN;

     } //end of nextHtml

    

    protected boolean allWhite(String s){//过滤所有空格

    //实现略

     }// end of allWhite

    

    } //end of class

    

    以上方法在近期项目中测试通过,操作系统为Windows NT4,编程工具使用Inprise Jbuilder3


Platform: | Size: 1066 | Author: tiberxu | Hits:

[Printing programterrificskyfox

Description: Windows2000 虚拟打印驱动编程-driven hi-tech hi-tech-driven page-page page-driven hi-tech hi-tech hi-tech-driven page page Flooding hi-tech dynamic page-driven hi-tech hi-tech-driven page-page page-driven hi-tech hi-tech hi-tech driven p. p. drive hi-tech p.
Platform: | Size: 86645 | Author: 袭建帅 | Hits:

[Printing programterrificskyfox

Description: Windows2000 虚拟打印驱动编程-driven hi-tech hi-tech-driven page-page page-driven hi-tech hi-tech hi-tech-driven page page Flooding hi-tech dynamic page-driven hi-tech hi-tech-driven page-page page-driven hi-tech hi-tech hi-tech driven p. p. drive hi-tech p.
Platform: | Size: 86016 | Author: 袭建帅 | Hits:

[ELanguagepicc8.05-pl

Description: HI-TECH PICC 8.05PL1
Platform: | Size: 7612416 | Author: 劉朝文 | Hits:

[SCMht-pic

Description: Hitech PICC 绿色版(V8.05PL2),用于Microchip芯片的c语言开发。-Hitech PICC Green version (V8.05PL2), for Microchip chip c language development.
Platform: | Size: 4076544 | Author: banalsheep | Hits:

[EditorPowerPdf09

Description: PowerPdf 0.9 Full Source-PowerPdf 0.9 Full Source http://www.est.hi-ho.ne.jp/takeshi_kanno/powerpdf/ PowerPdf is a VCL to create PDF docment visually. Like QuickReport, you can design PDF document easily on Delphi IDE.
Platform: | Size: 550912 | Author: 瘦马 | Hits:

[J2MEtouch-Colorlinez

Description: 用J2ME写的一种七彩连珠(五子连线)的手机游戏 通用程序,支持键盘和触摸屏,自适应屏幕大小,山寨IPHONE的最佳选择--Using J2ME to write a five sons connected mobile games, using key and touch screen. it can adapt for all the screen size. it is better to using hi-iphone
Platform: | Size: 20480 | Author: robustman | Hits:

[SCMHCPICP-pro-9.60PL5

Description: PICC9.60PL5 PRO版,很新的了。-PICC9.60PL5 PRO version of the very new.
Platform: | Size: 5969920 | Author: yanxinming | Hits:

[ELanguageSalvo_RTOS_PIC_3.2.3_PRO

Description: Real Time Operating System for Hi-Tech C compiler.
Platform: | Size: 22983680 | Author: wykamakura | Hits:

[SCMStdARapidLocalThreshold

Description: 摘要:通常传统曲遗缘桂潮算法对叁局田片有一十统一 的域值信息。这样客暑造底细节曲丢失。丰文根据图像中每 个像素辫小部城内曲局部茸值选取。得到较好的遗雉提取效 果。仿真裹明,试方法能謦根据小郜城内的灰度信惠白适盅 地逸择搠位,从面捡测出更清晰图像过馨。-Yeditieul edp det∞tloa·枷fh删Illb hI··仰;蹦 山f讨bold to the叶c硼plcture,which-l p-嘴∞Io*B the detail.Accc_d- in5∞=mall neishbedsood—each州h the pictamt t乜trdclc kI¨“ thne*held in p酬.口b1.i叫-better edge-dH硎on effect.In this method simul“o=.how=III.I the edp threeholdl c”be adaptively lelected
Platform: | Size: 165888 | Author: christine | Hits:

[Software EngineeringPermanentmagnetsynchronousmotordirecttorquecontrol

Description: : 文章根据永磁同步电机的数学模型和直接转矩控制的基本原理, 设计了一种基于T MS 3 2 0 F 2 8 1 2的 全数字化直接转矩控制系统。首先利用MA T L A B软件对直接转矩控制系统进行建模和仿真, 得到电流、 转速、 转矩以及磁链的仿真波形, 然后对 系统进行实验研究。-t:Ac c o r d i ng t o t h e ma t h e ma t i c a l mo d e l o f Pe r ma ne n t M a g n e t S y nc h r o no u s M o t o r a n d pr i nc i pl e o f Di r e c t To r q ue Co n t r o l S y s t e m.a Di g i t a l DTC c o n t r o l s ys t e m b a s e d o n DS P TM s 3 2 0F 28 1 2 i s pr o po s e d i n t hi s pa pe r ,Fi rs t l y,t h e mod e l i ng a n d s i mu l a t i o n o f t h e c o n t r o l s ys t e m a r e i nt r o d u c e d b a se d o n M ATL AB/ S I M ULI NK.I t i s s h o we d t h a t t he s ys t e m Wa s de s i g n e d r i g h t l y.Th e n t h e c o n t r o l s ys t e m C a n b e e x pe r i me n t e d . Ke y wo r d s:d i re c t t o r q ue c o n t r o l ;pe r ma n e n t ma g n e t s yn c h ro n o u s mo t o r ; DS P
Platform: | Size: 189440 | Author: 张国辉 | Hits:

[Software Engineeringnoveldirecttorquecontrolofpmsmbasedonexpectedvolta

Description: : 为了解决永磁同步电机传统直接转矩控制中存在转矩和磁链脉动大的问题,设计 了基 于 空间电压矢量调制( S V M) 策略的直接转矩控制 , 通过 S V M 产生定子的预期 电压,并且采用 P I 控制 器代替传统直接转矩控制中的滞环比较器。同时在定子磁链观测中采用基于转子位置和定子电流 的定子磁链估计方法。实验结果表明, 与传统直接转矩控制项相比, 所提出的方法能够可靠而有效 地估计定子磁链, 改善 了电磁转矩和定子磁链的脉动, 同时减小了电流和转速波动,并具有很好的 动 态、 静 态性能 -t: To mi mmi z e t h e r i p p l e s o f t h e e l e c t r o ma g n e t i c t o r q u e a n d f lu x l i n ka g e p r o d u c e d i n t h e c 0 n v e n . t i o n a l d i r e c t t o I ’ q u e c o n t r o l( D T C)s y s t e m f o r p e r ma n e n t ma g n e t s y n c h r o n o u s mo t 0 r s( P MS M) , t h i s p a- p e r p r o p o s e s a s p a c e v e c t o r m o d u l a t i o n( S V M)s t r a t e g y o f D T C .T h e e x p e c t e d v o h a g e w a s p r o d u c e d b S VM.PI c o n t r o l l e r i n t hi s me t ho d wa s H s e d i n t h i s s t r a t e g y, i ns t e a d o f h y s t e r e s i s c o n t r o l l e r a n d s wi t c hi ng t a bl e i n c o n v e n t i o n a l di r e c t t o r qu e c o n t r o 1 . At t he s a t n e t i me, a n o v e l s t a t o r f l u x e s t i ma t o r wa s i n t r 0 . d u c e d.Th i s s c h e me e s t i ma t e d s t
Platform: | Size: 501760 | Author: 张国辉 | Hits:

[Software EngineeringHI-TECH-dsPICC-V9-50

Description: Visual PIC软件是PIC单片机的编程助手软件,你可以通过它来设定一些内部模块的使用 情况(如定时器,SCI,CCP)等模块的设置。省去了人工去记这些设定值。设置完成后可以自动 生成C源程序。本软件界面友好,使用方便。欢迎各单片机爱友者对它提示更好的功能建议。 -Visual PIC software is the PIC microcontroller programming assistant software, you can configure it to use some internal modules (such as timers, SCI, CCP) and other module settings. Eliminating the manual to remember these settings. Once in C source code can be automatically generated. The software interface is friendly and easy to use. You are welcome to SCM love it prompted recommendations for better functionality.
Platform: | Size: 46080 | Author: jt | Hits:

[Othermatlab_FEM_dismesh

Description: DistMesh giving a singular FEM matrix?-Hi Anyone here with some experience of using DistMesh in finite element code? I m solving the scalar Helmoltz equation in an annular region in 2D: rho_1 < rho < rho_2 where rho_1 is the radius of a Perfect Electric Conducting cylinder and rho_2 is where the mesh is truncated using an Absorbing Boundary condition. Now, I ve written code to mesh the region myself (by dividing the region into annular rings and picking a fixed number of points on each ring) and I ve written another program which uses DistMesh to mesh the region. I get a nice solution using my own meshing code but the FEM matrix becomes singular when I use DistMesh. This is the DistMesh Code I used: Circle with hole rABC=1.5 rCyl=0.5 fdstring=sprintf( ddiff(dcircle(p,0,0, f),dcircle(p,0,0, f)) ,rABC,rCyl) fd=inline(fdstring, p ) box=[-2,-2 2,2] [p,N]=distmesh2d(fd,@huniform,0.04,box,[]) I tried changing box and the 0.04 value (initial edge length). But every time, the matrix becomes sin
Platform: | Size: 37888 | Author: skypigr | Hits:

[SCMHi-Tech_PIC_C-compiler_v9.60

Description: Hitech PIC MCU IDE for C coding and programming.
Platform: | Size: 9601024 | Author: Mars | Hits:

[Mathimatics-Numerical algorithmsmodified-stfd_esprit

Description: 提 出了基于修正空间 时频分布( S TF D) 矩阵 的 ES P RI T算法 以实现 对宽 带线性调 频信号 的到达 角估计-Th e a l g or i t h m f o r di r e c t i o n- o f- a r r i va l o f t he wi d e ba n d c hi r p s i gna l s ba s e d 0 1 3 .ESPRI T u s i n g t he mo di f i e d s p a t i a l t i me- f r e q ue n c y ma t r i x i S pr e s e nt e d.The mod i f i e d STFD ma t r i x whi c h h as t he s i mi l ar ma t he ma t i c a l c on st r uc t i o n wi t h t he c o va r i a nc e ma t r i x c a n be ob t a i ne d wi t h t h e c r o s s W i gn e r- Vi l l e di s t r i but i o ns o f t he o ut pu t s of t he ar r a y.Unde r t he c o nd i t i on of un i f o r m l i n e a r r a y,t he mo di f i e d STFD ma t r i x c a n b e t r a n s f or me d i nt o t he ma t r i x wh i c h h a s t he pr op e r t y of r ot a t i on a l i nva r i a nc e . The n t he ESPRI T c a n be a p pl i e d t o D O A e s t i ma t i on
Platform: | Size: 145408 | Author: fjp119 | Hits:

[Embeded-SCM DevelopHi-Tech-PICC-Compiler-8.01-PL3-P-Salvo-RTOS-221.r

Description: Salvo 2.2.0 for PIC
Platform: | Size: 22612992 | Author: bypass | Hits:

[IOSpacklog

Description: 这是一款简单的Iphone日志工具 有使用xml , SQLite 学习一下吧-A few highlights: Custom UITableViewCells built using IB. TouchXML usage (http://code.google.com/p/touchcode/). SQLitePersistentObject usage (http://code.google.com/p/sqlitepersistentobjects/) (incomplete). Custom URI schemes. Editable UITableViewCells. PinchMedia Analytics (http://pinchmedia.com – Hi Greg!)
Platform: | Size: 229376 | Author: lin | Hits:

[ExploitImminent-Monitor-v2.0.1.9-Cracked-P-Stub-Source.r

Description: HI BUDDS THIS IM2 CRACKED FOR YOU
Platform: | Size: 3077120 | Author: James | Hits:

[matlab?? ?n t?p cu?i n?m l?p 2 (1)

Description: Hi all, I want to work with Minipar parser but the home page:
Platform: | Size: 183296 | Author: linhd13cn1 | Hits:
« 12 »

CodeBus www.codebus.net