[原创]从页面连接采集难度来分析国外各种论坛程序
最近玩采集比较多,采过各种各样的论坛程序了,再加上前段时间在申请某国外空间的时候TOS上写着The PHP script Discuz is not allowed, because it is bad written.
意思就是由于Discuz这个程序写得很烂,因此不能在空间里使用。
现在颇有些感悟,下面就借 Invision , SMF , phpBB , phpwind , Discuz 得页面代码来说明一下
可能很繁琐,而且只是些感悟,没有兴趣得就自便了。
先看下面这一堆页面定义帖子连接得代码
Invision Power Board v2.1.7
<!-- End Topic Entry 494 --><!-- Begin Topic Entry 44 -->
<tr>
<td align="center" class="row5" id='tid-folder-44' onclick='return topic_toggle_folder("44");'><img src='style_images/luminous/f_hot.gif' border='0'alt='Hot topic' /></td>
<td align="center" class="row9">;</td>
<td class="row1" valign="middle">
<div style='float:right'></div>
<div>
<a href='http://www.webextension.net/forums/index.php?showtopic=44&view=getnewpost'><img src='style_images/luminous/newpost.gif' border='0'alt='Goto last unread' title='Goto last unread' hspace=2></a> <span id='tid-span-44'><a id="tid-link-44" href="http://www.webextension.net/forums/index.php?showtopic=44" title="This topic was started: Jul 4 2006, 07:23 PM">If you had 3 million dollars what would you buy?</a></span> ;<a href="javascript:multi_page_jump('http://www.webextension.net/forums/index.php?showtopic=44', 34, 20 );" title="multipage jump"><img src='style_images/luminous/pages_icon.gif' alt='*' border='0' /></a> <span class="minipagelink"><a href="http://www.webextension.net/forums/index.php?showtopic=44&st=0">1</a></span><span class="minipagelink"><a href="http://www.webextension.net/forums/index.php?showtopic=44&st=20">2</a></span>
<div class="desc"><span onclick='return span_desc_to_input("44");' id='tid-desc-44'></span></div>
</div>
</td>
<td align='center' class="row2">
<a href="javascript:who_posted(44);">33</a> </td>
<td align="center" class="row2"><a href='http://www.webextension.net/forums/index.php?showuser=11'>Matt</a></td>
<td align="center" class="row2">128</td>
<td class="row1"><span class="lastaction">Today, 10:37 AM<br /><a href="http://www.webextension.net/forums/index.php?showtopic=44&view=getlastpost">Last post by:</a> <b><a href='http://www.webextension.net/forums/index.php?showuser=154'>DangerMouse</a></b></span></td><td class="row6"><!-- no content --></td>
</tr>
<!-- End Topic Entry 44 --><!-- Begin Topic Entry 492 -->
SMF 1.1 RC2
<tr>
<td class="windowbg2" valign="middle" align="center" width="5%">
<img src="http://www.finespot.net/chinese/Themes/default/images/topic/normal_post.gif" alt="" />
</td>
<td class="windowbg2" valign="middle" align="center" width="4%">
<img src="http://www.finespot.net/chinese/Themes/default/images/post/xx.gif" alt="" />
</td>
<td class="windowbg" valign="middle" >
<span id="msg_201"><a href="http://www.finespot.net/chinese/index.php?topic=63.0">申请空间!!!</a></span>
<small id="pages201"></small>
</td>
<td class="windowbg2" valign="middle" width="14%">
<a href="http://www.finespot.net/chinese/index.php?action=profile;u=42" title="观看会员资料 会员: nappy2007">nappy2007</a>
</td>
<td class="windowbg" valign="middle" width="4%" align="center">
2
</td>
<td class="windowbg" valign="middle" width="4%" align="center">
34
</td>
<td class="windowbg2" valign="middle" width="22%">
<a href="http://www.finespot.net/chinese/index.php?topic=63.msg294#new"><img src="http://www.finespot.net/chinese/Themes/default/images/icons/last_post.gif" alt="最后回复" title="最后回复" style="float: right;" /></a>
<span class="smalltext">
八月 15, 2006, 10:18:29 pm<br />
作者 <a href="http://www.finespot.net/chinese/index.php?action=profile;u=42">nappy2007</a>
</span>
</td>
</tr>
phpBB
<tr>
<td class="row1" align="center" valign="middle" width="20"><img src="templates/subSilver/images/folder_new.gif" width="19" height="18" alt="New posts" title="New posts" /></td>
<td class="row1" width="100%"><span class="topictitle"><a href="vt-48124.html&view=newest"><img src="templates/subSilver/images/icon_newest_reply.gif" alt="View newest post" title="View newest post" border="0" /></a> <a href="vt-48124.html" class="topictitle">What famous do you want to come back to life?</a></span><span class="gensmall"><br />
</span></td>
<td class="row2" align="center" valign="middle"><span class="postdetails">19</span></td>
<td class="row3" align="center" valign="middle"><span class="name"><a href="profile.php?mode=viewprofile&u=9643">jaime</a></span></td>
<td class="row2" align="center" valign="middle"><span class="postdetails">76</span></td>
<td class="row3Right" align="center" valign="middle" nowrap="nowrap"><span class="postdetails">Thu Aug 17, 2006 12:52 pm<br /><a href="profile.php?mode=viewprofile&u=9852">Vrythramax</a> <a href="vp-398286.html#398286"><img src="templates/subSilver/images/icon_latest_reply.gif" alt="View latest post" title="View latest post" border="0" /></a></span></td>
</tr>
PHPWind v4.3.2
<tr align=center class=t_two>
<td><a title='打开新窗口' href="read-htm-tid-237810.html" target=_blank><img src='images/wind/thread/topichot.gif' border=0></a></td>
<td class=t_one onMouseOver="this.className='t_two'" onMouseOut="this.className='t_one'" align=left style="padding-left:8px" id=''>
<img src='images/wind/file/headtopic_3.gif' alt='置顶帖标志'>
<a href="read-htm-tid-237810.html" id=''><b><font color=orange>“PHPWind激情夏日”系列活动拉开帷幕,数码相机等你拿</font></b></a> <img src='images/wind/file/img.gif' align='absbottom' border=0>
[ <img src='images/wind/file/multipage.gif' border=0><span style='font-size:7pt;font-family:verdana;'> <a href="read-htm-tid-237810-page-1-fpage-1.html">1</a> <a href="read-htm-tid-237810-page-2-fpage-1.html">2</a> <a href="read-htm-tid-237810-page-3-fpage-1.html">3</a> <a href="read-htm-tid-237810-page-4-fpage-1.html">4</a> <a href="read-htm-tid-237810-page-5-fpage-1.html">5</a> .. <a href="read-htm-tid-237810-page-21-fpage-1.html">21</a></span> ]
</div>
</td>
<td class=smalltxt>
<a href="profile-htm-action-show-uid-617.html">甜橙</a>
<br>2006-07-31</td>
<td class=t_one>200</td>
<td class=t_one>10485</td>
<td class=smalltxt>;
<a href="read-htm-tid-237810-page-e-fpage-1.html#a">
2006-08-16 16:51
</a><br>
by: lele521</td></tr>
Discuz! 5.0.0 RC1
<table cellspacing="1" class="f_row">
<tr>
<td class="f_folder"><a href="viewthread.php?tid=16724707" target="_blank"><img src="images/jgwy/hot_folder.gif" border="0" alt="" /></a></td>
<td class="f_icon">;</td>
<td class="altbg2" onMouseOver="this.className='altbg1'" onMouseOut="this.className='altbg2'">
<img src="images/jgwy/digest.gif" class="absmiddle" alt="" /> 精华<b>
II</b>:;
<a href="viewthread.php?tid=16724707&extra=page%3D1" style="font-weight: bold;color: blue">给做网站的新手们建议一条走的路子</a>
; ;( <img src="images/jgwy/multipage.gif"border="0" alt="" /> <a href="viewthread.php?tid=16724707&extra=page%3D1&page=1">1</a> <a href="viewthread.php?tid=16724707&extra=page%3D1&page=2">2</a> <a href="viewthread.php?tid=16724707&extra=page%3D1&page=3">3</a> <a href="viewthread.php?tid=16724707&extra=page%3D1&page=4">4</a> <a href="viewthread.php?tid=16724707&extra=page%3D1&page=5">5</a> <a href="viewthread.php?tid=16724707&extra=page%3D1&page=6">6</a>.. <a href="viewthread.php?tid=16724707&page=7&extra=page%3D1">7</a> )</td><td class="f_author">
<a href="viewpro.php?uid=21476">cnrain</a>
<br><span class="smalltxt">2006-2-10</span></td>
<td class="f_replies">103</td>
<td class="f_views">6121</td>
<td class="f_last" nowrap>2006-8-10 00:41<br>by
<a href="viewpro.php?username=janglit">janglit</a>
;<a href="redirect.php?tid=16724707&goto=lastpost#lastpost"><img src="images/jgwy/lastpost.gif" border="0" class="absmiddle" alt="" /></a>
</td></tr></table>
咋一看,Discuz得代码是最整齐得,这也是5.0版本得一点重大改进,但是当我真正来采集连接得时候却发现Discuz得连接采集却是最麻烦得,因为从代码中也可以看到,“本版置顶:” ,“精华图标”."附件"等各种各样得图标和文字都和标题效果等都和标题共同处在一个 <td></td>里面,phpwind也有这样得现象,而其他三个外国论坛程序都是分别为文字,文字效果,图标,帖子状态等分块设置了,SMF甚至把这些状态都在帖子连接定义得开头定义了。
这是现象,但是两个中文论坛这样做使界面更加友善了,没什么不好得,而且这些都是php生成得html而已。对,我认为很好。但是从这个现象使我去查看了它们的php源码,我不是php专家,或许我没什么发言权,但是我发现的是,Invision , SMF , phpBB 这3个的php如它们生成的html一样模块很清晰,而后两者则也如它们所生成的html一样,请允许我用"不是很分明"来形容,给我的感觉就是,一段代码写好了,又匆忙往里面粘贴了另一个功能的代码一样。就是这样的感觉。
不过,这个代码只是它们整个程序的冰山一角。或许也说明不了什么问题,高手就只当我发发牢骚吧。。。
[ 本帖最后由 poison 于 2006-8-17 21:45 编辑 ] 只跟模板有关,跟代码没多大关系.
附:dz5是有史以来最烂的版本,垃圾中的垃圾 对采集还是有成见的 就不发表任何言论了 水贴 dz5鸡肋多 D5。。。诶。。。 原帖由 winsock 于 2006-8-17 21:10 发表
只跟模板有关,跟代码没多大关系.
附:dz5是有史以来最烂的版本,垃圾中的垃圾
嗯。。我知道,所以我说,发现他的代码和他生成的html一样奇怪啊。。 程序员 :D:D:D 怎么就没人赞扬一下LZ
偶还没搞懂,好像还不错 拼凑代码?
开发者基础不好?
半路出家?
页:
[1]