orbiter
fffb91447a
fixed crawl queue delete function
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7357 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b769cce433
- added a catch-all parser for all documents that cannot be parsed: they will contributed with their document url for the search index only
...
- enhanced the pdf and torrent parser: better documents titles
- enhanced the ftp client: more time-out time
- fixed bugs in json for search results
- enhanced yacyinteractive.html: added a file type navigator and a download-script generator for search result files
Please have a look at yacyinteractive.html: this will become the hacker-download tool for 27c3!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7355 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
22453b13ad
implemented local host address discovery as posted in
...
http://forum.yacy-websuche.de/viewtopic.php?p=21310#p21310
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7351 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
cc6499bf8d
- added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed
...
- renamed YaCys search result modifications keywords for RECENT, NEAR and language: to the blekko slashtag naming scheme. YaCy now supports the following blekko-like slash built-in slashtags:
/date
- for search results ordered by date (most recent up)
/near
- for search results where search words appear near to each other (closest up)
/language/<lang>
- for a sorting by language where the wanted language gets up. Example: /language/de
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7350 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a9f754c45f
removed unused CR accumulation and distribution process
...
this was never used and extended in the last years. The resulting YBR ranking criteria
is still a good idea and will be used in the future. Possible generation methods for YBR
ranking are:
- "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS)
- "block-rank" calculated from the local link structure
- a distributed "block-rank" using the xml API to the link structure from other peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7349 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d4a1a1850b
removed warnings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7347 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
3b5830b7d4
*) Fixed typo.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7346 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
9b3fae9496
*) cleaning up the code a little bit
...
*) program to interface, not implementation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7345 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7bb4b001ed
- view image files from cache
...
- fixed generic header settings; affects CORS functionality
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7344 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
e7552bd719
*) cleaning up the code a little bit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7343 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
737aaf6952
various small changes to ymarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7339 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
8a50670546
some code clean up for the last post
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7338 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
442497868d
another step towards an auto tagging function for YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7337 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
741a87a3e9
* make .yacy-domains crawlable (.yacy-domains are local domains, so only in custom networks/peers)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7334 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
dca9e16f51
* don't index pages, which redirect, twice
...
* there fore auto-redirection of HTTPClient for crawling is disabled and the old code is reactivated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7332 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
eb79b952ef
*) cleaner code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7331 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
38fdf43587
*) renamed classes according to standard Java coding conventions
...
*) String.isEmpty() was introduced in Java 1.6, but we still use Java 1.5
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7330 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
025e3f4790
*) renamed classes according to standard Java coding conventions
...
*) removed unsused code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7328 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
3b9aa0504e
*) removed unsused code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7327 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
db3db0fdb9
*) trying to make this class less confusing (probably failing)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7326 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
54e63b556e
intermediate step for a YMark auto-tagging function based on word frequencies.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7325 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
403ee9c014
added a drill-down for metadata and word count to /api/ymarks/test_treeview.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7324 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
11ae5b108e
enabled rebuildIndex for /Table_YMark_p.html (rebuilds the tags and folders index)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7320 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
94a9be18a4
added a ymark table administration: /Table_YMark_p.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7316 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
25339f93c7
more updates to ymarks
...
- working xbel import/export
- exported xbel includes yacy specific metadata but still validates against PUBLIC DTD
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7315 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
cdd65aca71
update to ymarks
...
- get_xbel.xml is almost working
- startet ymark api documentation info.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7313 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
808edffaf6
ymarks
...
- some refactoring
- working xbel and html import (/api/ymarks/test_import.html)
- working treeview (/api/ymarks/test_treeview.html)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7312 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
2c539b514a
* add domaincheck (local/global/domainlist) to urlcleaner
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7311 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
117fc86b3d
fix for http://forum.yacy-websuche.de/viewtopic.php?p=21199#p21199
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7308 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
09badc697b
- low-memory patch for crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7304 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
becc463d8a
enhanced did-you-mean
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7300 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
43586a2ace
a update to ymarks (please test if you wish):
...
- import HTML (e.g. FF export) via /api/ymarks/import.html
- view your import via /api/ymarks/test.html
- get a xml list via /api/ymarks/get_ymark_list.xml?tags=&folders=
- delete bookmark tables via standard interface /Tables_p.html
it is still very experimental!!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7299 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
93c535d111
fixed http://forum.yacy-websuche.de/viewtopic.php?p=21113#p21113
...
fixed a concurrent modification exception during search and a time-out problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7298 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4c72885cba
added a sitemap entry parser and loader for sitemaps
...
(a recursion if a sitemap refers to another sitemap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7295 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
790e0b1894
- enhanced index deletion in IndexControlRWIs_p: delete also robots.txt database and cache if demanded
...
- added option for details of deletion
- added deletion to new ConfigHTCache_p servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7294 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
f5324b27f2
more updates to the new bookmarks (ymarks)....
...
- split YMarkTables and YMarkIndex in two different classes
- HTML import is working properly
- XBEL import is still broken
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7292 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
445619f3ec
added a submenu ConfigHTCache_p.html to set the size of the HTCache separately from the proxy configuration.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7291 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
acd93b1b31
* add failsafe mechanisme to domainlist retrieval
...
domainlist is saved locally, if none of the given urls in network.unit.domainlist
could be retrieved, the file from the last boot is used instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7289 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
70c95608d4
Added CORS Access header for yacysearch.rss output
...
used some of the recommendations from Copro:
http://forum.yacy-websuche.de/viewtopic.php?p=21015#p21015
Original Request:
http://forum.yacy-websuche.de/viewtopic.php?p=20829#p20829
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7288 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
lotus
18729351e7
upnp: hint for wrongly detected local ip address
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7286 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
def4253555
* add option to network definition to provide a domainlist (syntax like in blacklists)
...
* crawler and search allow only urls matching one in domainlist (if list is provided)
* this may be useful to prevent dedicated networks from being "polluted"
* FilterEngine is improved Backlist-object, Blacklist may inherit from FilterEngine in the future
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7285 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
ac6b503adf
untar files without gzip decompression even if the file has gz extension. this is done when the decompression fails.
...
decompressed gzip files with gz extension may appear if the server sets a gzip compression header
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7282 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
efe0667fdd
more new bookmark (ymarks) code with experimental html and xbel import
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7281 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
mikeworks
caabebf9be
Fixed spelling mistake omiting -> omitting in debug messages in ConfigUpdate_p.java and Switchboard.java
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7280 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
155d556568
- better memory protection
...
- more logging
- little bit of refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7278 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
7d8de34778
* add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7276 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
25a8e55bc9
more logging about bad seeds
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7275 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
959b8c6fa0
- allow greater seed size
...
- more logging for bad seeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7274 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e103419a56
- removed <3 peers barrier for peer ping feedback
...
- more logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7273 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
apfelmaennchen
d0e6c03b51
some updates to the new bookmark code...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7272 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago