orbiter
bfe51c7228
added generation of domain-list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1112 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
0ec54d9c5f
enhanced CR-file handling and added first RCI-evaluation tests
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
c2fe3a1670
*) Updating jMimeMagic Ruleset
...
- to detect some special formated html documents correctly
- adding rule to detect vCards
*) plasmaParser now supports parsing of files that have a supported fileExtension
but a unsupported mimeType because the webserver has set it incorrectly to text/plain
*) Adding vCard new Parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1107 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
88e3234393
fine-tuning of rci-generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1105 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
a12759c1bf
first try to implement a rci-computation from cr-files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1103 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
4a8e8f269e
refactoring of cr-processing; new kelondro class to handle the attribute file format
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1100 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
24dc0e0760
implemented cr-file processing and further transmission steps
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9d9a87f445
limited htcache storage length
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1096 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
d0dfccdb77
*) Making CrawlStacker pool configurable via GUI and config file
...
See: http://www.yacy-forum.de/viewtopic.php?t=1448
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1087 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
3631cb1f6d
*) deleting empty entities during index selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1086 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
ca26aab9b1
*) More debugging output for migrateWords
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1085 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
9b35ae9027
*) Correcting wrong % values on IndexTransfer_p page
...
See: http://www.yacy-forum.de/viewtopic.php?p=12646
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1084 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
e6bf9d90a5
*) Fixing Problems with MalformedURLs during Word Selection
...
- removing (lurl.toString() == null) comparison because toString() is never null
- adding (lurl.url() == null) condition because url() is null if we have selected a word entry with
a malformed URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1083 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
86a9210264
*) indexing queue slots are now configurable via config file
...
See: http://www.yacy-forum.de/viewtopic.php?t=1480
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1081 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
3c11d7b81c
*) Bugfix for minimizeUrlDB
...
- function didn't work correctly because of new url hash structure
See: http://www.yacy-forum.de/viewtopic.php?p=12753#12753
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1080 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
orbiter
9913049009
fixed outOfMemory bug caused by loops in kelondroTree during enumeration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1079 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
bbb936b9ea
*) Bugfix for not human readable content of PDFs while viewing the URL Content via GUI
...
- This Bug also affects the snippet generation on non html/text documents
See: http://www.yacy-forum.de/viewtopic.php?t=1472
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1075 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
445e3a620f
*) Avoid rejecting of html content by the crawler when the file extension is not set properly
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1074 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
theli
444a5a9368
*) Bugfix for Entries with null url in GlobalQueue
...
See: http://www.yacy-forum.de/viewtopic.php?p=12675#12675
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1069 6c8d7289-2bf4-0310-a012-ef5d649a1542
19 years ago
borg-0300
ebac51df52
restore defaultRemoteProfile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1063 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
5778428455
move cutUrlText to nxTools,
...
max length from URLs(title) on searchpage now 120 chars
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1060 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
9158845c3b
bugfix for snippet text null bytes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1059 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
f763923e0a
added missing files for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
79818a320f
introduced citation-rank transmission protocol and activate transport for anonymisation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
7e0647f692
*) Bugfix for userDB usage during authentication
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1052 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
02f8013013
auto-delete of corrupted word files during word-migration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1047 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
d2731418bf
added creation of global ranking files and changed url normal form usage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
6f9f8ed8f8
*) Automatic Reset of Stack Crawler DB on startup errors
...
See: http://www.yacy-forum.de/viewtopic.php?t=1432
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1045 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
fb766413d1
*) Changes on httpc dns caching
...
- Bugfix: old dns cache did not handle case insensitive hostnames correctly.
- adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
e.g. borg-300.dyndns.org
This can be done by setting the new httpc.nameCacheNoCachingPatterns property
- using httpc.dnsResolve wherever possible within the sourcecode
[httpd.java,plasmaCrawlStacker.java]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
bc420c62f6
fixed htcache path generation (never change a running system)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1041 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
dd24f0252f
*) Searchword highlighting for info page
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1036 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
72cde1d894
getCachePath: no logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1033 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
1fbd72f9e0
rename "index.html" to "ndx"
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1032 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
cd1107d85e
added support for URLs with '?&'
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1030 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
5fb2b017cb
small change
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1029 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
544e4ea90e
small change
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1027 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
00ab4d8723
cleaned, small change, Properties
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1026 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
b8ceb1ffde
*) Adding better https support for crawler
...
- solving problems with unkown certificates by implementing a dummy trust Manager
- adding https support to robots-parser
- Seed File can now be downloaded from https resources
- adapting plasmaHTCache.java to support https URLs properly
*) URL Normalization
- sub URLs are now normalized properly during indexing
- pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
- normalizing URLs which were received by a crawlOrder request
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
e3179a6394
added getOwnSeedFile()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1022 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
a803a509ae
bugfix: port handling in HTCache
...
grogram flow, cleared up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1021 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
hydrox
cb69047b91
*)cleanup access static methods and fields
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
hydrox
56b9f34411
*)removed unused imports
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5f68b6886b
introduced new url-hashes for better ranking computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1013 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
aadace1285
fixed network image in search performance monitor
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1012 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
bb369c98de
fixed search result ordering by date
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1011 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
b058ecf0bc
refactoring of image-generation; added experimental PNG encoder (not active now)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1008 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
d42531e1b2
added auto-reset for NURL-DBs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1004 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
allo
92c49b406b
adminAuth with userDB and adminAuthenticated (fix for statuspage)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1001 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
rramthun
27f180f24b
Update of YaWoStat to 0.2.
...
Now does not try to make 400000! operations to load a 4MB textfile :-/
Program is not finished yet.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1000 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
d656e2b433
added a memory-profile chart generation to database performance testing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@993 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
ec3af327f7
*) Bugfix for Proxy-Authentication against remote proxy
...
See: http://www.yacy-forum.de/viewtopic.php?p=11804#11804
*) Adding first version of db test for mysql
NOTES:
- db user + db + db table must be created before starting the test
- db table must be empty. Entries can not be updated at the moment
- db connection properties must be changed in the sourcecode at the moment
TODOs:
- accepting connection properties via command line
- implementing update + remove + read operations
- 'maybe' adding code to create db + table if it doesn't exists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@991 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5b0911d7ea
added new performance menu for search sequence configuration and monitoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@990 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
allo
ada06b0674
bugfix for Networkimage from Hydrox
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@986 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1aa4ba8b62
added post-search filtering of redundant urls (longer than existing cited)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@982 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
8d827cdb30
tried to fix problems with order of network list by last-seen (which could also improve the network picture)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@980 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
097009d910
experimental visualization of DHT access during global search (temporary)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@977 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
4dcbc26ef1
introduction of search profiles; very experimental
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
6c48c3ce39
*) Bugfix for ArithmeticException during IndexTransfer
...
See: http://www.yacy-forum.de/viewtopic.php?t=1362
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@974 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
525c8dcbd4
*) Adding Traffic Statistic for Crawler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@972 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
9a5ab62928
*) Adding yacy specific X-YACY-Index-Control header which can be used by clients
...
to disallow yacy to index the response that belongs to the request where
X-YACY-Index-Contro is set to "no-index"
*) Bugfix for Seed-List download via Remote Proxy.
Now the pragma and cache-control http headers of the request are properly set to "no-cache"
See: http://www.yacy-forum.de/viewtopic.php?p=11639#11639
*) Bugfix for http-Proxy
yacy has ignored "no-cache"- pragma and cache-control http headers that were send in requests.
Now, these request headers are evaluated properly
TODO: Missing evaluation of "no-store" request headers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@971 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
02d9af1a70
*) Restructuring and extending of Remote Proxy Support
...
- remote proxy configuration can now be "really" changed on the fly and takes effect immediately
- adding possibility to disable remote proxy usage for yacy->yacy communication
- adding possibility to disable remote proxy usage for ssl
- restructuring proxy configuration so that it is stored in a single place now
*) Adding possibility to import a foreign word DB (or even more of them in parallel)
at runtime into the peers DB
- this can be done by calling IndexImport_p.html
- ATTENTION: please not that at the moment this thread must be aborted via gui
before a normal server shutdown is done.
- TODO: integrating IndexImport Thread into normal server shutdown
- TODO: Adding posibility to import crawl-queues, etc. from foreign peers
- TODO: removing old import function from yacy.java and calling the new routines instead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
58b670201d
now, changed HTCacheSize needs no restart
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@961 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
40777556c5
*) Connection Tracking
...
- adding automatic refresh
- accepts new parameter nameLookup which can be used to deactivate
yacy-peer name lookup (because we have problems with this on large seed-dbs)
*) ViewFile
New page that can be used to view
- original content
- plain text content
- parsed content
- parsed sentences
of a webpage specified by there url hash
Mainly for debugging purpose at the moment
*) Robots.txt
Bugfix for if-modified-since usage
TODO: synchronization of downloads to avoid loading the same robots-file
multiple times in parallel by different threads
*) Shutdown
Better abortion of transferRWI and transferURL sessions on server shutdown
*) Status Page
Adding icon to start/stop crawling via status page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
rramthun
a98bafb939
Changes to german language file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@941 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
95abdeb685
*) Bugfix for nextElement function of URL Enumerator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@936 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
6260942590
changed search process: received indexes are now buffered and written to wordIndex after search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@934 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
7ee03acce0
new function cutUrlText added to shortens the URLs on IndexMonitor.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@931 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
bc56a88cc8
further refactoring of search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@925 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
d29dfb0a12
refactoring of search / preparation for better search methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@921 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
0ae166c522
*) Small changes to Index Transfer.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@919 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
461374e175
*) Restricting amount of files that yacy is allowed to open during index transfer/distribution
...
This option is configurable via config file and is set per default to 800
See: http://www.yacy-forum.de/viewtopic.php?p=11137#11137
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@918 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
c8a35a0130
*) Adding new connection tracking page (currently only for incoming connections)
...
*) Displaying statistic for incoming connections on status page
*) Bugfix for Loop-Access Bug when trying to access the yacy page while yacy is configured as proxy
See: http://www.yacy-forum.de/viewtopic.php?p=6826
*) Bugfix for Referer Bug
See: http://www.yacy-forum.de/viewtopic.php?p=11098#11098
*) Adding reverse Name lookup for yacy-domain names (used by the connection tracking page)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@916 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
b80b2fbdcc
crawling peers now produce waves in network graphic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@912 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
10d3627c90
changed word cache flush scheduling and removed possible locks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@910 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
839db8869c
added high/low priority for index adding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@899 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
1688be8590
*) plasmaSwitchboard.java
...
adding more verbose logging output for db initialization
*) httpdFileHandler.java
adding cache for servlet response methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@897 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e9eb5e4b56
refactoring of index-entity join methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@895 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
258fd9eb8e
adding missing file for websearch refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@894 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
77ae30063d
refactoring of websearch process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@893 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
579b22d8ff
small update to network drawing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@892 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
2b5829c3da
small fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@891 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
4c7918f5b5
added shotdown to crawl stacker (moved from 882)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@889 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
2851658c2a
re-integrated Martins last change to crawl stacker from svn 882 that I had deleted accidently
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@888 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
c83594528c
integrated crawl stacker into thread control
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@887 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
959eefbc4f
*) Robots.txt parser/ppt
...
cutting of comments at the line end
*) Adding Threadpool for stackCrawl Thread to speedup robots.txt download
and double url checks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@882 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
allo
f65c939a60
userDB Auth
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@874 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1a5d98cd6d
better imagePainter example and fix for typo http://www.yacy-forum.de/viewtopic.php?p=10920#10920
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@868 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
f6cf3967de
fix for compile-bug in svn 583 (Martin guck mal ob das richtig ist: fifo oder filo-stack?)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@854 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
a2fa75e688
*) Asynchronous queuing of crawl job URLs (stackCrawl)
...
various checks like the blacklist check or the robots.txt disallow check are now
done by a separate thread to unburden the indexer thread(s)
TODO: maybe we have to introduce a threadpool here if it turn out that this single
thread is a bottleneck because of the time consuming robots.txt downloads
*) improved index transfer
The index selection and transmission is done in parallel now to improve index
transfer performance.
TODO: maybe we could speed up performance by unsing multiple transmission threads in
parallel instead of only a single one.
*) gzip encoded post requests
it is now configureable if a gzip encoded post request should be send on
intex transfer/distribution
*) storage Peer (very experimentell and not optimized yet)
Now it's possible to send the result of the yacy indexer thread to a remote peer
istead of storing the indexed words locally.
This could be done by setting the property "storagePeerHash" in the yacy config file
- Please note that if the index transfer fails, the index ist stored locally.
- TODO: currently this index transfer is done by the indexer thread.
To seedup the indexer
a) this transmission should be done in parallel and
b) multiple chunks should be bundled and transfered together
*) general performance improvements
- better memory cleanup after http request processing has finished
- replacing some string concatenations with stringBuffers
- replacing BufferedInputStreams with serverByteBuffer
- replacing vectors with arraylists wherever possible
- replacing hashtables with hashmaps wherever possible
This was done because function calls to verctor or hashtable functions
take 3 time longer than calls to functions of arraylists or hashmaps.
TODO: we should take a look on the class serverObject which is inherited from hashmap
Do we realy need a synchronization for this class?
TODO: replace arraylists with linkedLists if random access to the list elements is not needed
*) Robots Parser supports if-modified-since downloads now
If the downloaded robots.txt file is older than 7 days the robots parser tries to
download the robots.txt with the if-modified-since header to avoid unnecessary downloads
if the file was not changed. Additionally the ETag header is used to detect changes.
*) Crawler: better handling of unsupported mimeTypes + FileExtension
*) Bugfix: plasmaWordIndexEntity was not closed correctly in
- query.java
- plasmaswitchboard.java
*) function minimizeUrlDB added to yacy.java
this function tests the current urlHashDB for unused urls
ATTENTION: please don't use this function at the moment because
it causes the wordIndexDB to flush all words into the
word directory!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
6d5d0ac801
bugfix for startup problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@850 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
0c3a20d44f
more + changed log for better understanding of outOfMemory bug and others
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@846 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
0fd9aa6c6e
*) Bugfix: supportedFileExt Function didn't detect the file extension correctly because of missing conversion to lower case
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@837 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
8a33c9b309
*) Bugfix: supportedFileExt Function didn't detect the file extension correctly if there was a dot
...
in one of the parent directories of the file.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@836 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
28c5687ff9
*) Bugfix for "download of non supported file content" via crawler
...
See: http://www.yacy-forum.de/viewtopic.php?p=10724#10724
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@835 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
2b3f964037
*) Bugfix: supportedFileExt Function didn't chop http parameters before trying to detect the file extension
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@834 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
allo
ff1d3d0680
Init of userDB
...
Pagelayout of User_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@822 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
9c4306e41e
fixed problem with htcache path
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@811 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
1669eaaa1a
fixed svn 805
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@807 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
ca82d690a9
changed in SVN 805 one line too much
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@806 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
4bb1f849a0
Bugfix for http://www.yacy-forum.de/viewtopic.php?t=1233
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@805 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
2c7b490e30
memory-logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@804 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
7fc822a59b
changed handling of time-zones
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
9b7f37fc37
*) Minor changes
...
- more debugging output: storageTime for indexed document is logged now
- saving memory in plasmaParserDocument.java, plasmaWordIndexEntryContainer.java (not a big deal)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@798 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
b5a8992d29
*) Setting some object fields to final
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@796 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
023be89586
*) Bugfix for "Robots.txt wird immer wieder geladen"
...
See: http://www.yacy-forum.de/viewtopic.php?p=10241#10233
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@794 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
35c6c5ead7
*) Bugfix for "Blacklist und Crawlen" Bug.
...
: Crawling continues even if URL is listed in Blacklist
See: http://www.yacy-forum.de/viewtopic.php?p=10279#10279
- missing return statement added. Thanks to allo for the
code review.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@793 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
9e2fc7e5fe
load balancing of crawl target domains
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@791 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3fcc95a82c
integrated crawl-profiles db in memory-performance monitor
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@788 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
fe6a6abc0b
*) Adding robots.txt db to Performance Settings for Memory menue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@785 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
3274ae725e
increased cache size of robots database; however, this should be integrated into new memory control
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@784 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
c6d2f50375
changed order of robots and double-check
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@783 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
68d5ff2ef1
added stringbuffer in condenser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@782 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
495bc8bec6
removed cache-control from low and medium priority caches which reduces memory use and computation overhead
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@774 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
18d9e1a256
fix for http://www.yacy-forum.de/viewtopic.php?p=10026#10026
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@768 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
07f30931ec
various configuration options in memory performance
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@763 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
b990dc1ad1
*) Replacing jsch 0.1.19 lib with newer version 0.1.21
...
*) Replacing PDFBox 0.7.1 lib with newer version 0.7.2
*) Refactoring of classes httpd/httpc/httpHeaders to
make many methods for httpHeader/Requestline parsing
reusable for new icap implementation
*) adding chunked input stream support
- needed by new icap implementation
- needed by future httpc HTTP/1.1 support
*) httpd.java
- moving all connection property contants to class httpHeader
- moving readHeader function to class httpHeader
- moving parseQuery function to class httpHeader
- moving handleTransparentProxy function to class httpHeader
*) httpHeader.java
- adding new fuction to parse the http response line
- adding new function to converte http headers to a string that
can be send to the client
- adding a function that generates a proper url using all parsed
connection properties
*) ICAP Support
- yacy now supports handling of icap response modification requests
- this feature can be used by other icap enabled proxies to contact
yacy as icap server, and to handover the downloaded content to yacy.logging
for indexing
- functionality was successfully tested with squid 2.5Stable 10 + icap patch
- further icap services e.g. URL filtering based on yacy's blacklists are possible
*) plasmaSwitchboard.java
- htcache entries that are still needed for indexing are now properly registered
as in use after system restart
- extended logging: log message now shows parsing and indexing time for each sb. entry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
6d1de8abfd
finals; cleaned;
...
Properties;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@756 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
14bc880fa4
fixed bug with crashed profile database
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@753 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
71a31f0902
integrated and extended new memory performance menu; found and fixed bug in DHT caching
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@752 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
fb52a82008
added new performance page for memory settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@751 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
cddd9aaa33
fixed SERIOUS bug with kelondroStack; affected all stack processing since 729
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@732 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
416c126815
fix for a profile = null problem and new monitor in crawl queue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@730 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
2148c0cf49
replaced kelondro storage core; much less objects in kelondro cache now; less IO from DB
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@724 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
beefddf0e8
*) Adding option which allows to do a Index-Transfer without deletion of index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@722 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
rramthun
4036ee812a
Updated german language file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@721 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
40925f4fb7
*) Improving complete index transfer performance by automatically increasing size of transfered word chunk
...
for fast connections (much similar to normal dht behavior)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@719 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
91ab4d044b
*) Adding automatic retry functionality to complete index transfer function
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@718 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
a62677f761
*) Adding additional logging output for complete index transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@717 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
b991d2e7dd
*) Additional logging message for complete index transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@712 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
3c00c5f6c7
*) Complete Index Transfer
...
See: http://www.yacy-forum.de/viewtopic.php?p=9622
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@711 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
2cb084d426
*) Complete Index Transfer
...
See: http://www.yacy-forum.de/viewtopic.php?p=9622
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@707 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
d1de71e9f6
*) Suppress stacktrace on proxy error for "No route to host Exception"
...
See: http://www.yacy-forum.de/viewtopic.php?t=1153
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@704 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
56160cbd01
*) Bugfix for "YaCy verzählt sich ..." Bug.
...
See: http://www.yacy-forum.de/viewtopic.php?p=9559
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@701 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
43b42854a0
fix for null-entries and http://www.yacy-forum.de/viewtopic.php?p=8649
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@699 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
3587407039
*) Fixing problems of list operation if index and queue size are both 0.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@687 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
51b48a10e8
*) Suppress stacktrace on proxy error for "ValidatorException: No trusted certificate found"
...
See: http://www.yacy-forum.de/viewtopic.php?t=1110
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@686 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
7fe8784231
*) URLs pointing to a server having a private ip addess will not be indexed anymore
...
See: http://www.yacy-forum.de/viewtopic.php?p=9408
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@682 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
0aafb83edc
*) Bugfix for robots.txt isDisallowed Check.
...
Setting path to "/" if it is null or empty.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@677 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
8260128ee9
changed getFreeSize();
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@675 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
f8ad65eae1
*) First trial implementation of robots.txt support
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@674 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
0a57fbcde5
Added new HashSet filesInUse;
...
Added new Function getFreeSize();
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@672 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
8cd6a52dd0
Convention
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@671 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
c0e3d18bbf
*) remove import java.lang
...
*) Added Super()
*) replaced startsWith()
*) cleaned
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@670 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
b1cd1fa917
cleaned
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@669 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
da9c6857fb
*) changed a misunderstand, no BUG ;)
...
*) finals and other
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@668 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
fbac053c03
small change
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@665 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
578f36ae18
*) Speedup of indexer. Proxy files will not be enqueued by the cachemanager
...
into the sb-queue anymore if the mimeType or fileExtension is not supported
by the installed parsers.
- Advantage: Avoiding unnecessary enqueueing and dequeueing from queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@664 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
1219ef99f0
*) Bugfix for NullpointerException in yacyDebugMode Init
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@663 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
6c722706b7
*) Moving yacyDebugMode intialization to switchboard
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@660 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
4e07828807
*) httpdProxyHandler.java
...
- harmonizing proxy exception handling
- adding malformed URL + blacklist check for http head method
- adding malformed URL check to http post method
- chunked encoding is now not used anymore for http post if clients
are http/0.9 or http/1.0 clients (same behaviour as already implemented for get)
- now an exception will be thrown on internal httpc errors to force an error output
to the client or a connection close. This should help to fix the "binary data in browser window" bug
*) plasmaSwitchboard.java
- fixing the following Bug
E 2005/09/03 18:02:42 PLASMA Could not index URL http://mis04.de/FAIL/snot.php : null
java.lang.NullPointerException
at de.anomic.plasma.plasmaSwitchboard.processResourceStack(plasmaSwitchboard.java:1000)
at de.anomic.plasma.plasmaSwitchboard.deQueue(plasmaSwitchboard.java:625)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at de.anomic.server.serverInstantThread.job(serverInstantThread.java:95)
at de.anomic.server.serverAbstractThread.run(serverAbstractThread.java:243)
This bug could occure if the cached responseHeader is null
- getting the mimeType now from the parsed document instead of the responseHeader because the
mimeType could have been changed during content parsing (e.g. because of the mimetypeParser)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@656 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
81cb8feb15
back to 649 :/
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@651 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
5194511e8e
*) attempt to find bug
...
See: http://www.yacy-forum.de/viewtopic.php?t=1121
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@650 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
6991b9e2b9
*) Suppress stacktrace on crawler error for "Connection reset"
...
See: http://www.yacy-forum.de/viewtopic.php?p=9071
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@645 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
a47f9238fe
*) Blacklist is now also used by the crawler
...
See: http://www.yacy-forum.de/viewtopic.php?t=1069
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@642 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
dc0a2d4c11
*) Bugfix for Loader Queue:
...
Job count was not displayed correctly
*) IndexingQueue:
- now it's possible to delete single entries from the queue
- now it's possible to clear the whole queue
See: http://www.yacy-forum.de/viewtopic.php?t=995
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@641 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
732a107160
*) Bugfix for "-UNRESOLVED_PATTERN-" Bug on IndexCreateWWWLocalQueue_p.html and "urlEntry.url() == null" Bug
...
- Logging message for "urlEntry.url() == null" is now displayed as info
- IndexCreateWWWLocalQueue_p.html now detects null entries while looping throug the list and removes them automatically
See:
- http://www.yacy-forum.de/viewtopic.php?t=532#8781
- http://www.yacy-forum.de/viewtopic.php?t=639
- http://www.yacy-forum.de/viewtopic.php?t=1071
- http://www.yacy-forum.de/viewtopic.php?t=338
- http://www.yacy-forum.de/viewtopic.php?t=980
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@640 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
33aaffbfc6
*) Displaying content size of each entry in indexing queue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@639 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
7626823519
BUGFIX for last 'commit'
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@635 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
971756e8dd
the delete size is smaller
...
See: http://www.yacy-forum.de/viewtopic.php?t=1084
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@634 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
0471019606
*) IndexCreateIndexingQueue_p.html now also shows indexing jobs that are currently in process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@633 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
cc493ef8c1
Added change from Hermes
...
See: http://www.yacy-forum.de/viewtopic.php?t=1050
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@629 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
bead8a32aa
*) IndexCreate_p.java:
...
Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url
*) kelondroStack.java, plasmaSwitchboardQueue.java
Adding method which returns a list of all entries in the queue. This list is used by IndexCreate_p.java
instead of an iterator to display the indexing-list.
Advantages: avoid concurrent modifications of the list while displaying it.
Speedup because now we have to access only one sync function instead of multiple ones
(one for each entry)
*) IndexCreateIndexingQueue_p.java
Using new list() function of plasmaSwitchboardQueue
*) httpdFileHandler.java
If a servelet returns the special value "LOCATION" the httpFileHandler does a Redirection of
the Browser to the URL specified by the servelet. This can e.g. be used when a http get request is
used insead of a post request, but a refresh should not be allowed.
*) IndexCreateWWWLocalQueue_p.html
Now it's possible to delete single entries of the local crawler queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@626 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
48aaf703cc
*) Adding additional logging output to detect crawling problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@625 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
59b8a98c7e
*) Bugfix for suppressing of stacktrace in log on crawler error "MalformedURLException"
...
See: http://www.yacy-forum.de/viewtopic.php?p=8840
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@623 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
c1d7527929
better cache cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@621 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
2e6df95786
*) adding toString method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@620 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
4fd5b95b1f
*) Renaming Logger function names to reflect the proper Java Logging API Loglevels
...
- please use logFine instead of logDebug
- please use logSevere instead of logFailure and logError
See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
6adf8a4bde
*) Renaming Logger function names to reflect the proper Java Logging API Loglevels
...
- please use logFine instead of logDebug
- please use logFailure instead of logError
See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
f19c09b227
*) Suppress stacktrace on crawler error for "MalformedURLException"
...
See: http://www.yacy-forum.de/viewtopic.php?p=8733#8733
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@613 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
cc1df08069
*) Adding missing synchronized blocks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@608 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
borg-0300
bf14e6def5
*) proxyCache, proxyCacheSize can be changed under 'Proxy Indexing'
...
- path now are absolute
*) move path check from plasmaHTCache to plasmaSwitchboard
- only one path check when starting
*) small other
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@606 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
9b818b1ce3
*) Pausing Crawlers if there is not enough space on disk
...
See: http://www.yacy-forum.de/viewtopic.php?p=8648
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@603 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
b33094e925
*) Trying to solve "Too many open files bug"
...
*) Temp.Bugfix for "Bug in Index Restore"
See: http://www.yacy-forum.de/viewtopic.php?p=8647#8647
Orbiter: Please take a look
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@602 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
34790acf02
*) Bugfix for suppressing of stacktrace in log on crawler error "unknown host"
...
See: http://www.yacy-forum.de/viewtopic.php?p=8615#8615
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@600 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
af7b8f75bd
*) Making proxyAccessLogging configureable via yacy.logging file
...
- logging can be disabled now
- logging directory / filelimit / rotation count can be configured now
See: http://www.yacy-forum.de/viewtopic.php?t=965&postdays=0&postorder=asc&start=30#8280
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@595 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
2a081c9ee5
*) Adding additional logging message for "NURL.entry() == null" Bug
...
See: http://www.yacy-forum.de/viewtopic.php?p=8446
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@591 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
cb1f11c96b
*) Suppress stacktrace on crawler error for "Unknown Host"
...
See: http://www.yacy-forum.de/viewtopic.php?p=8431
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@590 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
e338a13de3
*) Suppress stacktrace on crawler error for "Read timed out"
...
See: http://www.yacy-forum.de/viewtopic.php?p=8433
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@589 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
2e43e744de
*) Suppress stacktrace on crawler error for "connect timed out"
...
See: http://www.yacy-forum.de/viewtopic.php?p=8420
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@588 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
36cbe04e3e
*) Bugfix for Crawler Redirection Bug
...
See: http://www.yacy-forum.de/viewtopic.php?p=8422
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@587 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
b70de495a0
*) Remembering Crawler-isPaused setting
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@586 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
e569a84dc0
*) Using the same configuration settings for all indexing threads on server Startup
...
See: http://www.yacy-forum.de/viewtopic.php?p=8349
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@584 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
17be77a468
*) Bugfix for "Crawler data will not be removed from htcache if content parsing failed"
...
See: http://www.yacy-forum.de/viewtopic.php?t=965&highlight=ramdisk
*) Making ACCEPT_LANGUAGE configureable for crawler
See: http://www.yacy-forum.de/viewtopic.php?p=8327
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@583 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
5f55dff297
*) Bugfix for "Binäre Nullen auf der page: Index Creation: Indexing Queue"
...
See: http://www.yacy-forum.de/viewtopic.php?p=6877#6877
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@577 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
allo
eb6365c069
local Bootstrapping bug.
...
use yacyDebugMode=true to allow local bootstrapping
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@572 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
330eae7cf3
*) Normalizing CrawlerStartURL now before crawling is started
...
*) CrawlWorker also does a URL normalization now before following the redirection URL
*) CrawlWorker removes redirection URL correctly from noticeURL stack now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@571 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
ab894d26bc
*) Bugfix for "plasmaSwitchboard.deQueue: null" Bug (hopefully)
...
See: http://www.yacy-forum.de/viewtopic.php?p=8135#8135
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@570 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
eaf9f26cc3
*) Bugfix for NULL PROFILE HANDLE 'null' Bug:
...
See: http://www.yacy-forum.de/viewtopic.php?p=7855#7855
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@569 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
rramthun
4cb382decb
Adding changes by borg-0300 from http://www.yacy-forum.de/viewtopic.php?t=997
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@565 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
ec4c70d722
*) If there are at most 10 entries left while doing an index transfer, these entries will also be appended
...
to the index list
|> D 2005/08/18 10:00:02 PLASMA Selected partial index (33 from 37 URLs, 0 not bound) for word fSuQM0xAJK1G
See: http://www.yacy-forum.de/viewtopic.php?t=970
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@556 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
d4a045d7b1
*) Trying to solve "de.anomic.plasma.plasmaSwitchboard.deQueue': null" Bug
...
See: http://www.yacy-forum.de/viewtopic.php?p=7791
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@555 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
ea9a992f05
*) Before the crawler retries to download a URL it checks if the server is already doing a shutdown
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@554 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
ea26b84eed
*) Bugfix for http://www.yacy-forum.de/viewtopic.php?t=954
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@553 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
0c8a48e2cb
*) converting php Session ID to lower case in funktion isCGI
...
See: http://www.yacy-forum.de/viewtopic.php?p=7671#7671
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@552 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
e616395c3b
latest changes and cut for 0.40
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@548 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
c47bb1182d
bugfix for assortment initialization error
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@547 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
theli
4654eae4e2
*) adding php Session ID to argument in funktion isCGI
...
See: http://www.yacy-forum.de/viewtopic.php?p=7671#7671
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@546 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
25f632dbd9
more DHT bugfixes and better logging of DHT effects
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@542 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
5cb00889d9
enhancements to dht selection, search and search presentation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@540 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago
orbiter
ba0a486328
moved printStackTrace() to logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@539 6c8d7289-2bf4-0310-a012-ef5d649a1542
20 years ago