orbiter
35a9e8f307
- fixed network graphic
...
- debuged evaluation tables
- changed cache settings in template engine
- some speed hacks
- changed int angles for peer positions in network graphic to double angles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8124 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
Al Sutton
8993cac4d8
Initial performance improvements
13 years ago
orbiter
8895d8c1cd
removed unnecessary log entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8117 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
77a080ced9
smaller fixes for YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8105 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5a55397f99
some last-minute performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
dd1482aaf5
further update to YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8100 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c584db991f
creating a bookmark from the search results now works again .. with new YMarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8092 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
564374d1fe
- included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
...
- reworked bookmark creation on crawlstart
- many smaller adjustments to ymarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8072 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c93f10417a
add a bookmark automatically each time a new crawl is started
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8063 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e4a82ddd8b
produce a bookmark entry from every crawl start. these bookmarks are always private.
...
these bookmarks will be used to get a source reference for the search in case of intranet or portal searches.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8062 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
6287c2b4a9
YMarks:
...
- introduced tag manager - a quite powerful tool (still not 100% stable, so be careful)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8060 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
cominch
2236e01137
Minor correction to prevent useless comma at beginning of string, created from list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8059 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
5581be12fb
YMarks:
...
- added backend and api for tag management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8058 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
a3eebfdcba
YMarks:
...
- show active/running crawls
- execute crawls (works currently only if API entry is available)
- various smaller fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8056 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c50f8f9a06
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8055 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
4f95f72124
YMarks:
...
- working direct importer for YaCy Crawl Starts
- working direct import for old bookmarks.db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8052 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
aa322bc6d0
fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8050 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
97d1347adb
added also a default accept field to robots.txt downloads
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8049 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
f183d3822c
added a default accept header in http requests since some http fraud detection functions check that this header field exist
...
see also: http://bad-behavior.ioerror.us/ in source file browser.inc.php
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8048 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
06352b8d6b
more logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8047 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a99934226e
more logging for debugging of robots.txt
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8046 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
7a5841e061
fix for robot parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8045 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
458c20ff72
fix for robot parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8044 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
017a01714d
- enhanced logging in robots.txt parser for remote debugging
...
- robots.txt is now more robust against database operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8043 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
a8dfe787ed
- updated to jquery flexigrid 1.1
...
- YMarks.html automatically recognizes if a bookmark is a crawl start
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8040 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
eb1c7c041d
write info about robots.txt evaluation into getpageinfo_p.xml
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8038 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
abba31f02e
- bugfix for correctly sorting ymarks
...
- some tuning for the autotagger (still not perfect)
- /api/ymarks/get_metadata.xml now provides info for crawlstarts
- removed unused code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8036 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
775b44017e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8033 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
apfelmaennchen
5f7dbe1c42
- some refactoring (ymarks)
...
- improvement for autotagger (is now able to create/detect multi word tags e.g. 'open source')
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8031 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
78ce3b13be
typo
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
85d6bf4ac4
fixed urls to media content during indexing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8021 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0d858d48ec
replaced String with StringBuilder in suggestion process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3a807e10cf
- added a cache for active crawl profiles to the crawl switchboard
...
- moved the domain cache for domain counter from the crawl switchboard to the crawl profiles. the crawl domain counter is now therefore relative for each crawl start, not for the whole crawler.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8018 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
37e35f2741
normalization of url using urlencoding/decoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8017 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
1b86d06d1e
fix for http://bugs.yacy.net/view.php?id=62
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8004 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
9e4875230f
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8001 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a9838f8b99
fix for http://bugs.yacy.net/view.php?id=59
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7997 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
a7df70221e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
cf4fd525ee
added directDocByURL attribute in crawl profile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c61e4cfd78
- fix for incomplete clear() in balancer
...
- renamed Parser Errors to Rejected URLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7984 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
813f297a95
another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
035ebfbf3b
- performance hacks (should affect the crawl balancer and reduce CPU load during crawl stack re-fill)
...
- this may have also (good) performance side effects on other parts of YaCy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7982 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
b250e6466d
implemented crawl restrictions for IP pattern and country lists
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
e207c41c8e
* fix urlproxy for urls containing dolar signs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7979 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
5ad7f9612b
added crawl settings for three new filters for each crawl:
...
must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue)
must-not-match for IPs
must-match against a list of country codes (allows only loading from hosts that are hostet in given countries)
note: the settings and input environment is there with that commit, but the values are not yet evaluated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
low012
42b5f09f68
*) this should fix a bug in snippet creation (also cleaned up a little bit)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7972 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6b22865dbc
- removed some warinings
...
- removed a dead update location
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
0c6d95e57b
- more tolerance against failure of table opening
...
- more connections for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7968 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
4f31869c5a
enhanced search result timing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7966 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
6b02b696b0
- add number of search results to end of rss and json output to reflect latest status of retrieval
...
- distinguish search access with different verify state in access of search cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7965 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
87e6abd168
* fix urls containing a port number in urlproxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7964 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
97045022fa
* pass cookies to Server Side Includes
...
* User.html a bit more usable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7963 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
ce2a76d603
performance hack for search process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7961 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c4a672fe2
bugfixes and performance hacks for tabe index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7957 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dad5b586a4
added a concurrent warmin-up of Table data structures. that should speed-up the start-up process but may also cause stronger CPU load at that time.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7956 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
734059d33e
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
23e81b28b2
synchronization enhancements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7954 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
dd4635e323
patches
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7953 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
bb0c045036
fix for problem with relocation of network
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7944 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
85a5487d6d
YaCy can now use the solr index to compute text snippets. This makes search result preparation MUCH faster because no document fetching and parsing is necessary any more.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7943 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
52a2b3f110
try to fix bug http://bugs.yacy.net/view.php?id=26
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7941 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2cba860693
- fix for wrong entries in NOLOAD indexing queue (that caused that urls had been only indexed based on their url and not loaded)
...
- patch for better urls to solr admin interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7938 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
cec3836e73
added reference limitation to IndexControlRWIs_p.html servlet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7936 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
49e5ca579f
added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
41e146116a
fixes size of document in case the server doesn't give the size in the header
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7929 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e1a3d609aa
moved merger object from Segment to IndexCell to enable a correct shutdown sequence. This solves a bug where yacy cannot be shut down during an index merge that appears during the shutdown phase.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7924 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
2cf61a40ce
fixed a bug from 7856, where Snippet returned an error by mistake when Metadata was found
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7921 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
610b01e1c3
- added a 'add every media object linked in a html document as a new document' to the html parser. This causes that all image, app, video or audio file that is linked in a html file is added as document. In fact that means that parsing a single html document may cause that a number of documents is inserted into the search index.
...
- some refactoring for mime type discovery
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7919 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3da21c4266
protection against starting of a (second) yacy peer while another one is already running on the same port
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7917 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
3e6767d66c
limitation of reference evaluation (protection against crawler pits)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7902 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
2c595a6a47
added new methods to count the number of objects in RWIs. lots of refactoring was necessary to introduce new Rating class and to unify naming of methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7896 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
9f9f634de2
fix in search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7894 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
5f8a5ca32d
- not doing merge-jobs while short on Memory
...
- using configuration-values of crawling-max-filesize also for snippetfetching and loading files into Index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7893 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
22d69a6368
refactoring in cora: added sorting package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
51cf697acd
refactoring: moved all score-related classes to new ranking package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
169236c6d9
almost revert changes in this class of 7880 and 7882
...
since MemoryControl does handle negative value requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7887 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
4fec99115b
Implementation of strategies for controlling memory resources.
...
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
c64faf41e2
addon to svn 7880
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7882 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
06408a9428
since many POST-requests come as gzip they report a contentlength of -1
...
request memory of -1 * 3 look useless to me
so I added some megs to it - even correct report of contentlength should not be harmed by this
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7880 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
594d8f546a
#cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7866 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
eb14111200
encapsulate potential expensive objects in TextSnippet to allow GC them asap
...
this reduces chance of OOMs at massive search & snippet-fetching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7865 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
e3fc1efbef
performance hack and ensuring termination in serverAccessTracker. cause:
...
"Session_:53600#0_POST /yacy/hello.html HTTP/1.1" prio=10 tid=0x2322b000 nid=0x3ba7 runnable [0x03d3e000]
java.lang.Thread.State: RUNNABLE
at java.lang.Long.valueOf(Long.java:557)
at de.anomic.server.serverAccessTracker.clearTooOldAccess(serverAccessTracker.java:113)
at de.anomic.server.serverAccessTracker.cleanupAccessTracker(serverAccessTracker.java:75)
- locked <0x3bda2ae8> (a de.anomic.server.serverAccessTracker)
at de.anomic.server.serverAccessTracker.track(serverAccessTracker.java:125)
at de.anomic.server.serverSwitch.track(serverSwitch.java:542)
at de.anomic.http.server.HTTPDemon.parseRequestLine(HTTPDemon.java:641)
at de.anomic.http.server.HTTPDemon.POST(HTTPDemon.java:491)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.anomic.server.serverCore$Session.listen(serverCore.java:757)
at de.anomic.server.serverCore$Session.run(serverCore.java:651)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7862 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
orbiter
44d74f8f89
performance hacks for seed generation (because thread dumps showed multiple occurrences at these code points)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7861 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
5cd07d7f84
early freeing resources on deleting index reference if search-verification fails (aka Switchboard.cleanupJob)
...
doing same thingy on other methods of touched files as well
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7860 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
a311596881
finishing up my commits (7855-7858) which could be helpful for
...
not declaring inside loops (helps GC of some VMs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7859 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
c0caca57e3
stoping thread for fetching searchresults if running short on memory
...
- in most cases at least one thread stays alive for getting the results
- fewer threads should do the work with less resouces, but much slower then
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7857 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
ce248cc8dd
less byte-arrays of response-content, less byte-array <-> stream conversation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7856 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
sixcooler
59b767eebd
stop loading via http at defined maximum of bytes - even size is unknown before loading
...
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
13 years ago
f1ori
3a5fa73008
* revert parts of previous commit, because it breaks the trickle-feature
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7851 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
6e79675ff3
* use gzip-encoding in more cases
...
* send Expire-Header for static content
* should improve webserver-performance for slow connections
* fixes #37
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7850 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
aff875baef
smaler ping-entry @ ProfilingGraph
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7841 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1912d0cccc
changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7840 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
be15874be1
added request line in http which can support better debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7838 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
11dc653de3
added a visualization of peer pings to the performance graphic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3a191cdf14
because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7836 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
cominch
09bb7a390c
do not replace malformed or invalid URLs in urlproxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7835 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
768c59740c
- replaced solrj 3.1 with solrj 3.3
...
- updated also slf4j
- added authentication for solrj
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
low012
c7b95e8c81
*) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly.
...
*) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore.
*) Cleaned up a little bit.
*) Added some comments.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
719777b2a7
replaced method to call getUsableSpace using reflection with direct call since we now use java 1.6
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7821 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2d4bb139d3
- added counting of links with noindex tag for solr index
...
- bugfixes for solr index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
892caccdca
added default configuration in ConfigurationSet in case of new values
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7814 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bda3eec0ff
added parsing of canonical link element to html parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b6f09a475d
- added an index profile editor in the /indexFederated_p.html servlet for solr indexes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
a17351dcfe
* navigation bar for filetype constraints
...
javascript interpreted backslashes from urlmask as escaping and didn't forward them to yacy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7806 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
96957375cc
* fix url proxy for relative links and chromium
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7805 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
9ebc75db4b
fix for channel authorization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7803 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
6d9e5865ee
faster appearance of search result page (but complete search time is the same)
...
this was inspired by http://bugs.yacy.net/view.php?id=37
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7801 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f7ca84cfc0
enhanced template engine
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7800 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
84c9658644
added a file type navigator
...
added a protocol navigator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
31283ecd07
- added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
...
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4b425ffdd2
fix for http://bugs.yacy.net/view.php?id=41
...
added another RSS channel "PROXY". the rss feed for peer news filters this channel if there is not an authorized access on that channel
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7792 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
7db208c992
performance hacks: more pre-allocated StringBuilder
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7790 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
87bd559c42
fixed warning
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7789 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
f30d36b101
enhanced template engine
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7783 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
115abc8917
- more attributes for search progress bar
...
- moved cache strategy to cora package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
sixcooler
7bfa6bb4b6
prevent getting a yacySeed from zero-length-hash-string by chance
...
(for eg.: proxy-crawls got displayed as initiated by some other peer)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7776 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bce280a308
update on options for interface graphics
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7775 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
2683162ec5
- added more options to access grid picture, web structure picture and network graphics
...
- remove test class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7770 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0c1b29f3c9
- applied many small performance hacks
...
- added a memory limitation in the zip parser and the pdf parser
- added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager.
- added a search cache deletion process that removes search requests in case that throttling happens
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7766 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
900dacbf97
* improve link rewriting in proxy-url
...
* only rewrites links, which are in current search domain
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7765 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
dc855d881b
* further improve proxyurl
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7762 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
a7a6b392f5
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7760 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fe0c08455b
more concurrency (enhancement) hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7759 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0e9a99cb05
another resource hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7758 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
535b6b953c
more hacks to omit superfluous string object allocation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7757 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
87082f407e
less String object creation during search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7756 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
ab5a16b957
lesse memory occupation during ranking and faster host navigator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7755 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1489ebeedf
one more hack to free ram for search events
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7753 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
ddcc333acc
* fix negative result counts
...
results sorted out by add to RankingProcess were counted in
sortedout-counter, but were not added to remote_indexCount nor
local_indexCount
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7749 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
fa734bdf9f
better memory protection in search logger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7748 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
dbea40d536
- changed snippet fetch strategy logic: do not check if entry is in cache. This should reduce IO load on the HTCACHE which is a showstopper during large number of search requests
...
- forced a possible short memory status when a search is started to flush caches that may cause search-heaps with resource contention effects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7747 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
4bea3f9714
hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
...
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
746e3c3b06
Replaced a widely-used Property Object in the httpd with HashMap<String, Object> which is not synchronized like Properties
...
A synchronization is not needed here and applies an overhead to the httpd process which is now removed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7745 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
f1ori
14e1666b21
* fix replacing regexes in url proxy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7742 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e28bd0d038
fix for some possible causes of memory leaks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
09ba6814c0
- non-blocking word hash computation with dynamic digest object generation (this was important!)
...
- (very) small performance enhancement in did-you-mean
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7740 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
10e2f588f8
- enhanced ybr ranking computation
...
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
bd55dcee50
- commented out experimental distributed ranking loading
...
- less threads for blocking threads
- disable all threads for DHT transmission for networks with zero peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7737 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
d1dbbd956a
always use a template method cache even if the template cache flag is set to false. This flag is only used to make dynamic updates to the template files, to not dynamic updates to the rewrite methods (which is not possible without recompiling). low memory usage is guaranteed by the usage of soft references which are dropped before an OOM is thrown
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7735 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0d040ff6bb
fix for bug 0000036: no crawling of https pages
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7734 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
3ed4a09368
small features, some bug fixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
e55c254f7b
enhanced logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7732 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
b45701d20f
this is a re-implementation of the YaCy Block Rank feature
...
This time it works like this:
- each peer provides its ranking information using the yacy/idx.json servlet
- peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob
- this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable
- I computed new ranking tables as part of the distribition and commit it here also
- the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers
- a recursive block rank refinement is implemented but disabled at this point. it needs more testing
Please play around with the ranking settings and see if this helped to make search results better.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
021840e5ba
removed (almost) deadlocks and unnecessary CPU load
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7726 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
123375bfba
added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
...
This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling
http://localhost:8090/yacy/idx.json?object=host
This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces.
The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform.
To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
1d8b0f74f4
one more fix for SVN 7713
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7716 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
0960261769
fix for svn 7713
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7715 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
5b579e21a3
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7713 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago
orbiter
039126cfaf
better handling of on/off switched solr indexing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7709 6c8d7289-2bf4-0310-a012-ef5d649a1542
14 years ago