orbiter
154bbc3364
code cleanup: call of static methods directly to the class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6155 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
222850414e
simplification of the code: removed unused classes, methods and variables
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6154 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
93dfb51fd4
problems with code style
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6153 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
adf01c676e
reduce lookup time when merging a large number of BLOBs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6152 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9a674d8047
- After the removal of the Tree class some code simplifications are possible. This affects mostly the Records class, which can be refactored and the result of the refactoring results in a reduced number of classes.
...
- The EcoTable was renamed to Table.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6151 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c5122d6836
completed migration of BLOBTree to BLOBHeaps:
...
- removed migration code
- removed BLOBTree
after the removal of the BLOBTree, a lot of dead code appeared:
- removed dead code that was needed for BLOBTree
Some more classes may have not much use any more after the removal of BLOBTree, but still have some component that are needed elsewhere. Additional Refactoring steps are needed to clean up dependencies and then more code may appear that is unused and can be removed as well.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6150 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d1083a6913
maybe we have less problems with open connections to the server if we don't do BF forced sleeps (just a test)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6149 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
ebe6c823ac
*) changed svn properties agains (hopefully doing it right this time)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6147 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
a80ac3a415
*) fixed wrong parser descriptions
...
*) changed svn properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6146 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
457b6c0d6d
*) updated Apache POI library to be able to parse Visio files
...
*) updated PPT and XLS parsers to use new Apache POI library
*) added new Visio (VSD) parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6145 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
a10c8022d1
DidYouMean:
...
- limit the number of consumer threads to available CPUs
- added some javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6144 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
7eb3bff5b3
* workaround for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2220&hilit=#p16128
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6143 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
99fa265e1d
fix for search bug caused by tenant patch
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6125 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
79875782af
be a bit more lazy when removing domain navigation entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6120 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
57af311627
fix for wrong urls in navigator when a tenant is used
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6119 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
76b96337e2
just some chatty code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6118 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
91785d895c
*) minor changes in comments
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6109 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bdda140c02
fix for json output (no doubleqotes any more, doublequote quoting did not work)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6105 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2f84736120
ignore signature files that cannot be downloaded because of failed encoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6103 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
041d9c253e
some refactoring and more error-awareness in LogalizeHandler
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6102 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6b307d6d59
more tolerance for corrupted index entries in exported row sets
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6099 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
33aafa9b4b
better logging when writing merged dumps
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6098 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
db70badcf0
possibility to set remote host on upnp device
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6097 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4d29e90708
uaeh
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6096 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3c3e6499ae
added more logging for merge operation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6095 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
15180fc95e
- patch for future computation in SplitTable
...
- added same concurrent process for has() from SPlitTable in ArrayStack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6093 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9a5ec20b3c
avoid merge during startup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6092 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
bf6b92343c
try to avoid stuck pdf parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6091 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
c695c7f512
try to remove hung swf parser from queue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6090 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fc69a76197
update to web structure picture:
...
- allow bigger size
- better instructions for api usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6089 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ae015e8e98
refactoring of blob package classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6088 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8b8877c233
moved image collector
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6087 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
be1c7ddc64
refactoring of search classes -- moved Ranking Profile to search package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6086 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fd31a3616a
- more logging in server process
...
- fix for bas ascii in comment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6084 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5a7fd6b4c8
just some comment lines
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6081 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
31f60a3b3e
when doing searches, also apply a online caution to DHT transmission and stop transmissions while heavy load caused by searching. This omits the many requests to the URL database that are needed for DHT transfer and it avoids collisions with URL retrieval needed for search results.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6080 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
17dc6d4be5
small fix for new Logger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6079 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ce1adf9955
serialized all logging using concurrency:
...
high-performance search query situations as seen in yacy-metager integration showed deadlock situation caused by synchronization effects inside of sun.java code. It appears that the logger is not completely safe against deadlock situations in concurrent calls of the logger. One possible solution would be a outside-synchronization with 'synchronized' statements, but that would further apply blocking on all high-efficient methods that call the logger. It is much better to do a non-blocking hand-over of logging lines and work off log entries with a concurrent log writer. This also disconnects IO operations from logging, which can also cause IO operation when a log is written to a file. This commit not only moves the logger from kelondro to yacy.logging, it also inserts the concurrency methods to realize non-blocking logging.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6078 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
aec3e7995a
autoconfig.pac can be used to browse .yacy-domains only
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6077 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bc6dd8194b
refactoring: moved search query class to new search package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6075 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a4805defdd
added stub for new search process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6074 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b8e738a7be
a collection of
...
- small bug fixes
- better/more comments
- more asserts
- fixed synchronization
- test case enhancements
- code cleanup
- performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6073 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
39779e4796
DidYouMean: as I moved to only 8 consumer and 4 producer threads, I removed poison pills as it does not make sense anymore - threads are interrupted directly. Having a consumer thread per test case just didn't make sense either (see svn 6070) due to the massive overhead.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6072 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
c3c4dd0933
DidYouMean - changed to much simpler LinkedBlockingQueue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6071 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
01ac1b5d7e
- blocking queue implementation of DidYouMean
...
- timeout ist set to 500ms
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6070 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b8bb1bb364
join with a timeout does not cause that the corresponding thread is stopped after the time-out. It does only cause that the waiting is stopped. Here we need additionally a signal to the thread to stop after we finished waiting.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6069 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b69f22e9ca
mistake in last commit: computation of loops in ReversingTwoConsecutiveLetters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6068 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3130334932
- start first with threads that run more loops
...
- join first with threads that run less loops
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6067 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
6cde7ebf16
DidYouMean
...
- without I/O intensive sorting by count
- but with multiple threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6066 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f348190566
tried to insert a database dump import method to the phpBB3 import function. Reason: imports or large database dumps are cannot be handled with phpMyAdmin and this should be an easy way to the database dumps into a mySQL database where it can be exported again with the phpBB3 content integration adapter. Completion or removal of this function stub will follow before next main release.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6065 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
945777aa80
replaced rwi term counting method by one that computes the maximum of the blobs that contibute to the RWI. An addition of the blob sizes is wrong/incorrect and does not reflect the real size. Truncation the size operation to the maximum of all blobs is also incorrect, but not as wrong as the sum of all blob sizes wich double-counts many rwi entries.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6064 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7c4d1d471c
hand-over of more specific object
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6062 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
09acfa66d1
- improved "did you mean"
...
- added &meanCount= to query string
- &meanCount=0 ==> no suggestion, no performance loss
- sorting suggestions by sb.indexSegment.termIndex().count()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6059 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
da6ce37f7b
- fixed encoding problem
...
- added limit to 10 suggestions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6058 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
54a48b4184
- added "did you mean" to search page
...
- currently works for single word queries only!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6057 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
550312ac85
added new command script to do a auto-Update from command line. this will make it easy to do mass-auto-updates in private yacy clusters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6052 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0fc1168554
- reduced time-out for socket-connection communication from 20 seconds to 5 seconds. This is a test to find out if the time-out was a cause for problems in metager environments
...
- turned a fine log entry in case of rejected connections on the server socket into a warning. (look for 'exceeding limit')
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6051 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
28b86385cd
patch for bad behaving swf parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6050 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d58b395993
fix for http://forum.yacy-websuche.de/viewtopic.php?p=15693#p15693
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6049 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
733385cdd7
enahnced database access times by removal of unnecessary synchronization.
...
added also more hacks that resulted from high-volum query testing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6047 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
398e210fef
removed synchronization in logging that causes deadlocks in high-performance environments
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6044 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
db3a06dd81
removed cookie handling in httpc:
...
- no need to do cookie handling in proxy, this was switched off so far
- no need for cookies in crawler, this was switched on (by mistake)
This fix was needed for a case where a web server flooded the crawler with cookies and caused a complete blocking of the httpc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6043 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1c54ae4a63
some small changes in HandleMap Testing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6042 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2c5554c912
small enhancements in search result computation speed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6039 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e0b3984805
added navigation keys for site and author facets to remote search interface
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6038 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
27fa6a66ad
- completed the author navigation
...
- removed some unused variables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6037 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a9a8b8d161
- added display of author navigation (usage of that navigator not yet implemented
...
- added a synchronization in pdf parser which should help to avoid deadlocks that occur when displaying several search results pointing to pdf sources
- fixed smaller bugs in navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6036 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c879783008
added steering of navigator computation:
...
- by default the navigator computation if off for servlet yacysearch.html, but:
- the servlet is called by default with a option to switch navigator results on
this will prevent that metasearch users will get slow results that are caused by unnecessary computations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6035 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c079b18ee7
- refactoring of IntegerHandleIndex and LongHandleIndex: both classes had been merged into the new HandleMap class, which handles (key<byte[]>,n-byte-long) pairs with arbitraty key and value length. This will be useful to get a memory-enhanced/minimized database table indexing.
...
- added a analysis method that counts bytes that could be saved in case the new HandleMap can be applied in the most efficient way. Look for the log messages beginning with "HeapReader saturation": in most cases we could save about 30% RAM!
- removed the old FlexTable database structure. It was not used any more.
- removed memory statistics in PerformanceMemory about flex tables and node caches (node caches were used by Tree Tables, which are also not used any more)
- add a stub for a steering of navigation functions. That should help to switch off naviagtion computation in cases where it is not demanded by a client
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6034 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bead0006da
replaced tmp file extensions by prt
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6033 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3189f9cd39
fixed problem with DCEntry initialization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6032 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a704d82280
patch for problem with digest
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6031 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3029ef6eb3
fixed a bug that was recently inserted which caused that no idx and gap files were written.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6030 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b6e274f211
omit most of forced crawl delays by using a separat delay table which flushes delayed URLs at the correct time
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6029 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d50be59088
- added a automatic re-construction of the domain stack after 10 minutes. this includes then urls to the domain stack that were left over in case of stack size limitations when the domain stack was created the last time
...
- changed the busy sleep time for the crawl thread to 30 millisecons. This is sufficient to crawl with 2000 PPM.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6028 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5fdba0fa51
- fixed a not working selection rule in balancer
...
- more security about crawl-delay, be more fail-save
- better logging in case of long forced crawl-delays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6027 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f5602404d5
another speed boost for the balancer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6026 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
95e8cbd1c3
new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6025 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c062385552
fix for http://forum.yacy-websuche.de/viewtopic.php?p=15555#p15555
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6024 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
42ae40b9f6
some bugfixes to database close() methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6023 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a0c53abbe1
- wait until local results are computed during search, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2167&hilit=&p=15521#p15521
...
- show only x+1 pages in page navigator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6022 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9bfd22f65d
fix for http://forum.yacy-websuche.de/viewtopic.php?p=15523#p15523
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6020 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1c77db670f
re-designed response format for navigation:
...
- changed json and rss response templates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6019 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
15fad767c0
some refactoring of topic generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6018 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
cc49aedf12
- fixed problem with remote search NPE
...
- more abstraction for search requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6015 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
9e18abc2ac
* fix charset detection, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2137
...
* why has this been uncommented???
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6014 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c38c852090
modified access method to get index entries out of a array of BLOBs:
...
iterate them, then merge; not collect them and merge then.
This should use less memory and may behave better in an environment with many queries.
To ensure that too many queries will not cause total blocking,
a time-out of one second was also added. After the time-out
the index data that was collected so far is returned.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6013 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ab06a6edd2
renamed topwords to topics and enhanced computation methods of topics
...
topics will now only be computed using the document title, not the document url,
because the host navigator is now responsible for statistical effects of urls.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6011 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a5d481eab1
enhanced navigation
...
- fixed too early computation of navigation
- moved navigation rendering to yacysearchtrailer
- added more asserts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6006 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7639ec2f38
- fixed letter case bug for dc record creation
...
- dc parser is now lazy against letter cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5998 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4522c13ee7
added option for a table prefix when importing phpbb3
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5996 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1c69d9b8b6
more refactoring of the index classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5995 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3d5f2ff544
- added new servlets to support search portal administrators for the integration of yacy search fields in their web pages
...
- moved some servlets from here to there..
- changed menu structure
- removed yacyui-portaltest.html which contained an example for the live search which is now integrated on all pages in yacy. The code snippet example from that page is integrated into the ConfigLiveSearch.html servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5994 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4d4315687f
fix for problem with concurrency in host navigator, bug reported by wsb
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5993 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
88426912ad
more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5992 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
d813fd26ed
reset sent/received counters on index delete
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5991 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
99bf0b8e41
refactoring of plasmaWordIndex:
...
divided that class into three parts:
- the peers object is now hosted by the plasmaSwitchboard
- the crawler elements are now in a new class, crawler.CrawlerSwitchboard
- the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment
The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
876746602d
catch problems of file hash computation, see also:
...
http://forum.yacy-websuche.de/viewtopic.php?p=15245#p15245
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5989 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fec6f9054f
some refactoring of search methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5988 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3d4b826ca5
migration of all databases that use the deprecated BLOBTree format into the BLOBHeap format. Old databases are migrated automatically.
...
This removes the last very IO-intensive data structures which were still used for Wiki, Blog and Bookmarks. Old database files will still remain in the DATA subdirectory but can be deleted manually if no major bugs appear during migration. There is no need for any user action, all migration is done automatically.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5986 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4b4bddca00
added new submenu to crawler menu: import of phpbb3 forum postings from mysql
...
- yacy can import phpbb3 posts without crawling
- all data is written as surrogate
- indexed surrogate files can be re-used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5985 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d8284046b0
enhanced speed of site navigation computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5980 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c72a5cf326
added stub for PHPBB3 extraction code using direct access to mySQL
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5979 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e735d3a69f
fix for http://forum.yacy-websuche.de/viewtopic.php?p=15175#p15175
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5978 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
63a0255166
- refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index
...
- refactoring: migrated data objects for the new connector classes
- added a DAO interface class to specify an abstract interface for database retrieval connector methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5977 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f246928c20
first attempt to add 'real' Navigation to yacy search results: host navigation
...
- after a search is started, it is analysed how many hits are in each site
- this can be done really efficient, because the navigation information is hidden in the url hash and can be computed very fast
- the search result shows a column on the right with the hosts and the hits per host
- after a click on a host the search is modified using the efficient site: - operator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5976 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
54b9e99c01
- more information about peer tags
...
- peer tag is by default '*'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5975 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
26a46b5521
increased default maximum file size for database files to 2GB
...
Other file sizes can now be configured with the attributes
filesize.max.win and filesize.max.other
the default maximum file size for non-windows OS is now 32GB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5974 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
addecdb18c
simplified code, removed one unused method in all implementing classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5972 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
47fce9020c
small change (Orbiter's wish)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5971 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
e07b14e5d7
finally a working fix for 5960
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5970 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
3ebb904d2c
fix for 5960, http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2119
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5969 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
734680dc70
initialize the ResourceObsever in own thread
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5968 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e005cfea37
fix for bug in -incell option of URLAnalysis
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5967 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a7e392f31b
The collection index will not be supported any more.
...
Existing indexes based on the old index collections must be migrated with YaCy 0.8
- removed index collection classes and all migration tools
- added a 'incell' reference collection feature in URL analysis
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5966 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a2f48863fc
- added prototype for navigation index
...
- refactoring of word index prototype
(no functional changes so far)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5965 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
47fd226bdb
proper parsing of sentences
...
does not affect tokens/words
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5964 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
27eb8d62cb
- new development cycle
...
- removed temporary configuration with safe setting for indexer threads (=1) and replaced it with best value computed during performance tests (1/2 of number of processors)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5963 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b7457d3807
patch for http://forum.yacy-websuche.de/viewtopic.php?p=14720#p14720
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5960 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bffbe43e09
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14522#p14522
...
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14955#p14955
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5959 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f133d6065c
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14955#p14955
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5958 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
82af994041
added missing loglevel
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5956 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ad9762746d
no exception in case of uniq() time-out, see also
...
http://forum.yacy-websuche.de/viewtopic.php?p=13177#p13177
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5955 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1efe686e3f
fix for http://forum.yacy-websuche.de/viewtopic.php?p=13960#p13960
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5954 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
13fb84ab81
you can define your default number of search results displayed by search.items
...
this applies only to requests through the classic-style page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5953 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f2e4d156e8
removed debug messages
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5950 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
709bfc2cd4
added a memory check in http post protocol
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5949 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c01d6f43e1
- fixed problem with thread dump if no arguments are given
...
- rejecting peers that are older than 6 hours (not-seen during 6 hours)
- 0.78, targeting 0.8 at the end of the week
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5948 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a49edd9415
fix for bug in search with site: constraint
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5947 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c1e5fad9a7
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14767#p14767
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5944 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8ee3a94e82
fix for non-caching of sitehash, see http://forum.yacy-websuche.de/viewtopic.php?p=14440#p14440
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5942 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
21930d05ed
fix for [B@...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5941 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b6ba387e01
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14751#p14751
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5940 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4338dcf936
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2093&hilit=
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5937 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
bad7ce9286
experimental option trayIcon.force for unsupported platforms. java 1.6 needed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5936 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
ea27853c59
*) some refactoring
...
*) added one assertion
*) no functional changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5935 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
d164b42604
*) cosmetics
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5934 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
17150b2950
fixed bug in snippet computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5932 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
89aeb318d3
enhanced the wikimedia dump import process
...
enhanced the wiki parser and condenser speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5931 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5fb77116c6
added a submenu to index administration to import a wikimedia dump (i.e. a dump from wikipedia) into the YaCy index: see
...
http://localhost:8080/IndexImportWikimedia_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5930 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
df733af4fa
Try not to loose content from ram during IndexCell.delete by moving ram.delete after the dangerous operations on the array (array.get and array.delete)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5929 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
ac72005f2f
Let IndexCell.remove remove entries from the ram portion of the DB as well.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5928 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8ba7ff5353
a fix and another speed enhancement for the RWI cache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5927 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
05f077e85f
added stack trace output to solve problem in
...
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2076&hilit=&p=14612#p14612
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5926 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
71a4cadf31
better and more performant synchronization in SimpleARC, the caching object for word hashes. Speeds up indexing.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5925 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e6773cbb33
better handling of RWI cache for concurrency and less overhead when writing new entries -> even more indexing speed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5924 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c097531e3d
added a catch Exception to all thread to check if any of them silently dies without any other notification
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5922 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
083533e5ec
fix for bugs in IODispatcher
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5921 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
21fbca0410
better scaling of HEAP dump writer for small memory configurations;
...
should prevent OOMs during cache dumps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5920 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6e0b57284d
better care for states of the IODispatcher
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5919 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1db9cdd4e4
fixed bug in writing of robots.txt entries in case that host names exceeded 64 characters and some other problems
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5918 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
bde88b684a
* splitt off yacyRelease from yacyVersion
...
* added some gui infos about signatures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5916 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
057ce14c8e
more fixes (character encoding, parser exceptions, http client failure, blob writing)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5914 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d2ac0aa682
- fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling
...
- increased default memory size to 180MB
- fixed possible bug in http client reset (there was a deadlock)
- bug in BOBHeap marked, but not solved, cause is still unknown.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5912 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
1351d903a1
don't follow links like mailto:
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5909 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e88a66bcae
temporary disabling computation of all sublinks (check needed)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5908 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
ff5f82d780
*) removed description of removed commands from wikiHelp ([= =])
...
*) used format function of Netbeans for wikiCode to make it more readable, no functional changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5907 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
eacf95213a
fix for crawling of mailto-links
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5906 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9c6ac43f66
fixes for wiki parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5905 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3a64c9d02f
- fix for problem with concurrency when computing word hashes
...
- fix for search in case that a urlfilter was used and zero results were returned
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5904 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d3f8aa5a2a
set of small fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5903 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
78ffb61297
*) got rid of unnecessary variable which might also fix IndexOutOfBoundsException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5902 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d31e6f9c14
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14457#p14457
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5899 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8d6212233b
fix for IODispatcher
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5896 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f678472f46
fix for quote problem in json output
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5895 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d079d6dfdb
small changes in surrogate reader, wiki code and portal test
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5894 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
07f09742bb
set of small fixes and comments
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5893 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
06ed4ef7b3
* better picture handling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5891 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5a634cab23
removed generation of anchor link sets in document types that describe container formats.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5890 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
f1244264b8
*) hopefully fixed bug reported in http://forum.yacy-websuche.de/viewtopic.php?t=2057
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5882 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2e3186189b
fix for mediawikiIndex surrogate producer + added concurrency
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5880 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
6f5ea7b1a8
small fix for previous post
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5879 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
138a0747e3
added serverObjects.putJSON as JSON has very particulare encoding requirements
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5877 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d977dd9a96
fix for surrogate loader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5870 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9cb68353da
fix for bug in ProfilingGraph for ppm >> 10000 ppm (!)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5868 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9e4db75aac
reduced internal logging and reduced memory that internal logging can use
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5867 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c10c257255
attempt to fix a deadlock situation where the IODispatcher did not work.
...
I suspect the dispatcher thread has crashed and queues filled so no indexing process was able to write data.
This fix tries to heal the problem, but I am unsure if it helps. To get a better view of the problem, some more log outputs had been inserted.
Added also a new attribut indexer.threads to get a control over the number of default threads for the indexer (default is 1)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5866 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
09987e93fd
fixed some more bad handling of byte[]
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5865 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1bcc1450cb
more explaining error message in case of IOExceptions during html parsing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5864 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fe51f4d668
less synchronization may help to prevent deadlocks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5863 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
58802e4201
added missing success test in storeDocumentIndex,
...
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1922&hilit=
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5862 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
171e62bee5
addition to the fix from last commit (which did not work)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5860 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
059949a0d1
tried to fix problem with snippet fetch for second search page when verify=false
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5859 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
b08991e278
moved some constants, rename of Tray class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5858 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
138422990a
- removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated
...
- added some debugging output to balancer to find a bug
- removed unused classes for index collection handling
- changed some default values for the process handling: more memory needed to prevent OOM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5856 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1b9e532c87
some concurrency for wikipedia dump reader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5855 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
25d2160288
small fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5853 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
16baa7ad24
To translate a mediawiki dump into the YaCy surrogate format do the following:
...
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/
this generates a series of files to DATA/SURROGATES/in
if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0b2c98edc9
some more work on the wikipedia-dump exporter (not finished yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5850 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5195c94838
two patches for performance enhancements of the index handover process from documents to the index cache:
...
- one word prototype is generated for each document, that is re-used when a specific word is stored.
- the index cache uses now ByteArray objects to reference to the RWI instead of byte[]. This enhances access to the the map that stores the cache. To dump the cache to the FS, the content must be sorted, but sorting takes less time than maintenance of a sorted map during caching.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5849 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9416f5c26f
more speed test cases: kelondro provides map functions that are more than 20% faster than standard java classes and use less than halve of the memory of java classes:
...
just start IndexTest (here with 1000000 test objects)
Performance test: comparing HashMap, TreeMap and kelondroRow
generated 1000000 test data entries
STANDARD JAVA CLASS MAPS
sorted map
time for TreeMap<byte[]> generation: 2110
time for TreeMap<byte[]> test: 2516, 0 bugs
memory for TreeMap<byte[]>: 29 MB
unsorted map
time for HashMap<String> generation: 1157
time for HashMap<String> test: 1516, 0 bugs
memory for HashMap<String>: 61 MB
KELONDRO-ENHANCED MAPS
sorted map
time for kelondroMap<byte[]> generation: 1781
time for kelondroMap<byte[]> test: 2452, 0 bugs
memory for kelondroMap<byte[]>: 15 MB
unsorted map
time for HashMap<ByteArray> generation: 828
time for HashMap<ByteArray> test: 953, 0 bugs
memory for HashMap<ByteArray>: 9 MB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5847 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b53790abb1
more performance hacks: 10% more speed for Base64.compare() which is really often used in YaCy code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5846 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8ffb9889e1
some fixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5845 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dfb96ecb72
more fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5844 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1b8d346b4c
fixes in connection with transiton to byte[] hashes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5843 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
0b0a46d35a
* fix transferRWI as suggested by celle (thanks!)
...
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2000#p14023
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5842 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
996572de95
quickfix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5841 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
380ed2dac0
performance and debugging additions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5840 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
635b0a9da7
code-split
...
allow cgi indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5839 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fa3adbbfc6
added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5837 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
76af84d732
* add custom comparator to ScoreCluster for byte[]
...
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2010
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5836 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
ab0030d7a7
allow dht-out for remote-crawl processing peers on default settings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5834 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
d1116c049f
*) added new method "contains()" to Blacklist interface
...
*) implemented contains() in class AbstractBlacklist
*) used new method in Blacklist_p to prevent double entries in blacklists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5832 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
08445e42f0
* don't throw exception, in case of bad charset in http-header
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5831 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
2f860a2564
* convert byte[] hashes to string for log output
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5830 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
d93a2a6552
* ignore whitespaces so you can copy&paste signatures better
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5828 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fbcbcc5bdb
export of yacy document objects as dublin core record in xml
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5826 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d7cbf4cdd4
more performance hacks: less overhead in word hash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5825 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
29e96c1a60
bugfixes and performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5824 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4e97a31009
corrections in dublin core syntax
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5823 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
44daec7936
* introduce signatures to autoupdate
...
as long as there aren't publickeys for the updatelocations set,
no signatures are checked
* wiki-article follows...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5822 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
538e375901
replaced old caching method for computed word hashes with a better method. The word hash computation is a new performance bottleneck (after the IO bottleneck was removed with the IndexCell data structure) and a better caching for word hashes was necessary.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5821 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9e853e1977
partly reverting SVN 5818: identical comparator required for join operator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5820 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e16c25ddf7
(peak-) performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5819 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
63cd152969
fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5818 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7dfe7e7cc6
fixed some problems with surrogate reader. This is now ready for testing.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5817 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3a1364ed5c
removed example lines from SurrogateReader sources; added additional example file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5816 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9050a3c4c5
alpha version of surrogate reading and indexing.
...
see the example file for an explanation.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5815 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b15b059c0d
fix for latest commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5813 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c8624903c6
full redesign of index access data model:
...
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
dd6b5005ff
* fix missing charset handling in getpageinfo_p
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5811 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bd5f4c78d8
- added default profile for surrogate indexing
...
- integrated surrogate indexing into indexing queue process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5810 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ad78e3a59f
- less lines in rssTerminal
...
- crawl more documents: if remote crawling is enabled, a remote crawl list is also loaded if a local crawl is running in case that the indexer is idle
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5809 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bc80dc913a
added new surrogate reader (surrogates are parsed documents on batches)
...
this will open a new way to insert indexes to YaCy (instead crawling)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5808 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
12d81e98eb
- fixed bad search results when searching for empty string
...
- simplified result handling and page composition in case that nothing was searched
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5807 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8a24350036
- fix for join method with new generalized RWI data structure (caused by latest commit)
...
- added more functions to mediawiki parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e58320a507
added more info in log fore debugging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5805 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
89ec3acb3e
- full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
...
- during index joins all word positions are maintained: better ranking for word distance possible; exact phrase match can be implemented soundly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5804 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
7a48090fcf
- fix for "uk" language
...
- svn attributes added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5803 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dc2af61bc9
allow up to 50 results from remote peers
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5802 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c0e8ed5461
fixed problem with not http client
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5801 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8862a2fed0
ups
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5799 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
de68948bc5
better handling of free memory computation and emrgency cache flush for index cell
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5798 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
fcb77c3140
* added .im (Isle of Man) to TLD-list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5794 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b81c7467d8
protection against too many files in RICELL in case of massive emergency dumps caused by low memory
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5791 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d4d87d90c4
- extended experimental wikipedia dump parser
...
- removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c3aff2521e
fix for NPE
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5789 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
57c00dd8c9
fix for bad filtering of common http error
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5788 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
14361f1ca4
added log message for index generation in HeapReader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5787 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c08f9b36a4
refactoring of wiki parser.
...
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
44e01afa5b
- refactoring
...
- a little bit more abstraction
- new interfaces for index abstraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5783 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
82fb60a720
increased memory limit for emergency cache flush
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5782 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
9180617dd9
*) Classes to handle import of lists (especially blacklists) from XML files, not used yet, but will be used soon.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5780 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
596e6215dc
fix in case of white space in path name
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5779 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b887f4a116
keep more free mem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5778 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c2359f20dd
refactoring: better abstraction of reference and metadata prototypes.
...
This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index.
Moved to version 0.74
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5777 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ab656687d7
more strict BLOB initialization .. may also help to save some ram
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5776 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5b138ada16
fixes to web structure reference collection and url construction
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5775 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a29a11e526
added evaluation of incoming links in webstructure api
...
the api hash changed, new XML schema.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5774 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f6691411b5
- migration of files from SplitTable (which are used for the URL-DB) to a different file name format.
...
- the file generation logic is slightly different: files may now have only a maximum size of one gigabyte and a maximum age of one month.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5773 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
shostakovich
1f37cc6107
Robots.txt is now reused after one day. See forum-topic:
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1669&p=13565#p13565
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5772 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f21a8c9e9c
a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5771 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7ba078daa1
- added fast site-operator
...
- refactoring merge into BLOBArray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5770 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b4126432bc
hardening of index dump write process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5769 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9bfb2641db
- removed deprecated threads
...
- added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5768 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
293290c317
fix for bad assert in last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5767 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bd409fb7ba
added web structure analysis for a special domain that can be requested from the api.
...
Example:
http://localhost:8080/api/webstructure.xml?about=www.yacy.net
returns a xml with the following content:
<?xml version="1.0"?>
<webstructure>
<domains reference="reverse" count="1" maxref="300">
<domain host="www.yacy.net" id="FXg39Q" date="20090401">
<citation host="java.sun.com" id="o-R3yY" count="1" />
<citation host="yacy-suche.de" id="-KCLaB" count="1" />
<citation host="suma-ev.de" id="VRAHIA" count="1" />
<citation host="www.kit.edu" id="EMaLDQ" count="1" />
<citation host="yacy.net" id="Fh1hyQ" count="1" />
<citation host="www.fzk.de" id="V2Kl-A" count="1" />
<citation host="en.wikipedia.org" id="rwtdfR" count="3" />
<citation host="vimeo.com" id="MmdQDY" count="3" />
<citation host="liebel.fzk.de" id="sX4ozA" count="6" />
</domain>
</domains>
</webstructure>
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5766 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b6c2167143
- patch for bad web structure dumps
...
- added automatic slow down of accessed to specific domains when access to a web page fails
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5765 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0139988c04
- added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up.
...
- added clean-up of unfinished merges and unused idx/gap files
- enhanced merge file selection method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5764 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3621aa96ab
- added a memory protection for the IndexCell migration
...
- fix for bad cell file selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5763 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
568e8f1741
fix in unmountBLOB
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5762 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9da69d6b68
- better selection of files to be merged
...
- fix for getChannel().close(), which works on windows but not on macs and linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d39a5b42ca
more care about open file handles. Now files also close on windows and can be deleted afterwards.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
029495e64d
fixed bug introduced in SVN 5756 in EcoTable.put()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5759 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
587838bd09
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d2e2420a68
- added another file selection method for index cell merge
...
- more hacks to check that files are closed propertly and filehandles do not exist after files are closed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5757 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
96eaecda3e
- added migration class to go from index collections to the index cell data structure.
...
- added better control over file deletion, because this sometimes fails, especially on windows
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0f0b4aec75
better index cell merge logic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5754 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
832fef670f
migration of urls-files into subdirectory METADATA
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5753 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fa07234d4e
fix for clear method: now deletes files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5752 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lulabad
df87e4dbf6
missing count of send Index and URLs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5747 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
c450e3746b
svn attributes added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5736 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
37f892b988
added new concurrent merger class for IndexCell RWI data
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5735 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
8c494afcfe
svn attributes added
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5734 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
67aaffc0a2
- added Latency control to the crawler:
...
because of the strongly enhanced indexing speed when using the new IndexCell RWI data structures (> 2000PPM on my notebook), it is now necessary to control the crawling speed depending on the response time of the target server (which is also YaCy in case of some intranet indexing use cases).
The latency factor in crawl delay times is derived from the time that a target hosts takes to answer on http requests. For internet domains, the crawl delay is a minimum of twice the response time, in intranet cases the delay time is now a halve of the response time.
- added API to monitor the latency times of the crawler:
a new api at /api/latency_p.xml returns the current response times of domains, the time when the domain was accessed by the crawler the last time and many more attributes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5733 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0926310461
another performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5731 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ebe5d69d14
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5730 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
61f9dbf0cc
- fixed a display problem in watch crawler
...
- another small enhancement in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5729 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b3f75e48fa
- enhanced balancer: auto-solving of waiting-deadlocks
...
- removed deprecated cache-init size value
- more debug lines for IndexCell cache dump merge
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5728 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9a90ea05e0
added a merge operation for IndexCell data structures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5727 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d99ff745aa
fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5726 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0c3ab291c4
fix for http://forum.yacy-websuche.de/viewtopic.php?p=13354#p13354
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5725 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a9cea419ef
Integration of the new index data structure IndexCell
...
This is the start of a testing phase for IndexCell data structure which will replace
the collections and caching strategy. IndexCall creation and maintenance is fast, has
no caching overhead, very low IO load and is the basis for the next data structure,
index segments.
IndexCell files are stored at DATA/<network>/TEXT/RICELL
With this commit still the old data structures are used, until a flag in yacy.conf is set.
To switch to the new data structure, set
useCell = true
in yacy.conf. Then you will have no access any more to TEXT/RICACHE and TEXT/RICOLLECTION
This code is still bleeding-edge development. Please do not use the new data structure for
production now. Future versions may have changed data types, or other storage locations.
The next main release will have a migration feature for old data structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5724 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
fd0976c0a7
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5723 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
83792d9233
more refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5722 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
ce79239322
"typo"
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5721 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
cdbdc731c5
small updates: unescape, isCGI
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5720 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
474aac65af
more refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5719 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
209f25f5f5
refactoring to integrate indexCell data structures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5718 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
359a238acf
faster isCGI()
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5717 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
f75628e53b
some corrections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5716 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b7138e5fcb
even more efficient comparator calls (less System.arraycopy for primary keys)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5715 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
65784eb656
- more efficient comparator calls
...
- fix for http://forum.yacy-websuche.de/viewtopic.php?p=13331#p13331
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5714 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
44874cb550
added a deleteOnExit for blob file deletion in case that a deletion is not successful.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5713 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
66f78d67e0
bad idea. Concurrency in index management will be done differently
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5712 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7dff1cba62
removed option to use different primary keys in kelondro tables
...
this option was never used and there is also no use to set other columns but the first as the primary key. as a result, access methods to the key do not need to compute key positions, and they work faster.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5711 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7f67238f8b
refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5710 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
14a1c33823
refactoring of wordIndex class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5709 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d49238a637
more performance hacks: better default values for scaling, less memory usage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5708 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
39644dc14e
performance hacks to compare methods in database core
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5707 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e2e7949feb
replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5706 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f6d989aa04
added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy.
...
The speed of the kelondro indexing class ObjectIndexCache can be compared with Javas standard TreeMap with the main method in IntegerHandleIndex. The result is, that the kelondro indexing needs only 1/5 of the memory that TreeMap uses! In exchange, the kelondro classes are slower than TreeMap, about four (!) times slower. However, this is not so bad because the better use of the memory is a strong advantage and makes it possible that YaCy can maintain such a large number of document (> 50 million) in one peer.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5705 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
0a2fabeef3
static TMPDIR
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5704 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
9f7e62e900
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5703 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
f35dc11dc4
allow crawl start from pages with script tags
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1910
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5702 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6958eff196
removed unnecessary exceptions, extended testing in IntegerHandleIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5701 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
13c666adef
performance hack to ObjectIndex put() method:
...
Java standard classes provide a Map Interface, that has a put() method that returns the object that was replaced by the object that was the argument of the put call. The kelondro ObjectIndex defined a put method in the same way, that means it also returned the previous value of the Entry object before the put call. However, this value was not used by the calling code in the most cases. Omitting a return of the previous value would cause some performance benefit. This change implements a put method that does not return the previous value to reflect the common use. Omitting the return of previous values will cause some benefit in performance. The functionality to get the previous value is still maintained, and provided with a new 'replace' method.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5700 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1f1be1518c
added stub for another performance hack: concurrent indexes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5699 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3e4c28e188
enhanced count feature for kelondroRowSet. This is about twice as fast as before. Should speed up the collection analysis (half time!)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5698 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
84e37387a2
fix for last commit and more testing stub
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5697 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ca006c506d
stub for performance enhancements for RowSet (no functional change yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5696 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d988204875
better shutdown of tools
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5695 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
100247bdda
added also an export and delete-feature to the URLAnalysis. This completes the clean-up feature for URLs. To do a complete clean-up of the url database, start the following:
...
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -export DATA/INDEX/freeworld/TEXT xml urls.xml diffurlcol.dump
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -delete DATA/INDEX/freeworld/TEXT diffurlcol.dump
The export-feature is optional, the purpose of that function is to provide a back-up function for URLs to be deleted. The export function can also be used to create html files with embedded links and simple text-files. Simply replace the 'xml' word with 'html' or 'text'. The last argument in the cann, the diffurlcol.dump value, can also be omitted. This will cause that the complete URL database is exported. This is an alternative to the Web-Interface based export function.
The delete-feature is the only destructive method of the four presented here. Please use it with care. It is better to make a back-up of the url database files before starting the deletion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5694 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
8c60d6d117
In DHT selection delete only those references that were actually selected
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5693 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
60078cf322
added next tool for url analysis: check for references, that occur in the URL-DB but not in the RICOLLECTIONS
...
to use this, you must user the -incollection command before (see SVN 5687) and you need a
used.dump file that has been produced with that process.
Now you can use that file, to do a URL-hash compare with the urls in the URL-DB. To do that, execute
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump
or use different names for the dump files or more memory.
As a result, you get the file diffurlcol.dump which contains all the url hashes that occur in the URL database, but not in the collections.
The file has the format
{hash-12}*
that means: 12 byte long hashes are listed without any separation.
The next step could be to process this file and delete all these URLs with the computed hashes, or to export them before deletion.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5692 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b1ddc4a83f
do not merge collections if ram == false
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5691 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dbdd10da84
better logging and startup behaviour for referenceHash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5690 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d64836c34f
added statistical analysis of URL reference
...
use that with the following command on a linux shell:
java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -incollection DATA/INDEX/freeworld/TEXT/RICOLLECTION used.dump
for freeworld indexes.
For more details please see discussion below:
http://forum.yacy-websuche.de/viewtopic.php?p=13204#p13204
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5687 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3b28daab40
code-beautification (to be consistent with external documentation paper)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5686 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
485c9406e5
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1915&hilit=&p=13249#p13249
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5684 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
858f800a07
more logging in httpd to detect shutdown cause. See also:
...
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1914&hilit=&p=13246#p13246
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5683 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b80db04667
- refactoring of IntegerHandleIndex and LongHandleIndex (better method names)
...
- fix for problem in httpdFileHandler: mising close of open Files if tempate cache was disabled
- more memory for DHT selection required
- stub for URL reference hash statistics in index collections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5682 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
8ee946bf1d
show upnp status
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5679 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
16f5c6a85e
fixed merge method initialization in ReferenceContainer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5676 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d7a493b4f5
added experimental timeline api
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5672 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
efcd95dc37
simplification of (internal) query process / refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f1b712c29a
small corrections to image loading methods in result presentation
...
especially loading of favicons in search results. This is a fix that
affects only searches in intranet/repository configurations.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5670 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d4b56d5819
added more asserts to BLOBHeap.flushBuffer() to fix the problem described in
...
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1679&hilit=&p=13109#p13109
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5666 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
c545fcb9fa
* add class to handle keys and signatures
...
* fix bug in serverCharBuffer
* add build-target to sign tar.gz (run ant dist sign)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5665 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
aa44d9bad9
more refactoring of kelondro.text / deleted de.anomic.index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6ffc6e3389
more refactoring of indexer and kelondro classes;
...
- integrating the indexer into kelondro as package 'text'
- renaming of classes in kelondro.index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5663 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
404bc21da9
simplification of (internal) query process / refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5662 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
76ef5f0f14
refactoring of index package: better names for the classes (to be continued)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2df57b1fd1
refactoring of index collection class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5660 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
39a177649b
* added upnp listener for devices that do not respond to discovery but advertise themselves
...
* moved package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d1d9fbae5c
enabling the URLAnalysis to operate on multime input files, just use a wild card when calling the class from the command line
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5658 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c728879ab8
fixes to yacyURL - more exceptions in case that urls are strange
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5657 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7542336ae5
performance enhancement to yacyURL: omit second processing of resolveBackpath. This method is already applied during initialization of the object and was called a second time when the url was exportet.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5656 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7ea53fe47b
added another url list transformation option:
...
- check the list and kick out entries with lines that contain not valid urls
- normalize the urls
- remove doubles
- sort the list
- split the list in smaller chunks
This is all done in one process which can be called with a new -sort option
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5655 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e521e81148
bugfix in yacyURL (for latest performance hack)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5654 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
54625360f7
performance update
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5653 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d884c4718a
added gzip support for URLAnalysis:
...
url lists can also be compressed with gzip
If such a file is handed over to URLAnalysis, the output will also be written as .gz-file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5652 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
46632f4385
performance update to yacyURL
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5651 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
cf9b74e6e3
added another method to process url lists: extract hosts only
...
This can be used like
java -Xmx2000m -cp classes de.anomic.data.URLAnalysis -host DATA/EXPORT/20090224213823.txt
changed als the call method to generate statistics, please use now
java -Xmx2000m -cp classes de.anomic.data.URLAnalysis -stat DATA/EXPORT/20090224213823.txt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5650 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
89d8e824ed
memory protection for URLAnalysis
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5649 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0f6fa804ff
performance update to URLAnalysis
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5648 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8444357291
added new row interator in kelondro tables files that enumerates rows
...
without an order by the primary key. The result is a very fast enumeration of the Eco table data structure. Other table data types are not affected.
The new enumerator is used for the url export function that can be accessed from the online interface (Index Administration -> URL References -> Export). This export should now be much faster, if all url database files are from type Eco
The new enumeration is also used at other functions in YaCy, i.e. the initialization of the crawl balancer and the initialization of YaCy News.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5647 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e8f5f2f612
added tool to analyse url strings
...
and to generate statistics about words occurring in urls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5646 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
6117e083e5
option to customize tray label (tooltip) with tray.label
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5642 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b8c3803bfc
don't panic when canceling server sessions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5641 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
de714783b1
- added host, path, filename to search result
...
- modified yacyinteractive, shows now also date
- added size attribut to export file in xml format
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5639 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
9519d84372
changed "dooble" variable to "browserintegration" to be less specific
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5636 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
8429083972
adjusted tray for dooble:
...
you can now set dooble=true in yacy.init to disable the menu and browser popups by default
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5633 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ef62ec635e
removed overwriting of logging config
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5629 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c852d2d70e
- reject too old seeds
...
- do not store the complete seed in the reverse name cache, only the hash of the peer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5628 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
aca973e2d9
catch more exceptions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5627 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9559bc23fd
automatic clean-up of dead connections
...
(hope that works well..)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5626 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
02dfd6183b
Fix logging in serverCore
...
Prevent NPEs from keeping stopped Sessions in the pool and blocking slots
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5625 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
d30456e2c8
Fix logging in serverCore
...
Prevent NPE:
I 2009/02/20 15:15:56 PLASMA check for Session_77.37.19.225:38812#0: 86515 ms alive, stopping thread
I 2009/02/20 15:15:56 PLASMA Closing main socket of thread 'Session_77.37.19.225:38812#0'
E 2009/02/20 15:15:56 SERVER receive interrupted - exception 2 = Socket closed
Exception in thread "Session_77.37.19.225:38812#0" java.lang.NullPointerException
at de.anomic.server.serverCore$Session.run(serverCore.java:623)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5624 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4f9dae2571
remove reference in crawl entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5623 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1ba4301920
automated interruption of dead incoming connections, if they are there for more than one minute
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5622 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c12bb8a6d0
- refactoring of the http client
...
- added a protection against memory leaks for the access tracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5621 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5d3983faae
the soLinger parameter was wrong.
...
With soLinger=true the httpd looses connections
The effect can be seen when crawling the internal repository:
lost connections filled the client process queue until it was full
and no more connections were possible.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5620 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
62505bb3cb
more bugfixes as recommendet by findbugs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5619 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6b450d09ca
some fixes recommended by findbugs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5618 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4db80065ac
select more
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5617 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
94c42691d8
- reject less transmissions as transmission receiver
...
- do not flag too much receiver when something goes wrong during transmission as sender
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5616 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f887fc159f
try to reduce the large number of unclosed incoming connections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5615 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e04a0e05c3
fix for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5614 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a9ad863686
second part of 'doubles' fix - better handling of doubles in RAMIndex. More logging.
...
still missing: deletion of double entries in collections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5613 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
59427064fb
first part of 'doubles' fix (not fully ready yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5612 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
26978b2a25
- better memory protection in kelondro caches: computation of needed memory for cache grow
...
- removed excessive gc calls
- step to 16 vertical DHT partitions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5611 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
e9e2fff47a
better scaling on performance graph
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5610 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
4aad461100
added UPnP support
...
YaCy can now automatically forward ports on home routers
off by default
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5609 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
99b9788e54
fix for possible 100% CPU caused by concurrent access of HashMap
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5607 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
be0c492ae5
fix for memory leak bug in new dht transmissions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5606 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
hermens
2173865f92
Prevent race condition when switching timezones.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5605 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
40d9849aa4
- better control of chunk size in dht selection
...
- more restrict values in selection
- step to 4 vertical partitions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5603 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
30a1de41b3
disabled the BufferedIOChunks, because I consider it as broken.
...
I will try to fix that, but it is better to not use a buffer than using a broken buffer.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5600 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
411f2212f2
more memory leak fixing hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5599 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
985d421f91
found and fixed some memory leaks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5596 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
333489420b
- fix for NPE when loading the cytag image
...
- some hacks for less memory usage:
-- less usage of buffer and cache memory in EcoFS
-- buffer allocation on-demand in BufferedIOChunks
-- removed largest ybr idx
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5595 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6a32193916
- refactoring of cache naming in web index cache (no more dht semantics there)
...
- activating a feature in the thread dump that cuts off dumping of a trance of inside-java-core events
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5593 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6c627dbdff
update to the server core
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5591 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5393f356aa
fix for termination problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5589 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6a876ecb88
first fixes to the DHT transmission process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5588 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c25c334b75
replaced old DHT transmission method with new method. Many things have changed! some of them:
...
- after a index selection is made, the index is splitted into its vertical components
- from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue
- each splitted chunk gets its own transmission thread
- multiple transmission threads are started concurrently
- the process can be monitored with the blocking queue servlet
To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed.
The new index distribution model using a vertical DHT was implemented. An abstraction of this model
is implemented in the new dht package as interface. The freeworld network has now a configuration
of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free.
This modification has three main targets:
- enhance the DHT transmission speed
- with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times.
- the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before.
with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs.
BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e9a4182e6a
using a concurrent hash map for the template cache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5584 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
e8ae2599fd
* some refactoring/moves to consoleInterface
...
* added possibility to find maximum possible heap size
you can get it via getWin32MaxHeap.bat
this may cause high system load
moreover the found limit is no guarantee for stable startups since it depends on system configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5583 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
01b97ef3f8
added new cybertag-tracking feature that was inspired by itgrl
...
from the forum discussion in
http://forum.yacy-websuche.de/viewtopic.php?p=12612#p12612
The feature will provide two basic entities:
- you can integrate image links which point to your yacy installation anywhere in the web.
the image can be loaded with
<img src="http://<yourpeer>:<yourport>/cytag.png?icon=invisible&nick=<yournickname_or_community_id>&tag=<anything>">
This will place a invisible 1-pixel image. If you change the icon=invisible to icon=redpill, you will see a red pill
Use this, to track your activity in the web.
- you can view your tracks at
http://localhost:8080/Tracks.html
- There is a public api to your tracks at
http://localhost:8080/api/tracks_p.json
which needs authentication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5581 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
b19bc611b0
gc: better logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5578 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b1f9c00118
fix for bug in merge operator initialization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5577 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b57c9da1f8
- fixes to doc, ppt, xls parser: better title
...
- fixes to httpd server response header generation
- fixes to a server date computation bug
- new Button in indexControl to view content of url in ViewFile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5576 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
7936e58fe7
* sorry,previous version didn't compile
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5575 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
76cdc59789
* added some convertions to and from UTF-8
...
* this might fix problems on windows systems
(like http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1824 )
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5574 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
65a1de6c05
longer timeout for remote crawl queries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5573 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
072dd01ac8
more logging for RSS parser (to fix the remote crawl problem)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5572 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9d282d2c16
- renamed interactivesearch to yacyinteractive
...
- added a configuration option to set the pop up page in Config Appearance
- added a minimized header option to yacyinteractive
- fixed a bug in yacysearch: default values when no query is done
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5569 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
180fe81ef7
quick hack to copy new log configuration over old one
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5565 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d3e33fd6c1
removed strange retry logic from DHT transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5564 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
db510b5d52
more exception logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5561 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ef82cced01
removed default line 'P2P WEB SEARCH' if no line is given
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5553 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
78b7361937
fixed problem with logging
...
YOU MUST DELETE DATA/LOG TO MAKE THIS WORK! (sorry..)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5552 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
f136ddcfd4
*) this change is supposed to prevent the creation of temporary files by Apache Commons Fileupload library in cases where it is not necessary (as proposed by thq in http://forum.yacy-websuche.de/viewtopic.php?f=8&t=1806 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5546 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
94110df85a
moved logging partially to kelondro
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
024da2916b
refactoring of logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
83ce65707a
(almost) completed partition of classes in kelondro
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7ee494fde5
more refactoring of kelondro:
...
- seperated BLOB from table classes
- renamed 'coding' package to 'order'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
d4281b78da
dynamic memory scale
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5541 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bf93767ec6
refactoring of kelondro database classes
...
(to be continued)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fc27bf8c4c
refactoring of kelondro classes:
...
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fe77fc3d62
- added new property setting 'repositoryPath'
...
which can be used to map any path to http://localhost:8080/repository/
This can be used to do an intranet-indexing without the setting of
symbolic links - which does not work in Windows environment.
Now also Windows users can index their file system easily
using the intranet use case.
- fixed some problems with the identification of the alternative
path in DATA/HTDOCS in the httpd file server
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5538 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6cbca1e508
extended last fix, preventing more sorts
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5533 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f9672d3f97
applied fix for inefficient put method as recommended by celle, see
...
http://forum.yacy-websuche.de/viewtopic.php?p=12424#p12424
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5532 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
3484e55be4
- small fix for bookmarksDB
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5527 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
6dd52422ea
- added two dialogs to manage bookmark tags in YaCy-UI
...
- fixed renameTag() in bookmarksDB
- added /api/bookmarks/tags/addTag.xml
- added /api/bookmarks/tags/editTag.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5525 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3154926311
some better memory protection and OOM prevention in EcoFS
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5523 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
aaafe05c02
* revert debug change
...
* contains instead of startsWith, because there might me localizied strings
* decode punycode for every domainpart seperately (see http://forum.yacy-websuche.de/viewtopic.php?f=9&t=1749 )
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5516 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
5570fa817b
* remove & from openBrowser command (see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1728&hilit=#p12321 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5515 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
419469ac27
added more methods to control the vertical DHT (not yet active .. )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5514 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
4ef6b15eb8
limit -Xmx setting to 1999m on win32. bigger values would never work.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5513 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dedfc7df7f
removed distinction between DHT-in and DHT-out. This is necessary to make room for the new cell data structure, which cannot use this this distinction in the first place, but will enable the same meaning with different mechanisms (segments, later)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5511 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b74159feb8
preparations to integrate the new 'cell' index data structure
...
(this commit is just to move development files to my other computer, no functionality change so far)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5509 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d399444e49
added debug information to class loader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5508 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5080fc33bf
fix for http://forum.yacy-websuche.de/viewtopic.php?p=12247#p12247
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5506 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
335d6ce8fc
fix for class loading problem
...
see also http://forum.yacy-websuche.de/viewtopic.php?p=12153#p12153
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5505 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
78778df464
*) this should adjust the Dev/Main detection of the updater to the new version numbers (0.7x is Dev, if x != 0)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5504 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b423d0a036
moved all servlets from htroot/xml to htroot/api
...
the file server contains a patch that temporary matches all xml paths to api,
that means all interfaces still work. Please adopt all your interfaces to the new path.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5497 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
91af105373
last changes before release
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5493 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
7eade3f181
* fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1728
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5489 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d1bace5e4d
enhanced cleanup function
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5488 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
cb76d9e0e4
more synchronized in BLOBHeap (will not fix problem with Runtime-Error as reported in forum)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5487 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ff41da613e
removed exception printout during load of snippets
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5484 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
814a28775f
removed thread dump writing in case of invocation target exception in httpd (looked bad, not serious)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5483 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bed38a5f8c
fix for uncaught exception in RSSReader
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5482 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
05c235de32
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5481 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
7608944081
*) bugfix for REMOTE_HOST environment variable in CGI code (shows hostname of client instead of hostname of YaCy peer now)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5480 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
a6b29cf72c
reverted change of search event processing in SVN 5460. The new code did not work properly,
...
it gave remote search requests too less time
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5479 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9ef77d57f5
added an access control to the search interface using white/blacklists:
...
in the network configuration, you can configure a whiteliste and a blacklist
- blacklistet clients cannot search
- whitelistet client get never any search restrictions
- for all other clients: apply DoS search restrictions
Please see the example configuriation in yacy.network.freeworld.unit
by default, all clients from localhosts get whitlistet.
If you have your own YaCy network, please put all the IPs of your peers into the whitelist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5475 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
efe801173c
better dht-in cache flush. see also:
...
http://forum.yacy-websuche.de/viewtopic.php?p=11936#p11936
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5472 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
941ab78d9b
better termination for blocking threads
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5471 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
apfelmaennchen
3dc208fad0
bugfix: bookmarks can now handle folder names like /news and /newspaper without getting confused...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5470 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e948df68ac
longer timeout for queues during shutdown
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5469 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2b32248079
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1516&p=10545#p10545
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5468 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
c1330f5743
*) added environment variable DOCUMENT_ROOT
...
*) caught exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5466 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
f26b8fcb1b
*) comment mode is 'moderated' instead of 'activated' by default now (to avoid spam being visible)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5465 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b2a8c653ee
small fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5464 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f675d47f86
better protection against database failures
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5463 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4f45605f04
small update for timing in search result processing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5460 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9d119c6b61
migration of auto-update rules to new release strategy:
...
next stable will be 0.7, development releases are 0.*x, experimental will be if x = 1, 2, 3
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5458 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4d5b401f00
try to fix some performance problems with the internal index management:
...
- ensuring that ordered indexes stay ordered during remove
- no unnecessary ordering checks
- better test logic in crawl stacker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5457 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
borg-0300
a0605325bb
fixed a NullPointer Exception
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5452 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b2b7edae18
fixed interactive search
...
- added dummy servlet class, because otherwise the template engine is not triggered.
thats so because the yacy httpd works much faster as normal file server without a scan
of the served pages. Therefore each page with templates must now have a class file associated to it.
- fixed json output format of yacysearch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5449 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
2be119f0df
adjusted big peer to 28M links
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5448 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c6880ce28b
removed the permanent cache flush and replaced it with a periodic cache flush
...
The cache is now flushed only for one second every ten seconds. During a crawl the cache
fills up completely, and is only flushed if space is needed for more documents.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5446 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ef7fe537c5
fixed a cache-bug in cachedFileRA
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5445 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6c7e83909b
- refactoring of data access methods to be prepared for new cell data structure
...
- removed a memory overhead in collections which prevent OOM Exception in low memory configurations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5443 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
c8451614f3
fix for overflow
...
http://forum.yacy-websuche.de/viewtopic.php?p=11696#p11696
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5440 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c4c4c223b9
fixed a problem with attribute flags on RWI entries that prevented proper selection of index-of constraint
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5437 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
6072831235
no cr transmission for robinson peers
...
see also: http://forum.yacy-websuche.de/viewtopic.php?p=10290#p10290
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5436 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
afe98bc11c
*) added changes as proposed by Halborinda in http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1674
...
*) changed indention
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5434 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
07fc115e90
removed active profiling in kelondroRowSet
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5433 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
be4c458951
refactoring (implemented Iterable in kelondroRowCollection)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5432 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
bb5c2cd12e
*) ISINDEX parameters will not be put on commandline anymore to prevent possible security hazards (better safe than sorry). Parmeters will have to be read from QUERY_STRING in ISINDEX case too which does not seem to be uncommon behaviour for web servers: http://vms.pdv-systeme.de/users/martinv/cgi_basics/cgi_basics.html#Datenuebergabe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5431 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b6bba18c37
replaced the storing procedure for the index ram cache with a method that generates BLOBHeap-compatible dumps
...
this is a migration step to support a new method to store the web index, which will also based on the same data structure. made also a lot of refactoring for a better structuring of the BLOBHeap class.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5430 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
db1cfae3e7
*) cleaning up after myself
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5429 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
f547f9a78c
*) added CGI capabilities (run Perl scripts and other software via HTTP GET and POST)
...
*) set cgi.allow to true in yacy.conf to enable CGI (CGI is disabled by default)
*) edit cgi.suffixes in yacy.conf if necessary to use additional script types
ATTENTION: This is a rather experimental feature, not all environment variables are set yet.
Only enable CGI if you know what you are doing. Poorly implemented CGI scripts can put a system's integrity at risk!
Implementation of more environment variables and documentation due for the next days.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5428 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
bdc380cd84
* add lastModified to templateCache
...
-> no outdated files from cache anymore...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5427 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
025094675f
* remove empty directory
...
* add necessary dependency for pdfParser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5424 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
c5691180cb
* skip style-tags in HTML-files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5423 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3567c58b18
added another filed information for BLOBHeap dumps: the gaps
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5420 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
abdd4aa414
added a index dump for blob heaps:
...
this will increase the shutdown time for at most some seconds, but will speed up the start-up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5419 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8c3205b62e
fix for OOB Exception
...
see http://forum.yacy-websuche.de/viewtopic.php?p=11598#p11598
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5417 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
78c568331e
added test channel to /xml/feed.rss
...
can be obtained with
http://localhost:8080/xml/feed.rss?set=TEST
returns always a single feed entry with a fresh date
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5416 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e004da48d3
- added fast fingerprint computation for files (any). Will be used in new index dump method
...
- refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
2d2ce24011
* remove all encoding-stuff from proxy
...
encoding is handled by parsers or browser, proxy only passes through
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5410 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
73c8a0839c
* abort download, when proxy connection is closed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5409 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
bb935fdbb0
less organization overhead for DNS caching and prefetching
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5408 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
4907697cfa
* make fileuploads through proxy bigger than 65500 bytes possible
...
* remove gzip-encoding for files from cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5407 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fc8189f3fb
better self-healing of corrupted databases
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5406 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
963da8c3f9
* updated tm-extractors to new version 1.0
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5405 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e34ac22fbd
- added new monitoring servlet at
...
http://localhost:8080/PerformanceConcurrency_p.html
- used the new monitoring to do some fine-tuning of the indexing queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5402 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
449e697436
fix for null-seed in seedfile
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1653
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5401 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d376d81fc4
replaced busy thread control of crawl stacker by blocking threads
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5400 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
f29b48d9ff
patch for IndexOutOfBoundsException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5399 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
0881190b19
* Robots.txt: don't interpret Crawl-Delays for other robots
...
fixes: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1647
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5398 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
243e73f53b
removed unnecessary usage of kelondroBLOBTree
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5397 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8cb7170b75
- set status of kelondroTree, kelondroBLOBTree and kelondroFlexTable to deprecated
...
- removed initialization and/or usage of kelondroFlexTable (should meanwhile not be used any more)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5396 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7535fd7447
- refactoring of CrawlEntry and CrawlStacker
...
- introduced blocking queues in CrawlStacker to make it ready for concurrency
- added a second busy thread for the CrawlStacker
The CrawlStacker is multithreaded. It shall be transformed into a BlockingThread in another step.
The concurrency of the stacker will hopefully solve some problems with cases where DNS blocks.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5395 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
18513e2ee2
npe fix: http://forum.yacy-websuche.de/viewtopic.php?t=1646
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5393 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2802138787
- refactoring of CrawlStacker (to prepare it for new multi-Threading to remove DNS lookup bottleneck)
...
- fix of shallBeOwnWord target computation heuristic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5392 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
db6b3bf5a3
speed enhancement for integrated http server:
...
- tuning hacks in template engine
- bypassing the template engine if no servlet present
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5389 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
7cd08bd5fb
fix for NPE in BLOBCompressor
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5388 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
5b94498643
fine-tuning of cache usage from SVN 5386 and a bug fix for overflow in available() method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5387 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1779c3c507
- added a read cache to the RAFile interface to RandomAccessFile
...
- added a write buffer to BLOBHeap
- modified the BLOBBuffer (is now only to buffer non-compressed content)
- added content compression to the HTCache
The new read cache will decrease the start/initialization time of BLOB files,
like the HTCache, RobotsTxt and other BLOBHeap structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5386 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
e1acdb952c
fix for problem with userDB and bookmarksDB which was caused by changes in kelondroRA in SVN 5376
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5385 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
4a2dac659e
more speed hacks:
...
- modified and activated write buffer
- increased cache flush factor
- fixed a problem with deadlocking of indexing process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
47292e696a
more performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
759cef23dd
fix for bug in kelondroAbstractRA.readFully
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5378 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d39d420b39
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0b4808ba3d
added new interactive search feature:
...
- during the user types search queries, the local database is searched
- results are presented interactively
This was implemented using a new JSON result format for search results in YaCy
- added JSON as file format for servlets
- refactoring of current search servlets (xml and html)
- added JSON output format for search results
- added AJAX-based search page, that uses the yacysearch.json selrvlet to print results as a query is typed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5373 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
74a3d86114
fixed a error response that might present classified information
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5372 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c6525ab75f
fix for NPE in seed handling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5371 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
1951d30a62
addendum to last commit
...
handle words with length < 3 correctly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5369 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
325ba7bfb8
only query words with length > 2
...
this is not complete, yet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5368 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
low012
e423fa9846
*) added method to only get file names in directory listing which match a filter
...
*) only files which end with .black will be listed as blacklists
*) added a little bit of Javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5366 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
513179f404
changed interface to colletctionIndex and adopted all implementing classes:
...
do not return a result of a double-check when adding entries with addUnique
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5363 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
9d64693cfb
reverting again the changes to new concurrent chunkIterator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5362 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
45ad1c3dd5
- re-activated concurrent iterator for EcoFiles
...
- added javadoc for new concurrent intialization in kelondroBytesLongMap
- switched default value for commons storage to false
- version step
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5361 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
2e2120046f
speed enhancement for BLOBHeap opening process
...
using concurrency of FileIO and content processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5360 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fa26a8f25a
fix for deadlock-like behavior in balancer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5358 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1918a0173e
added more exception handling during crawling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5357 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
10f5ec1040
reverted last commit (more testing needed)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
5af8923f37
* distribute forgotten jar-file in parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5355 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
b0f2003792
fast database initialization and fast start.up of yacy:
...
- applied knowledge about concurrent files stream reading and index processing from the wikimedia reader
to the EcoTable initialization process: the file reader is now concurrent to the index generation
- changed also some initialization processes to avoid some pauses during initialization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0ca4bc7b79
- added reader and visualization for mediawiki-export files:
...
files exported from mediawiki using the xml schema according to
http://www.mediawiki.org/xml/export-0.3/
can be processed to be viewed in a YaCy servlet.
To acces such a file, place it into
DATA/HTCACHE/mediawiki/
i.e. the export from german wikipedia would be:
DATA/HTCACHE/mediawiki/wikipedia.de.xml
This file can then be accessed using the URL
http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy
if this is done the first time, an index file is created
(for this case: more than 4 million lines must be written, this takes about 15 minutes)
Then try the same url again.
- enhanced also the md5 computation speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5352 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
danielr
2e63f03ca5
copy&paste vergessen :/
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5351 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
danielr
cd8082b4e3
fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111#p11166
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5350 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
4f996a7651
fix for logparser pattern
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5349 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
d18c18971e
* dirlisting in UTF-8 encoding
...
* fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550&hilit=#p11108
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5348 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
867d0f2f56
removed some unnecessary pause delays
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5346 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
d49ffcd818
* files distributed by yacy are utf-8, files from repository use the system default charset
...
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1564#p11092
and http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5345 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
8c96bc2ac1
do not use proxy caching rules for crawling
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5344 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dba7ef5144
extended crawling constraints:
...
- removed never-used secondary crawl depth
- added a must-not-match filter that can be used to exclude urls from a crawl
- added stub for crawl tags which will be used to identify search results that had been produced from specific crawls
please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'.
Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
96174b2b56
more debugging / better result status logging for parser/caching errors
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5341 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
90e78b2cf6
* improve encoding detection of http service
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5337 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
ef66438662
- more space in error db to store larger error messages
...
- added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5329 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
674ad2d55b
different handling of error cases that occur during loading files with http or ftp:
...
methods throw exception instead of returning an error string
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5328 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
danielr
538359a0ff
simple fix to get DHT working again (maybe something more has to be done ;)
...
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1578
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5327 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
7e1fe05e3c
* added utf8-encoding to many getBytes-calls
...
* utf8 should work now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5323 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
fad044fb54
update to snippet marker:
...
- do not display indexed html (solves xss issues)
the single words are analyzed for already marked parts. this is needed to avoid false encoding of the marker (<b>) tags.
- improved speed for existing routine
heavy used regex pattern are precompiled now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5322 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
16723d0fa6
ask another peer if crawljob loading fails
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5321 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
1b18d4bcf3
enhancement to crawling and remote crawling:
...
- for redirector and remote crawling place crawling url on notice queue instead of direct enqueueing in crawler queue
- when a request to a remote crawl provider fails, remove the peer from the network to prevent that the url fetcher gets stuck another time again
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5320 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
3f746be5d4
- consolidation and refactoring of many DHT target - computing methods
...
- implemented vertical DHT acceptance ("my own DHT") to accept new targets
- added new target computation for global search: addresses vertical targets also
- enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries)
- better performance value computations for PPM selection in network configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5319 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
d014b2728a
Design-check, Extension and Refactoring of DHT target position computation:
...
- two different computations (but mathematical equivalent) of the DHT distance had been consolidated
- moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets
- added fast Long - to - hash computation
- high-precision target computation of gaps for new peers
- added new target computation for horizontal and vertical DHT targets (not yet in use)
- old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
dd27ce7216
added control logic to ECO tables that deletes ram copies of the tables if they get too large
...
table copies in ram are now abandoned if less than 20 MB ram is left
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5317 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
38e6ba5d00
forgot to re-rename commonsPath
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5316 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
22989d0d8a
added property index.storeCommons to switch commons storage on or off
...
with index.storeCommons=false all currently stored commons are deleted!
Default is now 'true', but in future full releases it will be switched to 'false'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5315 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
4b4ce75396
* http-server: submit charset from html metatags
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5314 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
69e695bd4b
* detect charset for directory index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5313 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
340ecd919d
* include non ascii characters in visible characters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5312 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
5cf0cbb47e
javadoc
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5311 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
lotus
8d07607d1d
update to resource observer:
...
- returns high/medium/low disk space
- pauses crawling on medium disk space
- disables index receive on low disk space
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5310 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
d0543a7c39
* fix the debug ant-target
...
* fix yacy-subdomain handling (http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1556 )
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5307 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
low012
baae3d91b1
*) fixed warning when compiling listManager
...
*) fixed display of values of information for which part of YaCy (crawler, proxy, ...) blacklist is activated for
*) replaced regular put() with putXML() in several cases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5305 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
a4fb76e93c
undo r5300 (not fixed as seen after longer run)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5303 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
low012
a99a629ed4
*) quick fix to prevent comments for blog entries which don't exist ( http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1554 )
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5302 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
low012
00e27e5050
*) fixed bug which made it possible to write files outside of the DATA/LIST directory when creating a new blacklist
...
*) a blacklist will only be created if no blacklist with same name exists (some refactoring has been necessary for this)
*) further minor fixes
*) to be continued...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5301 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
0f9c0bd0d5
fix for ConcurrentModificationException at de.anomic.index.indexContainerHeap$heapCacheIterator.next(indexContainerHeap.java:324)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5300 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
103ad2a437
some javadoc
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5299 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
b098522977
some very small advances to index utf-8 (not working yet), inserted also debugging code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5298 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
2f49666908
integrated the character decoding into the parser, removed old code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5297 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
49293c1358
fix for deadlock in new encoder :-(
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5296 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
0edec2b760
FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html.
...
The old process used a not really efficient way to detect html encoding strings in texts.
All calling methods had been adoped to call the new class in an enhanced way with less parameters.
Many classes in interfaces used a XML encoding only (instead of full html conversion from unicode to html); this behavior was not changed with this commit but should be controlled again since it points out possible XSS leaks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5295 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
958ec20cd0
removed specialized umlaute-handling in html parser. This has to be replaced by something that is able to transfer all possible html encodings into utf-8. Please see SVN 5293 for test cases.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5294 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
f1ori
2e53cbc66a
should compile now
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5292 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
f1ori
f3bf2e379e
should compile again
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5291 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
f1ori
dd8441f102
fix bug: data from plasmaParser is allready converted to UTF-8
...
After removing the restrictions in the code, YaCy should be able to index Unicode-charaters!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5290 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
6941bf42b1
performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
9b0c4b1063
redesign of parts of the new BLOB buffer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5287 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
1778fb420d
- added some performance tweaks to the new BLOB buffer
...
- removed the now superfluous HT storage thread
- reduced number of file decompression by shifting the compression moment to the future
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5286 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
9663e61449
added another class to handle BLOB writings to the new HTCACHE data storage:
...
- entries are buffered and written as stream with many entries at once (saves many IO accesses)
- entries are compressed with gzip: increases capacity of cache
- concurrency for stream-writing and compression: all writings to the cache are non-blocking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5284 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
382226da94
fix for bug introduced in SVN 5281: parameters were switched
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5282 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
f2fd043797
refactoring (moved duplicate code into methods)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5281 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
c612046e5e
r5278 java 1.5 compatible
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5280 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
f1ori
af71ec93bf
ops, forgot to import something
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5279 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
f1ori
9e65e9141c
* always use UTF-8 for encoding hashes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5278 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
826ca79735
refactoring and new architecture to store the files of the web cache:
...
- files are not stored any more as individual files
- a new database structure using BLOBHeap files stores many cache entries in common files
- all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods
this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
f095137238
- respecting httpdMaxBusySessions (refusing new connections if limit is hit)
...
- comments in serverBusyThread converted to JavaDoc
- better debug output for npe-case in diskUsage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5274 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
8ba33f104e
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5269 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
998861acfd
- some refactoring in BLOBHeap to enable more gap processing functions
...
- better gap merging in BLOBHeap
- shrinking of heap file if gap is at end of file when file is closed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5268 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
9d50bfd0b3
fix for npe: http://forum.yacy-websuche.de/viewtopic.php?p=10562
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5267 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
766cad6e93
enhancement in memory management of BLOB Heap files / merging of deleted entries
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5266 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
7860d5d632
fix for bug in seed list management (cause was bad class overloading, only visual effects!)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5265 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
ffed5fc415
fixed problem with lost peers in database
...
migrated seedDB from BLOBTree to BLOBHeap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5263 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
6fb865fbdc
- fix of bug in iterator in kelondroBLOBHeap which caused bug in crawl profile listing
...
- some refactoring of classes that use kelondroMap (Map instead of HashMap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5262 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
2d65887723
- fix for bug in new profile handling
...
- added a new feature in ymageChart (cannot be seen yet, just wait... will be used in profiling chart)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5261 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
ff68f394dd
fix for problem with balancer and lost crawl profiles:
...
if crawl profile ist lost, no robots.txt is loaded any more
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5258 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
fb8d9850ea
fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1462
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5248 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
0d1a2f6183
fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1461
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5247 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
9ac16f565b
- fixed several bugs in database management functions
...
- fixed a display bug for the performance graph
- fixed deadlock when initialization of awt happens simultanously
- removed some debugging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5245 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
820a03f9d6
- removed some warnings
...
- used fix in SVN 5233 for ysearch.java and search.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5237 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
fe2792e9ce
use accept-language header instead of user agent for language detection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5235 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
c8bdd965ec
- larger update time for status page
...
- balancer writes cause of robots.txt in log file for crawl delay
- removed log output for forced GC
- smaller RAM flush for RWI cache, should cause more usage of cache and faster crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5228 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
dda771db9d
- search result layout
...
- tray only for windows
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5222 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
ce4715e305
removed indexing of anchor links and tagging such words as part of urls (that was wrong)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5219 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
ce57de6cb3
- fixed re-setting of DHT Send/Receive settings
...
- small change to network grafics: smaller circles / more URLs necessary for full radius; more PPM necessary for full crawling circles
- fixed exclusion search ('-' did not work any more)
- fixed NPE bug when FTP loader wrote to the error-db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5218 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
31c31e54e4
new tray icon image for different icon sizes (e.g. linux)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5216 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
f1ori
9589dfe080
* removed trayicon popupmenu title
...
* added some menu items to trayicon
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5213 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
5a637f004d
localized tray
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5212 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
9d4f0325e1
- removed shutdown from search page (we have it in tray now!)
...
- fixed doubleclick action for tray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5211 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
214277dad6
- revert r5202
...
- cleanup
- installer checks for JRE 1.6 only
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5210 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
f1ori
7afa084207
* add nativ java trayicon, using reflections
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5209 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
apfelmaennchen
b97ff24b43
bookmarksDB / xbel.xml:
...
- added support for folder=/foldername
- it crashes if foldername ends with /
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5207 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
6e7d113eac
fix for wrong index initialization after network switch
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5203 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lotus
0a0cc3bf67
added missing classes to build target "run"
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5201 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
7b35d54c6c
fixed some problems with network switching (was not completely 'clean')
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5200 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
f0b42e5a98
fixed NPE
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5199 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
8e0de7f180
update to language statistic evaluation:
...
- the condenser does not abandon too small words any more before feeding the statistics
- for text indexing no more urls are used to feed the index (this was wrong, but in contrast the indexing of urls for media search is necessary)
- urls are not used any more to feed the statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5197 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
1198eeecc7
added language selection to search query:
...
- the language can be selected using a LANGUAGE:<language> element in the query line, i.e.:
java LANGUAGE:en
- the language can be selected with a post element in google-style syntax with the 'rl' element:
?lr=lang_en&query=java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5193 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago