danielr
94d3d3a86f
fixed Proxy (for GET, POST still does not work!)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4665 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
081ed1d3ec
HTTPLoader: reduced stackTraces
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4664 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
8b2efb6f8c
fixed garbage in HTCACHE
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4663 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
225f9fd429
various fixes
...
- shutdown behavior (killing of client sessions)
- EcoFS reading better
- another synchronization in balancer.size()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4662 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
6e36c156e8
added more logging to EcoFS
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4661 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
fb541f9162
HTTPC: default timeout half-hour
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4660 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
a94f6cdca4
HTTPC: allowed self-signed certs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4659 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
ab330cfdca
Network.html: removed ; from location
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4658 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
319144f4b2
fix for outofbounds-excception in EcoFS chunk iterator
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4657 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
a9cf6cf2f4
generalization of index container-heap class.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4654 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
f099061944
protection against bad dht-flush word selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4653 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
5e4fddc1e6
more logging for new EcoFS.ChunkIterator to find bug for
...
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1024&hilit=&p=6806#p6806
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4652 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
117ae78001
speed enhancement for reading of eco-table indexes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4647 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
7c149a4ee8
- undo less 'binary data found'
...
- removed duplicate stackTrace
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4643 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
96cce8bed9
reduced 'Binary data found' errors
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4642 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
2aef1414f5
removed test (in yacy.init)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4641 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
5c3c1fdf41
replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
daa04f5db9
added additional check in file handler to prevent that url attacks are hidden in url path encodings
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4637 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
783a4c9edb
strong speed enhancements for the index cache dump and restore:
...
storage and loading is 30 times faster! a cache of 100000 RWIs needed 180 seconds
to store and 100 seconds to restore; now the same cache needs only 6 seconds to store and
3 seconds to restore. The cache size has decreased now by 30% (95 MB instead of 150 MB).
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4634 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
442204a1c8
fix for concurrentModificationException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4633 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
d2f4926951
- more logging for balancer to get a hint where the problem is
...
- fix for new concurrency method in kelondroSplitTable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4631 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
20dadba426
- added a deadlock prevention function in cache flushing
...
- removed unused methods in collection index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4630 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
764a40e37d
speed enhancements for crawler and url retrieval (affects also search speed)
...
- concurrency for LURL-fetching: this can be done using a concurrent lookup into the separated url databases. Concurrency is possible because there is no IO during lookup. The more LURL-Tables are present, the better is the speedup. More CPUs will increase speed
- because a large number of LURL-lookups are made during crawling (for double-check), the LURL-Lookup speed enhancements enhances also crawling speed
- search speed also profits from LURL-lookup enhancement
- changed some flushing parameters in word index caching which should make better use of large word index caches and should speed up indexing
- removed flush chunksize parameter, because this was only useful for IO path enhancement feature which was removed some weeks ago to prevent blocking and deadlocks during search requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4628 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
3ce3a4a3a1
added stub for new index container heap data structure (purpose: index folding)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4627 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
2c34038912
addition/correction to last commit: usage of concurrent-classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4626 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
b2150057d2
removed unnecessary cleanup method
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4625 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lulabad
c4c0d54b22
* added regex extended blacklistengine
...
* removed my own engines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4618 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
368593e449
enhanced the concurrency handling of indexing process (better queue size control, better data concept, better shutdown behavior)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4617 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
be58135b3e
possible fix for deadlock in search execution
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4612 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
0241d070bc
added concurrency to indexing process:
...
- the methods {parsing, semantic analysis (condensing), structure analysis (web structure)} in the serialized indexing path had been made concurrent.
- four BlockingQueues handle concurrency and hand-over of the indexing objects, the last object in the queue is stored into a blockingQueue of maximum size 1 to serialize the process for storage (which uses IO and therefore here should not be deserialized)
- a concurrency of (CPUs + 1) is default. Single-CPU users will profil from the change because large files cannot block the indexing process any more.
- removed the secondary indexing thread, which is superfluous now. Concurrency is default for all users.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4609 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
lulabad
9fb5d661f2
added my Blacklistengines
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4608 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
bca87f1e38
- refactoring of serverThreads: renaming to distinguish busy-threads and blocking-threads
...
- added blockingThreads which are threads that are not driven by pause times but by BlockingQueue lookup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4606 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
968c775025
- preparation of parsing/indexing queue for concurrent execution
...
- remote crawl receipts are now transmitted concurrently in separate threads (makes remove crawls much faster!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4605 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
9b0e20fb06
next refactoring step in document indexing to prepare concurrency environment for document parsing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4604 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
7f9f639d20
- refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
...
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
d6050b9ffb
- separated the LURL data storage and Crawl result stack for process supervision.
...
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
8d6a13bc07
refactoring of parsing-condensing-indexing process:
...
- separated parts
- removed storagePeer function
next step will be parallelization of processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4600 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
d3b06913ec
protection against seed-db failure during enumeration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4598 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
5aa96dbc36
fix for shutdown configuration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4596 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
93633abed8
- removed some debugging code from search process - should speed up now
...
- added some profiling code to search event - more time details in PerformanceSearch_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4594 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
fba46c51d7
fixed non-termination bug in qsort
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4593 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
541b817502
refactoring of switchboard queueing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
fc94fbe224
another improvement to the collection sorting
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4589 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
11270d450e
better quicksort-pivot computation: 30% faster (measured with test program)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4588 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
3e44293f07
- fixed a problem with thread pools in row collection
...
- added a line-viewing feature in threaddump
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4587 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
e43051b125
- fixed Threaddump output (html-escaped ie. <init>)
...
- in EcoFS converted comments to javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4586 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
433ff855f7
- fixed another concurrency problem in collection sorting
...
- fixed a typing problem that was introduced in svn 4579 and caused the crawl monitor to fail
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4585 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
19286fa2d1
tried to fix seed2.old.db-problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4584 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
orbiter
f3996e63b8
tried to fix more deadlocks:
...
- changed connection modes in ftpc
- replaced sort tread pool in row collections by new one using util.concurrent. the old pool had caused blockings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4582 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago
danielr
7008a218b3
avoid ConcurrentModificationException in plasmaCrawlerQueues
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4579 6c8d7289-2bf4-0310-a012-ef5d649a1542
17 years ago