orbiter
d1973bae2a
code cleanup: removed unused code and unused methods
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6559 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a3b8b7b5c5
some redesign of the main menu structure:
...
- moved all index generation servlets to it's own main menu item, including proxy indexing
- removed external index import because this operation is not recommended any more. Joining an index can simply be done by moving the index files from one peer to the other peer; they will be merged automatically
- fix to prevent endless loops when disconnecting http sessions
- fix to prevent application of bad blacklist entries that can cause a 'Dangling meta character' exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6558 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
ab3cf60dbe
fix for npe
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6557 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
7f20963b41
add-on to last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6556 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
eeca2ded92
fix for http://forum.yacy-websuche.de/viewtopic.php?p=18500#p18500
...
- catch uncatched OOM
- less wasting of memory
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6555 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
32972139af
added nice configuration for the resource observer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6554 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
bb2e03761c
- fix for deadlock with 100% CPU during search
...
- fix for failure of ranking because of a ConcurrentModificationException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6553 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3f771d2a16
fix for rss parser: be lazy when rss is not well-formed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6552 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
dff4f95c78
some patches to get the torrent parser working
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6551 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
574f49903e
Prevent blob merge from possibly losing the last container
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6549 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
83d05e9176
added sixcoolers hack with some modifications:
...
http://forum.yacy-websuche.de/viewtopic.php?p=15004#p15004
old index blobs where deletions have been made because of DHT transmission should be melted down to new blobs. This uses sixcoolers methods from the forum thread but modifies the process in such a way that the blobs are not merged with themselves but simply rewritten to smaller files.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6548 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fbd24c2d84
integrated the torrent parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6547 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
bd32f8b8cb
added a torrent metadata file parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6546 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
610e3ffffb
Added new classes for the implementation of concurrent greedy algorithms.
...
These classes can be used to produce an abstract worker process that can be used for common problems in artificial intelligence, such as game playing and problem solving. These classes will be used as abstraction layer for a new search process in YaCy. These classes had been created while searching for an abstraction of the current search process. It turned out that the abstraction of the YaCy search process is also an abstraction for problems in artificial intelligence and therefore the classes had been designed in such a way that it covers not only the YaCy-specific problem but also the more generic problems in ai. To test the classes they had been used in a ConnectFour implementation (game playing).
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6545 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d0b7bf9ca2
added a decoder class for Bencoding
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6544 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
028657f019
*) adding more SVN properties
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6542 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
82d740050f
*) adding more SVN properties
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6541 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
e04cb8cef0
*) adding more SVN properties
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6540 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
dcb1096fb0
*) adding more SVN properties
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6539 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
7d610e0063
*) minor changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6538 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
82198acc06
*) minor changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6537 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
b75547fc60
*) minor changes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6536 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
9bee0ac780
more logging for DHTrule
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6533 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
57d729e377
fix for negative numbers in network statistic
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6532 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4ac4fe952c
patch for npe in bookmarks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6530 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
c14233a933
fix for a OOM in MapView that can cause unavailability of
...
- seed list
- bookmarks
during very low memory configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6529 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d548bd41ad
fix for a npe during search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6528 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
37245430c3
fix for NPE during DHT RWI selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6527 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
959b38b61b
fix for memory tracker
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6526 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a37878b7d5
url parser regex performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6524 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b527d2ebfa
fix for media search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6522 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
362b7a929b
added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6521 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
8281e29963
- more configuration for profiling graph (number of events)
...
- more logging for a shutdown: print reason and accessing IP into log
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6520 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
f1ori
5f0f6b71b4
* revert last commit, something is more broken than before
...
* UTC timestamps and lastseen-properteries still needs some debugging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6519 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
f1ori
8c8b642eba
* fix timezone problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6518 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
713cb26a27
update for memory observer algorithm
...
disable dht if memory is less than treshold
after 4 times, maximum 11 minutes between each detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6517 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4782d2c438
fix for search bug that appeared when looking at page 3 of results or further
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6515 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
29fde9ed49
better control of ranking order in sort stack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6514 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
93caa38d55
fix for bug in SortStack (did not appear to shrink according to required size) - caused bad and unsufficient search results
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6513 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
66923ebc6c
- modified method in RequestHeader that delivers the host name of requester: no more reverse domain lookup (may have killed interface performance in some cases)
...
- added logging output for shutdown servlet: show ip of requester of the shutdown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6512 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e34e63a039
preset of proper HashMap dimensions: should prevent re-hashing and increase performance
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6511 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4a5100789f
replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6510 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
f4946eaf27
- better thread dump
...
- suppressed one server exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6509 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
9743b70d1c
disabled keep-alive of server, not really needed for speed but a cause for much trouble and memory occupancy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6508 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
491ba6a1ba
- some refactoring in workflow
...
- some refactoring in search process
- fixed image search for json and rss output
- search navigation on bottom of search result page in cases where there are more than 6 results on page
- fixes for number of displayed documents
- disabled pseudostemming
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6504 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
969123385b
added json and rss output for image search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6503 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d183f8d980
refactoring (moved code from ContentTransformer to TemplateEngine)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6498 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
23aef43786
- better synchronization in SortStack
...
- better ThreadGroup organization
- less worker threads for media search (64 was too much...)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6497 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
7b1f5b0430
- better media search ranking
...
- better concurrency with enhanced synchronization in sort stack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6496 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4df88a4e7a
- fixes for missing or bad hashCode computation
...
- fixes for bad equals() methods that had not been used by hash maps and therefore some classes did not work as objects in hash maps.
- this may also affect some cases where double-checks should have been, but did not work.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6495 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
dbdf2570ba
added comparator and more fixes for SortStack/SortStore
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6494 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d2938c44a1
- added bmp parser to the document parsers
...
- image parser that implement the document parser interface return itself in the list of images of the document which should cause that the parsed images contribute to the image search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6493 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1dff620181
Better implementation of SortStack and SortStore and adoptions in all using classes to implement the necessary Comparable interface and hash code computation.
...
The better SortStack performance affects crawling and image search speed and quality.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6492 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fe41a84330
some enhancements in web caching: avoid double loading of response metadata and/or content
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6491 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
06d0dcde20
more enhancements to image search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6490 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4c6312d103
enhanced image search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6489 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2d8f3ee301
some performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6488 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
94b2a664f3
- use a static DiskFileItemFactory (one instantiation is enough)
...
- use more memory for the DiskFileItemFactory to avoid IO when POST commands come
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6485 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fd0658ce7c
avoid forced execution of InetAddress.getLocalHost() at startup, because that hangs at some strangely declared linux configurations. The Domains.localHostAddresses object is first instantiated with a more simple logic and enriched with more host addresses using a concurrent thread that will not block a startup process.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6482 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
013f337d3f
- avoid unnecessary host name lookups for localhost
...
- avoid unnecessary reverse domain name lookups for remote access
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6481 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
20c5d78a5c
fix for a ConcurrentModificationException
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6478 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5afd9f7a91
fix for crlf writing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6477 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
7144d2df6e
added crawlReceipt servlet as individual class to examine OOM problem as documented in
...
http://forum.yacy-websuche.de/viewtopic.php?p=18120#p18120
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6476 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2d3c98b742
less computation within synchronized blocks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6475 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1a146b0d73
added a patch to ignore bad mime-ignore patterns
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6474 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
29fe436e36
- fixed post-ranking including prefer mask
...
- enhanced a core database access method / less wasted ram
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6473 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5399d1e2bc
refactoring (reason: get more abstraction to use the blacklist class; for integration in other servlets)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6471 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a97fdb4566
catch for NPE in image parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6470 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
534182559c
removed concurrency hacks from SplitTable because it showed deadlock-like situation.
...
see thread dump at http://forum.yacy-websuche.de/viewtopic.php?p=18081#p18081
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6468 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1fa0ac26e9
better protection against NPEs during search/ranking
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6467 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4c99d4683d
possible fix for lost crawl profile handles: clean-up job did wrong measurement to see if crawl is still running.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6465 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
cd6745b292
accept rss feeds without channel descriptions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6464 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
08f1cbb125
another update to the pdf parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6463 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
54c54fb144
get a handle for grep: 'StackTrace'
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6462 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
605e896d6c
more details for exception catching when parsing pdfs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6461 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
18b21eaffe
small fixes to search default values and server logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6460 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
6edc168cfe
option to disable dht by memory limit:
...
memory.acceptDHT in kbytes
not yet pre-enabled, will clear on every startup
please review since this could break dht in freeworld
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6459 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4431b9767e
added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6458 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e3025ee691
- new icon for OAI-PMH loading action
...
- added many stack trace outputs for exceptions in crawl profile handler to find the 'missing profile handle' bug
- catched one more timeout exception in httpd file loader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6457 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
f0b8db93f0
- more abstraction of serverCore thread access
...
- no more keep-alive when number of connections exceeds 1/2 of the allowed number of connection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6456 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
19f31bb043
- moved OAI-PMH source list file from SETTINGS to DICTIONARIES/harvesting
...
- added convenience method for loading of files from the web in LoaderDispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6455 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2889b9426e
missing code for last commit
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6454 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b6a8887ff5
better handling of running sessions without explicit hashtable
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6453 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1dc7ea986a
added a dynamic keep-alive time-out for http server sessions:
...
if there are many concurrent server sessions, the timout is decreased.
This should avoid a situation where the clean-up thread is too
late to stop running http sessions that should be terminated
before the maximum number of server sessions is reached.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6452 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
e77c906673
*) minor changes mainly in comments
...
*) added svn:keyword settings for several files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6451 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
f1740edbf8
*) added skript to change memory settings, password and port (experimental, don't blame me if it messes up your configuration)
...
*) minor change in Digest class, added option in main method, might not be optimal yet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6450 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
11f7da06ed
- fixes to csv parser
...
- automatic OAI-PMH import by just clicking on one link from the provided resource list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6449 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
9b6762ec2e
- added a csv "comma separated values" parser to parse OAI-PMH sources from
...
http://roar.eprints.org/index.php?action=csv
- integrated the csv parser into the crawlers parser list
- added an extension to the OAI-PMH import function to download and show the roar csv file using the csv parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6448 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
176e334aa4
fixes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6446 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2fa6bf440b
workflow update to OAI-PMH importer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6445 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b0b7a4f9a5
- added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
...
- added monitoring for retrieved records
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6444 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
350d13e153
very first working version of oai-pmh importer: if given the right url, the importer can read and index listRecord xml files and calculate the right resumptionURL which is then given as next default start point for the importer url input.
...
no automatic harvesting by now, this will be done later
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6443 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
58616d99e4
patch for yacy disk usage detection on lvm host
...
by Michael S.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6442 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
79251e6f60
configurable disk space hardlimit for dht
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6441 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a0e891c63d
- some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
...
- extended the OAI-PMH test applet and integrated it into the menu. Does still not import OAI-PMH records, but shows that it is able to read and parse this data
- some redesign in ZURL storage: refactoring of access methods, better concurrency, less synchronization
- added a limitation to the LURL metadata database table cache to 20 million entries: this cache was until now not limited and only limited by the available RAM which may have caused a memory-leak-like behavior.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6440 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4240785f20
added anti-alias function for line drawing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6438 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
30f108f97d
added stub of oai-pmh importer (not working yet)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6437 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
77c99e500f
added more control over memory allocation
...
should avoid some of the OOMs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6436 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
52470d0de4
- fix for xls parser
...
- fix for image parser
- temporary integration of images as document types in the crawler and indexer for testing of the image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6435 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5e8038ac4d
- refactoring of blacklists
...
- refactoring of event origin encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6434 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
26fafd85a5
- more refactoring
...
- fixed problem with parsers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6433 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3528b970d6
- refactoring
...
- added new experimental (not-yet-working) image parser
- added new test image
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a8ce192f63
- shifted main classes to new package net.yacy
...
- fixed some bugs in last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6427 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b79f4f062f
refactoring of yacy documents and parsers: they depend now only on the kelondro classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
0fd9540866
Configuration of HTTPDProxyHandler logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6425 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
cee7a05ff2
- de-serialized the pdf parser
...
- added fail callback for file indexer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6415 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
9db928ce53
replaced fontbox 0.7.3 with fontbox 0.8.0
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6414 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
c2272785c7
- fix for xlsx and pptx parsing
...
- less exception logging for swf parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6413 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
c864901087
- moved httpd.mime to defaults path
...
- some documentation fixes
- adopted a default setting for the search window: moves css setting to base.css
- some enhancements for the DocumentIndex class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6410 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
8829ec5f18
*) made sure that is replaced with a space and not just deleted in CharacterCoding.java
...
*) added annotations and made minor changes to serverObjects.java
*) set subversion properties for several files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6409 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
6c347a37eb
more options for DocumentIndex
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6408 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
6192205533
more final modifier
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6407 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
0f6b011e1a
fix for new index location and better way to use own classes by reflection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6406 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
7a3bbd950f
:-(
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6405 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b953f04f90
one more reflection fix
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6404 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
77d6604856
fix for npe, see http://forum.yacy-websuche.de/viewtopic.php?p=17727#p17727
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6403 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2a7fe35f92
performance tuning using more final modifiers in the kelondro core
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6402 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
cb4de9ceee
fixed a bug in table iterator (did not recognize elements in write buffer)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6401 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e7f18ba24b
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6399 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ce8dc575ca
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
bea3b99aff
moved table and util classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
bd876eb4b7
moved io classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6396 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
c0e0e1f422
moved blob classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6395 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1e4f8b56ed
accumulated classes from different packages into the new rwi package
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6394 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
194da25a2f
moved kelondro index
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6393 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4446acc8cd
moved kelondro order
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6392 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
f677d534b1
start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
...
- moved here the logging classes as part of the new net.yacy.kelondro package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6391 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ea473e32b8
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6390 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
735e2737e3
* added index segments
...
This is a major change in the organization of indexes.
Please consider a back-up of your data before you run this update.
All existing index files will be moved and renamed to a new position.
With this change, it will be possible to maintain different indexes for different purposes and it will be possible to have a distinction between DHT-in and DHT-out specific indexes. Tenants may also have their own index, and it may be possible to have histories and back-ups of indexes. This is just the beginning, many servlets must be adopted after this change, but all functions that had been there should still work.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6389 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
09de5da74a
once again a performance hack
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6388 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2f6d88403e
uä
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6387 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d2615ea5a8
increased memory for scraper buffer to enhance parsing speed
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6386 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4bbbb74ec4
removed not necessary synchronization
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6385 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
67e5464cc2
Fix for SVN6380: x[] Arrays are unsuitable Keys for Maps without using a proper Comparator.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6384 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
aeab8c7917
Prevent failed DHT attemps from overwriting newer peer info
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6382 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
9324b5b6c5
Enhancements to DHT
...
- speed up deletion of containers when selscted from whole index
- correctly eliminate all references to unavailable URLs, not just the first encountered
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6381 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
e49e2d75fe
Limit the time Transmission.Chunks stay in the transmissionCloud by using a Map that iterates entires in insertion order.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6380 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
92db7c5d07
increased timeout for index retrieval
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6379 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
386b9f35f6
activated resource observer for windows 7
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6378 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
6e0dc39a7d
- some fixes to prevent blocking situations
...
- better logging for the crawler
- better default values for the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6377 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
51f2bbf04b
possible fix for problem in http://forum.yacy-websuche.de/viewtopic.php?p=17655#p17655
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6376 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
f8371707e5
- possibly better termination for SplitTable
...
- better abstraction in DidYouMean
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6375 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
87780f2562
produce did-you-mean also for queries with more than one word
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6374 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
04a548a1e3
- temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class
...
- fixes for numerous other problems
- removed dead code
- resdesign of the strings-method, which produces now less memory overhead and may help to prevent OOMs
- another fix for the deadlock problem in SplitTable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6373 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ea427df944
fixed a worst case situation of the condenser which may cause a temporary full CPU load because of a bad data structure usage
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6372 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3e38035389
fix for interrupted thread during has() property check
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6370 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5bd1c1d205
just added some comments that had been produced to learn about OAI-PMH
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6369 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
6aa474f529
- better logging for web cache access and fail reasons
...
- better Exception handling for web cache access
- distinction between access of web cache for proxy and crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6367 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3671c37989
added experimental oai-pmh reader and integrated it with the existing dublin core parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6366 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
58a00205d5
re-activated the emergency close when too many server connections exist
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6364 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
c57d2070e6
more logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6363 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a995b95367
tried a fix for the httpd access bug (too many unclosed sessions)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6362 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e1fba41cad
better logging
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6361 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2275f885a8
possible fix for concurrency problem
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6360 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
a6a3090c3d
*) blacklist cleaner supports usage of regular expressions now
...
*) refacored BlacklistCleaner_p.java for better readability
*) moved check of validity of patterns to the Balcklist implementation since patterns might be valid in one implementation, but not in another
*) added method to check validity to Blacklist interface
*) fixed some minor issues like typos or wrong whitespaces
*) set subversion properties for a whole bunch of files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6359 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5a93807781
improved web cache speed:
...
- removed one computation out of a synchronization
- removed one not necessary has() call
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6358 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2e8b2867ff
double performance of store method because it avoids one 'has'
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6357 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
afda5b1adc
new join method for indexes (not yet used)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6356 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
65b66c2c18
better handling of array files of length 0
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6355 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1957b5797a
fix for seed generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6354 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
432154f725
new strategy for concurrent database index key retrieval
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6353 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a11cd9f80f
- removed reverse name lookup for http access logging (grr..)
...
- removed a synchronization in seed info string generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6351 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2e6bdce086
- added more logging to balancer
...
- changed balancer logic slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6350 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1171a72006
fix for deadlock as seen in http://forum.yacy-websuche.de/viewtopic.php?p=17521#p17521
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6343 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
031e6eefbd
some updates to dublin core, metadata browsing, file indexing and parser stability
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6342 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
62a7341c4d
Fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2204
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6341 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
f65bfaa9af
*) Removed base tag from errror page. This has been added by myself a long time ago as a workaround for some weird behavior of my router, but as it turns out, it does more bad than good in general: If HTTPS is used for communication with YaCy, entering a wrong passwort led to an errror page with a form which would send username and password unencrypted with the user possibly being unaware of this.
...
*) changed some comments, added some annotations, added SVN properties here and there
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6340 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e4797ebcde
fix for http://forum.yacy-websuche.de/viewtopic.php?p=17509#p17509
...
corrupted files are ignored
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6339 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
efa7fb34f0
better oom-awareness of miss-cache in cache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6338 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3e9dcfc204
fix for http://forum.yacy-websuche.de/viewtopic.php?p=17504#p17504
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6337 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
c3a4aee255
some redesign with a possible fix for the ReferenceContainerCache.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6336 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
aca8a78eb8
fix for shutdown of DocumentIndex objects
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6333 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
23ab6fbca4
- navigation appear at correct position when opengeodb-results are also presented after a search
...
- show an about box if about.headline and about.body is set
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6332 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4db34eea73
fix for OOM problem in kelondro Cache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6331 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
8ea1d7ab59
fix for wrong assert condition in search abstract generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6330 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fbd77bd77c
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6328 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
54c7cbf1d9
- fast result for local search in case that less than 10 hits exists
...
- small change in display of RAM in profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6326 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
28d4b921b6
different approach for file search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6325 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
f99f86c5c5
added concurrency to file indexing class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6324 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
902d16cf6c
fixes to parser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6323 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4a1c852435
fix in usage of RAM copy for Table objects and some cosmetics in asserts.
...
This bug affected Tables in case that a removeOne() was called and a RAM copy of the table was active. It may happen for peer owners with a lot of RAM assigned to YaCy. The bug appeared especially during crawling when the balancer tried to get new entries from the crawl queue.
This bug may help to solve report at
http://forum.yacy-websuche.de/viewtopic.php?p=17417#p17417
and will be tracked there
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6322 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
lotus
dce450e2e0
possible fix for "hung" doc-documents
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6320 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e627f75415
one more fix to badwords and stopwords
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6316 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
721b88efbd
- fixed a problem loading blacklists with new yacycore.jar
...
- fixed badwords and stopwords initialization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6315 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
80d5005044
fixed seed upload methods - replaced reflection with direct instantiation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6314 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
68465c37af
added a convenience class to add files into a YaCy index
...
to make this possible, the yacyURL must be able to process file:// urls, which has also been implemented
testing of the new class resulted in some bugfixes in other classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6313 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2e41e10ffd
- updates to yacyVersion parser (remove old targets)
...
- added javadoc target to built script (does not work yet without errors)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6312 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
27d00285aa
- added a new file reader cache that may serve as full-file-copy of blob database files. This is not yet used
...
- removed class FileWriter and replaced all usage of that class with CachedFileWriter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6309 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fd6b9cb7dc
refactoring of IO access classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6308 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d64569aa39
reuturn only recommendations of words that have a greater count than the original word
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6307 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
604c37927f
used comparator for did-you-mean that uses index sizes for comparisment, but:
...
- limit comparisment to only the first 10 elements that had been sorted before without IO
- added a size cache to index computation because the size is computed at least twice in set comparator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6306 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a58d9cae7d
- show location name in geolocalization search result
...
- added link from location icon to openstreetmap browser with coordinates
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6305 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
573d03c7d7
added configuration to enable ram table copy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6304 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3be54e1891
fix to rule when to use a ram table copy
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6302 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
700218846c
disabled or removed sleep calls
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6301 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
342c5d0fd4
fixed city name detection: finds now also substrings of city names
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6300 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
18aa0609ca
fix for caching of word hash computation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6299 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
a10a6cce45
patch for http://forum.yacy-websuche.de/viewtopic.php?p=17289#p17289
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6298 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
53bbdfd19a
*) setting SVN keywords
...
*) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6297 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
25f6145934
*) preventing null pointer exception in case empty search word or only one character is enterd or all search words are removed by filters
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6296 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
248f3fd9b5
*) cleaned up code for better readability
...
*) added a few copyright notices
*) removed redundancy in constructors of ListToken
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6295 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
eaddf2d464
- corrected layout of map preview
...
- added caption to maps containing latitude and longitude information
- prevented that maps occur on second search page
- added location names to did-you-mean
- some refactoring of did-you-mean
- added equal and compareTo test to Coordinates class to make that work in set
- fixed utf-8 support for library files
- fixed a bug in images search icon view caption
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6294 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
4b83875abd
Small fixes for the heapCacheIterator in ReferenceContainerCache:
...
- Start the iteration at startWordHash
- When used with rotation, let the iteration stop when the cache is empty
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6293 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fd668f531b
fixed map layout
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6292 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
2740d9dd79
added integration of osm maps for search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6291 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
af3a696fc4
added a fast-fail concept in search processes. The search now has better control if all the remote searches may bring any result. If all processes are finished, then all search tasks fail fast.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6290 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ce972ff4ef
update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6288 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
44579fa06d
- fixed a problem loading images through yacy's document loader,
...
this denied non-parseable documents which excluded all images
- fixed url of osm tile server
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6287 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
67eddaec4b
changed way to integrate dictionary files:
...
the must be downloaded manually by the user and placed in DATA/DICTIONARIES/source
for each externally imported dictionary file there will be a translator that converts the input file once
into a YaCy-internat data format.
Files that will be provided together with yacy releases may still be placed in <root>/dictionaries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6286 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d656a94f55
fix for bad paths in dictionary processing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6285 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
3b9aaf9e9f
- inserted new library tests inside DidYouMean
...
- some redesign of DidYouMean that was necessary to follow
a special rule how a library should be used:
- the library provides words that start or end with a test
word which may be possibly also an empty set of words
- all words that the DidYouMean produced with the four
production rules are used to generate a set of
library-completed words
- if this process results in any words from the library,
only library-genrated words are taken
- if the is no library-generated word at all, take the
artifial generated word
- all words that result from these rules are tested against
the index
- the result is ordered using a lightweight comparator that
prefers short words
- a not-so-much-io test against the index is beeing prepared
next
- insered the library initialization into the switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6284 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
8c35ffe34c
fixes to the dymlib
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6283 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
bfa273bcc1
added a library provider which holds libraries in static objects,
...
which can be used by any other classes to support their functions.
libraries are designed in such a way that users can create and
insert their own library files, but can also be imported from
other sources. As an example the "Korpusbasierte Wortgrundformliste
DeReWo des Institut für Deutsche Sprache" from
http://www.ids-mannheim.de has been integrated. This dictionary
is licensed to be used for all non-profit purposes. In case that
YaCy is used for commercial uses, this library must be removed.
The new library provilder reads the original source and translates
it into a simple word list to be used for the did-you-mean library
provider. More libraries may be provided in the future using
a download-servlet which puts files from the internet into the
<application-root>/dictionaries/ path.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6282 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1762a7bcd6
- moved DidYouMean to the data package
...
- added a DidYouMeanLibrary class that shall support the did you mean function with additional word lists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6281 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
bf8ed00e9e
removed debugging code
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6280 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ead48c4b25
fix for preparation of search result pages with offset > 10:
...
- less pages are fetched in advance
- just-in-time fetch of next required pages
- fix for missing hand-over of offset to fetch threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6279 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
39a311d608
better care to do not loose the merge/dump thread
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6278 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
10d3e856b5
better concurrency, less blocking & performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6277 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
1a9cfd8718
some performance hacks (CPU only, not IO)
...
this will cause better computation speed for single- and multi-core;
there are enhancements that will speed up old and slow machines as well
as multi-core CPUs. Indexing of surrogates has been speed up
from 4000 PPM to over 20000 PPM on a simple dual core office computer.
Since the enhancements are mostly in core routines, the hack should also
speed up search performance.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6276 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
92407009b2
cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6275 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
0ba1beaf56
separated rwi constraint evaluation from rwi ranking and added concurrency
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6274 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
ce7924d712
better concurrency for rwi entry parsing during search processing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6273 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
b0637600d5
enhanced url constraint computation: better position of constraint check during retrieval process
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6272 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
61748285c3
more refactoring of search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6270 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
323a8e733d
removed unused classes
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6269 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
72e5407115
refactoring of snippet cache
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6268 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
0e471ba33b
- fixed a bug in fast digest computation
...
- added a open-on-demand hack to heap files: when a heap file is
opened the first time, it is first scanned to get a key index
and then it is closed again. This will free up file pointers
in cases where a really large number of blob files are opened
upon initialization of ArrayStack objects. This should solve
also a problem reported in
http://forum.yacy-websuche.de/viewtopic.php?p=17191#p17191
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6267 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
low012
93b2622503
*) repaired and added IM online status indicators
...
*) added some missing SVN properties
*) removed unnecessary comment, added missing copyright notice
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6266 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
e7736d9c8d
more refactoring: made all variables in SearchEvent private
...
to prepare splitting of the class into two parts: local and remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6265 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
4b92d0b9b7
patch for possible problems with normalization of '/' in urls. This applies in rare cases when '/' appear in post-properties
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6264 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d8ca6e6bf1
more refactoring for search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6263 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
fe4a4e3f6b
added missing class
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6261 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
72ac5bd80f
refactoring of search process.
...
this is the beginning of some architecture changes that will hopefully bring some more stability, speed and transparency to the search process.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6260 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
c4d0e22a77
Further speed upof concurrent DHT-receive
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6259 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
hermens
2fbc0696bf
Fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2334
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6258 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
f1ori
d515bc11e2
added ooxmlparser
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6256 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
d9744b1b5d
replaced old caching strategy control class with lightweight simplearc
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6254 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
8e56c2ace6
fix for fixes from this afternoon
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6253 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
cf739edc2e
fix for possible deadlock, see
...
http://forum.yacy-websuche.de/viewtopic.php?p=17017#p17017
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6252 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
6354b5e447
removed possible deadlock, see
...
http://forum.yacy-websuche.de/viewtopic.php?p=17017#p17017
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6251 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
5cc17ccf8a
a better caching with less overhead and more appropriate
...
synchronisation use in more than 10 different data objects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6250 6c8d7289-2bf4-0310-a012-ef5d649a1542
15 years ago
orbiter
92edd24e70
fixed problem with switching of networks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6247 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
0575f12838
fix for deadlock
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6246 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
fbfdaf063d
- patch to omit IndexOutOfBoundsException when a b64-encoded key appears not to be well-formed. In that case the key is still accepted but rated higher than other regular keys to create a virtual ordering between well-formed and ill-formed keys
...
- check routine at the beginning of the import of table keys that check that all imported keys are well-formed. All records that have a ill-formed key are deleted. This is a hack and is not tested since I don't have bad data here to test with. If the effect is seen in the wild, please report in the forum.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6245 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c0e17de2fb
- fixes for some problems with the new crawling/caching strategies
...
- speed enhancements for the cache-only cache policy by using special no-delay rules in the balancer
- fixed some deadlock- and 100% CPU problems in the balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6243 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
634a01a9a4
replaced wget-requests with caching requests
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6242 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c6c97f23ad
- added cache usage properties to crawl start
...
- added special rule to balancer to omit forced delays if cache is used exclusively
- extended the htCache size by default to 32GB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6241 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
c4ae2cd03f
fixed bug that caused deletion of crawl profiles at every application startup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6240 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter
161d2fd2ef
redesign of access to the HTCache (now http.client.Cache):
...
- better control to the cache by using combined request-header and content access methods
- refactoring of many classes to comply to this new access method
- make shure that the cache is always written if something was loaded
- some redesign of the process how http response results are feeded into the new indexing queue
- introduction of a cache read policy:
* never use the cache
* use the cache if entry exist
* use the cache if the proxy freshness rule confirmes
* use only the cache and go never online
- added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step.
- set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule'
- enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6239 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
f1ori
ba2e6de538
fix empty version string again
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6236 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago