yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	16b21f7a5b	Added more steering in Crawler_p.html interface	13 years ago
orbiter	3a807e10cf	- added a cache for active crawl profiles to the crawl switchboard - moved the domain cache for domain counter from the crawl switchboard to the crawl profiles. the crawl domain counter is now therefore relative for each crawl start, not for the whole crawler. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8018 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	b250e6466d	implemented crawl restrictions for IP pattern and country lists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7980 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5ad7f9612b	added crawl settings for three new filters for each crawl: must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue) must-not-match for IPs must-match against a list of country codes (allows only loading from hosts that are hostet in given countries) note: the settings and input environment is there with that commit, but the values are not yet evaluated git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
low012	c7b95e8c81	) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. ) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore. ) Cleaned up a little bit. ) Added some comments. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7827 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	4fe1329de2	*) trying to at least fix symptoms of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3293#p22791 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7799 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	7fea51ecee	check filter to bee a correct pattern on edit CrawlProfiles see; http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3277&p=22662#p22660 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7764 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3ec94d87c4	show dom counter only for active crawls where the dom counter is enabled within the crawl profile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7731 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4588b5a291	- fixed document number limitation for crawls that restrict the number of documents per domain - some restructuring of the document counting and logging structures was necessary - better abstraction of CrawlProfiles - added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation - more refactoring to get the LibraryProvider more clean - some refactoring of the Condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	e7552bd719	*) cleaning up the code a little bit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7343 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2c549ae341	fixed a number of small bugs: - better crawl star for files paths and smb paths - added time-out wrapper for dns resolving and reverse resolving to prevent blockings - fixed intranet scanner result list check boxes - prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available) - fixed rss feed loader - fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero) - clearing of crawl result lists when a network switch was done - higher maximum file size for crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f6eebb6f99	replaced auto-dom filter with easy-to-understand Site Link-List crawler option - nobody understand the auto-dom filter without a lenghtly introduction about the function of a crawler - nobody ever used the auto-dom filter other than with a crawl depth of 1 - the auto-dom filter was buggy since the filter did not survive a restart and then a search index contained waste - the function of the auto-dom filter was in fact to just load a link list from the given start url and then start separate crawls for all these urls restricted by their domain - the new Site Link-List option shows the target urls in real-time during input of the start url (like the robots check) and gives a transparent feed-back what it does before it can be used - the new option also fits into the easy site-crawl start menu git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7213 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	377f001e0d	sorting of crawl profile names in crawl profile editor, see http://forum.yacy-websuche.de/viewtopic.php?p=20851#p20851 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7172 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	65eaf30f77	redesign of crawl profiles data structure. target will be: - permanent storage of auto-dom statistics in profile - storage of profiles in WorkTable data structure not finished yet. No functional change yet. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3197ca42ed	preparations to move the HTCache into cora: - move the header framework classes to cora - move the ARC caching classes to cora - refactoring of code to call these classes from cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5a994c9796	added a scheduler based on API actions - every process that is monitored with the API Steering interface can now be scheduled! - added input methods in Steering interface to set a scheduling time - added a view on the steering api that shows only crawl jobs inside the Crawl Profile servlet - added a scheduling call process in the cleanup process handler that triggers the scheduled processes This causes that the cleanup now also looks for scheduled processes. Such processes are therefore not executed at the same time as given in the target execution time but they will be executed within the cleanup process time window. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7050 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	ad96a14d0a	*) jump to Crawl Profile editor if a profile is selected to be edited git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6991 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	b7556893c6	removed terminate buttons for build-in crawl profiles in crawl profile editor git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6883 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	25aef069a6	continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	34354cf9b2	added a servlet that has been removed in SVN 4881; this servlet is now splitted and will be used for a simple crawl start and a remote crawl monitor (not yet integrated into the interface) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6582 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	a06f7ddb33	more PMD recommendations git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6572 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	dd459281c8	applied code changes that are recommended by PMD git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	362b7a929b	added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6521 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	5841ee83d3	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6400 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	04a548a1e3	- temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class - fixes for numerous other problems - removed dead code - resdesign of the strings-method, which produces now less memory overhead and may help to prevent OOMs - another fix for the deadlock problem in SplitTable git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6373 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
low012	5e4f267a36	*) added subversion properties and edited a few comments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6348 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1d8d51075c	refactoring: - removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here: http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages. - cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http. - because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	5bb8074150	removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency. - The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well. - Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified. - Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed. - The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here. - Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	154bbc3364	code cleanup: call of static methods directly to the class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6155 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	99bf0b8e41	refactoring of plasmaWordIndex: divided that class into three parts: - the peers object is now hosted by the plasmaSwitchboard - the crawler elements are now in a new class, crawler.CrawlerSwitchboard - the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	bd5f4c78d8	- added default profile for surrogate indexing - integrated surrogate indexing into indexing queue process git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5810 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	10f5ec1040	reverted last commit (more testing needed) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	dba7ef5144	extended crawling constraints: - removed never-used secondary crawl depth - added a must-not-match filter that can be used to exclude urls from a crawl - added stub for crawl tags which will be used to identify search results that had been produced from specific crawls please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'. Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	0edec2b760	FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html. The old process used a not really efficient way to detect html encoding strings in texts. All calling methods had been adoped to call the new class in an enhanced way with less parameters. Many classes in interfaces used a XML encoding only (instead of full html conversion from unicode to html); this behavior was not changed with this commit but should be controlled again since it points out possible XSS leaks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5295 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	d9d9c522a1	addendum to last commit moved recrawl times for standard profiles to constants calculate new specific dates in cleanup job git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5082 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	536e77e8b7	modifications towards a single database operation to read/write http header and cached file at once: - removed distinction between header file types for http and ftp; ftp is simulated by using http properties - removed all old resourceInfo classes that handled this distinction - introduced a new distinction between http request and http response objects - unified new response objects with two other object types that had been introduced elsewhere - changed all servlet call methods to use the new http request header object type - divided static object keys for http header properties into request and response types - refactoring here and there (a large number of type changes and many methods merged/moved) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
danielr	3bb870bfcd	added final where possible git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	7feae906aa	- organize imports - removed potential null pointer accesses - removed unnecessary casts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	cfe6790498	- added option to switch between yacy networks, especially between the two default networks (freeworld and intranet), from the ConfigNetwork online interface - to make this possible, a large refactoring and reorganisation of data structures was necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	dd75b3cabc	- patch for bad profiles - time-out when deleting profiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4793 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	1689030ee8	refactoring: moved all crawler classes into their own package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
f1ori	b9602e891a	* added CrawlProfileEditor_p.xml for monitoring in yacybar git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4708 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	d03940f2ec	- included patch from http://forum.yacy-websuche.de/viewtopic.php?p=7193#p7193 - fixed problem with crawl profile editor after deletion of a crawl profile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4706 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	b2150057d2	removed unnecessary cleanup method git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4625 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
low012	b63cf2fc1c	*) added button to Crawl Profile Editor to delete all terminated crawl jobs (only visible if there are terminated crawl jobs) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4620 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	541b817502	refactoring of switchboard queueing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago

1 2

62 Commits (ea49a8aa8ca6d09beda9318423093795bd9b4f04)