yacy_search_server

Commit Graph

Author	SHA1	Message	Date
orbiter	d2ba1fd2ab	major step forward to network switching (target is easy switch to intranet or other networks .. and back) This change is inspired by the need to see a network connected to the index it creates in a indexing team. It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder. The remaining YACYDB is superfluous and can be deleted. The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy). The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT). No other functional change has been made. The next steps to enable network switcing are: - shift of crawler tables from PLASMADB into the network (crawls are also network-specific) - possibly shift of plasmaWordIndex code into yacy package (index management is network-specific) - servlet to switch networks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	d4bce6affd	refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
danielr	5c3c1fdf41	replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	0241d070bc	added concurrency to indexing process: - the methods {parsing, semantic analysis (condensing), structure analysis (web structure)} in the serialized indexing path had been made concurrent. - four BlockingQueues handle concurrency and hand-over of the indexing objects, the last object in the queue is stored into a blockingQueue of maximum size 1 to serialize the process for storage (which uses IO and therefore here should not be deserialized) - a concurrency of (CPUs + 1) is default. Single-CPU users will profil from the change because large files cannot block the indexing process any more. - removed the secondary indexing thread, which is superfluous now. Concurrency is default for all users. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4609 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	bca87f1e38	- refactoring of serverThreads: renaming to distinguish busy-threads and blocking-threads - added blockingThreads which are threads that are not driven by pause times but by BlockingQueue lookup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4606 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
low012	8e889de50b	*) Added Lotus' patch (http://forum.yacy-websuche.de/viewtopic.php?t=979 ), user will be taken back to last opened page after making changes in Advanced Settings. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4601 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	541b817502	refactoring of switchboard queueing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	9d693ee635	more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	0f5c4abaca	more generics git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4414 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	d517e96714	last cleanup bits to serverDate before the release. only safe refactoring (method renaming) changes outside of serverDate. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4289 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
fuchsi	0e1738899f	* Complete number localization and provide a more reasonable interface to serverObjects: - put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation. - putASIS(...) have been removed, now done with simple put(...) (see above). - puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()). - putHTML(...) escapes special characters into corresponding HTML enities ('<' => '<') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ". In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value. A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values. * added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456 * removed duplicate code (mostly related to the big changes above). TODO: - make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437 - probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting. - further improve the speed of page creation for the WatchCrawler. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	01e0669264	re-designed some parts of DHT position calculation (effect is the same as before) and replaced old fist hash computation by new method that tries to find a gap in the current dht to do this, it is necessary that the network bootstraping is done before the own hash is computed this made further redesigns in peer initialization order necessary git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	0ad8499e66	- all parsers are activated by default for pro releases - slightly higher file size limits for parsers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4051 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	a9e73b6852	fixed great mess with localization paths. the problem was: automatic re-translation after update did not work. hopefully now git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3952 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	d25caa07bf	redesigned some parts of http authentication added another access check for peer hops git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3340 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
auron_x	6ac0021e14	) fixed static ip beeing cleared when no port was given ) fixed static ip beeing enabled although it was cleaned to "" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3198 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	4a1dd8ecc8	- final step for Advanced Settings pages to XHTML git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3121 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
rramthun	882ebf34fa	) Some enhancements to the statuspage, which save one point from the advanced config ) ant clean now doesn't remove RPMs anymore git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3079 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
borg-0300	83cdf056c1	check for valid static IP git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3038 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	2e4aa6a170	refactoring of Advanced Config: - removed settings that are in Basic Settings - joined pages that belong together - moved include pages from yacy/ to / git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2726 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	fded1f4a5d	*) better handling of maximum file size limit in crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	63893003be	) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use. ) adding first version of maximum filesize check for the crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	6ad471ef96	* applied many compiler warning recommendations * cleaned up code * added unit test code * migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	6e676224d0	*) adding support for upnp A new port forwarding method for upnp was added. If this method is enabled, yacy automatically determines an UPnP capable internet gateway and configures the gateway port forwarding settings properly. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2328 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	b594ee9a5a	*) Adding possibility to configure if the http proxy should send the X-forwarded-for header (requested by TeeSee) See: http://www.yacy-forum.de/viewtopic.php?t=2577 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2257 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	04ab5da350	fixing server & proxy access settings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1857 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	20ba05ad41	*) Bugfix for 'Dangling meta character' bug See: http://www.yacy-forum.de/viewtopic.php?p=18585#18585 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1838 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	3b7e66ab48	staticIP should now work (with resolved Conflict) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1785 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8fcb25f9f9	*) Setting via header according to rfc - can be disabled via settings dialog git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1662 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	9086261476	refactoring of base64 encoding: the kelondro database needs specific information about the order of base64-encoded keys. Since no other package depends on base64 (only the httpd uses base64 for encryption, but does not need to encode these strings) it is good to move base64 encoding to the new ordering classes in kelondro. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	61bded057a	*) Bugfix for Server Port configuration. Status-Info was not displayed correctly. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1194 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	5a1d45715d	*) Bugfix for parser configuration bug - it was not possible to disable all parsers See: http://www.yacy-forum.de/viewtopic.php?t=1579 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1191 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	8f1f2daa5e	implemented interactive link deletion of search results. next steps: attach voting and restrict to administrator to see the deletion button, move the mouse pointer to the left of a search result git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	44fa94ac52	) Modifications for dbImport functionality - dbImporter threads are now shutdown by the switchboard on server shutdown - adding possibility to pause a importer thread via GUI - Bugfix for abort function See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363 ) Modification of content parser configuration - now it's possible to configure which parsers should be enabled for the proxy, crawler, icap, etc. separately - ) htmlFilterContentScraper.java - adding regular expression to normalize URLs containing /../ and /./ parts ) httpc.java - adding functionality to unzip gzipped content - requested by roland: should be used later to allow gzipped seed lists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	a04930f025	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	8e308cf50e	*) Possibility to change the server port on-the-fly. - Now it's possible to change the server port without the need to restart the whole server. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1089 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	f8f9d509d5	removed dead Code git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1078 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	cb69047b91	*)cleanup access static methods and fields git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	56b9f34411	*)removed unused imports git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
hydrox	295aff52a3	)added offline-browsing-support (onlineMode=0) )online-mode now can be changed in Status.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1010 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	ec3af327f7	) Bugfix for Proxy-Authentication against remote proxy See: http://www.yacy-forum.de/viewtopic.php?p=11804#11804 ) Adding first version of db test for mysql NOTES: - db user + db + db table must be created before starting the test - db table must be empty. Entries can not be updated at the moment - db connection properties must be changed in the sourcecode at the moment TODOs: - accepting connection properties via command line - implementing update + remove + read operations - 'maybe' adding code to create db + table if it doesn't exists git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@991 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	02d9af1a70	) Restructuring and extending of Remote Proxy Support - remote proxy configuration can now be "really" changed on the fly and takes effect immediately - adding possibility to disable remote proxy usage for yacy->yacy communication - adding possibility to disable remote proxy usage for ssl - restructuring proxy configuration so that it is stored in a single place now ) Adding possibility to import a foreign word DB (or even more of them in parallel) at runtime into the peers DB - this can be done by calling IndexImport_p.html - ATTENTION: please not that at the moment this thread must be aborted via gui before a normal server shutdown is done. - TODO: integrating IndexImport Thread into normal server shutdown - TODO: Adding posibility to import crawl-queues, etc. from foreign peers - TODO: removing old import function from yacy.java and calling the new routines instead git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
borg-0300	e642a5d8b7	more constants git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@947 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
allo	f65c939a60	userDB Auth git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@874 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
theli	a2fa75e688	) Asynchronous queuing of crawl job URLs (stackCrawl) various checks like the blacklist check or the robots.txt disallow check are now done by a separate thread to unburden the indexer thread(s) TODO: maybe we have to introduce a threadpool here if it turn out that this single thread is a bottleneck because of the time consuming robots.txt downloads ) improved index transfer The index selection and transmission is done in parallel now to improve index transfer performance. TODO: maybe we could speed up performance by unsing multiple transmission threads in parallel instead of only a single one. ) gzip encoded post requests it is now configureable if a gzip encoded post request should be send on intex transfer/distribution ) storage Peer (very experimentell and not optimized yet) Now it's possible to send the result of the yacy indexer thread to a remote peer istead of storing the indexed words locally. This could be done by setting the property "storagePeerHash" in the yacy config file - Please note that if the index transfer fails, the index ist stored locally. - TODO: currently this index transfer is done by the indexer thread. To seedup the indexer a) this transmission should be done in parallel and b) multiple chunks should be bundled and transfered together ) general performance improvements - better memory cleanup after http request processing has finished - replacing some string concatenations with stringBuffers - replacing BufferedInputStreams with serverByteBuffer - replacing vectors with arraylists wherever possible - replacing hashtables with hashmaps wherever possible This was done because function calls to verctor or hashtable functions take 3 time longer than calls to functions of arraylists or hashmaps. TODO: we should take a look on the class serverObject which is inherited from hashmap Do we realy need a synchronization for this class? TODO: replace arraylists with linkedLists if random access to the list elements is not needed ) Robots Parser supports if-modified-since downloads now If the downloaded robots.txt file is older than 7 days the robots parser tries to download the robots.txt with the if-modified-since header to avoid unnecessary downloads if the file was not changed. Additionally the ETag header is used to detect changes. ) Crawler: better handling of unsupported mimeTypes + FileExtension ) Bugfix: plasmaWordIndexEntity was not closed correctly in - query.java - plasmaswitchboard.java *) function minimizeUrlDB added to yacy.java this function tests the current urlHashDB for unused urls ATTENTION: please don't use this function at the moment because it causes the wordIndexDB to flush all words into the word directory! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	3c1d968d29	fix-fix for 792 and small changes in ftpc/download/dir experiments git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@797 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	dc474aa22f	various bug-fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@792 6c8d7289-2bf4-0310-a012-ef5d649a1542	19 years ago
orbiter	2d8557cb10	minor changes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@487 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
theli	228b04b499	*) Bugfix for "wrong seed-upload timestamp" problem http://www.yacy-forum.de/viewtopic.php?t=817 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@480 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago
orbiter	85877413a0	tried to fix principal bug .. not succeeded git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@440 6c8d7289-2bf4-0310-a012-ef5d649a1542	20 years ago

1 2

70 Commits (fbb712c669dc001230cbfac24065e9b91dd52c19)