yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	12 years ago
Michael Peter Christen	a06930662c	replaced some more .getBytes() with UTF8/ASCII.getBytes()	12 years ago
Michael Peter Christen	2f536cb54d	code cleanup: removed unised methods and made more methods and objects private	12 years ago
Michael Peter Christen	1533bfd63b	refactoring	12 years ago
Michael Peter Christen	e49359cc95	removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field.	12 years ago
Michael Peter Christen	e57bf2ca39	simplified DHT classes	12 years ago
Michael Peter Christen	8219a445f3	refactoring	12 years ago
Michael Peter Christen	00c1c777fa	refactoring	12 years ago
Michael Peter Christen	f75b3f8a47	added more patches to work without RWI data structure	12 years ago
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	12 years ago
Michael Peter Christen	a06123aec6	more abstraction and less parameter overhead for remote search	12 years ago
orbiter	6f01542aaa	explicit double-check in transferURL	12 years ago
Michael Peter Christen	0cab06c47c	refactoring	12 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	12 years ago
Michael Peter Christen	6197caf698	added clear-text search words in query params	12 years ago
Michael Peter Christen	597bb76e4f	get the peer location more quickly	12 years ago
orbiter	9b88433f45	patch from hint in http://forum.yacy-websuche.de/viewtopic.php?p=26858#p26858 from gaston	12 years ago
orbiter	e816b88b55	changed behaviour of metadata storage: in case that any solr is attached, the metadata is not written to the metadata-db, even if it is enabled but instead to solr. This prevents that metadata is written in two store systems at the same time. It is also the next step to migrate the current metadata-db to solr.	12 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	12 years ago
orbiter	67edfd991c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	d9173ba7ed	added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. The new/added solr fields are: ## time when resource was loaded load_date_dt ## date until resource shall be considered as fresh fresh_date_dt ## id of the host, a 6-byte hash that is part of the document id host_id_s ## ids of referrer to this document referrer_id_ss ## the md5 of the raw source md5_s ## the name of the publisher of the document publisher_t ## the language used in the document; starts with primary language language_ss ## an external ranking value ranking_i ## the size of the raw source size_i ## number of links to audio resources audiolinkscount_i ## number of links to video resources videolinkscount_i ## number of links to application resources applinkscount_i	12 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	12 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	12 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
orbiter	62202e2d71	refactoring of query attribute variable names for better consistency with (next) stored query words	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	241dd8410a	removed snippet pattern filter - it was not used	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	03280fb161	removed segments-concept and the Segments class: the segments had been there to create a tenant-infrastructure but were never be used since that was all much too complex. There will be a replacement using a solr navigation using a segment field in the search index.	13 years ago
Michael Peter Christen	b9d42fd9c8	using com.google.common.io.Files instead of homebrew methods	13 years ago
Michael Peter Christen	8b53771db2	changed behavior of navigation processing: - vocabulary annotation is not done any more into the metadata of urldb - vocabularies are written into the jena triplestore using a rdf vocabulary - vocabularies for rdf tripel must be updated; refactoring done - with the new navigation tags in the triplestore a faster pre-urldb-lookup is possible: navigation is processed now within the RWI during pre-ranking retrieval - added also a Owl vocabulary stub to add the plain-text url to the triplestore using the owl:sameas predicate	13 years ago
Roland 'Quix0r' Haeder	edaa09b9b1	Rewrote all String blacklist types to enum 'BlacklistType', closes bug #143 Conflicts: htroot/Supporter.java htroot/yacy/crawlReceipt.java htroot/yacy/transferRWI.java htroot/yacy/transferURL.java source/de/anomic/crawler/CrawlStacker.java source/de/anomic/data/ListManager.java source/net/yacy/peers/Protocol.java source/net/yacy/repository/Blacklist.java source/net/yacy/repository/LoaderDispatcher.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/MetadataRepository.java source/net/yacy/search/index/Segment.java source/net/yacy/search/query/RWIProcess.java source/net/yacy/search/snippet/MediaSnippet.java	13 years ago
Michael Peter Christen	2fe207f813	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	5aee19daa4	added show from cache in search results (not yet finished)	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	71c3163f3d	- fixes to node identification - added link to node in network list - added marking of portal search node peers	13 years ago
Michael Peter Christen	7bf421b9dd	- fixed image search page navigation - removed some deadlocks and ConcurrentModificationExceptions during DidYouMean collection	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	13 years ago
Michael Peter Christen	14f67f217c	refactoring of ContentDomain: now subclass of Classification	13 years ago
Michael Peter Christen	a5d7da68a0	refactoring: removed dependency from switchboard in Balancer/CrawlQueues	13 years ago
Michael Peter Christen	a9b4d49b75	removed debug output	13 years ago
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	13 years ago
Michael Peter Christen	b4bc1e2875	remote search does not do snippet generation	13 years ago
Michael Peter Christen	83009d86f7	added the vocabulary navigator. It can be very simply tested by switching on the locale dictionaries.	13 years ago
Michael Christen	20e3084bd4	redesign of fining of peers by ip: more leightweight method to read the seed databases	13 years ago
Michael Christen	9e5894c784	Removed handling of components objects for URIMetadataRows. This is a preparation to replace this rows with nodes from the node store.	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Peter Christen	0bcef2d156	added feature as requested in http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461 The search can now be configured with a non-display host list. the search will always exlude the given list of host unless they are requested directly using the host navigation	13 years ago
orbiter	ebd840ebf6	- enhanced description on search front page - fixed language and heuristic modifier - added hint to crawl start that we can do also ftp and smb crawls - added a protocol extension to remote crawls to transport all search modifiers to remote peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	c9216d5adf	fixed secondary remote search (the process that finds distributed join situations) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8098 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	507c9d478d	much better timing when search globally; less blocking; more results earlier! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8084 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	8e0b2c5832	fixed cluster search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8083 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	368b51ed5b	argh.. fixed bad SVN 8080 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8081 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	eb4436defb	removed limitation to cluster peers if peer is asked remotely. This enables single-linked clusters which naturaly is there first if a new cluster is created git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8080 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	5581be12fb	YMarks: - added backend and api for tag management git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8058 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	84c3fc9d97	local/global fixes in search, better abstraction git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8054 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	5a7cec59f3	moved ynetSearch to get all files out of htroot/api/util/ git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8042 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	f8b8c82421	- refactoring of getpageinfo_p.xml (moved out of util) - added more logging in getpageinfo_p.xml git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8037 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
apfelmaennchen	4d7ae76017	- update to jquery 1.7 (does not apply to all jquery code, old version is additionally kept for compatibility) - update to jquery-ui 1.8.16 (includes themes) - introduced new portalsearch (as default) - old portalsearch is still available and accessible, but will eventually be removed - jquery and portal search is now loaded by special header templates for maintenance reasons - update to new autocomplete, solves bug: http://bugs.yacy.net/view.php?id=29 - many improvements to YMarks GUI and API...more to come anytime soon Sorry, this is a rather large commit, I hope it doesn't break anything essential, but I need to consolidate some of my efforts in order to move ahead. Especially the update to the portalsearch widget might not be welcomed, but the old one is simply incompatible with newer jquery and jquery-ui libraries, sorry. The code tree /yacy/ui/... is obsolete and will be removed in the future. At that point all productive portalsearches should have migrated to the new version. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8014 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	9e4875230f	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8001 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	204e98db3a	added a protection against rwi flooding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7993 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	a7df70221e	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	813f297a95	another performance hack: re-use of known host addresses for isLocal property; avoids look-up in local hash git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7983 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	734059d33e	performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7955 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	22d69a6368	refactoring in cora: added sorting package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7890 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	51cf697acd	refactoring: moved all score-related classes to new ranking package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7889 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	11dc653de3	added a visualization of peer pings to the performance graphic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	115abc8917	- more attributes for search progress bar - moved cache strategy to cora package git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4bea3f9714	hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources: used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes). The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	e28bd0d038	fix for some possible causes of memory leaks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	10e2f588f8	- enhanced ybr ranking computation - many speed/performance hacks - added solr charding and new charding web interface - added option to switch off the yacy index when using solr - added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr - refactoring/renaming of some method names to distinguish host/url hashes better - a large number of bug/npe fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b45701d20f	this is a re-implementation of the YaCy Block Rank feature This time it works like this: - each peer provides its ranking information using the yacy/idx.json servlet - peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob - this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable - I computed new ranking tables as part of the distribition and commit it here also - the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers - a recursive block rank refinement is implemented but disabled at this point. it needs more testing Please play around with the ranking settings and see if this helped to make search results better. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	123375bfba	added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy. This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling http://localhost:8090/yacy/idx.json?object=host This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces. The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform. To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5b579e21a3	code cleanup git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7713 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	0621a15f89	fix for wrong search result counter: added a counter for all filtered out entities see also http://bugs.yacy.net/view.php?id=5 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7704 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	deda54d684	- relaxed matching of string-search (this is now case-insensitive) - added transport of string-search pattern to remote search protocol - fixed a problem parsing snippets with a '-' inside git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7700 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	6e42d4de88	- added full-String search function: find things that match exactly what is quoted in the query - re-structuring authentification methods to fix a problem with API steering git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7697 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	8b8db2aaba	YMarks: some small changes/fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7695 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	6fa439c82b	- refactoring of robots - added option to crawler to send error-URLs to solr - changed solr scheme slightly (no multi-value fields where no multi values are) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7693 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	e7c2ea193b	YMark: - general improvements on importers, especially on auto tagging - added get_tags (needed for tag clouds etc.) - improved flexigrid support - added YMarks.html (not fully working) that will eventually replace Bookmarks.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7691 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3b578a28ef	some patches to prevent that empty or bad IP information is broadcasted - on client-side: fix bad IP reports from remote Peers by replacing their reported IP with their server IP if the reported IP is bad, broken or disallowed - on server-side: the same during a peer ping (here the ping'ed server acts also as client during the back-ping) and also when receiving a message or a search where the client sends also its seed. Here the IP is replaced by the client IP if the reported IP is broken or bad git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7687 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	8b95a26866	better magic git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7684 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2700a58e5a	added a magic to the peer ping that will be used in case that the contacting peer requests that it's reported IP shall be used for a back-ping. The back-ping now also returns the same magic which will make it possible that the requested peer can verify that the back-pinged peer is actually the same peer. This is also a protection against the foced-fake of a external IP: if such an IP was faked, then the next ping from the affected peer to another peer looks like a staticIP report. Such a bad staticIP-by-faked-response can now be discovered and fixed by the peer that gets the second ping after the first ping contained a faked response. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7683 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	b2281f0b7d	YMark: intermediate work towards flexigrid support git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7670 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
lotus	06afa94f9d	hups git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7626 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
lotus	a9a9db98c8	better rename modified version git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7625 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
lotus	e19ca27004	do not autocomplete on mouseover. this has resulted in unwanted autocomplete. fixes bug #3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7624 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	2861d0888a	) simplified code\n) fixed potential NumberFormatExceptions git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	8f11d3a5bb	redesigned the ScoreMap classes: - new concurrent score map using atom operation from java concurrency classes - redesigned difference beween StaticScore and Dynamic Score into ScoreMap and ReversibleScoreMap allowed that many classes can now use simple ScoreMap Objects which can be used better in concurrent environments using the ConcurrentScoreMap - switched from DynamicScore to ConcurrentScoreMap usage wherever possible git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7586 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	694fa3a2a5	- replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion - changed menu structure slightly git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	cb1f49d0f2	replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	bed79402be	introduction of a new remote search load control: the remote search has taken 10 results per peer with a time-out of 3 seconds so far. The attributes of number of results per peer and time-out time can now be configured. This has two aspects: the user who searches may want to increase these values to get more results and more load on the remote side and the user of the server which is accessed for this search may want to restrict the load. Both sides can now be configured. The server-site maximum load parameters are defined by a network definition and the client-side search request load can be defined by each user individually but when the remote search is done the requested service is limited to the network definition. You can find now in the network definition file: network.unit.remotesearch.maxcount and network.unit.remotesearch.maxtime and in the yacy.conf file: remotesearch.maxcount and remotesearch.maxtime There is currently no web interface to define the client-side remote search attributes, please set them manually git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7548 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5e186e0122	continuing the fight against deadlocks during time formatting: better caching. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7531 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4473cf8c61	replaced utf-8 with UTF-8 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7485 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5892fff51f	introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased. Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods. The number of maximum peers is now not fixed to a specific number but may increase with - the partition exponent - the number of redundant peers - the robinson burst percentage - the multiword burst percentage The maximum can then be the number of senior peers (all visible peers). git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	4588b5a291	- fixed document number limitation for crawls that restrict the number of documents per domain - some restructuring of the document counting and logging structures was necessary - better abstraction of CrawlProfiles - added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation - more refactoring to get the LibraryProvider more clean - some refactoring of the Condenser class git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	88773e4daa	changed the default port from 8080 to 8090 see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	28f669bf0b	- fixed/enhanced move to SD/16:9 images (network, web structure) - added logging in peer ping to analyse time-consuming elements which could be cause for disappearing peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7450 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	efb4ca8fa8	modified auto-delete of search failure-words: - words are now not deleted from the search index automatically if index receive is switched off - a flag in the network definition defines if this feature is switched on at all - the search filter for not-found word references is switched off for server-side remote searches git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7441 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	6c1b14c8e1	- more control in access tracker: count number of returned search results (not only info how much is in the index) - extended query params for this - enhanced cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7430 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	10ae8d961b	- cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring) - cleaned up (removed special code and documentation for 27c3) - added remote search functions to be used within cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7420 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a4c9d27287	- moved some variables from Stwitchboard to new class AccessTracker - added a limitation in access tracking to delete queries which are older than 10 minutes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7410 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a563b05b60	enhanced crawler: - added a new queue 'noload' which can be filled with urls where it is already known that the content cannot be loaded. This may be because there is no parser available or the file is too big - the noload queue is emptied with the parser process which indexes the file names only - the 'start from file' functionality now also reads from ftp crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7368 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	db99db4be9	some redesign of the search-fail-response mechanism: when a search fails for a single url because the snippet cannot be generated, then the url reference is deleted from the index. This mechanism was redesign and enhanced. The process now also writes into the work tables into the table searchfl to prepare a re-indexing mechanism. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7364 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	18d33b5c6d	fixed several search result navigation bugs fixed bad behaviours during search result collection git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7362 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	49b5a206cd	- better caclculation of search result size - predefined search recommendations git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7361 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a9f754c45f	removed unused CR accumulation and distribution process this was never used and extended in the last years. The resulting YBR ranking criteria is still a good idea and will be used in the future. Possible generation methods for YBR ranking are: - "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS) - "block-rank" calculated from the local link structure - a distributed "block-rank" using the xml API to the link structure from other peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7349 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	e7552bd719	*) cleaning up the code a little bit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7343 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	38fdf43587	) renamed classes according to standard Java coding conventions ) String.isEmpty() was introduced in Java 1.6, but we still use Java 1.5 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7330 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	808edffaf6	ymarks - some refactoring - working xbel and html import (/api/ymarks/test_import.html) - working treeview (/api/ymarks/test_treeview.html) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7312 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	43586a2ace	a update to ymarks (please test if you wish): - import HTML (e.g. FF export) via /api/ymarks/import.html - view your import via /api/ymarks/test.html - get a xml list via /api/ymarks/get_ymark_list.xml?tags=&folders= - delete bookmark tables via standard interface /Tables_p.html it is still very experimental!! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7299 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	25a8e55bc9	more logging about bad seeds git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7275 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	959b8c6fa0	- allow greater seed size - more logging for bad seeds git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7274 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	7adfe4a1c1	fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=35#p21092 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7269 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	863065abc4	added user agent logging to access tracker git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7256 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	ed4371dcf3	enhanced navigation implementation and enhanced tag cloud computation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7252 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	ca738ac924	- added a tag cloud to search results (using the topics) - some refactoring of score classes - added default package for new classes add_ymark and delete_ymark git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7251 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	beb65437d2	additional fix for the widget - now a second result page is loaded automatically in case of too little search results for the scroll event to trigger git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7245 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	2bb0c9b503	Fix for search widget keyup event handling. ESC will close the widget window and RIGHT will load additional search results, especially when the scroll event won't work because of too litte results. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7244 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	a59c885ee0	autocomplete and did-you-mean can now understand _all_ languages and can generate suggestions in all languages and character types git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7242 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b7acd92ce4	Auto-Suggestions for YaCy Search: - added a suggest servlet according to opensearch and firefox standard - integrated the suggest servlet into opensearch description file - integrated a autocomplete plugin for jquery - added a autocomplete addition to the yacy search windows showing autosuggest queries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7241 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	24f1cba7b2	performance hacks: - faster generation of index abstract compression during remote search - less synchronization in IO record reading - request index abstract generation only if necessary and faster time-out in remote search git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7239 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	45b1ab3d07	custom + generic skins: - added a generic skin which is filled with actual color assignment using a servlet - enabled css servlets - added a generic color scheme in configuration file - added configuration input in Customization/Appearance servlet - added a jquery color picker widget - placed color picked widget to input field of generic colour definition input fields git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7235 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	dffa142529	Fix for author navigator in yacyui-portalsearch.js git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7219 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	aacf572a26	- enhancements for search speed - bug fixes in many classes including basic data structure classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7217 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
low012	f32bb5e51f	) Changed image in Steering.html from linked image to embedded image because shutdown is so fast now, browsers can't load image before Yacy instance is gone already. Had to make image smaller since IE does not accept large Base64 encoded images. ) Decreases wait time in Steering.html before first check since *) HTML fixes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7165 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	ac1c08924e	more performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7149 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	2e75879504	fix for latest commit git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7145 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	6e4653cf50	remove DoS protection in remote search for intranet hosts git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7144 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	906c572621	- enhanced index create menu structure - clear search log caches each time a search is done git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7142 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	64860dc1bb	enhanced search event logging (to be used for further improvements) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	348dece62f	redesign of the SortStack and SortStore classes: created a WeakPriorityBlockingQueue as special implementation of a PriorityBlockingQueue with a weak object binding. - better abstraction of ordering technique - fixed some bugs according to result numbering (distinguish different counters in Queue) - fixed a ordering bug in post-ranking (ordering was decreased instead of increased) - reversed ordering numbering using a reversed ordering. The higher the ranking number the better (now). git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7128 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	9d080f387e	change in handling of the all-visible home path for storage in YaCy: the home path can now be distinguished between - data home; the path where the DATA directory is created - application home; everything else This will make it possible to store application data on Mac releases within the ~/Library/YaCy directory; a place where Mac applications write their data. Similar techniques will be possible for debian and windows. To use the new data path, YaCy can be started with -start <data path> or -gui <data path> git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3197ca42ed	preparations to move the HTCache into cora: - move the header framework classes to cora - move the ARC caching classes to cora - refactoring of code to call these classes from cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
mikeworks	aa663cda4d	ConfigUpdate_p.html and ConfigUpdate_p.java: Added check for downloaded releases and disabled buttons in case no new releases available de.lng: Updated German translation for additional String in ConfigUpdate_p.html XHTML 1.0 Strict fixes for all the other .html files yacy/ui/css/yacyui-portalsearch.css: added .hidden class that was removed from ConfigProperties_p.html Switchboard.java: Added URL for thread Remote Crawl Job and set URL for Remote Crawl URL Loader to null to fix empty href="" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6996 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
mikeworks	0f248e7433	ConfigBasic.html: XHTML 1.0 Strict fixes DictionaryLoader_p.html: Filled <dt> elements to eliminate warnings Moved CSS for portalsearch field from header to metas template because it belongs in the <head>er yacui-portalsearch.css Added #yacylivesearch form { display: inline; } because HTML 1.0 Strict does not allow <form><input> and the added <p> would otherwise provoke a line break de.lng: Updates translations for added <dt> elements and deactivated statement in DictionaryLoader_p.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6994 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	b6fb239e74	redesign of parser interface: some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure. This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents. To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files. additionally, parsing of jar manifest files had been added. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	bf25407fdd	added peer hash to internal RSSFeed. The hash will be used to display news activities in the network graphic. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6949 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1557e0f2d0	- some refactoring for internal RSSFeed (protocol of all actions as seen on status page) - added dht-out to internal RSSFeed (you can see now messages about distributed indexes on status page) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6948 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	7bcfa033c9	more abstraction of the htcache when using the LoaderDispatcher: a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher. To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used. Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	090eae2cf5	fix for broken index abstract generation git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6928 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1610c81dff	fixes for embedded search / search widget git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6911 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	11639aef35	- added new protocol loader for 'file'-type URLs - it is now possible to crawl the local file system with an intranet peer - redesign of URL handling - refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	e40542579e	fixes for wrong attribut name search->query (SRU) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6895 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	7b880d73d0	adjustments to granted query size git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6868 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	586bc4d920	- remove superfluous entries in remote search tracker handles - avoid concurrent access from same client this is a fix for http://forum.yacy-websuche.de/viewtopic.php?p=20045#p20045 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6866 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago

1 2 3 4 5 ...

889 Commits (a26f1b3cd749d336963967481b70e50659ec61e7)