yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	f5ca5cea44	- added field options to all solr queries. This can be used to restrict the actual data which is fetched from solr. - used the new field options to reduce generic options like getting the load date or the count of search results. should increase overall speed - used the new field options to reduce overhead in the host browser during aquisition of links. - used the field options to make checking of links in crawler faster - if the crawler is paused, the crawl queue is not cleaned	12 years ago
cominch	2bb8f045cc	content control: use up-to-date definitions	12 years ago
cominch	d2a94cc55e	refactor package	12 years ago
cominch	21df1ad9e0	update and generalization of the SMW import and content control routines	12 years ago
Michael Peter Christen	71ed8e5e07	bugfixes for crawler	12 years ago
Michael Peter Christen	158732af37	automatically delete entries from the crawl profile list if crawl is terminated.	12 years ago
Michael Peter Christen	15d1460b40	added information about the reason of pausing of crawls	12 years ago
Michael Peter Christen	791e1dcfdf	when a new crawl is started, delete all entries about error-urls for crawl-start domains	12 years ago
Michael Peter Christen	8fb370d9f8	renovated the way how search results are count. should be correct now...	12 years ago
Michael Peter Christen	6629e37685	tried to clean up the search process mess	12 years ago
Michael Peter Christen	f8f05ecba7	- added a delete button in host browser to delete a complete subpath - removed storage of default collection name - default is now "user" - made stacking of crawl start points concurrently	12 years ago
Michael Peter Christen	4a14122ba7	in case that a crawl profile has a collection assigned, use the collection to show a name in the web interface. This should prevent that much too long names make the interface unusable.	12 years ago
Michael Peter Christen	0833937c1c	better balancing and duetime-cumputation also for no-delay intranet hosts	12 years ago
Michael Peter Christen	c25d7bcb80	- added concurrency for robots.txt loading - changed data model for domain counter	12 years ago
Michael Peter Christen	2d9e577ad0	replaced the custom robots.txt loader by the standard http loader	12 years ago
Michael Peter Christen	799d71bc67	enhanced solr caching: - increased cache size which is needed for longer solr commit time - speed hacks on cache write code	12 years ago
Michael Peter Christen	a33e2742cb	- removed unnecessary synchronized and deadlock in crawler - removed problem with monitoring object on Balancer.wait - added missing user agent settings	12 years ago
sixcooler	47ae7e322e	smaller dhtDispatcher.cloudSize @Orbiter: we talked about this times ago - please revert if I'm wrong	12 years ago
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	12 years ago
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	12 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	12 years ago
Michael Peter Christen	21fe8339b4	- enhanced generation of url objects - enhanced computation of link structure graphics - enhanced collection of data for link structures	12 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	12 years ago
Michael Peter Christen	bd769de604	since the solr index is now used for all pages that are indexed locally, there is no need for the RWI index if the index is not transfered to another peer. Therefore the creation of RWI index data is now suppressed if DHT is disabled. This applies for all intranet and portal mode configurations, but not for public robinson modes. A robinson may switch back to public mode and then transmit its data. That means if someone wants to switch never to DHT mode, it would be more appropriate to choose the portal mode.	12 years ago
Michael Peter Christen	4b5e0c1500	added an url rewriter which can be used to remove session ids from urls	12 years ago
Michael Peter Christen	76d218fbef	fixes to crawl profiles	12 years ago
sof	5cb244b79b	Merge remote branch 'origin/master'	13 years ago
apfelmaennchen	88b062210c	Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based on the jaudiotagger library. The parser is disabled by default as it needs to store temporary files for non file:// protocols, which might be disliked. For your local MP3-collection it loads nicely Artist, Title, Album etc. from the audio files meta data.	13 years ago
orbiter	3190347814	added a synonyms_t field to solr and a process to read synonym files. This can be used to add another stemming to solr using stemming files that are expressed as synonyms for grammatical alternatives. The synonym/stemming files must have the following form: - each line is a comma-separated list of synonyms - the list of synonyms may be enclosed with {} (like the GSA synonyms file) - the file may contain comments which are lines starting with a '#' The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and are activated by default whenever a synonym file is in place. Then, for each word that is found in a document all synonyms are added to a long text field which is stored into synonyms_t. Processes using the synonyms must query with that field as optional matcher.	13 years ago
Michael Peter Christen	f45f7fc12e	added new Host Browser to main menu: this new search interface is something completely new for search, but completely common on desktops: browser a web space like one would browse a file system in a file browser. The file listing is created using the search index and a faceted restriction to specific domains.	13 years ago
Michael Peter Christen	8556a3d521	extended solr connector with a method to retrieve a single facet.	13 years ago
Michael Peter Christen	23f68f2a69	force usage of default faceting mechanisms for search	13 years ago
Michael Peter Christen	a4214694df	We assert that no other metadata storage than solr is used now. Therefore a property like solrConnected() must be true all the time. Removal of this method causes removal of all write operations to the old metadata index.	13 years ago
Michael Peter Christen	1533bfd63b	refactoring	13 years ago
Michael Peter Christen	872f83ebe0	refactoring	13 years ago
Michael Peter Christen	8219a445f3	refactoring	13 years ago
Michael Peter Christen	00c1c777fa	refactoring	13 years ago
orbiter	63762d8f89	removed kelondro dependencies from cora	13 years ago
orbiter	60b1e23f05	added new crawl options: - indexUrlMustMatch and indexUrlMustNotMatch which can be used to select loaded pages for indexing. Default patterns are in such a way that all loaded pages are also indexed (as before) but when doing an expert crawl start, then the user may select only specific urls to be indexed. - crawlerNoDepthLimitMatch is a new pattern that can be used to remove the crawl depth limitation. This filter a never-match by default (which causes that the depth is used) but the user can select paths which will be loaded completely even if a crawl depth is reached.	13 years ago
Michael Peter Christen	6ec02deec6	added new crawl attributes in crawl profile (not active yet)	13 years ago
orbiter	a55e77a115	added twitter search heuristic	13 years ago
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	13 years ago
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	13 years ago
cominch	dc468dad01	add content control features for custom filter lists	13 years ago
reger	65d49df865	security fix: clear automtic password only if adminAccountForLocalhost=false to prevent remote access to protected pages after restart. if adminAccountForLocalhost=true leave automatic password unchanged so access from local host is granted but remote access is preventet from the 1st second.	13 years ago
Michael Peter Christen	48a82bc705	log queries anonymous from gsa+solr requests	13 years ago
Michael Peter Christen	0cab06c47c	refactoring	13 years ago
Michael Peter Christen	06a78eecb7	code simplification	13 years ago
Michael Peter Christen	18f989dfb1	- refactoring (load -> getMetadata) - added getDocument to retrieve Solr documents which shall replace getMetadata	13 years ago
Michael Peter Christen	23226676c6	FOR THE BRAVE.. this is a forced migration to solr which is now ready for production as a replacement of the metadata-db. This intermediate release 1.041 will switch on the previously optional solr index and the old metadata-db will still work as it did before. Solr+metadata are accessed in mixed mode, no migration is done yet. If this causes not a catastrophe until the end of the weekend, we will do a YaCy 1.1 main release containing this as default.	13 years ago

1 2 3

149 Commits (f5ca5cea447b08f582b8bb44bcb33973d16fe7f5)