yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	8fc3679c66	using more pre-compile pattern for split methods	12 years ago
Michael Peter Christen	5e182a566f	- added another enumeration method in kelondro data structure to get a more random access to data for the balancer - added random access inside the balancer	12 years ago
Michael Peter Christen	d6b82840f8	added a feature to find similarities in documents. This uses an enhanced version of the Nutch/Solr TextProfileSignatue. As a result, a signature of the document is written to the solr search index. Additionally for each time when a signature is written, it is checked if the singature exists already in the index. If the signature does not exist, the document is marked as unique. The unique attribute can now be used to sort document lists and bring duplicates to the end of a result list. To enable this, a large portion of the search api to Solr had to be changed. This affected mainly caching of 'exists' searches to enhance the check for existing signatures and do this without actually doing a solr query. Because here the first time a long number is used as value in the Solr store, also the value naming in the YaCySchema had to be adopted and normalized. This caused that many files had to be changed.	12 years ago
Michael Peter Christen	f5ca5cea44	- added field options to all solr queries. This can be used to restrict the actual data which is fetched from solr. - used the new field options to reduce generic options like getting the load date or the count of search results. should increase overall speed - used the new field options to reduce overhead in the host browser during aquisition of links. - used the field options to make checking of links in crawler faster - if the crawler is paused, the crawl queue is not cleaned	13 years ago
Michael Peter Christen	832eead998	Merge remote-tracking branch 'regerdev/master'	13 years ago
Michael Peter Christen	570e42c4e3	fix for filetype naviagtor	13 years ago
reger	633fbe9188	Fix Metadata handling - language default on missing lang property to "uk" (fix set to nothing) - language set to TLD (added call to existing language calculation from TLD) - coordinate number exception on possible lat/lon content of "NaN,NaN" adjust Netbeans IDE classpath (for Solr/Lucene 4.0.0 jars)	13 years ago
Michael Peter Christen	c5f67a5d6d	fixed a problem with local search from solr results: now all results from solr are shown (again)	13 years ago
Michael Peter Christen	f8f05ecba7	- added a delete button in host browser to delete a complete subpath - removed storage of default collection name - default is now "user" - made stacking of crawl start points concurrently	13 years ago
Michael Peter Christen	a33e2742cb	- removed unnecessary synchronized and deadlock in crawler - removed problem with monitoring object on Balancer.wait - added missing user agent settings	13 years ago
orbiter	354f0d9acd	moved static method from ClusteredScoreMap to MapDataMining because it was not used in the ClusteredScoreMap class but only in MapDataMining	13 years ago
Michael Peter Christen	1baf498d59	- show more lines in online log - reverse order is default now	13 years ago
Michael Peter Christen	f2d0418218	because the new PngEncoder had a problem with the PixelGrabber which is caused by a JRE bug, the PixelGrabber had to be circumvented using an own frame buffer which can be read without a PixelGrabber. This resulted in ultra-fast and much less memory-consuming transformation. YaCy images are now generated really fast!	13 years ago
orbiter	276dd6452b	removed warnings	13 years ago
Michael Peter Christen	ce0e5b1e17	- more refactoring / private methods - fix for usage of custom solr field names	13 years ago
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	13 years ago
Michael Peter Christen	b400fc7b4d	fix for file parser problem	13 years ago
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	13 years ago
Michael Peter Christen	6017691522	added an exception catch	13 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	13 years ago
Michael Peter Christen	21fe8339b4	- enhanced generation of url objects - enhanced computation of link structure graphics - enhanced collection of data for link structures	13 years ago
Michael Peter Christen	613cf7da7f	enhancement to post argument parsing - possible fix to zero-filled parameter values	13 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	13 years ago
Michael Peter Christen	a06930662c	replaced some more .getBytes() with UTF8/ASCII.getBytes()	13 years ago
Michael Peter Christen	2f536cb54d	code cleanup: removed unised methods and made more methods and objects private	13 years ago
Michael Peter Christen	584663ae8c	- redesign of solr query construction - fix for solr boosts and location search - fix for number of search results in local search	13 years ago
Michael Peter Christen	a8167e6e5b	clean-up: removed unused methods in kelondro	13 years ago
Michael Peter Christen	24d2ee3c52	- better date ranking - more protection against NPE and time travel effects	13 years ago
Michael Peter Christen	ca313e404f	- if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field)	13 years ago
Michael Peter Christen	24f4ca4d85	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
apfelmaennchen	116f429e35	fix for java.lang.RuntimeException: TableColumnIndex not available...	13 years ago
Michael Peter Christen	1533bfd63b	refactoring	13 years ago
Michael Peter Christen	872f83ebe0	refactoring	13 years ago
Michael Peter Christen	8219a445f3	refactoring	13 years ago
Michael Peter Christen	00c1c777fa	refactoring	13 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	13 years ago
Michael Peter Christen	e072632a54	no complaints about memory if the database is empty	13 years ago
Michael Peter Christen	e65cecc419	- updated lucene libraries to 3.6.1 - added lucene-grouping which enables faceted search; try this: http://localhost:8090/solr/select?q=:&start=0&rows=3&facet=true&facet.field=host_s	13 years ago
Michael Peter Christen	4d29f59a27	removed warnings	13 years ago
Michael Peter Christen	8c099d2106	Merge remote-tracking branch 'origin/master' Conflicts: htroot/api/ymarks/import_ymark.java source/de/anomic/data/ymark/YMarkEntry.java source/de/anomic/data/ymark/YMarkTables.java	13 years ago
apfelmaennchen	d31a632951	- added dmoz RDF dump importer - added indexing to Tables columns to support larger bookmark collections - added RDF output (HTTP) for public bookmarks at /YMarks.rdf - YMarkRDF also provides a Jena RDF Model as "internal" API - various other changes/fixes for YMarks (mainly backend)	13 years ago
Michael Peter Christen	d8425e6809	added collections to crawl monitor	13 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	13 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	13 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	13 years ago
orbiter	2094df2e4e	- correct length computation for BStringObject (bugfix suggested by apfelmaennchen) - using ASCII for string conversion for Strings generated from Integer	13 years ago
Michael Peter Christen	4716546ef5	- reduced memory usage in index transmission using a transformation of Node to Row objects - removed peerDeparture in solr remote search in case that peer does not answer (this may be normal because it is allowed to switch this off)	13 years ago
Michael Peter Christen	06b0081fdc	fix for NPE during host navigation computation	13 years ago
orbiter	acb9f04e80	removed unused classes	13 years ago
Michael Peter Christen	755f5e76cf	removed strange assert statements and simplified code in metadata transformation	13 years ago

1 2 3 4 5 ...

641 Commits (1faa045dc1e63852562f335600f565b22e1d22f8)