yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	24c9bb35f7	extended the Scheduler: introduced scheduled events - an event type (once, regular) can be selected - for this event type, a fixed time can be selected. This may be either directly after startup or at one of the full hours at a day (==25 options) The main point about this feature is the opportunity to start an action directly after startup. That makes it possible to create YaCy distributions which, after started at the first time, start to index parts of the intranet/internet by itself.	12 years ago
reger	ad71747525	fix: set defaul language to "en"	12 years ago
orbiter	712cc37c40	if maxFileSize < 0 then the file size limit is without limit.	12 years ago
Michael Peter Christen	8fc3679c66	using more pre-compile pattern for split methods	12 years ago
Michael Peter Christen	5e182a566f	- added another enumeration method in kelondro data structure to get a more random access to data for the balancer - added random access inside the balancer	12 years ago
Michael Peter Christen	d6b82840f8	added a feature to find similarities in documents. This uses an enhanced version of the Nutch/Solr TextProfileSignatue. As a result, a signature of the document is written to the solr search index. Additionally for each time when a signature is written, it is checked if the singature exists already in the index. If the signature does not exist, the document is marked as unique. The unique attribute can now be used to sort document lists and bring duplicates to the end of a result list. To enable this, a large portion of the search api to Solr had to be changed. This affected mainly caching of 'exists' searches to enhance the check for existing signatures and do this without actually doing a solr query. Because here the first time a long number is used as value in the Solr store, also the value naming in the YaCySchema had to be adopted and normalized. This caused that many files had to be changed.	12 years ago
Michael Peter Christen	f5ca5cea44	- added field options to all solr queries. This can be used to restrict the actual data which is fetched from solr. - used the new field options to reduce generic options like getting the load date or the count of search results. should increase overall speed - used the new field options to reduce overhead in the host browser during aquisition of links. - used the field options to make checking of links in crawler faster - if the crawler is paused, the crawl queue is not cleaned	12 years ago
Michael Peter Christen	832eead998	Merge remote-tracking branch 'regerdev/master'	12 years ago
Michael Peter Christen	570e42c4e3	fix for filetype naviagtor	12 years ago
reger	633fbe9188	Fix Metadata handling - language default on missing lang property to "uk" (fix set to nothing) - language set to TLD (added call to existing language calculation from TLD) - coordinate number exception on possible lat/lon content of "NaN,NaN" adjust Netbeans IDE classpath (for Solr/Lucene 4.0.0 jars)	12 years ago
Michael Peter Christen	c5f67a5d6d	fixed a problem with local search from solr results: now all results from solr are shown (again)	12 years ago
Michael Peter Christen	f8f05ecba7	- added a delete button in host browser to delete a complete subpath - removed storage of default collection name - default is now "user" - made stacking of crawl start points concurrently	12 years ago
Michael Peter Christen	a33e2742cb	- removed unnecessary synchronized and deadlock in crawler - removed problem with monitoring object on Balancer.wait - added missing user agent settings	12 years ago
orbiter	354f0d9acd	moved static method from ClusteredScoreMap to MapDataMining because it was not used in the ClusteredScoreMap class but only in MapDataMining	12 years ago
Michael Peter Christen	1baf498d59	- show more lines in online log - reverse order is default now	12 years ago
Michael Peter Christen	f2d0418218	because the new PngEncoder had a problem with the PixelGrabber which is caused by a JRE bug, the PixelGrabber had to be circumvented using an own frame buffer which can be read without a PixelGrabber. This resulted in ultra-fast and much less memory-consuming transformation. YaCy images are now generated really fast!	12 years ago
orbiter	276dd6452b	removed warnings	12 years ago
Michael Peter Christen	ce0e5b1e17	- more refactoring / private methods - fix for usage of custom solr field names	12 years ago
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	12 years ago
Michael Peter Christen	b400fc7b4d	fix for file parser problem	12 years ago
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	12 years ago
Michael Peter Christen	6017691522	added an exception catch	12 years ago
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	12 years ago
Michael Peter Christen	21fe8339b4	- enhanced generation of url objects - enhanced computation of link structure graphics - enhanced collection of data for link structures	12 years ago
Michael Peter Christen	613cf7da7f	enhancement to post argument parsing - possible fix to zero-filled parameter values	13 years ago
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	13 years ago
Michael Peter Christen	a06930662c	replaced some more .getBytes() with UTF8/ASCII.getBytes()	13 years ago
Michael Peter Christen	2f536cb54d	code cleanup: removed unised methods and made more methods and objects private	13 years ago
Michael Peter Christen	584663ae8c	- redesign of solr query construction - fix for solr boosts and location search - fix for number of search results in local search	13 years ago
Michael Peter Christen	a8167e6e5b	clean-up: removed unused methods in kelondro	13 years ago
Michael Peter Christen	24d2ee3c52	- better date ranking - more protection against NPE and time travel effects	13 years ago
Michael Peter Christen	ca313e404f	- if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field)	13 years ago
Michael Peter Christen	24f4ca4d85	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
apfelmaennchen	116f429e35	fix for java.lang.RuntimeException: TableColumnIndex not available...	13 years ago
Michael Peter Christen	1533bfd63b	refactoring	13 years ago
Michael Peter Christen	872f83ebe0	refactoring	13 years ago
Michael Peter Christen	8219a445f3	refactoring	13 years ago
Michael Peter Christen	00c1c777fa	refactoring	13 years ago
orbiter	563d584420	removed more dependencies in cora from kelondro	13 years ago
Michael Peter Christen	e072632a54	no complaints about memory if the database is empty	13 years ago
Michael Peter Christen	e65cecc419	- updated lucene libraries to 3.6.1 - added lucene-grouping which enables faceted search; try this: http://localhost:8090/solr/select?q=:&start=0&rows=3&facet=true&facet.field=host_s	13 years ago
Michael Peter Christen	4d29f59a27	removed warnings	13 years ago
Michael Peter Christen	8c099d2106	Merge remote-tracking branch 'origin/master' Conflicts: htroot/api/ymarks/import_ymark.java source/de/anomic/data/ymark/YMarkEntry.java source/de/anomic/data/ymark/YMarkTables.java	13 years ago
apfelmaennchen	d31a632951	- added dmoz RDF dump importer - added indexing to Tables columns to support larger bookmark collections - added RDF output (HTTP) for public bookmarks at /YMarks.rdf - YMarkRDF also provides a Jena RDF Model as "internal" API - various other changes/fixes for YMarks (mainly backend)	13 years ago
Michael Peter Christen	d8425e6809	added collections to crawl monitor	13 years ago
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	13 years ago
Michael Peter Christen	316b5fe116	- added a solr type definition verifier - fixed type definition found by the verifier - added multivalue-string fields for solr with extension 'sxt' - added multivalue-integer fields for solr with extension 'val' - renamed some solr attributes from txt to sxt - changed solr query line to an explicit AND/OR structure - added a country code second level domain list to Domains class; with parser - added a host string parser to get domain class name, country-code second-level domain and subdomain out of it - removed old coordinate attributes	13 years ago
Michael Peter Christen	e8acd542b5	- added faceted drill-down for host and geolocation to solr queries - added a new geolocation field to index schema, the old values are migrated if possible	13 years ago
orbiter	2094df2e4e	- correct length computation for BStringObject (bugfix suggested by apfelmaennchen) - using ASCII for string conversion for Strings generated from Integer	13 years ago
Michael Peter Christen	4716546ef5	- reduced memory usage in index transmission using a transformation of Node to Row objects - removed peerDeparture in solr remote search in case that peer does not answer (this may be normal because it is allowed to switch this off)	13 years ago
Michael Peter Christen	06b0081fdc	fix for NPE during host navigation computation	13 years ago
orbiter	acb9f04e80	removed unused classes	13 years ago
Michael Peter Christen	755f5e76cf	removed strange assert statements and simplified code in metadata transformation	13 years ago
orbiter	ee01c12e56	fixes for putDocument and putMetadata	13 years ago
Michael Peter Christen	f9fc5cfaba	better check for bad urls in url transmission	13 years ago
Michael Peter Christen	40c0856489	refactoring	13 years ago
Michael Peter Christen	9bece5ac5f	enhanced snippet fetch - removed a bug that caused documents to be parsed even if a solr text was available	13 years ago
Michael Peter Christen	395b78a0d8	using the solr search index to concurrently search within solr and the rwis during local search requests.	13 years ago
Michael Peter Christen	e5ef840f40	- renamed DoubleSolrConnector to MirrorSolrConnector and added a hit/miss/document cache to the MirrorSolrConnector. - more abstraction to SolrDocument in Connector interface - bugfixes in Solr field reader	13 years ago
Michael Peter Christen	94a334f128	another fix to the Solr metadata reading process and to the shutdown process	13 years ago
Michael Peter Christen	b51df6c7e8	- added coordinate storage in solr schema - fixed shutdown process - fixed some solr-to-metadata reading - added a large number of metadata attributes in ViewFile.html	13 years ago
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	13 years ago
Michael Peter Christen	dcc72799c4	better abstraction for result writers using controlled vocabularies and URIRefs	13 years ago
Michael Peter Christen	a12f693ec9	added two response writer for embedded solr interface: a rss/opensearch writer and an enhanced solr xml writer. The enhanced solr writer has less configuration overhead than the original writer and should by slightly faster. The rss/opensearch writer is at this time slightly incomplete compared with the already existing rss search result form YaCy and also snippets are missing at this time. To test the new interface, open for example: http://localhost:8090/solr/select?wt=rss&q=olympia The wt-code for the new result writers are= wt=rss for opensearch wt=exml for the enhanced solr xml writer. Additionally, the SRU search parameters had been added to the solr interface which can now also be used for a normal solr/xml search.	13 years ago
sixcooler	f32aa9a49c	prevent merge of blobs that can't be handled in memory	13 years ago
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	13 years ago
Michael Peter Christen	e432bb9cd9	better calculation of possible saving in HeapReader index data structure	13 years ago
Michael Peter Christen	9549984c65	documentation/comments	13 years ago
Michael Peter Christen	826967513b	changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts.	13 years ago
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	13 years ago
Michael Peter Christen	f0a079ac9f	allow larger log entries	13 years ago
Michael Peter Christen	784a4abb18	enhancement in internal data organization which should generate less synchronizations in database access	13 years ago
Michael Peter Christen	f78ce93a80	collection of speed and memory saving hacks	13 years ago
orbiter	a196f24f60	prevent enqueueing of non-loggeable logging entries	13 years ago
orbiter	482afed07c	reduced logging overhead (a bit)	13 years ago
orbiter	e76159040b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	13 years ago
Michael Peter Christen	83da68c4c1	fixed a memory leak inside the logger which appeared if the log was writter faster that the logger is able to print this out to its out stream. A very large collection of unwritten log outputs had been seen during strong crawling. The new ArrayBlockingQueue is limited to prevent this case.	13 years ago
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	13 years ago
Michael Peter Christen	1addbc792c	use less memory for md5 cache	13 years ago
Michael Peter Christen	f32de94723	more logging	13 years ago
Michael Peter Christen	8efc1c1078	- fixed a memory leak (or bad usage) during parsing/snippet fetch - more logging for errors	13 years ago
Michael Peter Christen	b0c408788b	made class methods static where possible	13 years ago
Michael Peter Christen	5bd3c90907	- removed unnecessary semicolons - added default case for switch	13 years ago
Michael Peter Christen	132afaf687	removed unaccessible code	13 years ago
Michael Peter Christen	7c1ba99755	removed more unused method parameters	13 years ago
Michael Peter Christen	83701a1b4c	removed unused ImageReference package	13 years ago
Michael Peter Christen	0301aba1e9	removed unused method parameters	13 years ago
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	13 years ago
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	13 years ago
Michael Peter Christen	1481037820	replaced non-generic array with collection	13 years ago
Michael Peter Christen	613b45f604	- better data structures in secondary search - fixed a big memory leak in secondary search	13 years ago
Michael Peter Christen	8a82609360	- smaller caches to save memory - close cloneable iterators to free memory	13 years ago
Michael Peter Christen	ce8d4b87d9	fixes for new eclipse 'Juno' warning 'Resource leak'.	13 years ago
Michael Peter Christen	0c345d1559	giving threads name so its easier to see whats happening during debugging and within a thread dump	13 years ago
Michael Peter Christen	b9d42fd9c8	using com.google.common.io.Files instead of homebrew methods	13 years ago
Michael Peter Christen	de3ef8ad73	removed unimportant warnings	13 years ago
Michael Peter Christen	9264d8b4af	removed old navigation practice using subject tags in favor of triplestore-tags	13 years ago
Michael Peter Christen	61bb52d55c	- using http://purl.org/dc/terms/references to refer from an auto-annotated document to a 'pseudo-linked' document which has an url created with an object-prefix as defined in the vocabulary file	13 years ago
Michael Peter Christen	8b53771db2	changed behavior of navigation processing: - vocabulary annotation is not done any more into the metadata of urldb - vocabularies are written into the jena triplestore using a rdf vocabulary - vocabularies for rdf tripel must be updated; refactoring done - with the new navigation tags in the triplestore a faster pre-urldb-lookup is possible: navigation is processed now within the RWI during pre-ranking retrieval - added also a Owl vocabulary stub to add the plain-text url to the triplestore using the owl:sameas predicate	13 years ago
Michael Peter Christen	bef823c247	close the reader if finished	13 years ago
cominch	9cbfc1a1c0	augmentedProxy, which forwards every proxy request to a rewrite engine to customize existing webpages. originally implemented by Florian Richter. Conflicts: source/de/anomic/http/server/HTTPDProxyHandler.java	13 years ago
Michael Peter Christen	3b992e6b00	using utf8 String compression in Webstructure database	13 years ago
Michael Peter Christen	2280a7b276	- changed initialization order to prefer allocation of memory for table files first - bugfixes in memory amount calculation	13 years ago
Michael Peter Christen	0746308bc2	only the metadata tables shall be able to use the tail cache	13 years ago
Michael Peter Christen	7ec9bef0c3	fix for OOM	13 years ago
Michael Peter Christen	41c02cb10e	- less restrictions for usage of Table RAM copy - new limit to use the table copy (instead of flag): 400MB available. If less is available, then a copy is never used. If more is available, then it can be used if there is a remaining space of at least 200MB - flush caches more often: flush the Digest cache	13 years ago
Michael Peter Christen	b8f56a9803	npe bugfix	13 years ago
Michael Peter Christen	ba10caf89a	lazy initialization of database tables	13 years ago
Michael Peter Christen	701b9a28a0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/PerformanceMemory_p.java	13 years ago
Michael Peter Christen	10c9c17d51	fixed handlemap spread factor and null iterator handling	13 years ago
Michael Peter Christen	b0095c8d3c	flush the compressor cache when a cleanup is done	13 years ago
Michael Peter Christen	96e9d77270	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/sorting/WeakPriorityBlockingQueue.java	13 years ago
Michael Peter Christen	00f2df1120	a variety of possible memory leak fixes	13 years ago
Michael Peter Christen	3dd8376825	added automatic cleaning of cache if metadata and file database size is not equal. It might happen that these data is different because one of that caches is cleaned after a while or when it is too big. The metadata is then not cleaned, but now wiped after a checkup process at every application start. This should cause a bit less memory usage.	13 years ago
Michael Peter Christen	6bb07afcc3	accept also files with other file prefix; used to read 'foreign' cache files	13 years ago
Michael Peter Christen	461a0ce052	removed warnings	13 years ago
Michael Peter Christen	407fdf6968	more bug fixes and performance hacks for search process	13 years ago
Michael Peter Christen	a1fe65b115	performance hacks	13 years ago
Michael Peter Christen	e0d8643226	- performance hacks - added log warnings in case that search processes run into time-out situations - better concurrency for Integer formatter (used a non-synchronized formatter before) - bugfix for search termination (a poison pill was missing) - added timeout parameters for search (again) -> target is, that they are never reached.	13 years ago
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	13 years ago
Michael Peter Christen	1f48d1528b	performance hacks	13 years ago
Michael Peter Christen	10da7335ea	performance hack: use a hash cache for all hashes that are computed by a byte array. If this hash is used in a HashMap (which is very often the case) then this hack eliminates a lot of re-computations of the same hash.	13 years ago
Michael Peter Christen	7c1feefb28	introduced a default 10 second time-out in rwi normalization time uring search process to prevent endless deadlocks after a very long running search	13 years ago
Michael Peter Christen	8d997d55b6	better logging	13 years ago
Michael Peter Christen	43c2c6e588	better logging	13 years ago
Michael Peter Christen	c15fcde1c8	add-on to latest commit	13 years ago
Michael Peter Christen	cf47d94888	performance hack to parse numbers inside of substrings without actually generating a substring. This avoids the allocation of a String object ech time a substring is parsed. Should affect CPU load during RWI transmission.	13 years ago
Michael Peter Christen	7e0ddbd275	added a "fromCache" flag in Response object to omit one cache.has() check during snippet generation. This should cause less blockings	13 years ago
Michael Peter Christen	c6a09eab0b	synchronization needed	13 years ago
reger	6696cb1313	bugfix: lookup of peernames no result for active peer in page IndexControlRWIs_p.html -> Transfer RWI to other Peer SeedDB.lookupByName searche for lowercase peerNames, while MapColumnIndex.getIndex uses peername as is in the keyset. Changed the index init to insert lowercase peer names as key	13 years ago
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	13 years ago
Michael Peter Christen	acf8d521a2	fix for http://bugs.yacy.net/view.php?id=126	13 years ago
Michael Peter Christen	fa735f4f04	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	3e1bc9477f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
Michael Peter Christen	6f8a2fef1f	small speed enhancement using a column factory	13 years ago
Roland 'Quix0r' Haeder	d10627d591	More sync in close() methods Conflicts: source/net/yacy/kelondro/logging/GuiHandler.java source/net/yacy/kelondro/workflow/InstantBusyThread.java	13 years ago
Roland 'Quix0r' Haeder	fbb946f913	Made a method static (Eclipse suggested it), removed unused import, pk=null check does now output a warning in logfile	13 years ago
Michael Peter Christen	89142d1e8d	removed (not all) warnings	13 years ago
Michael Peter Christen	15db703808	added missing serialization to remove all warnings	13 years ago
Michael Peter Christen	1795a7325b	made HandleSet serializable	13 years ago
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	13 years ago
Michael Peter Christen	0cf3d36eae	more tolerance in case of corrupted file	13 years ago
Michael Peter Christen	34f4225d7e	less 'wellformed' calls without asserts	13 years ago
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	13 years ago
Michael Christen	e32055aa15	added stub classes for - a new database for url reference data ('seen links') - a new database extending the references to the full url metadata attributes set which shall replace the old metadata database if it is finished - migration help classes stub to use old and new metadata databases simultanously	13 years ago
Michael Peter Christen	2fc8ecee36	ConcurrentLinkedQueue has a VERY long return time on the .size() method. See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html and the following test programm: public class QueueLengthTimeTest { public static long countTest(Queue<Integer> q, int c) { long t = System.currentTimeMillis(); for (int i = 0; i < c; i++) { q.add(q.size()); } return System.currentTimeMillis() - t; } public static void main(String[] args) { int c = 1; for (int i = 0; i < 100; i++) { Runtime.getRuntime().gc(); long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c); Runtime.getRuntime().gc(); long t2 = countTest(new LinkedBlockingQueue<Integer>(), c); Runtime.getRuntime().gc(); long t3 = countTest(new ConcurrentLinkedQueue<Integer>(), c); System.out.println("count = " + c + ": ArrayBlockingQueue = " + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = " + t3); c = c * 2; } } }	13 years ago
Michael Peter Christen	213c8d97f2	use less proccesses in process pool	13 years ago
Michael Peter Christen	c639248c23	protection against strange answers from remote peers during search	13 years ago
Michael Peter Christen	1cd711d005	added classes for citation references (for new citation ranking)	13 years ago
Michael Peter Christen	e0f1e7d904	added new citation reference data structure that shall be used for a citation ranking	13 years ago
Michael Peter Christen	e18a4f6b74	more tolerant merge iterator	13 years ago
Michael Peter Christen	7e4e3fe5b6	free some memory after parsing html	13 years ago
Michael Peter Christen	4540174fe0	memory hacks	13 years ago
Michael Peter Christen	b4409cc803	small redesign of blob column index and usage	13 years ago
Michael Peter Christen	d5c1f2746e	performance hack	13 years ago
Michael Peter Christen	803963aebd	performance hack: better space grow in CharBuffer (speeds up html parser)	13 years ago
Michael Peter Christen	e2f8f263e8	changed storage of search words: keep order	13 years ago
Michael Peter Christen	0b67a0a5d8	added a column index for tables in blob files. This is heavily used during receiving of DHT submissions and when answering remote search requests. Both events together may have caused IO-deadlocking and this commit shall fix that.	13 years ago
Michael Peter Christen	e3bb73c3d6	serialized some database access methods	13 years ago
Michael Peter Christen	2ea585d616	fix for host navigator	13 years ago
Michael Peter Christen	ef78f22ee1	performance hack	13 years ago
Michael Peter Christen	a02fdf8625	better error messages	13 years ago
Michael Peter Christen	c6ba44468e	timeout = 5000 instead 3000	13 years ago
low012	8776b84c10	*) small fix to make password change function of reconfigureYACY.sh work again	13 years ago
Michael Peter Christen	4901cee3cc	suppress auto-tagged subject entries when sending out or receiving metadata from other peers	13 years ago
sixcooler	985b78cf89	correct 'avaiable()' to use max of young / eden	13 years ago
sixcooler	4da8746275	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	13 years ago
sixcooler	c9aaa9e00a	respect non-reserved Memory in GenerationMemoryStrategy and enable it again	13 years ago
Michael Peter Christen	37f2d1b3e9	replaced Thread initialization with ExecutorService pool for delete method. This is much faster and produces less blocking when using the Compressor class which is used by the HTCache. I.e. picture search is much faster now.	13 years ago
Michael Peter Christen	0d6176804b	emergency disabling of GenerationMemoryStrategy because of non-working available-method	13 years ago
Michael Peter Christen	87f0210480	enriched log output to find NPE in HeapReader	13 years ago
Michael Peter Christen	254adea51c	small fixes	13 years ago
Michael Peter Christen	49be60a7c8	WorkflowProcess is forced to make small pauses if shortMemoryStatus is reached.	13 years ago
Michael Peter Christen	b7bb84c0bb	set a limit to CharBuffer object size to fight against bad/too large content	13 years ago
Marek Otahal	72adbeae90	!Important: move from Hashtable to HashMap Hashtable is an obsolete collection v1, now since v2 offers HashMap with same or better functionality. Please review, almost all code was already moved, so only a few changes. That is not the issue, but I found notices that some (ugly big) helper classes had to be created in past to compensate missing Hashtable's functionality. I'd like input if we can remove some of them. look for //FIX: if these commits Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Marek Otahal	f75b5e40e0	little fix in copy() Signed-off-by: Marek Otahal <markotahal@gmail.com>	13 years ago
Michael Christen	216a287a85	Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r Conflicts: source/de/anomic/crawler/CrawlQueues.java	13 years ago
Michael Christen	20962a4ed7	added metadata node stub for metadata from blobs	13 years ago
Michael Christen	575dbbaa93	enhancements in Blob retrieval: try to use less CPU resources by testing a blog first that most certainly has wanted entries.	13 years ago
Roland 'Quix0r' Haeder	6d4e08ed06	Rewrote filesize() to (hopefully) avoid a NPE, rewrote Blacklist class to concurrent classes to avoid a CME	13 years ago
Roland 'Quix0r' Haeder	fa08ed5ae5	Fixed a lot CHMOD rights (no need for execute flag on .java/.html) and introduced local/remote crawl size ratio based check	13 years ago
Michael Christen	9e5894c784	Removed handling of components objects for URIMetadataRows. This is a preparation to replace this rows with nodes from the node store.	13 years ago
Michael Christen	c04bfaa51b	refactoring	13 years ago
Michael Peter Christen	613ab6a69d	added BEncodedHeapBag and BEncodedHeapShard which are storage container for a new metadata store. An abstraction of the content for this storage is defined with MapStore. A MapStore is an abstraction of a RDF Node store.	13 years ago
Michael Christen	6fecd0db88	one more performance hack to prevent costly md5 computation	13 years ago
Michael Christen	e13441b069	better digest pool size (smaller by default but unlimited)	13 years ago
Michael Christen	1f4afb4dc0	performance hacks	13 years ago
Michael Christen	e9dc99fe15	added rules to set specific RWIs as private RWIs which are not transmitted to remote peers. This will be used for private index copies and phonetic indexes.	13 years ago
Michael Peter Christen	0bcef2d156	added feature as requested in http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461 The search can now be configured with a non-display host list. the search will always exlude the given list of host unless they are requested directly using the host navigation	13 years ago
Michael Christen	204c29f010	small bugfixes for search result display and cache display	13 years ago
Michael Christen	078fcde0dd	bad initialization	13 years ago
Michael Christen	14e45e90fd	patch for a bug that I don't understand by now.	13 years ago
Michael Christen	86b3385847	fixed a deadlock during secondary remote search	13 years ago
Michael Christen	404758698a	less io operations	13 years ago
Michael Christen	044f83feed	added some pauses into the search process which shall produce better-ranked search results. without that pauses the result page will only contain links from the peer that answers first which is not a good average picture of all the peers that provided results	13 years ago
sixcooler	448656087a	probably fix for http://bugs.yacy.net/view.php?id=94 (don't know how to force this exception)	13 years ago
Michael Christen	d35bdc2df6	removed npe	13 years ago
Michael Christen	e7e429705a	- less automatic indexing after a search (needs to reset the default crawl profiles) - fix for concurrency problem in storage of serverSwitch Properties - markup update	13 years ago
Michael Christen	9cd469e6d6	added pull request from als plus an NPE fix	13 years ago

... 2 3 4 5 6 ...

794 Commits (f901e7d3cf5e9b963cb849087234621f3f8cecd5)