yacy_search_server

Commit Graph

Author	SHA1	Message	Date
luccioman	4dd9c0d5d9	Merge from main repository	9 years ago
reger	e37a4f0b3d	prevent metadata records in index w/o valid url by throwing MalformedURL exception on URIMetadataNode creation	9 years ago
reger	41c4eade51	extract modification date from vCard (vcfParser)	9 years ago
reger	8768896975	extract lastmodified from openoffice doc set lastmod date in office document parsers	9 years ago
Michael Peter Christen	c40c302748	when many crawl queues are generated, this NPE can occur; probably caused as concurrency issue: W 2015/09/05 14:09:10 ConcurrentLog java.lang.NullPointerException java.lang.NullPointerException at java.util.TreeMap.rotateRight(TreeMap.java:2239) at java.util.TreeMap.fixAfterInsertion(TreeMap.java:2271) at java.util.TreeMap.put(TreeMap.java:582) at net.yacy.kelondro.table.Table.<init>(Table.java:235) at net.yacy.crawler.HostQueue.openStack(HostQueue.java:229) at net.yacy.crawler.HostQueue.getStack(HostQueue.java:204) at net.yacy.crawler.HostQueue.push(HostQueue.java:397) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:237) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:184) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:355) at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)	9 years ago
reger	367fe388b9	fix exception throw after sendError in DefaultServlet - reduce debug exception logs in crawler	9 years ago
luccioman	9752bd5f88	Added utils to help translation without launching full YaCy application : - translate all source files with a locale - list all non translated files with a locale	9 years ago
luccioman	2f0f0180e2	Added a function to list files recursively.	9 years ago
luccioman	7e4c1d2282	Translator refactoring : - deleted useless new StringBuilder allocation - use of a new reusable FileNameFilter - added javadoc	9 years ago
reger	802ccaead6	fix init of error cache, use latest faildates => load_date_dt	9 years ago
reger	dba7f15073	apply same size constrain on result image from doc as for linked images see `19f1308bf0`	9 years ago
reger	4cf875336c	complete TODO: getFileExtension handle dot in query part + testcase	9 years ago
sixcooler	87e4abe393	fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has moved and was not cleared anymore. This results in an huge fieldcache. (http://lucene.apache.org/#highlights-of-the-lucene-release-include https://issues.apache.org/jira/browse/LUCENE-5666) Here I try to use DovValues where it is possible. For this I used the Api-Scheme as new basis für the Solr-Schema. This needs at least a complete optimization of the Solr-Index to get a smaller FieldCache. Everything that is indexed with these setting will not use the Fieldcache at all.	9 years ago
reger	eaf0e8ff2c	start recording/indexing pixel size for image document as for linked images	9 years ago
reger	c33229fc0c	check mime prior to ext for metadata modification for images	9 years ago
reger	19f1308bf0	enforce th result images limit to > 16x16px for linked images http://mantis.tokeek.de/view.php?id=594	9 years ago
reger	0e4ba0360b	fix NPE on .yacyh result url of disconnected peer (cleanup yacyshare remaining)	9 years ago
reger	7ed812a2bf	log missing seed.port in favour of exception to prevent repeating throws	9 years ago
reger	206883f80d	fix: Preserve protocol in url proxy to connect to http/https. Display warning if https target is viewed over http	9 years ago
reger	f7b0b3b7b3	avoid runtime exception by earlier testing for seed.ip=null	9 years ago
Michael Peter Christen	906b5fd742	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	8f90767889	fix for filesystem crawl	9 years ago
sixcooler	a3dd4be749	added / corrected charste to be 1.7 compatible. @Orbiter: please check is this is ok for you	9 years ago
Michael Peter Christen	8028410ab7	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	df3314ac1a	added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. New wordings: - a context is a class where different categories are possible. The context name is equal to a facet name. - a category is a facet type within a facet navigation. Each context must have several categories, at least one custom name (things you want to discover) and one with the exact name "negative". To use this, you must do: - for each context, you must create a directory within DATA/CLASSIFICATION with the name of the context (the facet name) - within each context directory, you must create text files with one document each per line for every categroy. One of these categories MUST have the name 'negative.txt'. Then, each new document is classified to match within one of the given categories for each context.	9 years ago
reger	1409cabe8b	exclude more default search fields from text copy to text_t for metadata index documents	9 years ago
reger	e2e73258ca	remove obsolete interface SearchAccumulator and unused SRURSSConnector Thread inheritance	9 years ago
Michael Peter Christen	dbbad23e12	removed warnings	9 years ago
Michael Peter Christen	500cfa9457	enhanced logging	9 years ago
Michael Peter Christen	c14bc8d9b7	revert of fq transformation (recent fix)	9 years ago
Michael Peter Christen	203df5a750	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
reger	fa08ca207e	! finish running crawls before applying ! Allow crawl urls up to 2048 character fix for http://mantis.tokeek.de/view.php?id=575	9 years ago
reger	ee77f24e52	use some more declared HeaderFramework constants	9 years ago
Michael Peter Christen	11a848da5a	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	b94bd7f20a	a collection of search query enhancements: - fixed superfluous space in query field list - fixed filter query logic - removed look-ahead query which caused that each new search page submitted two solr queries - fixed random solr result orders in case that the solr score was equal: this was then re-ordered by YaCy using the document hash which came from the solr object and that appeared to be random. Now the hash of the url is used and the score is additionally modified by the url length to prevent that this particular case appears at all.	9 years ago
reger	dbe2594c38	replace deprecated myPublicLocalIP() in AbstractRemoteHandler	9 years ago
reger	6d3534e725	remove unused Transmission hit counter	9 years ago
reger	cb67eb7baf	use more absolute path for config file opening as suggested in pull request 5 (https://github.com/yacy/yacy_search_server/pull/5)	9 years ago
Michael Peter Christen	1ccbf739b1	added bayes filter from Philipp Nolte, originally taken from https://github.com/ptnplanet/Java-Naive-Bayes-Classifier and modified inside the loklak.org project. After optimization in loklak it was inserted into the net.yacy.cora.bayes package. It shall be used to create custom search navigation filters. The original copyright notice was copied from the README.md from https://github.com/ptnplanet/Java-Naive-Bayes-Classifier/blob/master/README.md The original package domain was de.daslaboratorium.machinelearning.classifier	9 years ago
Michael Peter Christen	1bced1ae60	using latest enhanced (un/)gzip methods from loklak for yacy	9 years ago
Michael Peter Christen	3e6657288d	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	de8cfbe1d7	added export option to export the fulltext of the search index text only	9 years ago
reger	2fb6ebe88a	move java environment parameter setting disabling SNI (Server Name Indicator) support for https connections from code to startup script allowing admin to ~easy/transparent alter the YaCy default FALSE setting. Background: some user report problem with connecting/crawling some sites via https which require SNI support (by default switched off in YaCy). On the other hand systems not demanding SNI support are sometimes not properly configured and due to a bug/feature in java 1.7 connection is aborted. The later is more often the case, so the default is still fine. With the java start parameter expert user can no alter the startparameter to -Djsse.enableSNIExtension=true (java default) if they crawl more hosts requiring SNI support. The alternative to let YaCy try both during https handshake (deep inside the httpclient) is not pursut at this time.	9 years ago
Michael Peter Christen	fbeae20b3a	try a healing of the cache if the index file is corrupted	9 years ago
Michael Peter Christen	03ea723889	added log lines for query performance profiling	9 years ago
Michael Peter Christen	0e87a99ab8	more fixes for special windows paths	9 years ago
Michael Peter Christen	e5b6424eed	patch for bad windows file paths	9 years ago
Michael Peter Christen	0aa6fcf259	remove old vocabularies and synonyms before adding new	9 years ago
Michael Peter Christen	289018b559	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	7b412e8c07	added msg (text emails) format; should be handled by html parser.	9 years ago

1 2 3 4 5 ...

3356 Commits (711183bd72e290fa70069c32016385d050fe866e)