yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	6ebc2451a9	Merge pull request #14 from luccioman/master Translator refactoring : no more regular expression processing	9 years ago
reger	2f51baff4f	check for loading error (includs unsupported formats) to prevent blank thumbnail display in image search because of not handled source which don't load on click. Now the cross icon indicates the problem (inlcuding not supported format)	9 years ago
luc	5578886f6f	Merge branch 'master' of https://github.com/luccioman/yacy_search_server.git	9 years ago
luc	c38d6c1f37	Correction for mantis 535: inurl: parameter doesn't work on URLs with upper-case letters	9 years ago
reger	52e3eb4ce8	harmonize/correct assignment to Ymarkmeta.mime replace use of deprecated	9 years ago
Michael Peter Christen	87f358058e	Fix for index entries which have id's not computed as hash from the url. This makes it possible to operate with outside-computed url hashes in enterprise environments not using the build-in crawler from YaCy.	9 years ago
reger	3f2b8ab5e5	optionally include mime in p2p url exchange string if doctype decodes to ambiguous mime and default conversion is not equal to original	9 years ago
reger	a3195d78ae	add Portuguese month names to date recognition	10 years ago
reger	d2cc11ea8f	fix html parser taking <style> content as text. Noticed some result description contain css content from style tag. Added <style> to tag list to scrape it's content not as text + test case included	10 years ago
Michael Peter Christen	5f706797cb	patch for a bug inside of solr since solr 5.0 when using a boost function with a numeric date field: "unexpected docvalues type NUMERIC for field 'last_modified' (expected one of [SORTED, SORTED_SET]). Use UninvertingReader or index with docvalues." This is a well-known bug inside solr which prevents that now the 'sort by date' in the YaCy search interface can be used. Without this patch no results at all is displayed (since the exception prevents that). Now there is at least a result but it is not ordered properly.	10 years ago
reger	7889fc2389	Hack to prevent Solr issue on partial update on a document containing multivalued date field (regardless if these fields part of update). Switch partial update option off in postprocessing if schema contains *_dts (multivalued date field). see http://mantis.tokeek.de/view.php?id=601	10 years ago
reger	b4cbdea1e7	adapt SolrServerConnector.add to handle error on partial update input document. In case of error we deleted the original document and added the new doc to the index. This is not valid for partial update documents (which contain only a subset of the fields). Remove the "delete" error handling step.	10 years ago
reger	98ab655917	on reindex delete index document with invalid url if discovered	10 years ago
reger	1e8369e18b	use a parsed date in Document.toString	10 years ago
luccioman	199b2ce52d	Translator refactoring : to simplify locale files writing, process keys as simple string and no more as regular expressions. Updated all locale files to adapt to refectored Translator : removed useless escaped characters and did minor corrections. Performed minor syntax corrections on some html source files. Added an util to translate all html source files with all locales without launching full YaCy application. Corrected main arguments parsing on other translation utils.	10 years ago
luccioman	4dd9c0d5d9	Merge from main repository	10 years ago
reger	3428b6f13b	improve filtering by filetype navigator. The used url-filter for filetype doesn't require ".ext" resulting in too many matches, add a sort-out filter for RWI results.	10 years ago
reger	e37a4f0b3d	prevent metadata records in index w/o valid url by throwing MalformedURL exception on URIMetadataNode creation	10 years ago
reger	41c4eade51	extract modification date from vCard (vcfParser)	10 years ago
reger	8768896975	extract lastmodified from openoffice doc set lastmod date in office document parsers	10 years ago
Michael Peter Christen	c40c302748	when many crawl queues are generated, this NPE can occur; probably caused as concurrency issue: W 2015/09/05 14:09:10 ConcurrentLog java.lang.NullPointerException java.lang.NullPointerException at java.util.TreeMap.rotateRight(TreeMap.java:2239) at java.util.TreeMap.fixAfterInsertion(TreeMap.java:2271) at java.util.TreeMap.put(TreeMap.java:582) at net.yacy.kelondro.table.Table.<init>(Table.java:235) at net.yacy.crawler.HostQueue.openStack(HostQueue.java:229) at net.yacy.crawler.HostQueue.getStack(HostQueue.java:204) at net.yacy.crawler.HostQueue.push(HostQueue.java:397) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:237) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:184) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:355) at net.yacy.crawler.CrawlStacker.job(CrawlStacker.java:134) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:101) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)	10 years ago
reger	367fe388b9	fix exception throw after sendError in DefaultServlet - reduce debug exception logs in crawler	10 years ago
luccioman	9752bd5f88	Added utils to help translation without launching full YaCy application : - translate all source files with a locale - list all non translated files with a locale	10 years ago
luccioman	2f0f0180e2	Added a function to list files recursively.	10 years ago
luccioman	7e4c1d2282	Translator refactoring : - deleted useless new StringBuilder allocation - use of a new reusable FileNameFilter - added javadoc	10 years ago
reger	802ccaead6	fix init of error cache, use latest faildates => load_date_dt	10 years ago
reger	dba7f15073	apply same size constrain on result image from doc as for linked images see `19f1308bf0`	10 years ago
reger	4cf875336c	complete TODO: getFileExtension handle dot in query part + testcase	10 years ago
sixcooler	87e4abe393	fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has moved and was not cleared anymore. This results in an huge fieldcache. (http://lucene.apache.org/#highlights-of-the-lucene-release-include https://issues.apache.org/jira/browse/LUCENE-5666) Here I try to use DovValues where it is possible. For this I used the Api-Scheme as new basis für the Solr-Schema. This needs at least a complete optimization of the Solr-Index to get a smaller FieldCache. Everything that is indexed with these setting will not use the Fieldcache at all.	10 years ago
reger	eaf0e8ff2c	start recording/indexing pixel size for image document as for linked images	10 years ago
reger	c33229fc0c	check mime prior to ext for metadata modification for images	10 years ago
reger	19f1308bf0	enforce th result images limit to > 16x16px for linked images http://mantis.tokeek.de/view.php?id=594	10 years ago
reger	0e4ba0360b	fix NPE on .yacyh result url of disconnected peer (cleanup yacyshare remaining)	10 years ago
reger	7ed812a2bf	log missing seed.port in favour of exception to prevent repeating throws	10 years ago
reger	206883f80d	fix: Preserve protocol in url proxy to connect to http/https. Display warning if https target is viewed over http	10 years ago
reger	f7b0b3b7b3	avoid runtime exception by earlier testing for seed.ip=null	10 years ago
Michael Peter Christen	906b5fd742	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	10 years ago
Michael Peter Christen	8f90767889	fix for filesystem crawl	10 years ago
sixcooler	a3dd4be749	added / corrected charste to be 1.7 compatible. @Orbiter: please check is this is ok for you	10 years ago
Michael Peter Christen	8028410ab7	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	10 years ago
Michael Peter Christen	df3314ac1a	added a new facet type based on a probabilistic classifier using bayesian filters. This can be used to classify documents during indexing-time using a pre-definied bayesian filter. New wordings: - a context is a class where different categories are possible. The context name is equal to a facet name. - a category is a facet type within a facet navigation. Each context must have several categories, at least one custom name (things you want to discover) and one with the exact name "negative". To use this, you must do: - for each context, you must create a directory within DATA/CLASSIFICATION with the name of the context (the facet name) - within each context directory, you must create text files with one document each per line for every categroy. One of these categories MUST have the name 'negative.txt'. Then, each new document is classified to match within one of the given categories for each context.	10 years ago
reger	1409cabe8b	exclude more default search fields from text copy to text_t for metadata index documents	10 years ago
reger	e2e73258ca	remove obsolete interface SearchAccumulator and unused SRURSSConnector Thread inheritance	10 years ago
Michael Peter Christen	dbbad23e12	removed warnings	10 years ago
Michael Peter Christen	500cfa9457	enhanced logging	10 years ago
Michael Peter Christen	c14bc8d9b7	revert of fq transformation (recent fix)	10 years ago
Michael Peter Christen	203df5a750	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	10 years ago
reger	fa08ca207e	! finish running crawls before applying ! Allow crawl urls up to 2048 character fix for http://mantis.tokeek.de/view.php?id=575	10 years ago
reger	ee77f24e52	use some more declared HeaderFramework constants	10 years ago
Michael Peter Christen	11a848da5a	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	10 years ago

1 2 3 4 5 ...

3372 Commits (23f6294a2dcb3ab4601ce0260e1a16e17e7d7b22)