yacy_search_server

Commit Graph

Author	SHA1	Message	Date
Michael Peter Christen	7466d390b2	small refactoring + do not accept too old peers during bootstrap	8 years ago
reger	8d58a48029	remove wrong log line in CrawlSwitchboard + don't allow CrawlSwitchboard to exit application making network param unused	8 years ago
reger	5aaa057c65	ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read. equalizes behavior with getListString() improves: case were blacklist file contained a undesired empty line, not fixed by blacklist-cleaner.	9 years ago
reger	41c36ffd75	exclude rejected results from result count (by using the resultcontainer.size instead of input docList.size) skip waiting for write-search-result-to-local-index (by removing the Thread.join - which will bring a small performance increase)	9 years ago
reger	d4da4805a8	internal wiki code, require header line to start with markup (to allow something like "one=two" as text) + incl. test case	9 years ago
reger	e952e355a2	have Translator servlet adhoc apply added translation by translating a single file + fix NPE in Translator, coming from translation read by TranslatorXliff which allows null content for not translated key's	9 years ago
reger	b119ff65be	clean out not used Switchboard variables counter indexedPages, const xstackCrawlSlots	9 years ago
reger	223071337b	Translator to take caution of word boundaries to identify text portion to be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it" translation of partial terms/words. Add check of word boundary before and after sourcetext (incl. take care of current praxis for key to be delimetered by > < + add test case	9 years ago
luccioman	009657791e	Merge remote-tracking branch 'origin/master' into LibreJS	9 years ago
luccioman	a73c9327a5	JavaScript License fixes for LibreJS compatibility	9 years ago
reger	0c40401d28	fix MessageBoard test for null data	9 years ago
reger	5b22c63030	Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation. process 1. load default from locales/. 2. load and merge(overwrite) from DATA/LOCALE/. (can be partial translation as it is merged) - include all entries from DATA/LOCAL to be edited in Translator servlet and save just modifications (instead of full list) to DATA/LOCALE This shall make it easy to share modifications.	9 years ago
reger	a2e0f00456	optimize Translator - translateFilesRecursive: load translation once (reduce io), return true on complete success - remove resulting unused translateFiles() variant - translate: use StringBuilder parameter (skip toString conversion) - remove not needed static declaration - upd some javadoc	9 years ago
reger	a6ba1faa80	introduce a translation edit servlet Translator_p.html YaCy's UI text translation This is the 1st rudimentary approach to support the translatio utilities. It allows currently to edit untranslated text and save it in a local translation file in the DATA/LOCALE directory. + refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine + adjust TranslatorXliff to check for local translations in DATA/LOCALE, this includes storing manually downloaded translation files in DATA as well (to keep default untouched) + on 1st call of Translator_p a master tanslation file is generated, checking the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages) Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.	9 years ago
reger	b3c9041f79	remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames to free unused resources	9 years ago
reger	bd8f7c11f5	Use transparent addToCrawler in AutoSearch instead of addToIndex This would likely also be of advantage for RSS import/schedule as following bug-reports suggest http://mantis.tokeek.de/view.php?id=569 http://mantis.tokeek.de/view.php?id=655	9 years ago
reger	f23d8ab47b	fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP() returning null in intranet mode (in servlets: ConfigSearchBox, Load_PHPBB3 +remove unused (const ∅) seed.IPTYPE	9 years ago
reger	bb0076c3dd	fix: assure close inputstream in TranslatorXliff after reading xlf file by using try-wiht-resource block	9 years ago
reger	6384b7d82e	fix NPE in Load_MediawikiWiki servlet in intranet mode - in intranet mode getip returns null causing a NPE - adjust starturl (which was set to http://localip/repository) which is never the start url for the Mediawiki + correct javadoc for seed.getIP()	9 years ago
Michael Peter Christen	596b5dfa59	add the JRE version in the seed. Purpose: identify if it is possible to migrate to new JRE version	9 years ago
reger	4cc38e979d	add InputStream close after reading input file (Vocabulary_p servlet)	9 years ago
reger	6bf9c55584	adjust Solr select servlet to lates bugfix for boostquery (bq param) to split query into multiple parameter on line separator in input query. e.g. split "crawldepth_i_0^10.0 \n crawldepth_i:1^5.0" but allow "url_file_ext_s:jpg OR url_file_ext_s:png" to be unsplitted	9 years ago
Burkhard	9a18e2297b	Merge pull request #51 from JeremyRand/multiple-boost-query Fix multiple boost queries	9 years ago
reger	f0d7b93372	make use and activate autodetect charset in Vocabulary input from file + revert mistake of empty cn.lng	9 years ago
JeremyRand	433217b33e	Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)	9 years ago
JeremyRand	58824dfa6c	Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.	9 years ago
reger	9e94989237	upd to PDFBox 2.0.1	9 years ago
reger	d0a571bed2	del cytag trail for own index.html (save resource not used by default)	9 years ago
reger	de46879637	fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)	9 years ago
reger	24b0fa2a38	extend snapshot Html2Image.pdf2image to use PDFBox image export capability if no external tool installed (and for Win) Resulting jpg are not always perfect (if graphic included) but imho sufficient.	9 years ago
reger	eb2a00b1d8	fix NPE on missing crawldepth_i	9 years ago
reger	efb9f1a8b7	save resource for unused blacklistFiles map	9 years ago
reger	5f113be760	cleanup connectPeer & yacyVersion.latestRelease usage obsolete since `527b3decde`	9 years ago
reger	7097dcbdbd	cleanup hack for partial Solr update on multivalued datefields has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050	9 years ago
reger	f10ea3c155	clean-out unused SwitchboardConstants	9 years ago
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	9 years ago
reger	125b5e26a5	apply bugfix for ChartPlotter from Pullreq 42 https://github.com/yacy/yacy_search_server/pull/42 thanks to otteresk (https://github.com/otteresk)	9 years ago
reger	06ce9ae711	prevent "unchecked conversion" compiler message + include "translate" property in xlf "trans-unit" export	9 years ago
reger	b4a576dbdf	exclude unused protocol param "duetime" (receiver interpretes param "time" only)	9 years ago
reger	3bd6ae8d8b	keep addon/Notepad++ keyword marker on lng export (length of remarks devider line) + harmonize status_p.inc lng text	9 years ago
reger	16837d60c7	fix version in locale version file (it's compared to full version)	9 years ago
reger	0fb01e429e	fix migration, account for ssl port in config (for auto-disable https)	9 years ago
reger	7be1c7a05a	fix logger name	9 years ago
reger	1d940e5a94	upd commons-compress 1.11	9 years ago
reger	7789c32c82	delete crawl queue on init exception (happens occasionally on path name vaiolation and will never get resolved)	9 years ago
reger	f781b9dd47	revert call condition f. migration.installSkins (a bug introduced in `fb8ae14b21` , see comment on that commit )	9 years ago
reger	3adb670f44	remove never used Domains.myHostNames set	9 years ago
reger	6ecc180299	fix rwi doubledom return best (highest) ranking	9 years ago
reger	2343e3f1cd	keep and update existing xlf translation master instead of create new in utility CreateTranslationMasters + small fixes in lng's	9 years ago
reger	a1935f485f	Added utility class CreateTranslationMasters to create a language independant translation master as source to harmonize individual translation files Included a main to create masters in YaCy an xliff format for testing + restrict TranslatorXliff to use only entries with State=translated P.S. used https://open-language-tools.java.net/editor/about-xliff-editor.html to experiement with xlf output (haven't a Pootle avail.)	9 years ago
reger	acaf51b296	keep ConfigLanguage_p as 1st entry in exported translation file + rem untranslated text & some typo fixes in several translations (considering to create a translation master file to harmonize entries)	9 years ago
reger	61c5b6b403	fix empty drop down list in ConfigLanguage after wrong/empty download + add xliff translated attribut + append japanese lng name	9 years ago
reger	4eddabee42	translate Network History screen -> de + remove leftover debug line	9 years ago
reger	90c79014ae	remove unused translator routine which also doesn't handle rel path input + correct some language file match issues	9 years ago
reger	902e79e261	Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map. This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649 Allows longer term also to store translation maps for the htroot files in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ). + added test case creating and comparing xliff file with internal custom prop file. (currently the introduced class is not used in core code)	9 years ago
reger	d9adc2c255	load handler for Transparent Proxy on startup only if feature is activated to save the resources and keep handler chain small if the feature is not used. +add a warning message on settingsack_p page to restart on first activation	9 years ago
reger	ec24a0c85a	add test case for optimized toTokens()	9 years ago
reger	cada24f918	adjust utility ListNonTranslatedFiles for path compare on windows (backslash replace)	9 years ago
reger	fb8ae14b21	make migration version safe	9 years ago
reger	258cd41577	reduce logging (EmbeddedSolrConnector.query) mainly to reduce the frequent metadat checks like > EmbeddedSolrConnector.query QUERY: q={!cache=false raw f=id}xXxXxX&rows=1&start=0&fl=id,load_date_dt (p.s. direct servlet queries logged via AccessTracker.addToDump)	9 years ago
reger	6783ef5540	move example code SearchClient out of yacycore package to example directory	9 years ago
Michael Peter Christen	b89465d952	0N - basic dump upload servlet infrastructure, to share index dumps within an experimental new sharing model	9 years ago
Michael Peter Christen	f12a900f3e	harmonization of http post of files for one and several files - this had been differently - and wrong for several files. also: base64-encoding for gzipped push files because our data structures currently only supports ASCII POST pushes..	9 years ago
Michael Peter Christen	849ab671a9	0n: modified the p2p bootstraping process - rules had been too tight and did not support the re-start of a network with just one principal peer.	9 years ago
reger	764f5100f0	fix delete of temp file after odt % ooxml parser Close zipfile after parsing	9 years ago
reger	379e9b330d	use supplied url port to get robots.txt in crawlers hostqueue	9 years ago
reger	58a959403d	fix mixed logfactory in UrlProxyServlet, Class doesn't use functions of declared ancestor, change to extend on httpservlet	9 years ago
Michael Peter Christen	2494a820c7	0N - added recording of dump exports if given time frame is not negative	9 years ago
Michael Peter Christen	ef2cc4f690	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	9 years ago
Michael Peter Christen	a6bf0b1649	0N - added option to generate index export files for a specific number of minutes in the past and reverted latest change. The export file dump will now contain four data elements: f - first date of index entry write date, l - last date of index write date, n - now-date of index dump time, c - count of numbers inside the dump. '0N' denotes a series of changes which will lead to the opportunity to exchange index data dumps in a way that is needed to integrate ZeroNet index data. This will be based on index dump sharing; that causes this commit.	9 years ago
reger	6d56beaed8	fix assertion exception in toString of MultiProtocolURL toString of AnchorURL and MultiProtocolURL are identical code (no need to override or to protect call to parent) as reported in https://github.com/yacy/yacy_search_server/issues/43	9 years ago
reger	42a7bdb2af	fix SolrSelectServlet authentication to default to true	9 years ago
reger	dbb28bb4f3	del unused statistic parameter (from status servlet)	9 years ago
reger	06d0e2aeb9	result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. - Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).	9 years ago
reger	caf9e98f09	put metadata dc_publisher in corresponding schema field	9 years ago
reger	38e2b054d4	remove servlet classloder internal cache map (to save the resources, cache hits marginal) - DefaultServlet includes already a class cache "templateMethodCache" which is emptied on low mem status - avoid classloader cache gets has no hits but over time holds all (used) servlet classes	9 years ago
reger	6f0b073bf3	override detected language (statistic langdetect) only with TLD determided language if langdetect probability is not high. + additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh used by YaCy	9 years ago
reger	b65e2b527d	include use of condenser's content text for language detection. Language identification may show poor performance on documents with short or no title but clear lang indication in text content. Using content text too improves lang detection. + remove double caching of text in Identificator	9 years ago
reger	937fbb0b9f	correct isHidden() for smb from last commit	9 years ago
reger	535d4bf75f	respect hidden attribute for file and smb directory listing (hidden directories are not listed, effects crawling of local file system)	9 years ago
reger	c28142095a	add findClass() to servlet class loader (used in YaCyDefaltServlet) In the 2 cases where servlet calls servlet the jvm classloader chain is invoked and servlet class loaded by jvm loader (successful while requiring htroot in system classpath). This patch uses the standard override design for loaders to handle these cases (making in not longer crucial to have htroot in system classpath, as this classLoader is mainly used for servlets and looks in this case for the class in the configured path). + As the default classloader is parallelcapable we should register this too.	9 years ago
reger	a6617ad887	expand initRemoteCrawler() to terminate worker threads if called to deactivate remote crawl. On startup we save the resources for remote crawler if disabled. Once started threads are running idle after disable remote crawl. Now threads are terminated to save the resources also while disabeling during runtime. + remove empty class Channels	9 years ago
reger	2048b7e057	support scraping start-/enddate from html tag with property "datetime" This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping).	9 years ago
reger	900d4584ba	complet resource cleanup of lists in contentscraper's close()	9 years ago
reger	1f18653de0	pass parsed swf content trough htmlscraper Swf may contain subset of html tags which shoul'd appear as text. Especially <font> tag may totally screw up metadata servlet if not filtered out.	9 years ago
reger	18ecf57792	add support of compressed swf to swfParser from JavaSWF2 (source compatible to WebCat). Moved swf file signature check to parser Changed use of synced vector to list swf InStream	9 years ago
sixcooler	5cb7ba0dc4	fix for connections not getting closed to get favicon.ico during seach	9 years ago
reger	ed3e16e092	apply remote result count config value to Bookmark Autosearch + prepare to make the widely unused Bookmark feature optional	9 years ago
Ryszard Goń	a98c395023	Add the Autocrawl thread	9 years ago
Ryszard Goń	1728cd30c6	Create autocrawl profiles	9 years ago
reger	ff27824964	fix swfParser reading file signature before passing to library (current version expects data w/o signature)	9 years ago
reger	c91e712178	further refactor using standard java / (one) utf-8 charset variable extending initiative of commit `9a25751850`	9 years ago
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	9 years ago
reger	1af0e9ef74	remove workaround for Solr bug regarding multivalued date fields fixed in 5.4.0 http://issues.apache.org/jira/browse/SOLR-8050	9 years ago
sixcooler	5a35f9383a	bump to solr/lucene 5.4.0	9 years ago
reger	a58d34a4e8	check error URL cache before adding errorDoc to index - del obsolete related switchboardconstant	9 years ago
reger	e9539b1086	reintroduce special handling of file upload multipart/form-data from HTTPDemon.parseMultipart - add filename to parameter fieldname - add filecontent to special parameter fieldname$file (some servlets use this $file parameter) fix for http://mantis.tokeek.de/view.php?id=542	9 years ago
reger	cd26717ba2	fix low memory status hint (dht-in disabled) http://mantis.tokeek.de/view.php?id=619	9 years ago
reger	a5faf73afa	remove obsolete yacy.init entries interaction.* (related to removed triplestore)	9 years ago
sixcooler	dce1cb65c4	Merge remote-tracking branch 'choose_remote_name/master'	9 years ago

1 2 3 4 5 ...

8074 Commits (421a6e3a95ee7db257a79c35c5bb3bf6067d676b)