yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	3b694b3935	add some javadoc to rwi wordreference distance, position to remember facts for http://mantis.tokeek.de/view.php?id=683 Init missing word position to 0 like in other non text body words	8 years ago
reger	a4465c97d6	as requested, disable/remove old swf parser http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5861#p33098	8 years ago
reger	7f63fc50f3	prepare a IndexSegment test case for RWI index testing + prevent NPE in Segment.clear() on missing embedded solr instance.	8 years ago
reger	96467c5467	remove not needed counter in Tokeninzer (completing last changes) including a small change, word posintext counting. We remember/store 1st posintext. Previously following words got a handle (posintext) excluding found. Now it just counts and assigns true posintext as handle (posintext)	8 years ago
luccioman	d66b0f7b7b	Fixed french messages encoding in YaCy tray. Also added the missing french translations.	8 years ago
reger	7efb66ee10	adjust the WordReference.join wordsintext calc to take the max (instead of sum) The reference is for the same url (add same for title and phrases). + del redundant join() procedure	8 years ago
luccioman	0a9ff14d96	Fixed NullPointerException case and added Javadoc	8 years ago
luccioman	06d4f93d03	Merged master into postprocessing branch	8 years ago
Michael Peter Christen	b73d2db914	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	25a3c7a6d0	catch exception and write end of object	8 years ago
reger	272cdd496a	reactivate sentence counter in WordTokenizer for phrasepos ranking, by counting punktuation (delivered as 1 char word) again.	8 years ago
Michael Peter Christen	5e165a8150	removed unused imports	8 years ago
Michael Peter Christen	c716648c78	enhanced json encoding of strings	8 years ago
Michael Peter Christen	6139bd85a8	fix for broken facet names	8 years ago
Michael Peter Christen	5060f9fee9	fix for too long snippets	8 years ago
Michael Peter Christen	8681cee3f3	fix for bad comma	8 years ago
Michael Peter Christen	db6d8fc197	fix for bad json	8 years ago
Michael Peter Christen	8f4a341735	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	9934f546bb	added default fl to solr query, removed large texts retrieval and changed snippet to description tag if no other description is available	8 years ago
reger	120bf7e6e2	implemented RWI WordReference to return the word position value (was always left empty) This is needed and enables existing word position ranking for RWI. The upcoming concurrency issue in word position min/max calculation were eliminated by iterator.hasHext check before next() access.	8 years ago
reger	e310ec5f70	fix posInText ranking calculation to score 0 on no position info + fix Word posInText calc in Tokenizer to start with 1 + test case	8 years ago
luccioman	74f9927ddc	Merge remote-tracking branch 'origin/master' into dist_macOS	8 years ago
reger	51c077f493	adjust the getTopics() and getTopicNavigator() to current useage - move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics) - let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)	8 years ago
reger	39dd244693	fix ConcurrentScoreMap.set() calculation of totalCount() + test case	8 years ago
reger	ebf818ad95	log a error on aborted news publish (due to duplicate news.id) + change printed err msg to log entry in PeerAction.processPeerArrival	8 years ago
reger	cc2d9dd3f1	reactivate the use of included-in-topwords boost in postRanking + changed the postRanking to add one score only if word appears more as one time. + getTopics() unused code block rem'd (save performace)-> routine needs rework !	8 years ago
luccioman	39ea28adfd	Merged master to dist_macOS branch.	8 years ago
luccioman	8255e91c99	Fixed serverClassLoader.findClass method htroot is a supposed to be a subfolder of appPath and not of dataPath, as assumed in other places where htroot is loaded. This issue was not visible when dataPath and appPath are equals.	8 years ago
reger	6801673a07	apply postranking media search boost only on media queries	8 years ago
luccioman	1dc4306058	Fixed indentation for better readability.	8 years ago
luccioman	8c49a755da	Postprocessing refactoring Added Javadocs to refactored methods. Added log warnings instead of silently failing some errors. Only fill collection1hosts when required ( shallComputeCR true).	8 years ago
luccioman	42f45760ed	Refactored postprocessing For easier understanding and performances profiling.	8 years ago
reger	4386e84b55	correct NewPool rentention calculation (was still clearing everything after one day)	8 years ago
reger	5e72d37f0a	TransNews_p: add ad-hoc translation of target file on positive vote (additon to local translation) + errmsg on language=default	8 years ago
reger	9462a32244	Added news service for easy, community driven UI translation support. New or modified translation (via /Translator_p.html) can be shared/distributed via the YaCy internal news service. Remote peers can see and vote on the translation via the new http://localhost:8090/TransNews_p.html servlet. A positive vote will add the received translation to the local translation list and post a voting message to the news service. (at this no processing of received votings is implemented) + fixed the msg service retention time check (NewsPool.automaticProcessP)	8 years ago
reger	f8d6543a23	Rename class CreateTranslationMaster to TranslationManager and add additional routines and the capability to handle translation maps internally (to reduce complexity of handling translation maps for calling servelets)	8 years ago
reger	19b4509d54	speed-up reading of xlif language file, by using xmlparser (stax) instead of jaxb making xliff-core-1.2-1.1.jar obsolete	8 years ago
Michael Peter Christen	e1fac86f53	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	a9316ceff6	force browser-caching of favicons from search results	8 years ago
Orbiter	503312ca43	Merge pull request #61 from luccioman/heroku_experiments Deploy YaCy on Heroku	8 years ago
reger	33bf35d90f	missing file for prev commint "Introduction of additional language setting browser"	8 years ago
reger	16e8ed3f01	Introduce additional language setting "browser/Browser Language" for UI internationalization. If language is set to "browser" the client/user browser language is used to choose from available translation. simply: one users browser speaks English -> YaCy responds in English, other users browser speaks French -> YaCy responds in French. ! To make a translation/language available you have to activate the language once ! (or manually use the utility class TranslateAll) In ConfigBasic.html availabel translations are marked green on setting language=Browser The client language is determined by http header Accept-Language (checked in DefaultServlet)	8 years ago
reger	3b47a07dd1	change unused servletProperties entry CONNECTION_PROP_CLIENT_REQUEST_HEADER to use directly HttpServletRequest. This is used to get the http protocol version in HTTPDProxyHandler.fulfillRequestFromWeb() for error response to client. - adjust YaCyProxyServlet and UrlProxyServlet accordingly - use more http_version constants in headerframework and httpdeamon - equalize servlets (3) use of HeaderFramework.CONNECTION_PROP_HOST to HeaderFramework.HOST	8 years ago
reger	036c1dc6ef	fix CookieTest_p formatting (output of <br> as text), change to dataoutput only by servlet, leave formatting to html. + removed link to obsolete env/grafics gif	8 years ago
Michael Peter Christen	bf6709d196	fixed missing browser activation in linux	8 years ago
Michael Peter Christen	d8504418b6	enhanced browser-caching of static content	8 years ago
Michael Peter Christen	079112358c	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Michael Peter Christen	efeb592661	don't do solr optimization, this create high IO load. We should leave this task to solr to do that on it's own instead of forcing it.	8 years ago
luccioman	46b8836548	Copy image resources contained in donation iframe. Handle eventual images loading errors.	8 years ago
reger	4c7a77662a	eleminate dependency on file-extension in storeDocument but use supported mime-type to also support handling of urls w/o corresponding file-extension. For this refactor use of document.getParserObject() to alway return a Parser (for clean logic) and define/move the scraperObject as local var of AbstractParser. Adjust related calls to getParserObject (where actually a scraperObject is wanted). Addionally skip appending url token to parsed text for dht metadata entries (by default returned as result by rwi index).	8 years ago
reger	ebde21079a	refactor xlsParser to include Excel file attribute (like author) in parser result doc. Similar to ppt and doc parser, completing a TODO in xlsParser.	8 years ago
luccioman	744c9a2615	Opensearch desc : handle https protocol url with default port (443) This completes modifications made for mantis 669 (http://mantis.tokeek.de/view.php?id=669)	8 years ago
luccioman	b9c28893ee	Merged master to 'heroku' branch.	8 years ago
Michael Peter Christen	103a8348b3	fix for NPE and small performance enhancement	8 years ago
reger	2910fe35c1	add missing scheduler calc of next exec_date (call of calculateAPIScheduler) - after last_exec_date is altered, next_exec_date should be recalculated - makes the recalculation of next_exec in advance (without api call surely made) in Switchbard.schedulerJob() obsolete Slightly modify next_exec calc. on missed event to now+schedule_time (from fix 10min)	8 years ago
reger	70d47ae38a	keep scheduler selection by repeat entry from `07311020d4` to allow exec schedule on actual exec event. Iterate on exec date (of advantage after interruption/shutdown) to schedule older or missed events first.	8 years ago
reger	7c3f932e5d	revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)	8 years ago
reger	07311020d4	postpone apicall exec date init until actual call fix for http://mantis.tokeek.de/view.php?id=677 The difference is on scheduling a large number of rss feeds and loading is not finished before shutdown of YaCy. The change makes sure not already loaded RSS will be loaded by the scheduler on next startup.	8 years ago
reger	5e335b32da	fix Blacklist.contains() matching path pattern to string similar to `5e9e871192` + add proof testcase	8 years ago
reger	5e9e871192	fix Blacklist.remove by using pattern.toString to find pattern to remove, parameter String path did never equal Pattern. + delete unused removeAll, as it does not persist changes after restart	8 years ago
reger	1843ea7e69	on Blacklist.add pattern to source file also update internal entry maps as in Blacklist.add(blacklistType) to make entry effective w/o restart fix for http://mantis.tokeek.de/view.php?id=676	8 years ago
reger	bf6ce33da3	Correct use of _htDocsPath config in YaCyDefaultServlet to use servlet config variable + add some javadoc and remove a not useful static declaration	8 years ago
luccioman	480027ec98	Merge remote-tracking branch 'origin/master' into heroku_experiments	8 years ago
reger	fcad2d0744	add uses of config constant INDEX_RECEIVE_ALLOW	8 years ago
reger	226f81cfcf	declare poison pill url MultiProtocolURL() as protected to make sure not used from outside. After double checking use of poison url revert path init from commit `f8632ad292`	8 years ago
reger	f8632ad292	prevent string index out of bounds MultiProtocolURL.getPaths as path maybe a empty string + init path to "" also in init for poison url (to guarantee success for all existing uses of path w/o check for null)	8 years ago
reger	35a7d57260	update lucenematchversion to current (5.2.0 -> 5.5.0) there should be no need for reindex by the update	8 years ago
reger	9b07bbf955	deprecate newurl(), not used and already replaced instead of making it handle all supported the protocols	8 years ago
luccioman	47d486298f	Merged changes from master.	8 years ago
reger	774b3906a9	fix GenericFormatter.parse ("time","timeoffset") change: UTC offset internally expected in minutes	8 years ago
reger	27163af0e1	improve detection of referenced links by taking http and https link protocol into account + correct query start detection of commit `f89d4eb51d`	8 years ago
reger	f89d4eb51d	fix MultiProtocolURL init (assign of host) for urls with '/' in query part + add to test case	8 years ago
reger	87fcfc6d78	Adjusted hash computation and toNormalform for file:// protocol to deliver same hash same file on Windows filesystem path with forward- and backslash in path. Background see http://mantis.tokeek.de/view.php?id=671 +Test case	8 years ago
luccioman	d6bf90803f	Merged from maain master branch.	8 years ago
luccioman	9b9c112263	Handle more propertly local port configuration by system property And prefixed property with "net.yacy" to avoid ambiguity.	8 years ago
reger	3811184abd	fix GSA servlet clientIP retrival	8 years ago
reger	7ab41d4ff1	use directories original lastmodified date in file- & smbloader in response	8 years ago
reger	708bcbb042	one more replacement to use cached hosthash vs. calculated	8 years ago
luccioman	b57a06d88e	Let Heroku decide which http port to use	8 years ago
reger	22db449f2a	to prevent crawler to concurrently access and alter same crawl queue after restart, put hosthash in queue's filename (which is used as primary key for crawl queue. Hint: initial hosthash from url and recalculated hosthash from just hostname:port are not the same. fixes http://mantis.tokeek.de/view.php?id=668 (partially)	8 years ago
luccioman	893a40995a	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	8 years ago
Orbiter	50c5ddf1a1	Merge pull request #56 from luccioman/LibreJS LibreJS compliance : YaCy JavaScript license information	8 years ago
Michael Peter Christen	7466d390b2	small refactoring + do not accept too old peers during bootstrap	8 years ago
luccioman	6e96c7341a	Merge remote-tracking branch 'origin/master' Conflicts: htroot/Load_MediawikiWiki.java htroot/Load_PHPBB3.java htroot/ViewImage.java	8 years ago
reger	8d58a48029	remove wrong log line in CrawlSwitchboard + don't allow CrawlSwitchboard to exit application making network param unused	8 years ago
reger	5aaa057c65	ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read. equalizes behavior with getListString() improves: case were blacklist file contained a undesired empty line, not fixed by blacklist-cleaner.	9 years ago
reger	41c36ffd75	exclude rejected results from result count (by using the resultcontainer.size instead of input docList.size) skip waiting for write-search-result-to-local-index (by removing the Thread.join - which will bring a small performance increase)	9 years ago
reger	d4da4805a8	internal wiki code, require header line to start with markup (to allow something like "one=two" as text) + incl. test case	9 years ago
reger	e952e355a2	have Translator servlet adhoc apply added translation by translating a single file + fix NPE in Translator, coming from translation read by TranslatorXliff which allows null content for not translated key's	9 years ago
reger	b119ff65be	clean out not used Switchboard variables counter indexedPages, const xstackCrawlSlots	9 years ago
reger	223071337b	Translator to take caution of word boundaries to identify text portion to be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it" translation of partial terms/words. Add check of word boundary before and after sourcetext (incl. take care of current praxis for key to be delimetered by > < + add test case	9 years ago
luccioman	009657791e	Merge remote-tracking branch 'origin/master' into LibreJS	9 years ago
luccioman	a73c9327a5	JavaScript License fixes for LibreJS compatibility	9 years ago
reger	0c40401d28	fix MessageBoard test for null data	9 years ago
reger	5b22c63030	Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation. process 1. load default from locales/. 2. load and merge(overwrite) from DATA/LOCALE/. (can be partial translation as it is merged) - include all entries from DATA/LOCAL to be edited in Translator servlet and save just modifications (instead of full list) to DATA/LOCALE This shall make it easy to share modifications.	9 years ago
reger	a2e0f00456	optimize Translator - translateFilesRecursive: load translation once (reduce io), return true on complete success - remove resulting unused translateFiles() variant - translate: use StringBuilder parameter (skip toString conversion) - remove not needed static declaration - upd some javadoc	9 years ago
reger	a6ba1faa80	introduce a translation edit servlet Translator_p.html YaCy's UI text translation This is the 1st rudimentary approach to support the translatio utilities. It allows currently to edit untranslated text and save it in a local translation file in the DATA/LOCALE directory. + refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine + adjust TranslatorXliff to check for local translations in DATA/LOCALE, this includes storing manually downloaded translation files in DATA as well (to keep default untouched) + on 1st call of Translator_p a master tanslation file is generated, checking the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages) Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.	9 years ago
reger	b3c9041f79	remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames to free unused resources	9 years ago
reger	bd8f7c11f5	Use transparent addToCrawler in AutoSearch instead of addToIndex This would likely also be of advantage for RSS import/schedule as following bug-reports suggest http://mantis.tokeek.de/view.php?id=569 http://mantis.tokeek.de/view.php?id=655	9 years ago
reger	f23d8ab47b	fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP() returning null in intranet mode (in servlets: ConfigSearchBox, Load_PHPBB3 +remove unused (const ∅) seed.IPTYPE	9 years ago
reger	bb0076c3dd	fix: assure close inputstream in TranslatorXliff after reading xlf file by using try-wiht-resource block	9 years ago
reger	6384b7d82e	fix NPE in Load_MediawikiWiki servlet in intranet mode - in intranet mode getip returns null causing a NPE - adjust starturl (which was set to http://localip/repository) which is never the start url for the Mediawiki + correct javadoc for seed.getIP()	9 years ago
Michael Peter Christen	596b5dfa59	add the JRE version in the seed. Purpose: identify if it is possible to migrate to new JRE version	9 years ago
reger	4cc38e979d	add InputStream close after reading input file (Vocabulary_p servlet)	9 years ago
reger	6bf9c55584	adjust Solr select servlet to lates bugfix for boostquery (bq param) to split query into multiple parameter on line separator in input query. e.g. split "crawldepth_i_0^10.0 \n crawldepth_i:1^5.0" but allow "url_file_ext_s:jpg OR url_file_ext_s:png" to be unsplitted	9 years ago
Burkhard	9a18e2297b	Merge pull request #51 from JeremyRand/multiple-boost-query Fix multiple boost queries	9 years ago
reger	f0d7b93372	make use and activate autodetect charset in Vocabulary input from file + revert mistake of empty cn.lng	9 years ago
JeremyRand	433217b33e	Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)	9 years ago
JeremyRand	58824dfa6c	Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.	9 years ago
reger	9e94989237	upd to PDFBox 2.0.1	9 years ago
reger	d0a571bed2	del cytag trail for own index.html (save resource not used by default)	9 years ago
reger	de46879637	fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)	9 years ago
reger	24b0fa2a38	extend snapshot Html2Image.pdf2image to use PDFBox image export capability if no external tool installed (and for Win) Resulting jpg are not always perfect (if graphic included) but imho sufficient.	9 years ago
reger	eb2a00b1d8	fix NPE on missing crawldepth_i	9 years ago
reger	efb9f1a8b7	save resource for unused blacklistFiles map	9 years ago
reger	5f113be760	cleanup connectPeer & yacyVersion.latestRelease usage obsolete since `527b3decde`	9 years ago
reger	7097dcbdbd	cleanup hack for partial Solr update on multivalued datefields has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050	9 years ago
reger	f10ea3c155	clean-out unused SwitchboardConstants	9 years ago
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	9 years ago
reger	125b5e26a5	apply bugfix for ChartPlotter from Pullreq 42 https://github.com/yacy/yacy_search_server/pull/42 thanks to otteresk (https://github.com/otteresk)	9 years ago
reger	06ce9ae711	prevent "unchecked conversion" compiler message + include "translate" property in xlf "trans-unit" export	9 years ago
reger	b4a576dbdf	exclude unused protocol param "duetime" (receiver interpretes param "time" only)	9 years ago
reger	3bd6ae8d8b	keep addon/Notepad++ keyword marker on lng export (length of remarks devider line) + harmonize status_p.inc lng text	9 years ago
reger	16837d60c7	fix version in locale version file (it's compared to full version)	9 years ago
reger	0fb01e429e	fix migration, account for ssl port in config (for auto-disable https)	9 years ago
reger	7be1c7a05a	fix logger name	9 years ago
reger	1d940e5a94	upd commons-compress 1.11	9 years ago
reger	7789c32c82	delete crawl queue on init exception (happens occasionally on path name vaiolation and will never get resolved)	9 years ago
reger	f781b9dd47	revert call condition f. migration.installSkins (a bug introduced in `fb8ae14b21` , see comment on that commit )	9 years ago
reger	3adb670f44	remove never used Domains.myHostNames set	9 years ago
reger	6ecc180299	fix rwi doubledom return best (highest) ranking	9 years ago
reger	2343e3f1cd	keep and update existing xlf translation master instead of create new in utility CreateTranslationMasters + small fixes in lng's	9 years ago
reger	a1935f485f	Added utility class CreateTranslationMasters to create a language independant translation master as source to harmonize individual translation files Included a main to create masters in YaCy an xliff format for testing + restrict TranslatorXliff to use only entries with State=translated P.S. used https://open-language-tools.java.net/editor/about-xliff-editor.html to experiement with xlf output (haven't a Pootle avail.)	9 years ago
reger	acaf51b296	keep ConfigLanguage_p as 1st entry in exported translation file + rem untranslated text & some typo fixes in several translations (considering to create a translation master file to harmonize entries)	9 years ago
reger	61c5b6b403	fix empty drop down list in ConfigLanguage after wrong/empty download + add xliff translated attribut + append japanese lng name	9 years ago
reger	4eddabee42	translate Network History screen -> de + remove leftover debug line	9 years ago
reger	90c79014ae	remove unused translator routine which also doesn't handle rel path input + correct some language file match issues	9 years ago
reger	902e79e261	Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map. This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649 Allows longer term also to store translation maps for the htroot files in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ). + added test case creating and comparing xliff file with internal custom prop file. (currently the introduced class is not used in core code)	9 years ago
reger	d9adc2c255	load handler for Transparent Proxy on startup only if feature is activated to save the resources and keep handler chain small if the feature is not used. +add a warning message on settingsack_p page to restart on first activation	9 years ago
reger	ec24a0c85a	add test case for optimized toTokens()	9 years ago
reger	cada24f918	adjust utility ListNonTranslatedFiles for path compare on windows (backslash replace)	9 years ago
reger	fb8ae14b21	make migration version safe	9 years ago
reger	258cd41577	reduce logging (EmbeddedSolrConnector.query) mainly to reduce the frequent metadat checks like > EmbeddedSolrConnector.query QUERY: q={!cache=false raw f=id}xXxXxX&rows=1&start=0&fl=id,load_date_dt (p.s. direct servlet queries logged via AccessTracker.addToDump)	9 years ago
reger	6783ef5540	move example code SearchClient out of yacycore package to example directory	9 years ago
Michael Peter Christen	b89465d952	0N - basic dump upload servlet infrastructure, to share index dumps within an experimental new sharing model	9 years ago
Michael Peter Christen	f12a900f3e	harmonization of http post of files for one and several files - this had been differently - and wrong for several files. also: base64-encoding for gzipped push files because our data structures currently only supports ASCII POST pushes..	9 years ago
Michael Peter Christen	849ab671a9	0n: modified the p2p bootstraping process - rules had been too tight and did not support the re-start of a network with just one principal peer.	9 years ago
reger	764f5100f0	fix delete of temp file after odt % ooxml parser Close zipfile after parsing	9 years ago
reger	379e9b330d	use supplied url port to get robots.txt in crawlers hostqueue	9 years ago
reger	58a959403d	fix mixed logfactory in UrlProxyServlet, Class doesn't use functions of declared ancestor, change to extend on httpservlet	9 years ago

1 2 3 4 5 ...

3783 Commits (4eeb448eb3d0b0fda80375aae866a5a6c914e30f)