yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	105cf8f593	changes to adjust jetty to recent code changes	11 years ago
reger	aafef72a8a	merged current rc1/master into jetty branch to allow further development with latest version ServerSideIncludes and servlet return values need further work (for working jetty integration) - TODO: added nasty quickfix to allow SSI - needs further work - TODO: YaCy servlet return values/parameters are not handled	11 years ago
Michael Peter Christen	5b7c0d0745	update to pdfbox 1.8.2	11 years ago
Michael Peter Christen	f13df9dbb6	migration to solr 4.4.0	11 years ago
Michael Peter Christen	dc1002e511	cleaned sourcepaths from eclipse classpath	11 years ago
Michael Peter Christen	c4538d8d91	added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib	12 years ago
Michael Peter Christen	9bd2aee180	migrated to solr 4.3.0	12 years ago
Michael Peter Christen	ad050ec88d	- upgraded httpclient, httpcore and httpmime - removed httpclient 3.1 which has been used by solrj < 4.x.x and is now not used any more - fixed some parts in YaCy which used methods from httpclient 3.1	12 years ago
Michael Peter Christen	4b100f8b48	Merge branch 'master' of ssh://gitorious.org/yacy/rc1	12 years ago
Michael Peter Christen	3abf516ca7	merged classpath Bitte geben Sie eine Versionsbeschreibung für Ihre Änderungen ein. Zeilen,	12 years ago
orbiter	48e9a54e80	updated pdf parser	12 years ago
Michael Peter Christen	27907c9739	added missing library after solr upgrade	12 years ago
Michael Peter Christen	cf0acd2cb4	upgrade to solr 4.2.1	12 years ago
Michael Peter Christen	461d46101d	- Removed log4j from libraries. This can be removed because the package log4j-over-slf4j is there. From slf4j all loggings are routed to the jdk logger. Now all loggings are consistently done to the jdk logger. - added some lines to the logging properties to suppress many solr logging statements. The number of the logging entries had already become a performance issue, therefore removing these from the log should increase performance.	12 years ago
Michael Peter Christen	b349c8145b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
orbiter	36f9b0fc16	updated wstx-asl to 3.2.9	12 years ago
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	12 years ago
Michael Peter Christen	09a2b09c48	guava update	12 years ago
Michael Peter Christen	80fe3d7860	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java	12 years ago
Michael Peter Christen	4323621a76	update to Solr 4.1.0	12 years ago
sixcooler	639c114199	remove jetty from classpath - as it was moved last commit	12 years ago
sixcooler	f3e705c4fe	bump to httpclient / httpcore 4.2.3 (bugfix-release)	12 years ago
Michael Peter Christen	9dfc9c95d8	updated slf4j and log4j	12 years ago
Michael Peter Christen	95712fdc8b	update to pdf parser	12 years ago
Michael Peter Christen	a1a4d9aa94	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/cora/federate/solr/connector/MirrorSolrConnector.java	12 years ago
Michael Peter Christen	e2c4c3c7d3	migration to solr 4.0.0	12 years ago
Michael Peter Christen	69aa39d664	update to libraries required by solr 4.0.0	12 years ago
sixcooler	9d062873d2	bump to httpclient-4.2.2	12 years ago
sof	5cb244b79b	Merge remote branch 'origin/master'	12 years ago
apfelmaennchen	88b062210c	Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based on the jaudiotagger library. The parser is disabled by default as it needs to store temporary files for non file:// protocols, which might be disliked. For your local MP3-collection it loads nicely Artist, Title, Album etc. from the audio files meta data.	12 years ago
sixcooler	9aa21506be	bump to httpcore-4.2.2 (maintenance release)	12 years ago
Michael Peter Christen	d0015df61c	added lucene memory library which is now necessary as solr has to process more complex queries	12 years ago
Michael Peter Christen	e65cecc419	- updated lucene libraries to 3.6.1 - added lucene-grouping which enables faceted search; try this: http://localhost:8090/solr/select?q=:&start=0&rows=3&facet=true&facet.field=host_s	12 years ago
Michael Peter Christen	ff3eaa21b0	added remote search to solr on YaCy peers! - when doing a remote search, node peers are selected for solr queries - the solr query is done concurrently to the standard YaCy rwi search - the solr search result is feeded into the same data structure that prepares the rwi search result - the same remote seach that is done to several outside peers is done to the local solr index - the search process works now also without any 'old' RWI data using solr	12 years ago
Michael Peter Christen	d39463a85c	added deleteByQuery to solr connectors	12 years ago
Michael Peter Christen	2ccf1dba71	upgrade to solr 3.6.1	12 years ago
Michael Peter Christen	ea49a8aa8c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	12 years ago
Michael Peter Christen	d988ba50cf	added a very rudimentary, incomplete, non-verified GSA response writer for solr. Try this: http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10	12 years ago
cominch	e74d66e28c	augmented browsing: remove htmlparser library	12 years ago
cominch	e2119f4e76	augmented browsing: replace htmlparser by jsoup, which is more stable and reliable	12 years ago
Michael Peter Christen	bf4968d748	source change in classpath	13 years ago
sixcooler	a99ef68422	bump to httpclient-4.2.1	13 years ago
Michael Peter Christen	65f56b1fd4	Merge branch 'master' of ssh://gitorious.org/yacy/rc1 into jetty Conflicts: .classpath build.xml htroot/Status.java source/de/anomic/http/server/HTTPDProxyHandler.java source/net/yacy/yacy.java	13 years ago
Michael Peter Christen	7b53be141f	upgraded to pdfbox 1.7.0 changes in http://www.apache.org/dist/pdfbox/1.7.0/RELEASE-NOTES.txt with many bugfixes, including performance related	13 years ago
Michael Peter Christen	fad3b14813	added jetty libraries, needed for future use as web server and as application server for the solr search interface	13 years ago
Michael Peter Christen	b9d42fd9c8	using com.google.common.io.Files instead of homebrew methods	13 years ago
Michael Peter Christen	1be0025a9c	- added test for EmbeddedSolrConnector - added needed libraries for this test this includes most (all) files needed for an embedded solr	13 years ago
Michael Peter Christen	90b82ce994	using guava for host resolution (non-blocking for ips) and time-out	13 years ago
Michael Peter Christen	3f55dc7c1e	- added solr core and libraries that solr needs (lucene is missing, will follow later) - added embedded solr connector which can connect to solr programmatically (without using a server in between)	13 years ago
Michael Peter Christen	5fc6524ca8	- moved triple store to net.yacy.cora.lod (should be generalized there later - added abstract add, delete, get methods in the triplestore - added generation of triples after auto-annotation - migrated all MultiProtocolURI objects to DigestURI in the parser since the url hash is needed as subject value in the triples in the triple store	13 years ago
cominch	5d20cd324a	Add Triplestore and RDF query interface Conflicts: build.xml defaults/yacy.init source/net/yacy/interaction/AugmentHtmlStream.java	13 years ago
cominch	b21048892b	augmentedParser add features and integrate external html parser to modify existing web pages Conflicts: addon/YaCy.app/Contents/Info.plist build.xml	13 years ago
sixcooler	56087c1f23	bump to httpclient- httpcore-, httpmime- 4.2	13 years ago
Michael Peter Christen	4d3cc02168	replaced old bzip2 library against better documented commons-compress package from http://commons.apache.org/compress/	13 years ago
Michael Peter Christen	1795a7325b	made HandleSet serializable	13 years ago
Michael Peter Christen	62f2554a01	- fixed build problems (deprecated methods using httpclient 3.1) - removed httpclient 3.1 lib which was used by solrj (solrj now uses httpclient 4)	13 years ago
Michael Peter Christen	248299d10f	updated solrj lib	13 years ago
Michael Peter Christen	f838997126	updated commons io from 2.0.1 to 2.1	13 years ago
Michael Peter Christen	eeb57ae824	updated http client libraries	13 years ago
Michael Peter Christen	ef5192f8c9	using the generic document parser for crawl starts instead of the html parser. This makes it possible that every type of document can be a crawl start point, not only text documents or html documents. Testet this with a pdf document.	13 years ago
Michael Peter Christen	a30b028cc0	updated libraries	13 years ago
Michael Christen	e69afae87e	class path for servlets in eclipse	13 years ago
Al Sutton	8993cac4d8	Initial performance improvements	13 years ago
orbiter	5a7cec59f3	moved ynetSearch to get all files out of htroot/api/util/ git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8042 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	65ab067491	migration to solrj 3.4.0 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7952 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	52b477cf6f	bump to httpclient-4.1.2, httpcore-4.1.3 - bugfixrelease git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7876 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
sixcooler	48560a44a9	bump to httpcore-4.1.2: a bugfixrelease git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7853 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	c0d9474b31	update to eclipse class path environmen git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7834 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	528b59e078	replaced xerces.jar library that was originally added 2005 with SVN 126 to the libx directory and that was moved to lib in SVN 5781 the new replacement is taken from http://xerces.apache.org and has the version 2.11.0 and was inside the file Xerces-J-bin.2.11.0.tar.gz and consists of two files named xercesImpl.jar and xml-apis.jar The original purpose of that library was to support: - content parsers - optional seed uploader - SOAP API (which will be committed later) Since the SOAP API does not exist any more the purpose is to support content parser and an optional seed uploader git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7819 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	77fe69395d	added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	efcd21e0ed	new httpclient, httcore (bugfixrelease) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7769 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	761b1c71dc	added latest pdfbox git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7761 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	0abd99621c	correct slip of click in classpath from last commit - I wonder there are 7658'is around apflemaenchen, please don't take this amiss git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7659 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
apfelmaennchen	a0e4960a4d	YMark: - first attempt for a firefox json bookmark importer - added JSON library json-simple-1.1.jar git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7658 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	19fd13d3bc	Added federated index storage to solr. YaCy supports now the storage to remote solr indexes. More federated storage (and search) methods may follow. The remote index scheme is the same as produced by the SolrCell; see http://wiki.apache.org/solr/ExtractingRequestHandler Because this default scheme is used, the default example scheme can be used as solr configuration This is also the same scheme that solr uses if documents are imported with apache tika. federated solr storage is switched off by default. To use this, do the following: - set federated.service.solr.indexing.enabled = true - download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/ - extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar' - start yacy and then start a crawler. The crawler will fill both, YaCy and solr indexes. - to check whats in solr after indexing, open http://localhost:8983/solr/admin/ Until now it is not possible to use the solr index to search with YaCy in that solr index. This functionality is now available for two reasons: 1) to compare the functionality of Solr and YaCy and to compare the search speed 2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people still want to use solr instead of YaCy. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7654 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
Florian Richter	351d264a48	* yacy domain handler for jetty * rewrite from / to /index.html	14 years ago
Florian Richter	68ca0fbb2e	* add copyright info * implement basic authentication * update jetty to 7.3.0	14 years ago
sixcooler	9199b9e3c6	also putting jcifs-1.3.15 into classpath (let me me build YaCy again :-) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7588 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
Florian Richter	1989ba64c0	* jetty	14 years ago
sixcooler	45dcfa3460	update to httpclient-4.1 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7473 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	ca738ac924	- added a tag cloud to search results (using the topics) - some refactoring of score classes - added default package for new classes add_ymark and delete_ymark git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7251 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
sixcooler	f4357dff03	bump to httpclient-4.0.3 which fixes a number of bugs git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7197 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	e670e1ef8e	add charset auto-detection for htmlParser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7186 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3552476fbe	terminated migration from apache httpclient-3.1 to 4.1: - remove the library - added two classes from the httpclient-3.1 library as source code to YaCy because these classes were used by the YaCy HTTP Server - modified the added classes ChunkedInputStream and ContentLengthInputStream in such a way that: * there are no more dependencies to httpclient-3.1 * these classes had been simplified to serve only the purpose for the YaCy httpd git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7171 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	f9a27a05e5	migrated to log4j 1.2.16 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7153 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5c67e6ca49	migrated to latest apache commons fileupload 1.2.2 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7152 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	5fe828fa06	- replaced pdfbox and fontbox version 1.1.0 with 1.2.1 - added some clear statements that shall clear static cache size within the pdfbox library - the pdfbox library contains a memory leak; it is unsafe to run a peer with pdf parser permanently on. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7120 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
lotus	965aa97993	including sbbi upnplib as source again http://www.sbbi.net/site/upnp/index.html renamed package to yacy all options are also named "yacy" instead of "sbbi" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6986 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
sixcooler	c5c67f0504	start migrating to HttpComponents-Client-4.x see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2872 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6965 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	b5e190099d	- updated pdfbox and fontbox to 1.1.0 - added license file to sbbi-upnplib git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6946 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	37b8827a7a	- removed the UPnP library sources from sbbi and added the jar library again. The library was included to get support for fedora releases, but after this time the fact that the sbbi cannot be part of fedora should be re-discussed. If this will still not be possible, then we may integrate the sbbi UPnP package using reflection. - cleaned uo the code. The new eclipse helios provided new warnings for dead code. This change cleans up most of these warnings git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6945 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	56ff9d5fd4	- extended news size from 512 to 1024 characters - a new news db will be created (news1024.db), the old one (news.db) can be deleted - peers with too large news payload are not ignored any more (they may have been invisible because they had a too large news payload!) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6917 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	fc5efcc05a	enhanced and fixed OAI-PMH import - now importing OAI-PMH server list fron two sources - simultanous import from several servers (even > 2000) - check buttons on OAI-PMH server list to select multiple servers for import start - it is possible to select all servers at once for import - imported XML data is gzipped after import from surrogate reader git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6847 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	24e5faee75	added exif parsing for jpg images git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6745 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1bbe14d23f	SVN 6716 unfortunately contained parts of the unfinished SMB integration. To fix compile errors the remaining parts of the SMB implementation stub is added with this commit. This adds the jcifs smb library. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6717 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	f5ec7ad077	replaced four old libraries with latest version git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6702 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	1e2c011c98	updated the jsch lib from 0.1.21 to 0.1.42 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6688 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	c2b505ae87	updated bouncy castle libraries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6687 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	681f4d185f	replaced microsoft office document parser POI 3.5 with latest version 3.6 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6686 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	e9cdddcd0f	updated parser libraries fontbox and pdfbox with latest version of jar files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6685 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago

1 2 3 4 5

228 Commits (ea633a794cb25fb03030dd4f535c59680ec5e2ac)