yacy_search_server

Commit Graph

Author	SHA1	Message	Date
reger	160ce568b3	move testing SolrServlet.main to test, making include of jetty.jar in distribution and classpath obsolete - move jetty.jar to test library - move SolrServlet.main as is to test, add also a junit test simulating main - add build.xml cleanup for EmbeddedSolrConnectorTest created test/DATA - adjust some test compile errors	12 years ago
orbiter	d2ea250d99	refactoring: - moved many classes from de.anomic to net.yacy - made more sub-packages for search classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	49e5ca579f	added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542	13 years ago
orbiter	cb1f49d0f2	replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	cd19d0517e	added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7513 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	01cb3bbaec	* fix patchCharsetEncoding-test (patchCharsetEncoding now returns null on input null) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7465 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
f1ori	fd74bc388c	* fix small bug in sessionid-removal * add testcase for seesionid-removal git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7333 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	3197ca42ed	preparations to move the HTCache into cora: - move the header framework classes to cora - move the ARC caching classes to cora - refactoring of code to call these classes from cora git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	844f158686	- removed dependencies in header framework: moved http date methods from DateFormatter to HeaderFramework changed logging to log4j - added ftp load access to MultiProtocolURI - ensured termination of RSS feed iteration git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542	14 years ago
orbiter	b6fb239e74	redesign of parser interface: some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure. This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents. To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files. additionally, parsing of jar manifest files had been added. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	11639aef35	- added new protocol loader for 'file'-type URLs - it is now possible to crawl the local file system with an intranet peer - redesign of URL handling - refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	b68deb407a	- moved test data from /bin to /test/words - refactoring of stopYACY.sh by introduction of /bin/apicall which is able to call any api file with attached authorization git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6691 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	3528b970d6	- refactoring - added new experimental (not-yet-working) image parser - added new test image git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	b79f4f062f	refactoring of yacy documents and parsers: they depend now only on the kelondro classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
f1ori	34c71b22e8	fix and enable parser unit tests (tested with eclipse) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6419 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	ce8dc575ca	refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	bea3b99aff	moved table and util classes git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	ce7924d712	better concurrency for rwi entry parsing during search processing git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6273 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	72ac5bd80f	refactoring of search process. this is the beginning of some architecture changes that will hopefully bring some more stability, speed and transparency to the search process. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6260 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
f1ori	d515bc11e2	added ooxmlparser git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6256 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
f1ori	8c1b02af04	* fix warning in testcase git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6255 6c8d7289-2bf4-0310-a012-ef5d649a1542	15 years ago
orbiter	65b1d51e70	added xml version of windows office test files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6244 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	67da20647f	* add new odf parser based on sax-xml-parser * remove odf_utils-jar * test metadata in ParserTest git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6231 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	06557485f5	* added parser unittest! git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6229 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
f1ori	69dfd03985	reactivate unittests * fix old tests * add buildtarget "ant test" git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6228 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	d553e4ff39	added visio test files and mime types git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6165 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
lotus	bb570716e6	added more testfiles git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5347 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	84185baa81	added more test files for windows from lulabad git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5340 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	3246358485	mistake -> rename git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5336 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	55ec57d27f	added linux umlute test files from low012 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5335 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	e9262b3890	re-named old test files added more mac test files git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5333 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	ff2a54da68	added more umlaute test files: mac git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5332 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	204220ecd5	added test files for UTF-8 / Umlaute - Testing: These 3 files contain the same text in different HTML encodings. We use this documents to test if the parser and indexer creates the same set of word hashes for all three texts. To use these files, run a indexing/crawling on them. To get the files inside the localhost-path, do the following: cd <yacy-home> rmdir DATA/HTDOCS/repository ln -s test/parsertest DATA/HTDOCS/repository you have then linked the test directory as repository directory which you can reach in yacy if you switch to intranet indexing mode. So the next step is to start yacy, then - switch to intranet use case - go to the crawl start page - the repository directory should be the default path as crawl start - start the crawl - search for any word that appears in the demo texts - search not only for words with umlautel but also for words without umlaute to ensure that you find _all_ three documents - see how yacy presents the snippet with the text containing umlaute git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5293 6c8d7289-2bf4-0310-a012-ef5d649a1542	16 years ago
orbiter	daf0f74361	joined anomic.net.URL, plasmaURL and url hash computation: search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542	17 years ago
orbiter	40b0547611	- documentaton changes (removed old forum links) - different handling of link quotation - different handling of link normalization - enhanced html/unicode en/de-coding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	2399ed817c	) robots.txt parser now extracts the sitemap-URL (will be used later) ) some javadoc added *) junit testclass for robots.txt parser added git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3602 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	1b7fda12ee	*) SOAP: separate function to get the active/passive/potential peer list git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3526 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
auron_x	d451ad48d3	*) improved peerloadgraphic: - unnecessary (0 %) pieces are removed - percent-values of each thread displayed in legend git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3474 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
karlchenofhell	a1d68fe092	- use .class rather than Class.forName for classes in class-path - added Bost's patch for Diff.findDiagonale() from: http://www.yacy-forum.de//files/patch_685.txt - fixed minor bugs in Blog git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3416 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
orbiter	d25caa07bf	redesigned some parts of http authentication added another access check for peer hops git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3340 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	eb20ec3837	*) soap-service: adding function to check if a specific url is blacklisted git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3014 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5c0669429e	*) soap: adding function to query the peer list git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2968 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	203f2bde9a	*) adding function to query the pause/resume state of the crawling queues git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2958 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	6d3a130878	*) bugfix needed because of db refactoring git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2957 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	892b9f2fc4	*) additional soap function to query peer status git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2920 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	bd3710a974	) new xml template to view peer profile as xml ) bugfix for wrong profile display (some fields where displayed twice) *) new soap functions to get and set peer profile git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2919 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	d1afe1ce6b	) adding xml template to get the message list as xml ) Bugfix in client stub jar generation (too many files where added) *) new soap service to manage peer messages git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2918 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	f37e2041e8	) adding soap function to import yacy bookmarks from xml or html (transfered via soap attachments) ) soapHandler: code cleanup for service deployment git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2915 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	4a3ec63e34	*) new soap service to manage yacy bookmarks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2906 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago
theli	5e57e0814d	*) new soap function to display log git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2902 6c8d7289-2bf4-0310-a012-ef5d649a1542	18 years ago

1 2

62 Commits (19c46e4acf282c9c3cc32adf1a63a754cabc7308)