Commit Graph

3030 Commits (aa0faeabc55d202c3019c2cd730a0e6031e4ada4)

Author SHA1 Message Date
Michael Peter Christen 0838326a76 changed error message, see http://mantis.tokeek.de/view.php?id=439
10 years ago
reger b5e0f70197 - remove repositoryPath post from ConfigBasic (obsolete)
10 years ago
reger 8931e14514 fix NPE in image search
10 years ago
Michael Peter Christen 1735dbc9d9 enhanced image search: bugfixes and performance enhancements
10 years ago
Michael Peter Christen ebd0be2cea fixes and speed updates for search process
10 years ago
Michael Peter Christen 7611bf79bd Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1
10 years ago
Michael Peter Christen 524bedc00a fixed text in startup tray icon and added shutdown icon during shutdown
10 years ago
Michael Peter Christen 4709d8417c npe fix for non-tray users
10 years ago
orbiter 5b5635e187 replaced font for boot tray icon with image and added some more images
10 years ago
orbiter aa6cdc4ab5 speed-up of start process if remote DNS waits for timeout
10 years ago
orbiter 40b3977c21 added an animation of the tray icon during the boot phase of YaCy.
10 years ago
Michael Peter Christen ec6082c872 very bad language detection hack fix hack
10 years ago
Michael Peter Christen 39615de3f9 adding the buffer size is not wrong but may cause confusing information
10 years ago
Michael Peter Christen 395edec6f1 changed strategy to count the number of documents: get the max of
10 years ago
Michael Peter Christen e87dc08c0d set the correct fail time in error docs
10 years ago
Michael Peter Christen cfb20bc0ce removing the [] for ipv6 addresses may be a bad idea..
10 years ago
orbiter b6d57f06eb enhanced the apk parser (up to beeing production-ready).
10 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references
10 years ago
Michael Peter Christen 57ce7eeff3 fixed localhost authorization and replaced the adminRealm with an info
10 years ago
orbiter f318d7c285 enhanced date-ordered ranking
10 years ago
reger a6891ff7f8 fix Querygoal.parse exception on +/-null-term
10 years ago
reger c7335318eb remove unused legacy procedure from httpserver
10 years ago
Michael Peter Christen eab0d3e1a9 bugfix for wrong lock display, see
10 years ago
orbiter 49d4f95faf bugfix to latest commit
10 years ago
orbiter 68211f8244 enable Crawler_p servlet if a rss feed or a wiki dump import was
10 years ago
orbiter a65df4ce7e do not push noindex errors into log if in intranet mode. noindex
10 years ago
orbiter 688c6d8954 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
orbiter 4ae7aead28 addon to latest fix
10 years ago
Marc Nause 2af56fa37d Improved UPnP. (still not perfect)
10 years ago
orbiter b3ebd38079 removed the HTDOCS repository concept because the concept to host files
10 years ago
reger 1fdcc2d67b change seedfile upload ip check to allow intranet ip in intranet mode
10 years ago
reger e31b0e6d67 - update javadoc Seed.getIP
10 years ago
reger 350c6b8250 in IntranetMode allow intranet hosted seedlist with Network_Domain "any"
10 years ago
orbiter d68438c3d9 make sure that the postprocessing background thread never dies by any
10 years ago
orbiter b4f2a1db6e added a unlock icon for all protected pages that are unlocked because
10 years ago
reger ea6c9e9b07 reduce mem buffer overhead for gap files during r/w
10 years ago
reger e88537522d allow single quote " ' " in query
10 years ago
orbiter 487021fb0a snippet computation update
10 years ago
orbiter 1c2f1f233a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
reger 5a4995ded3 fill solr rss writer dc:subject tag with keyword content
10 years ago
orbiter 927aaa95a6 concurrency bugfix
10 years ago
orbiter c9e593cf78 removed warnings
10 years ago
reger 7584352e7b use more predefined Solr query parameter constants
10 years ago
reger f9db5dd6c5 reduce doublecontent check document (prevent out of memory)
10 years ago
reger e9eae45b55 simplify rssreader and improve atom feed link extraction
10 years ago
reger a8508417d1 catch NPE during crawl (OAI import)
10 years ago
reger 3dde94422f center searchevent lines on network graph
10 years ago
Michael Peter Christen 3860711aef fix for possible interruption of concurrent queries
10 years ago
Michael Peter Christen 6344718f8b reducing the concurrent query stack size and reduced concurrency of
10 years ago
Michael Peter Christen eca9380e3d bugfix for crawler double-check: if an url is redirected, the
10 years ago
Michael Peter Christen 9ac0c93f17 fix for subpath crawl filter
10 years ago
Michael Peter Christen 66106bdaf0 fix for crawler attribute maxdompages
10 years ago
Michael Peter Christen 49d91b94c3 npe fix in crawler
10 years ago
Michael Peter Christen b7183a7321 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger ea2e627662 fix ConfigAccounts del user with uppercase letter in name
10 years ago
Michael Peter Christen c465b791af typo
10 years ago
Michael Peter Christen 191ec8c82a added concurrency to postprocess rewrite process
10 years ago
Michael Peter Christen a1e8bdd5e9 log ppm instead of docs/second
10 years ago
Michael Peter Christen cc0ded7abd set process type of web graph according to fields as defined in the
10 years ago
Michael Peter Christen 12fb9d7cd1 log postprocessing constraints in case that postprocessing is not
10 years ago
Michael Peter Christen 3c23b89823 less logging
10 years ago
Michael Peter Christen a0c53174c5 better solr query logging to detect unnecessary sort requests for more
10 years ago
Michael Peter Christen 338f574bdc no sorting if http/www unique fields are not demanded (makes query
10 years ago
Michael Peter Christen 1609763be5 toString fix
10 years ago
Michael Peter Christen b983e68254 more retries, less sleep
10 years ago
Michael Peter Christen 1503ba7794 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger 8f77719091 fix "Ljava.lang.String" in crawl queue anchor name
10 years ago
Michael Peter Christen 0ceeceb35e more logic on Solr queries; usage of the query terms in posprocessing,
10 years ago
orbiter 38864ae004 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
orbiter 4099296b45 added new classes which shall reduce call overhead to Solr (stub)
10 years ago
reger d0c02e1de7 adjust rss lat/lon to double
10 years ago
orbiter 3491ab4c38 removed unused images from webgraph edge computation
10 years ago
orbiter 2371d6b8db target linktexts must be string to enable search facets on these fields
10 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail
10 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
10 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
10 years ago
Marc Nause 9df14fc126 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Marc Nause 477be17c51 Replaced old UPNP library with Weupnp. UPNP should
10 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
10 years ago
orbiter e9163e7e10 fix for malformed hostpath names in crawl balancer
10 years ago
Michael Peter Christen c115f3869c enhanced snippet computation and test method in ViewFile
10 years ago
reger 6c10b59f3e move bootstrap peers test systems to its test class
10 years ago
orbiter 1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
10 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
10 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
10 years ago
orbiter 4b06adb751 fix for file urls
10 years ago
orbiter 08409ec680 no idea why the words max was an ordered one. This change increaes speed
10 years ago
reger e5854a5cdb fix localhost link to opensearchdescription.xml
10 years ago
Michael Peter Christen b44626e55b fixed target_alt_t in webgraph
10 years ago
Michael Peter Christen 504327b15c fix for condition for writing the webgraph
10 years ago
Michael Peter Christen 542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
10 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
10 years ago
reger c95ba52cf0 improve logexception info
10 years ago
orbiter e441831a24 reverted toString() change in AnchorURL to prevent mistakenly used
10 years ago
reger 47f201a6b8 Add Solr default query fields (&qf) to select servlet
10 years ago
reger f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
10 years ago
reger 5f5fb4ecdc remove unused static (RSS)search from protocol
10 years ago
reger 7c1706d83a use CRLF in generated bat command scripts for windows
10 years ago
reger a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
10 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
10 years ago
Michael Peter Christen bf1b6b93e7 do not write CR values to webgraph if no CR values are computed
10 years ago
Michael Peter Christen e039e78210 small bugfixes
10 years ago
Michael Peter Christen 32a2ff925c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
10 years ago
Michael Peter Christen 8514bffc22 enhanced postprocessing status report
10 years ago
reger b24572f304 fix GSA filter query assignment
10 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
11 years ago
Michael Peter Christen 62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
11 years ago
Michael Peter Christen dd5cdfe212 reverted filter query hack, it did not work
11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
11 years ago
Michael Peter Christen 5326970d6c enhanced solr queries for single document extraction
11 years ago
Michael Peter Christen 525575bd97 added debugging of filter queries in thread dump thread names
11 years ago
Michael Peter Christen f319ef268f testing filter queries instead of queries to retrieve documents by id
11 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
11 years ago
Michael Peter Christen 06ab72d1af enhanced crawler host round-robin strategy
11 years ago
orbiter dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
11 years ago
Michael Peter Christen a694b6a8fc another fix for unique field computation
11 years ago
Michael Peter Christen fb3dd56b02 fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen b0d941626f fixed bugs in canonical, robots and title/description unique calculation
11 years ago
reger d9472d043a cleanup older unused classes
11 years ago
reger 665e12f88e move startup time from old serverCore to switchboard (most used here)
11 years ago
reger 336425912a remove unused localSearchThread from SearchEvent
11 years ago
reger 32bd2a61c1 add local ip to AbstractRemoteHandler local hostname cache
11 years ago
Michael Peter Christen f3a6b6e21e fix for bad URL decoding
11 years ago
Michael Peter Christen 1092e798a5 fixed double content postprocessing
11 years ago
Michael Peter Christen aee5b108e5 added linkScraperParser, a parser which ignores the text like the
11 years ago
reger 2b8cc5832c fix seek error for 0 file size records file
11 years ago
reger 2ba394333f fix Crawler HostQueue release of stackfile
11 years ago
reger 40133ba2d0 fix NPE in Condenser,
11 years ago
orbiter 59160984cc timeline performance update
11 years ago
orbiter 54bea96e67 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
Michael Peter Christen 841cc77391 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen e09218129c remove check for local solr. This check was made during a time when Solr
11 years ago
orbiter 2073e69034 fix for long periods in timeline
11 years ago
reger 1f94df29e7 fix NPE in solr rss where snippet contains only the title text
11 years ago
Michael Peter Christen 09dcdb9b19 update to solr 4.9.0
11 years ago
Michael Peter Christen 1cd4b2e8be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 8c52f0651b refactoring of AccessTracker events & timeline fix
11 years ago
reger 431a5f9c4e added test case for TextSnippet,
11 years ago
Michael Peter Christen 5b94a257ce no timeout for large reference collections
11 years ago
Michael Peter Christen f5b817bac4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
reger cb2c17d236 extract author and keywords in .doc and .ppt parser
11 years ago
reger a5707cd2eb enable proper Author navigator
11 years ago
Michael Peter Christen 74206a10c7 refactoring
11 years ago
orbiter fec673c9d1 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 4a66af716d added apkParser stub (work in progress)
11 years ago
orbiter c59da9fe7a added access tracker log reader stub
11 years ago
reger 2d67f29244 adjust mergeDocument after parsing to
11 years ago
Michael Peter Christen 0d29b972cc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 36e623d8bf enhanced metadata enrichment for media file type search:
11 years ago
Michael Peter Christen 49886fab08 enhanced debugging
11 years ago
Michael Peter Christen b893c42a0f bugfix for image search
11 years ago
Michael Peter Christen c7995d3e2a increased fixed limit for http POST request sizes to 100MB
11 years ago
reger 7847a93558 fix AbstractParser.singleList not adding null strings
11 years ago
Michael Peter Christen 8acae852a0 write <em>-tagged texts also into the bold_txt field
11 years ago
reger 90c4576361 add a link to recrawl index entry to metadata html page
11 years ago
Michael Peter Christen 2626c8f6db using concurrency to do base64 encoding in file POST commands
11 years ago
Michael Peter Christen e132689818 fixed and enhanced Base64 (en)coder (again)
11 years ago
Michael Peter Christen 2415e3db43 enhanced ASCII byte[] -> String conversion
11 years ago
Michael Peter Christen 4751ed974f enhanced base64 encoding
11 years ago
Michael Peter Christen e949071160 removed superfluous date method
11 years ago
Michael Peter Christen 501d55cd35 removed superfluous assert
11 years ago
orbiter 0bbb5040b8 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 9d5d86cd03 Added filter query options to the ranking servlet /RankingSolr_p.html.
11 years ago
Michael Peter Christen d2151857f1 Added collection navigation:
11 years ago
Michael Peter Christen 74c249288a added a push api to make it possible to upload files directly without
11 years ago
Michael Peter Christen f13c8aa7dd re-implementation of file push option in the context of POST http
11 years ago
Michael Peter Christen ba6ffddefc refactoring
11 years ago
reger 982601017e crawling of filenames with + fails due to url decoding
11 years ago
reger 3b559e7846 optimize pdfParser
11 years ago
reger 09f73b790f fix pdfParser not closed warning from pdfbox
11 years ago
reger 92d1604a31 Crawler hostbalancer does not delete finished queue files,
11 years ago
Michael Peter Christen 0c324d735c NPE fix for postprocessing without term index
11 years ago
Michael Peter Christen 922979aae1 added option to prefer http over https in unique-protocol ranking
11 years ago
Michael Peter Christen b3b174e2b8 fixed webgraph postprocessing and status display in Crawler_p servlet
11 years ago
Michael Peter Christen e6b28f5958 removed check on protocol for double content (user request)
11 years ago
reger d8d318233e fix logging settings
11 years ago
Michael Peter Christen 698f053658 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen f23c4142e0 added option to configure a custom user agent within allip networks
11 years ago
reger 8e233e2eb4 - fix typo in Message_p (defaultpath)
11 years ago
orbiter d7d38f9135 made number of open files in crawler configurable and increased default
11 years ago
Michael Peter Christen 8ad41a882c fixed several problems with postprocessing:
11 years ago
reger ca5437dd50 fix crawl of file:// , also http://mantis.tokeek.de/view.php?id=149
11 years ago
Michael Peter Christen ff5b3ac84d added new fields http_unique_b and www_unique_b which can be used for
11 years ago
sixcooler 5b1c4ef191 Monitoring and limit connection-count for Jetty
11 years ago
Michael Peter Christen f0db501630 better handling of ranking parameters and new default values for date
11 years ago
Michael Peter Christen 53948da7d0 tried to make last_modified recognition smarter
11 years ago
Michael Peter Christen 2d03037965 'Last-Modified', not 'Last-modified' according to
11 years ago
Michael Peter Christen 3dc5fb0050 fix for operator precedence bug (cast binds stronger than bitwise AND)
11 years ago
Michael Peter Christen 6634b5b737 debug code for index distribution testing
11 years ago
orbiter 49e344e8d9 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 7705e36703 fix for latest generic warning fix
11 years ago
sixcooler 10326892a8 avoid erros from ConnectHandler, correction for #6d16fa9
11 years ago
orbiter 97983ba89f fixed generics warnings for generic array instantiation that appeared
11 years ago
sixcooler 830057d788 lower Segment-size (hope to get Segments of 10GB)
11 years ago
orbiter c028ae9b09 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
reger e31493e139 "Use remote proxy for yacy" has no function, remove option and related config item
11 years ago