Commit Graph

1264 Commits (bf8a6d984855f3aaca35b866a1e27d988e933f21)

Author SHA1 Message Date
reger 2bc9cb5828 fix early return in addToCrawler
10 years ago
Michael Peter Christen 0710648c31 enable api calls with very long urls
10 years ago
reger 1481a8ab56 add opensearch rss results to dht collection (due to text = snippet)
10 years ago
reger 752eec6697 fix NPE in addToIndex when used outside searchEvent
10 years ago
Michael Peter Christen ff29b0e503 added option to re-index exported xml snapshot dumps to
10 years ago
Michael Peter Christen 6f4fe4b175 revert of 8a7c68e4c7
10 years ago
Michael Peter Christen 97930a6aad added must-not-match filter to snapshot generation.
10 years ago
Michael Peter Christen 9d8f426890 adding a try-catch to link graph processing to prevent that a single
10 years ago
reger 8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
10 years ago
reger 7224209486 break out of NormalizeDistributor loop on timeout
10 years ago
reger 47e61f8325 fix typo in image filter query
10 years ago
reger 4b4ab6799f fix String out of range in Collection Nav
10 years ago
reger 5408448a56 skip redundant add. of keywords to text
10 years ago
reger 296e97c78e put https port in peers dna
10 years ago
Michael Peter Christen fed26f33a8 enhanced timezone managament for indexed data:
10 years ago
Michael Peter Christen b060ba900d added parsing of contentprop attribute in html tags for
10 years ago
Michael Peter Christen 4cb4f67f38 added parsing of dd, dt and article html fields. The parsed result is
10 years ago
reger 1395f10e95 fix typecast for css links
10 years ago
Michael Peter Christen abaaaef5f1 fix for filter queries
10 years ago
Michael Peter Christen f5a032f293 split query into filter query and text query to get better ranking
10 years ago
Michael Peter Christen 2e88028c1a when selecting collections in navigation, do show the un-selected
10 years ago
Michael Peter Christen fa7edc9f7a refactoring of filter queries (several queries instead only one)
10 years ago
Michael Peter Christen 40389987ec Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen f9ba50379d added an expansion option to search facets on result page:
10 years ago
reger 1f0f77bb77 make location facet return results
10 years ago
Michael Peter Christen 9bf0d7ecb9 added a new collection type 'dht' to all documents from the peer-to-peer
10 years ago
reger f63fff9008 fix snippet containig number with comma as desmo point http://mantis.tokeek.de/view.php?id=344
10 years ago
reger b241264632 fix error on *abc query input
10 years ago
reger 7e09bff4a1 exclude default search fields from text copy to text_t
10 years ago
reger 8af70950d9 harmonize snippet computation
10 years ago
Michael Peter Christen fd4e2c809a Show dates in the content of a document in the search result:
10 years ago
Michael Peter Christen d9d3111d10 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
10 years ago
reger d7259419f3 postpone raw snippet html encoding upon use
10 years ago
reger 9b0de2de64 introduce getQueryFields to return default query fields (queryparamter QF)
10 years ago
reger 4b97ddb9ec stop sending crawl receipts if receiver got offline
10 years ago
reger fba34e12ef fix formatting issue if snippet contains html code
10 years ago
reger e48720a58c fix NPE in snippet computation
10 years ago
Michael Peter Christen 97ba5ddbb7 configuration option for maxload limit for remote search
10 years ago
reger 9e1ec5fec4 refactor: just some more useages of constant for term ":[* TO *]"
10 years ago
reger 8c491f51a5 remove hardcoded initialization of language nav if not used
10 years ago
Michael Peter Christen b5ac29c9a5 added a html field scraper which reads text from html entities of a
10 years ago
Michael Peter Christen 68c605d637 replace with CommonPattern.SPACE for split
10 years ago
Michael Peter Christen a8a2b7a803 persistency for vocabulary facet switch
10 years ago
Michael Peter Christen 69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
10 years ago
Michael Peter Christen 5a060c9f26 refactoring of reindexSolr (just replaced constant string)
10 years ago
Michael Peter Christen 3d717b749a fix for urlmaskfilter
10 years ago
Michael Peter Christen bee5ee7cce removed some warnings
10 years ago
reger 42b0672be3 Let auto-disabled crawls recover if low resource condition vanished.
10 years ago
Michael Peter Christen 7db2888336 fixed font size and print page generation in pdf snapshots
10 years ago
reger 24f68a4eb7 refactor opensearch heuristic
10 years ago
Michael Peter Christen 3b51636ecb fix for mediawiki import
10 years ago
Michael Peter Christen 8cafdb989a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
reger 66839f73fa remove debug limit from commit before
10 years ago
reger 4214f250d0 Add option for extended search (Autosearch) to Bookmark.html asking all connected peers for the searchterm added as description to the bookmark created by the bookmark icon.
10 years ago
Michael Peter Christen 3e6c3e2237 documents pushed over the api/push_p.html interface will have their
10 years ago
reger 4eb89d7f15 revert clickservlet
10 years ago
reger d44d8996d0 Added a “don't store remote search results” option
10 years ago
Michael Peter Christen d2792a43fd do not write iframe and embed links into webgraph, but use them anyway
10 years ago
Michael Peter Christen ecb6a59e9e do not translate gif images into png images for thumbnails. Instead,
10 years ago
reger 73ba5d8ef7 adjust fieldtype and description of field httpstatus_redirect_s in CollectionSchema
10 years ago
Michael Peter Christen eb78388a98 changed prefer strategy for http unique in such a way that http is
10 years ago
Michael Peter Christen aaf7d4775a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 8c3e5b7b6d added experimental pdf splitting which enables YaCy to split pdfs during
10 years ago
reger 198102304b refactor size() -> filesize() of URIMetadataNode
10 years ago
Michael Peter Christen d3e71ed070 fixes for searches when initialization of large autotagging libraries
10 years ago
Michael Peter Christen 28683530cd fixes to usage of no-cache: use and recognize also the no-store
10 years ago
reger 13cca2b114 fix missing AppPath
10 years ago
Michael Peter Christen 65125439fe added query modifier 'on'. This makes it possible to search for date
10 years ago
Michael Peter Christen 932faafffe reactivated on-demand snapshot loading
10 years ago
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
10 years ago
Michael Peter Christen 6a1865f507 refactoring date -> lastModified
10 years ago
Michael Peter Christen 7bfc5b80cb added new options to vocabulary editor:
10 years ago
Michael Peter Christen 8df8ffbb6d enhanced the snapshot functionality:
10 years ago
reger 5f0bb1214f modified FieldReIndex to reindex queries with low number of documents first
10 years ago
reger e52370728a fix startup stop on missing HTCACHE/SNAPSHOT directory
10 years ago
reger 70cf7060a4 coding fixes suggested in
10 years ago
reger ff18129def ViewFile servlet: update index if newer,
10 years ago
Michael Peter Christen 60f27bdf49 added the property timeoutrequests to configuration to disable
10 years ago
Michael Peter Christen 97f6089a41 YaCy can now create web page snapshots as pdf documents which can later
10 years ago
reger 0c97cc2440 skip unused call parameter for hashSentence()
10 years ago
Michael Peter Christen ad0da5f246 added new web page snapshot infrastructure which will lead to the
10 years ago
Michael Peter Christen 1d45d9405a security bugfix
10 years ago
Michael Peter Christen ff728b4aa5 ignore url errors during search
10 years ago
Michael Peter Christen 8317914ce3 changed vocabulary navigator object type to TreeMap to get a specific
10 years ago
Michael Peter Christen 041b605cfe Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
Michael Peter Christen 30276a2b48 prevent that a local Solr search and a local RWI search are running
10 years ago
reger 1e7ee72240 fix path lookup to ./defaults/yacy.badwords
10 years ago
reger ee277b9b3e allow for local yacy.stopwords and yacy.badwords list (in DATA/SETTINGS/)
10 years ago
reger de56266bcb remove redundant toLower for topwords
10 years ago
Michael Peter Christen 70f03f7c8e do not cache search requests to Solr if the result is used for
10 years ago
reger ef5dc68313 include domtype to searcheventcache id
10 years ago
Michael Peter Christen 6a2a669db4 added loading of the synonyms file from addon/synonyms into the
10 years ago
Michael Peter Christen c67c5c0709 added new solr schema fields which record the occurences of vocabulary
10 years ago
Michael Peter Christen 0550b54d56 added fix to postprocessing: avoid caching of postprocessing collection
10 years ago
Michael Peter Christen 68e8039fd1 added high-precision scheduler for API processes. This allows also to
10 years ago
Michael Peter Christen 7e1b0b6712 fix for wildcard patch in search queries
10 years ago
Michael Peter Christen 0a879c98e7 added new 'firstSeen' database table and necessary data structures which
10 years ago
sixcooler 9c6e3a6b1c fix assertation-failure in version-string for Solr-4.10.2 by changing
10 years ago
sixcooler 725b206fb4 update to solr-/lucene-4.10.2
10 years ago
Michael Peter Christen 5c97ecb30f fix of bad query generation for search facets
10 years ago
Michael Peter Christen 95d87f00b3 fix for bad query generation in doublecheck in postprocessing
10 years ago
orbiter 5be352da99 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
orbiter 0fcd8097a3 removed unused options from BusyThreads
10 years ago
Michael Peter Christen 92007e5d2d more enhancements to posprocessing speed
10 years ago
Michael Peter Christen 9a7fe9e0d1 fix for bad timing computation in postprocessing
10 years ago
Michael Peter Christen bd16119a00 another fix for postprocessing (the query for "" on numeric field did
10 years ago
Michael Peter Christen 327e83bfe7 more fixes in postprocessing: partitioning of the complete queue to
10 years ago
orbiter 71758f0d62 enhanced postprocessing by usage of a field-list generation to prevent
10 years ago
Michael Peter Christen fe537679de fix for exact_signature_unique_b, exact_signature_copycount_i,
10 years ago
Michael Peter Christen 2e5214eb21 added field postprocessing.partialUpdate to settings which can be used
10 years ago
Michael Peter Christen 77662e08e1 concurrently initialize the error cache; extended also the cache by
10 years ago
Michael Peter Christen 07c5b57953 removed warnings
10 years ago
Michael Peter Christen 2e09da9832 npe fix
10 years ago
Michael Peter Christen d80418f1b1 added partial updates to solr during postprocessing: during
10 years ago
Michael Peter Christen b1cfbc4a04 added new solr field url_paths_count_i which can be used to enhance the
10 years ago
Michael Peter Christen 30d4402cd1 fixed location search
10 years ago
Michael Peter Christen 8c1a89cb34 added another decoration flag to switch off network graphics in crawler
10 years ago
Michael Peter Christen 5082feb103 less volume for effect sounds
10 years ago
Michael Peter Christen 0bfc69b29b more ipv6 bugfixes
10 years ago
Michael Peter Christen 883622306e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 0843b12ef3 ipv6 fix: avoid that shrinked own ip set is overwritten with (non-valid)
10 years ago
orbiter cddf884bc4 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
Michael Peter Christen 74957f3760 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 2a052f446a Added an experimental audio feedback system.
10 years ago
Marc Nause 1e6e69bc40 Finished implementation of UPNP:
10 years ago
orbiter f3a12801f0 Merge branch 'master' of git@gitorious.org:yacy/rc1.git
10 years ago
orbiter d93325a578 lazy handling of process_sxt field (part of postprocessing)
10 years ago
reger b5ca20de15 preserve content_type (mime) if supplied in preference of construct in from file type.
10 years ago
reger fb1fcc2b03 handle noarchive tag, skip writing page to cache
10 years ago
Michael Peter Christen 3073c69aee Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
10 years ago
Michael Peter Christen 6491270b3a large IPv6 redesign of peer ping methods!
10 years ago
reger 8b1ce49ee6 remove unused variable timeout
10 years ago
orbiter a922b122a3 added a hack to forward solr search results from an external attached
11 years ago
Michael Peter Christen 2645dc816a added warning for not well-formed postprocessing queries
11 years ago
Michael Peter Christen 6d3d4c4ea6 changed the concurrent enumeration of query results in such a way that
11 years ago
Michael Peter Christen ad35d9294f added a 'stats' table which records some peer statistics twice every
11 years ago
reger 8284ea751a catch TimeoutException during ping and do not delete yacy.conf during prereadconfigfile
11 years ago
reger ffa7c7116f better fix for NPE in image search
11 years ago
Michael Peter Christen f1032fb8fe more enhancements to image search in case that a restriction to a single
11 years ago
Michael Peter Christen 475125f9d7 hack to get more results when doing a remote site search
11 years ago
Michael Peter Christen 81f9b34da7 increaesed ability ot search for all images on a single server within
11 years ago
reger b5e0f70197 - remove repositoryPath post from ConfigBasic (obsolete)
11 years ago
reger 8931e14514 fix NPE in image search
11 years ago
Michael Peter Christen 1735dbc9d9 enhanced image search: bugfixes and performance enhancements
11 years ago
Michael Peter Christen ebd0be2cea fixes and speed updates for search process
11 years ago
Michael Peter Christen 7611bf79bd Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1
11 years ago
Michael Peter Christen 524bedc00a fixed text in startup tray icon and added shutdown icon during shutdown
11 years ago
Michael Peter Christen e87dc08c0d set the correct fail time in error docs
11 years ago
Michael Peter Christen a7dd89c4de changed method to write the citation index: do not catch up references
11 years ago
orbiter f318d7c285 enhanced date-ordered ranking
11 years ago
reger a6891ff7f8 fix Querygoal.parse exception on +/-null-term
11 years ago
orbiter a65df4ce7e do not push noindex errors into log if in intranet mode. noindex
11 years ago
Marc Nause 2af56fa37d Improved UPnP. (still not perfect)
11 years ago
orbiter d68438c3d9 make sure that the postprocessing background thread never dies by any
11 years ago
reger e88537522d allow single quote " ' " in query
11 years ago
orbiter 487021fb0a snippet computation update
11 years ago
orbiter 927aaa95a6 concurrency bugfix
11 years ago
reger 7584352e7b use more predefined Solr query parameter constants
11 years ago
reger f9db5dd6c5 reduce doublecontent check document (prevent out of memory)
11 years ago
reger a8508417d1 catch NPE during crawl (OAI import)
11 years ago
Michael Peter Christen 6344718f8b reducing the concurrent query stack size and reduced concurrency of
11 years ago
Michael Peter Christen c465b791af typo
11 years ago
Michael Peter Christen 191ec8c82a added concurrency to postprocess rewrite process
11 years ago
Michael Peter Christen a1e8bdd5e9 log ppm instead of docs/second
11 years ago
Michael Peter Christen cc0ded7abd set process type of web graph according to fields as defined in the
11 years ago
Michael Peter Christen 12fb9d7cd1 log postprocessing constraints in case that postprocessing is not
11 years ago
Michael Peter Christen 338f574bdc no sorting if http/www unique fields are not demanded (makes query
11 years ago
Michael Peter Christen 0ceeceb35e more logic on Solr queries; usage of the query terms in posprocessing,
11 years ago
orbiter 4099296b45 added new classes which shall reduce call overhead to Solr (stub)
11 years ago
orbiter 3491ab4c38 removed unused images from webgraph edge computation
11 years ago
orbiter 2371d6b8db target linktexts must be string to enable search facets on these fields
11 years ago
Michael Peter Christen 001e05bb80 do not store failure of loading of robots.txt into the index as a fail
11 years ago
Michael Peter Christen 05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
11 years ago
Michael Peter Christen 98f45c9032 fix for image alt attachment to AnchorURLs in html parser.
11 years ago
orbiter 22ce4fb4dd better error handling for remote solr queries and exists-checks
11 years ago
orbiter 738989aab7 reverted commit f94c91315b because the
11 years ago
Michael Peter Christen c115f3869c enhanced snippet computation and test method in ViewFile
11 years ago
orbiter 1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
11 years ago
Michael Peter Christen f94c91315b if the webgraph is used, then use it also for reference computation to
11 years ago
Michael Peter Christen 6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
11 years ago
Michael Peter Christen b44626e55b fixed target_alt_t in webgraph
11 years ago
Michael Peter Christen 504327b15c fix for condition for writing the webgraph
11 years ago
Michael Peter Christen 542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
11 years ago
Michael Peter Christen 4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
11 years ago
reger f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
11 years ago
reger a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
11 years ago
Michael Peter Christen 2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
11 years ago
Michael Peter Christen bf1b6b93e7 do not write CR values to webgraph if no CR values are computed
11 years ago
Michael Peter Christen d07cdd8c3b added SolrCloud access mode and configuration
11 years ago
Michael Peter Christen 8514bffc22 enhanced postprocessing status report
11 years ago
Michael Peter Christen b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
11 years ago
Michael Peter Christen 62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
11 years ago
Michael Peter Christen b5d78ba156 reduced number of solr queries during crawling
11 years ago
Michael Peter Christen fd87fa1613 removed more unnecessary exist-checks in ErrorCache
11 years ago
Michael Peter Christen f2b476e08b don't do a double check to solr for failed documents if they are not
11 years ago
orbiter dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git
11 years ago
orbiter 51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
11 years ago
Michael Peter Christen fb3dd56b02 fix for processing of noindex flag in http header
11 years ago
Michael Peter Christen b0d941626f fixed bugs in canonical, robots and title/description unique calculation
11 years ago