Commit Graph

335 Commits (b73ea6581d8269ddf120e15e270688baf28df2ff)

Author SHA1 Message Date
orbiter e3025ee691 - new icon for OAI-PMH loading action
15 years ago
orbiter b0b7a4f9a5 - added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
15 years ago
lotus 79251e6f60 configurable disk space hardlimit for dht
15 years ago
orbiter a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
15 years ago
orbiter 30f108f97d added stub of oai-pmh importer (not working yet)
15 years ago
orbiter 52470d0de4 - fix for xls parser
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
15 years ago
orbiter 26fafd85a5 - more refactoring
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago
orbiter a8ce192f63 - shifted main classes to new package net.yacy
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago
orbiter e7f18ba24b refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter c0e0e1f422 moved blob classes
15 years ago
orbiter 1e4f8b56ed accumulated classes from different packages into the new rwi package
15 years ago
orbiter 194da25a2f moved kelondro index
15 years ago
orbiter 4446acc8cd moved kelondro order
15 years ago
orbiter f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
orbiter 6e0dc39a7d - some fixes to prevent blocking situations
15 years ago
orbiter 04a548a1e3 - temporary integrated the transferURL servlet as static class instead as a class that is called using reflection to investigate the OOM problems in that class
15 years ago
orbiter 6aa474f529 - better logging for web cache access and fail reasons
15 years ago
orbiter 3671c37989 added experimental oai-pmh reader and integrated it with the existing dublin core parser
15 years ago
orbiter 2e6bdce086 - added more logging to balancer
15 years ago
hermens 62a7341c4d Fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2204
15 years ago
low012 f65bfaa9af *) Removed base tag from errror page. This has been added by myself a long time ago as a workaround for some weird behavior of my router, but as it turns out, it does more bad than good in general: If HTTPS is used for communication with YaCy, entering a wrong passwort led to an errror page with a form which would send username and password unencrypted with the user possibly being unaware of this.
15 years ago
orbiter 3e9dcfc204 fix for http://forum.yacy-websuche.de/viewtopic.php?p=17504#p17504
15 years ago
orbiter 573d03c7d7 added configuration to enable ram table copy
15 years ago
orbiter ce972ff4ef update to default ranking profile which has now some settings to deny some phpbb3 pages which are redundant in the index when crawling phpbb3.
15 years ago
orbiter 44579fa06d - fixed a problem loading images through yacy's document loader,
15 years ago
orbiter 72e5407115 refactoring of snippet cache
15 years ago
orbiter 8e56c2ace6 fix for fixes from this afternoon
15 years ago
orbiter cf739edc2e fix for possible deadlock, see
15 years ago
orbiter 92edd24e70 fixed problem with switching of networks
16 years ago
orbiter 0575f12838 fix for deadlock
16 years ago
orbiter c0e17de2fb - fixes for some problems with the new crawling/caching strategies
16 years ago
orbiter 634a01a9a4 replaced wget-requests with caching requests
16 years ago
orbiter c6c97f23ad - added cache usage properties to crawl start
16 years ago
orbiter c4ae2cd03f fixed bug that caused deletion of crawl profiles at every application startup
16 years ago
orbiter 161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
16 years ago
orbiter 4da9042e8a code simplification
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter b332dfad67 - inserted request object into response object which carries this now instead generating new objects
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
orbiter 13c63f4082 a set of small fixes to crawling behaviour
16 years ago
orbiter 43c8defd79 enhanced parser with more extension + mime attributes
16 years ago
orbiter b2263bc720 enhanced document type recognition
16 years ago
lotus aa38eb5a20 * maxfilesize -1 for infinite filesize
16 years ago
lotus 9cfe89c8fc * process content-length as soon as it is received
16 years ago
lotus 9f083bb6b2 check filetype before loading (no more mp4 loading)
16 years ago
f1ori f814e0fa81 enable warnings and fix most of it
16 years ago
orbiter 57a88d435b redesign of parser mime type detection and parser steering
16 years ago
orbiter 21b8704fb4 refactoring of the ParserDispatcher and ParserConfig: resulted into Idiom, Parser and Classification classes
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter 024744245c small refactoring to prepare for new queues
16 years ago
orbiter 24cb6d68bc - renamed Stack to RecordStack to avoid name confusion with new classes
16 years ago
orbiter 995da28c73 all stack/heap files that had been stored in DATA/PLASMA are now stored in the network-specific QUEUES path
16 years ago
orbiter 409538e17a code cleanup and code simplifcation
16 years ago
orbiter 1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter 222850414e simplification of the code: removed unused classes, methods and variables
16 years ago
orbiter 93dfb51fd4 problems with code style
16 years ago
orbiter 9a674d8047 - After the removal of the Tree class some code simplifications are possible. This affects mostly the Records class, which can be refactored and the result of the refactoring results in a reduced number of classes.
16 years ago
orbiter c5122d6836 completed migration of BLOBTree to BLOBHeaps:
16 years ago
orbiter ae015e8e98 refactoring of blob package classes
16 years ago
orbiter ce1adf9955 serialized all logging using concurrency:
16 years ago
orbiter b8e738a7be a collection of
16 years ago
orbiter d58b395993 fix for http://forum.yacy-websuche.de/viewtopic.php?p=15693#p15693
16 years ago
orbiter b6e274f211 omit most of forced crawl delays by using a separat delay table which flushes delayed URLs at the correct time
16 years ago
orbiter d50be59088 - added a automatic re-construction of the domain stack after 10 minutes. this includes then urls to the domain stack that were left over in case of stack size limitations when the domain stack was created the last time
16 years ago
orbiter 5fdba0fa51 - fixed a not working selection rule in balancer
16 years ago
orbiter f5602404d5 another speed boost for the balancer
16 years ago
orbiter 95e8cbd1c3 new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls
16 years ago
orbiter 42ae40b9f6 some bugfixes to database close() methods
16 years ago
orbiter 88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter 3d4b826ca5 migration of all databases that use the deprecated BLOBTree format into the BLOBHeap format. Old databases are migrated automatically.
16 years ago
orbiter 63a0255166 - refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index
16 years ago
orbiter addecdb18c simplified code, removed one unused method in all implementing classes
16 years ago
lotus 734680dc70 initialize the ResourceObsever in own thread
16 years ago
orbiter d2ac0aa682 - fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling
16 years ago
orbiter 138422990a - removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated
16 years ago
lotus 635b0a9da7 code-split
16 years ago
orbiter fa3adbbfc6 added domain checks to surrogate reader and RWI transfer receiver to prevent spaming using surrogates
16 years ago
lotus ab0030d7a7 allow dht-out for remote-crawl processing peers on default settings
16 years ago
orbiter 4e97a31009 corrections in dublin core syntax
16 years ago
orbiter 7dfe7e7cc6 fixed some problems with surrogate reader. This is now ready for testing.
16 years ago
orbiter 9050a3c4c5 alpha version of surrogate reading and indexing.
16 years ago
orbiter ad78e3a59f - less lines in rssTerminal
16 years ago
orbiter bc80dc913a added new surrogate reader (surrogates are parsed documents on batches)
16 years ago
orbiter e58320a507 added more info in log fore debugging
16 years ago
orbiter c0e8ed5461 fixed problem with not http client
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
shostakovich 1f37cc6107 Robots.txt is now reused after one day. See forum-topic:
16 years ago
orbiter 9bfb2641db - removed deprecated threads
16 years ago
orbiter b6c2167143 - patch for bad web structure dumps
16 years ago
orbiter 0139988c04 - added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up.
16 years ago
orbiter 3621aa96ab - added a memory protection for the IndexCell migration
16 years ago
orbiter d39a5b42ca more care about open file handles. Now files also close on windows and can be deleted afterwards.
16 years ago
orbiter 029495e64d fixed bug introduced in SVN 5756 in EcoTable.put()
16 years ago
orbiter 587838bd09 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542
16 years ago
orbiter 96eaecda3e - added migration class to go from index collections to the index cell data structure.
16 years ago
orbiter 37f892b988 added new concurrent merger class for IndexCell RWI data
16 years ago
borg-0300 8c494afcfe svn attributes added
16 years ago
orbiter 67aaffc0a2 - added Latency control to the crawler:
16 years ago
orbiter 61f9dbf0cc - fixed a display problem in watch crawler
16 years ago
orbiter b3f75e48fa - enhanced balancer: auto-solving of waiting-deadlocks
16 years ago
orbiter d99ff745aa fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378
16 years ago
borg-0300 fd0976c0a7 refactoring
16 years ago
borg-0300 ce79239322 "typo"
16 years ago
orbiter 7dff1cba62 removed option to use different primary keys in kelondro tables
16 years ago
orbiter 7f67238f8b refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection
16 years ago
orbiter 14a1c33823 refactoring of wordIndex class
16 years ago
orbiter f6d989aa04 added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy.
16 years ago
orbiter efcd95dc37 simplification of (internal) query process / refactoring
16 years ago
orbiter aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
16 years ago
orbiter 76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
16 years ago
orbiter 8444357291 added new row interator in kelondro tables files that enumerates rows
16 years ago
orbiter 9559bc23fd automatic clean-up of dead connections
16 years ago
orbiter 4f9dae2571 remove reference in crawl entries
16 years ago
orbiter c12bb8a6d0 - refactoring of the http client
16 years ago
orbiter 62505bb3cb more bugfixes as recommendet by findbugs
16 years ago
orbiter 411f2212f2 more memory leak fixing hacks
16 years ago
orbiter 6c627dbdff update to the server core
16 years ago
orbiter 6a876ecb88 first fixes to the DHT transmission process
16 years ago
orbiter c25c334b75 replaced old DHT transmission method with new method. Many things have changed! some of them:
16 years ago
orbiter 65a1de6c05 longer timeout for remote crawl queries
16 years ago
orbiter 94110df85a moved logging partially to kelondro
16 years ago
orbiter 024da2916b refactoring of logging
16 years ago
orbiter 83ce65707a (almost) completed partition of classes in kelondro
16 years ago
orbiter 7ee494fde5 more refactoring of kelondro:
16 years ago
orbiter bf93767ec6 refactoring of kelondro database classes
16 years ago
orbiter fc27bf8c4c refactoring of kelondro classes:
16 years ago
orbiter 91af105373 last changes before release
16 years ago
orbiter 05c235de32 fix for npe
16 years ago
orbiter 2b32248079 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1516&p=10545#p10545
16 years ago
orbiter 4d5b401f00 try to fix some performance problems with the internal index management:
16 years ago
orbiter c6880ce28b removed the permanent cache flush and replaced it with a periodic cache flush
16 years ago
orbiter 07fc115e90 removed active profiling in kelondroRowSet
16 years ago
orbiter e004da48d3 - added fast fingerprint computation for files (any). Will be used in new index dump method
16 years ago
orbiter bb935fdbb0 less organization overhead for DNS caching and prefetching
16 years ago
orbiter e34ac22fbd - added new monitoring servlet at
16 years ago
orbiter d376d81fc4 replaced busy thread control of crawl stacker by blocking threads
16 years ago
f1ori 0881190b19 * Robots.txt: don't interpret Crawl-Delays for other robots
16 years ago
orbiter 243e73f53b removed unnecessary usage of kelondroBLOBTree
16 years ago
orbiter 7535fd7447 - refactoring of CrawlEntry and CrawlStacker
16 years ago
orbiter 2802138787 - refactoring of CrawlStacker (to prepare it for new multi-Threading to remove DNS lookup bottleneck)
16 years ago
orbiter 1779c3c507 - added a read cache to the RAFile interface to RandomAccessFile
16 years ago