Commit Graph

484 Commits (7b108dadf77b0e2734570b872ea132b26b4dc474)

Author SHA1 Message Date
Michael Peter Christen 21fe8339b4 - enhanced generation of url objects
12 years ago
Michael Peter Christen 5f0ab25382 removed the option to prevent removal of & parts inside of the
12 years ago
Michael Peter Christen abab291162 made the index schema retrieval public and allow cross-domain retrieval
12 years ago
Michael Peter Christen 1533bfd63b refactoring
12 years ago
Michael Peter Christen 872f83ebe0 refactoring
12 years ago
Michael Peter Christen 8219a445f3 refactoring
12 years ago
Michael Peter Christen 00c1c777fa refactoring
12 years ago
orbiter 563d584420 removed more dependencies in cora from kelondro
12 years ago
orbiter 63762d8f89 removed kelondro dependencies from cora
12 years ago
Michael Peter Christen b69ed96f0b - added collections to yacydoc
12 years ago
Michael Peter Christen 4d29f59a27 removed warnings
12 years ago
Michael Peter Christen 8c099d2106 Merge remote-tracking branch 'origin/master'
12 years ago
apfelmaennchen d31a632951 - added dmoz RDF dump importer
12 years ago
Michael Peter Christen 8ca842b137 added new button design to more buttons
12 years ago
Michael Peter Christen b2b516cc3e added a collection attribute to crawls and searches:
12 years ago
Michael Peter Christen a427a68bac removed many warnings
12 years ago
Michael Peter Christen 31d4d38804 - extended the solr interface by a references-by-word-count method
12 years ago
Michael Peter Christen 528d6763fa - added new solr fields:
12 years ago
Michael Peter Christen 75d5e3475d Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
12 years ago
Michael Peter Christen 316b5fe116 - added a solr type definition verifier
12 years ago
reger 2d2be546fe fix path to env/grafics to display api icon on meta data page
12 years ago
Michael Peter Christen 0cab06c47c refactoring
12 years ago
Michael Peter Christen 06a78eecb7 code simplification
12 years ago
Michael Peter Christen 18f989dfb1 - refactoring (load -> getMetadata)
12 years ago
Michael Peter Christen 136fcb1ad9 refactoring
12 years ago
Michael Peter Christen 24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
12 years ago
Michael Peter Christen 1687737771 Abstraction of HandleMap and HandleSet
12 years ago
Michael Peter Christen 6f1ddb2519 Moved solr index-add method to the same method where the YaCy index is
12 years ago
orbiter 69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
12 years ago
Michael Peter Christen f78ce93a80 collection of speed and memory saving hacks
13 years ago
orbiter 0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
13 years ago
Michael Peter Christen b0c408788b made class methods static where possible
13 years ago
Michael Peter Christen 5bd3c90907 - removed unnecessary semicolons
13 years ago
Michael Peter Christen 241dd8410a removed snippet pattern filter - it was not used
13 years ago
Michael Peter Christen d3964253ae - added @SuppressWarnings to unused servlet method parameters
13 years ago
Michael Peter Christen ea10766bfd cleaned unnecessary nested code
13 years ago
Michael Peter Christen 1825f165b8 better integration of blacklist according to use case
13 years ago
Michael Peter Christen 03280fb161 removed segments-concept and the Segments class:
13 years ago
Michael Peter Christen 9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no
13 years ago
cominch 011f8a5818 Auto Tagging: Add hyperlinks to tags (provisional)
13 years ago
Michael Peter Christen 52f5d40043 better abstraction of document model generation
13 years ago
Michael Peter Christen 8b7c4d3144 produce a rdf output containing the triplestore with yacydoc; ie:
13 years ago
cominch d8815db877 Merge remote-tracking branch 'original yacy/master'
13 years ago
cominch e4dab19045 Augmented Browsing: added template for document info bar
13 years ago
Michael Peter Christen b2d1c25ebb removed warnings/unused entities
13 years ago
Michael Peter Christen 64c0268b2b show triplestore metadata in yacydoc and viewfile
13 years ago
Roland 'Quix0r' Haeder edaa09b9b1 Rewrote all String blacklist types to enum 'BlacklistType', closes bug
13 years ago
cominch 87a3fbb3c2 interaction javascript
13 years ago
Michael Peter Christen 8b974905ee changed log-in text for all servlets with authentication:
13 years ago
reger b2175ea4ef Add possibility to set custom Solr field names for the YaCy default Solr attributes.
13 years ago
Michael Peter Christen c00efc2717 made the solr connection more generic
13 years ago
Michael Peter Christen 453010bd68 - solved problems with backpath normalization
13 years ago
Michael Peter Christen 0e13022147 - enhanced solr field documentation
13 years ago
Michael Peter Christen e377092198 fix to xml output format
13 years ago
Michael Christen 41be98dc9d extended webstructure api to show together with incoming links also
13 years ago
Michael Christen 8f89c8ef07 added information about inbound, outbound and citation links into
13 years ago
Michael Christen 71649a1296 added an api to retrieve the new citation.index with the
13 years ago
Michael Peter Christen 9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
13 years ago
Michael Peter Christen e2f8f263e8 changed storage of search words: keep order
13 years ago
Michael Peter Christen c166eb68b6 fixes in solr schema file
13 years ago
Lotus 335a776351 xss hardening on Status.html
13 years ago
Michael Peter Christen ef5192f8c9 using the generic document parser for crawl starts instead of the html
13 years ago
Michael Peter Christen ce620be783 for for crawl start with smb url
13 years ago
Michael Peter Christen 7053f8ab46 added automatic generation of a solr schema.xml file
13 years ago
Michael Peter Christen 2ee8cbeb2c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
13 years ago
Michael Peter Christen 992dbdf4bb added noload statistic to servlets
13 years ago
Roland 'Quix0r' Haeder fa08ed5ae5 Fixed a lot CHMOD rights (no need for execute flag on *.java/*.html) and introduced local/remote crawl size ratio based check
13 years ago
Michael Christen 9e5894c784 Removed handling of components objects for URIMetadataRows.
13 years ago
Michael Christen c04bfaa51b refactoring
13 years ago
Michael Christen e9dc99fe15 added rules to set specific RWIs as private RWIs which are not
13 years ago
Michael Christen 204c29f010 small bugfixes for search result display and cache display
13 years ago
apfelmaennchen ff19fcdb28 bugfix for YMarks XBEL import and export; thanks to Dominic
13 years ago
orbiter 11729061f2 added an option in the bookmark import process to put everything into the crawler
13 years ago
apfelmaennchen 8f30d288e9 small change to mouse over text for crawl starts within bookmarks
13 years ago
apfelmaennchen 29e97f94f2 small optical enhancements to ymarks treeview
13 years ago
apfelmaennchen 77a080ced9 smaller fixes for YMarks
13 years ago
orbiter e22f8497c9 - tested the ARC methods
13 years ago
orbiter 5a55397f99 some last-minute performance hacks
13 years ago
apfelmaennchen dd1482aaf5 further update to YMarks
13 years ago
apfelmaennchen 564374d1fe - included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
13 years ago
orbiter 05f34a3fa7 added a full, complete, database insert, update and delete API for the tables.
13 years ago
orbiter c461c1eebf fixed xml output for table retrieval
13 years ago
orbiter c93f10417a add a bookmark automatically each time a new crawl is started
13 years ago
apfelmaennchen 6287c2b4a9 YMarks:
13 years ago
apfelmaennchen 5581be12fb YMarks:
13 years ago
apfelmaennchen a3eebfdcba YMarks:
13 years ago
apfelmaennchen 4f95f72124 YMarks:
13 years ago
orbiter 017a01714d - enhanced logging in robots.txt parser for remote debugging
13 years ago
orbiter 5a7cec59f3 moved ynetSearch to get all files out of htroot/api/util/
13 years ago
apfelmaennchen a8dfe787ed - updated to jquery flexigrid 1.1
13 years ago
cominch cef8ebc41d getpageinfo: Checks if there is a OAI repository behind the URL.
13 years ago
orbiter eb1c7c041d write info about robots.txt evaluation into getpageinfo_p.xml
13 years ago
orbiter f8b8c82421 - refactoring of getpageinfo_p.xml (moved out of util)
13 years ago
apfelmaennchen abba31f02e - bugfix for correctly sorting ymarks
13 years ago
apfelmaennchen 5f7dbe1c42 - some refactoring (ymarks)
13 years ago
apfelmaennchen 4d7ae76017 - update to jquery 1.7 (does not apply to all jquery code, old version is additionally kept for compatibility)
13 years ago
orbiter a7df70221e refactoring
13 years ago
orbiter d2ea250d99 refactoring:
13 years ago
orbiter 2d03dc1804 removed unnecessary warning
13 years ago
orbiter cf8e3b0df8 small fix for count: overXX includes the count
13 years ago
orbiter 6db8921a0f enhanced termlist
13 years ago
sixcooler d40a177c05 Generation Memory Strategy fine tuning
13 years ago
orbiter a5541751a8 - added memory computation to termlist_p.xml
13 years ago
orbiter 9bdee5c71c added a servlet that produces a list of term hashes that appear more than 10000 times
13 years ago
sixcooler 916d79111e Runtime.maxMemory() DOES change @ runtime:
14 years ago
orbiter 9ebc75db4b fix for channel authorization
14 years ago
orbiter 115abc8917 - more attributes for search progress bar
14 years ago
orbiter 4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
14 years ago
orbiter 123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
14 years ago
orbiter 5b579e21a3 code cleanup
14 years ago
apfelmaennchen 61c9a791c4 YMarks: sidebar with tabs for tags and folders
14 years ago
apfelmaennchen 8b8db2aaba YMarks: some small changes/fixes
14 years ago
apfelmaennchen 441035f1f4 YMarks: some improvements to flexigrid quick search on YMarks.html
14 years ago
orbiter 6fa439c82b - refactoring of robots
14 years ago
apfelmaennchen e7c2ea193b YMark:
14 years ago
apfelmaennchen b2281f0b7d YMark: intermediate work towards flexigrid support
14 years ago
apfelmaennchen 60412d2bb3 YMark:
14 years ago
apfelmaennchen 62855f9567 YMark: code clean up and some small fixes
14 years ago
apfelmaennchen 667e912b19 YMark:
14 years ago
apfelmaennchen a0e4960a4d YMark:
14 years ago
orbiter 19fd13d3bc Added federated index storage to solr.
14 years ago
apfelmaennchen 78d6d6ca06 refactoring for ymarks
14 years ago
orbiter b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
14 years ago
low012 1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users)
14 years ago
orbiter 9b25d07295 - added geo information parsing to html parser
14 years ago
orbiter b1a8d0c020 enhancements to web cache and less strict caching rules
14 years ago
low012 2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
14 years ago
orbiter dc0db3550e avoid string conversion
14 years ago
orbiter 694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
14 years ago
orbiter e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
14 years ago
orbiter cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
14 years ago
orbiter 5e186e0122 continuing the fight against deadlocks during time formatting: better caching.
14 years ago
low012 c5051c4020 *) fixed bug which caused entries to not be deleted when deleting by URL on IndexCreateWWWLocalQueue_p.html (I hope this did not break anything else)
14 years ago
orbiter 4473cf8c61 replaced utf-8 with UTF-8
14 years ago
orbiter c93f4dda72 - cleaned up yacy news
14 years ago
orbiter 10ae8d961b - cora package has now no dependencies to other yacy packages and becomes a 'base' package (refactoring)
14 years ago
orbiter c288fcf634 redesigned CrawlStartScanner user interface and added more features:
14 years ago
low012 6f4f957e50 *) cleaning up the code a little bit
14 years ago
f1ori 9d2159582f * fix system update if urls are in blacklist (for example for very general blacklists like *.de)
14 years ago
orbiter a563b05b60 enhanced crawler:
14 years ago
apfelmaennchen 737aaf6952 various small changes to ymarks
14 years ago
apfelmaennchen 8a50670546 some code clean up for the last post
14 years ago
apfelmaennchen 442497868d another step towards an auto tagging function for YMarks
14 years ago
low012 dad5818b40 *) cleaning up the code a little bit
14 years ago
low012 eb79b952ef *) cleaner code
14 years ago
low012 38fdf43587 *) renamed classes according to standard Java coding conventions
14 years ago
apfelmaennchen 54e63b556e intermediate step for a YMark auto-tagging function based on word frequencies.
14 years ago
apfelmaennchen 403ee9c014 added a drill-down for metadata and word count to /api/ymarks/test_treeview.html
14 years ago
apfelmaennchen f147a022f8 enabled YMark Import for /Table_YMark_p.html
14 years ago
apfelmaennchen 94a9be18a4 added a ymark table administration: /Table_YMark_p.html
14 years ago
apfelmaennchen 25339f93c7 more updates to ymarks
14 years ago
apfelmaennchen cdd65aca71 update to ymarks
14 years ago
apfelmaennchen 808edffaf6 ymarks
14 years ago
apfelmaennchen 43586a2ace a update to ymarks (please test if you wish):
14 years ago
apfelmaennchen f5324b27f2 more updates to the new bookmarks (ymarks)....
14 years ago
orbiter 70c95608d4 Added CORS Access header for yacysearch.rss output
14 years ago
apfelmaennchen efe0667fdd more new bookmark (ymarks) code with experimental html and xbel import
14 years ago
f1ori 7d8de34778 * add a bit documentation to DigestURI, use DigestURI(string) instead of DigestURI(string, null)
14 years ago
apfelmaennchen d0e6c03b51 some updates to the new bookmark code...
14 years ago
apfelmaennchen 9c94ebdee4 small changes to new bookmark code...
14 years ago
apfelmaennchen 244b56e9d3 an update to the new bookmark code...
14 years ago
apfelmaennchen f035f257da added some more bookmark code...
14 years ago
apfelmaennchen a79728b97d some updates to experimental bookmark code...
14 years ago
apfelmaennchen ef782cd026 and even more experimental bookmark code...
14 years ago
apfelmaennchen 7aca763ca8 Some more experimental bookmark code...
14 years ago
apfelmaennchen 4270ed696c Experimental code (I need to transfer the code to my macbook, sorry) for the new bookmarks API based on the Tables concept (same as for crawl starts). Currently you can add a bookmark by api/ymarks/add_ymark.xml?url=http://www.yacy.net&title=YaCy and watch the result via the standard view Tables_p.html.
14 years ago
orbiter 2c549ae341 fixed a number of small bugs:
14 years ago
orbiter f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
14 years ago
mikeworks ad7efe6016 rssTerminal.html: Fixing the 'null' is null or not an object in rss2.js when viewing the YaCy default Status page http://localhost:8080/Status.html with Internet Explorer
14 years ago
orbiter 39f409a7bb performance hacks
14 years ago
sixcooler 17eebd4ef8 counting crawler traffic again:
14 years ago
orbiter 9d080f387e change in handling of the all-visible home path for storage in YaCy:
14 years ago
orbiter 875741bcff fix for http://forum.yacy-websuche.de/viewtopic.php?p=20657#p20657
14 years ago
orbiter 0010cd9db1 Support for indexing of RSS feeds!
14 years ago
orbiter 3197ca42ed preparations to move the HTCache into cora:
14 years ago
orbiter 189a986ebd - modified api-call interface to record api calls with references to api-call database (carries pk)
14 years ago
sixcooler 15e8c13526 ... migrating to HttpComponents-Client-4.x ...
15 years ago
orbiter b6fb239e74 redesign of parser interface:
15 years ago
orbiter 1557e0f2d0 - some refactoring for internal RSSFeed (protocol of all actions as seen on status page)
15 years ago
orbiter 777195e8d1 more abstraction for access of LoaderDispatcher and cache
15 years ago
orbiter 3a1cebb598 bugfixes
15 years ago
orbiter 56ff9d5fd4 - extended news size from 512 to 1024 characters
15 years ago
orbiter 3f93a0cc8f redesign of remote proxy settings
15 years ago
orbiter 11639aef35 - added new protocol loader for 'file'-type URLs
15 years ago
orbiter 9842fab6e4 - fixes to query parameter
15 years ago
orbiter 1defd580bc - added option to localization search to distinguish between a search for a location according to the search word only or for the relation between a web search results and locations found in the metadata fields
15 years ago
orbiter 2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
15 years ago
orbiter cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
15 years ago
orbiter c45117f81f fixed dates in metadata
15 years ago
orbiter 06ff0c5b06 fixes for metadata retrieval and presentation
15 years ago
suessthomas 5c5e6accdb Fixes for (X)HTML compatibility.
15 years ago
orbiter 7ab207d93a better presentation of search result metadata and fixes to htcache loading
15 years ago
orbiter 90c3e5d6f6 - cleanup, removed unused imports
15 years ago
orbiter 1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
15 years ago
orbiter 25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
15 years ago
mikeworks 7a3c19846f Updated German translation de.lng: added new Table_RobotsTxt_p.html and some other changes
15 years ago
orbiter 1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
15 years ago
orbiter 0018163c07 moved table row/column matching method from front-end to back-end
15 years ago
orbiter 3300930fc5 - (almost) fixed FTP crawler
15 years ago
orbiter 27b2998eb4 added searchtable function to more tables in interface
15 years ago
orbiter 3014e5f6f9 - integrated live search in the IndexControlURLs input window for URLs:
15 years ago
orbiter 0769517129 added a robots.txt monitor in the crawler monitor submenu
15 years ago
orbiter 840527689b more simplification of bookmark class
15 years ago
orbiter ada0ce9de3 refactoring of bookmarks: there is a big performance problem in the bookmarks code and furthermore the bookmarks
15 years ago
orbiter 2113fcd7e5 - fixed usage of isEmpty() which is not available in java 1.5
15 years ago
orbiter dd459281c8 applied code changes that are recommended by PMD
15 years ago
orbiter 362b7a929b added extensive memory protection logic to avoid out of memory errors that may be caused by the RowCollection memory allocation function
15 years ago
orbiter e34e63a039 preset of proper HashMap dimensions: should prevent re-hashing and increase performance
15 years ago
orbiter 4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
15 years ago
orbiter 5399d1e2bc refactoring (reason: get more abstraction to use the blacklist class; for integration in other servlets)
15 years ago
orbiter 4c99d4683d possible fix for lost crawl profile handles: clean-up job did wrong measurement to see if crawl is still running.
15 years ago
orbiter 4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
15 years ago
orbiter 5e8038ac4d - refactoring of blacklists
15 years ago
orbiter 26fafd85a5 - more refactoring
15 years ago
orbiter 3528b970d6 - refactoring
15 years ago
orbiter b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
15 years ago
orbiter 5841ee83d3 refactoring
15 years ago
orbiter ce8dc575ca refactoring
15 years ago
orbiter bea3b99aff moved table and util classes
15 years ago
orbiter 1e4f8b56ed accumulated classes from different packages into the new rwi package
15 years ago
orbiter 4446acc8cd moved kelondro order
15 years ago
orbiter 735e2737e3 * added index segments
15 years ago
orbiter 031e6eefbd some updates to dublin core, metadata browsing, file indexing and parser stability
15 years ago
orbiter c0e17de2fb - fixes for some problems with the new crawling/caching strategies
16 years ago
orbiter 634a01a9a4 replaced wget-requests with caching requests
16 years ago
orbiter 1d8d51075c refactoring:
16 years ago
orbiter 5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
16 years ago
orbiter ca72ed7526 -removed superfluous crawl cache
16 years ago
orbiter 13c63f4082 a set of small fixes to crawling behaviour
16 years ago
f1ori 8931c8d6b4 improvments to debianpackage:
16 years ago
orbiter 0e8647d62f refactoring of search classes
16 years ago
orbiter dafffd0153 refactoring of parsers and document processing
16 years ago
orbiter 154bbc3364 code cleanup: call of static methods directly to the class
16 years ago
orbiter bc6dd8194b refactoring: moved search query class to new search package
16 years ago
orbiter 945777aa80 replaced rwi term counting method by one that computes the maximum of the blobs that contibute to the RWI. An addition of the blob sizes is wrong/incorrect and does not reflect the real size. Truncation the size operation to the maximum of all blobs is also incorrect, but not as wrong as the sum of all blob sizes wich double-counts many rwi entries.
16 years ago
orbiter cc49aedf12 - fixed problem with remote search NPE
16 years ago
orbiter 88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
16 years ago
orbiter 99bf0b8e41 refactoring of plasmaWordIndex:
16 years ago
orbiter fec6f9054f some refactoring of search methods
16 years ago
orbiter 63a0255166 - refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index
16 years ago
orbiter e16c25ddf7 (peak-) performance hacks
16 years ago
orbiter c8624903c6 full redesign of index access data model:
16 years ago
f1ori dd6b5005ff * fix missing charset handling in getpageinfo_p
16 years ago
orbiter 89ec3acb3e - full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
16 years ago
orbiter c2359f20dd refactoring: better abstraction of reference and metadata prototypes.
16 years ago
orbiter a29a11e526 added evaluation of incoming links in webstructure api
16 years ago
orbiter 7ba078daa1 - added fast site-operator
16 years ago
orbiter bd409fb7ba added web structure analysis for a special domain that can be requested from the api.
16 years ago
borg-0300 8c494afcfe svn attributes added
16 years ago
orbiter 67aaffc0a2 - added Latency control to the crawler:
16 years ago